独角兽企业重金招聘Python工程师标准>>>
基于腾讯大数据QQ共同好友推荐系统,我们基于mapreduce来实现下
测试数据:前面代表QQ用户,:后面代表用户QQ好友
A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J
可以分两个步骤来实现
第一步map和reduce阶段
1.先根据用户QQ好友作为key,用户作为value来输出:
比如第一行:B->A C->A D->A F->A E->A O->A
第二行:A->B C->B E->B K->B
然后根据相同的key使用reduce来聚合
比如输出:B->A B->C B->D....
reduce会根据相同的key分发给相应的reduce task来执行
结果输出为:B->A C D...
再写代码组合输出:
A-C B
A-D B
C-D B
.....
代码实现
package com.xuyu.friends;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;import java.util.ArrayList;import java.util.Collections;public class CommonFriendsOne {public static class CommonFriendsMapper extends Mapper<LongWritable,Text,Text,Text>{Text k=new Text();Text v=new Text();//A:B,C,D,F,E,O//输出:B->A C->A D-A...@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String[] userAndFriends = value.toString().split(":");String user=userAndFriends[0];String[] friends= userAndFriends[1].split(",");v.set(user);for (String f:friends){k.set(f);//friend usercontext.write(k,v);}}}public static class CommonFriendsOneReduce extends Reducer<Text,Text,Text,Text>{//一组数据:B->A E F J...//一组数据:C->B F E J...//所以需要排序@Overrideprotected void reduce(Text friend, Iterable<Text> users, Context context) throws IOException, InterruptedException {ArrayList<String> userList=new ArrayList<String>();for (Text user:users){userList.add(user.toString());}//排序Collections.sort(userList);//组合输出for(int i=0;i<userList.size()-1;i++){for(int j=i+1;j<userList.size();j++){context.write(new Text(userList.get(i)+"-"+userList.get(j)),friend);}}}}public static void main(String[] args) throws Exception{Configuration conf = new Configuration();Job job = Job.getInstance(conf);//动态获取jar包在哪里job.setJarByClass(CommonFriendsOne.class);//2.封装参数:本次job所要调用的mapper实现类job.setMapperClass(CommonFriendsMapper.class);job.setReducerClass(CommonFriendsOneReduce.class);//3.封装参数:本次job的Mapper实现类产生的数据key,value的类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(Text.class);//4.封装参数:本次Reduce返回的key,value数据类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);FileInputFormat.setInputPaths(job,new Path("F:\\mrdata\\friends\\input"));FileOutputFormat.setOutputPath(job,new Path("F:\\mrdata\\friends\\output"));boolean res = job.waitForCompletion(true);System.exit(res ? 0:-1);}}
结果输出
part-r-000000文件
B-CAB-DAB-FAB-GAB-HAB-IAB-KAB-OAC-DAC-FAC-GAC-HAC-IAC-KAC-OAD-FAD-GAD-HAD-IAD-KAD-OAF-GAF-HAF-IAF-KAF-OAG-HAG-IAG-KAG-OAH-IAH-KAH-OAI-KAI-OAK-OAA-EBA-FBA-JBE-FBE-JBF-JBA-BCA-ECA-FCA-GCA-HCA-KCB-ECB-FCB-GCB-HCB-KCE-FCE-GCE-HCE-KCF-GCF-HCF-KCG-HCG-KCH-KCA-CDA-EDA-FDA-GDA-HDA-KDA-LDC-EDC-FDC-GDC-HDC-KDC-LDE-FDE-GDE-HDE-KDE-LDF-GDF-HDF-KDF-LDG-HDG-KDG-LDH-KDH-LDK-LDA-BEA-DEA-FEA-GEA-HEA-LEA-MEB-DEB-FEB-GEB-HEB-LEB-MED-FED-GED-HED-LED-MEF-GEF-HEF-LEF-MEG-HEG-LEG-MEH-LEH-MEL-MEA-CFA-DFA-GFA-LFA-MFC-DFC-GFC-LFC-MFD-GFD-LFD-MFG-LFG-MFL-MFC-OID-ELE-FMA-FOA-HOA-IOA-JOF-HOF-IOF-JOH-IOH-JOI-JO
第二步:将第一步输出的结果文件再进行一次map reduce
输出结果应该为这种:第一行表示:A和B用户都有C和E这两个共同好友
A-B[C, E]A-C[D, F]A-D[E, F]A-E[B, C, D]A-F[B, C, D, E, O]A-G[C, D, E, F]
代码实现
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;import java.util.ArrayList;import java.util.Collections;public class CommonFriendsOne2 {public static class CommonFriendsMapper extends Mapper<LongWritable,Text,Text,Text>{@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String[] friendsAndUser = value.toString().split("\t");context.write(new Text(friendsAndUser[0]),new Text(friendsAndUser[1]));}}public static class CommonFriendsOneReduce extends Reducer<Text,Text,Text,Text>{@Overrideprotected void reduce(Text friends, Iterable<Text> users, Context context) throws IOException, InterruptedException {ArrayList<String> userList=new ArrayList<String>();for (Text user:users){userList.add(user.toString());}//排序Collections.sort(userList);context.write(friends,new Text(userList.toString()));}}public static void main(String[] args) throws Exception{Configuration conf = new Configuration();Job job = Job.getInstance(conf);//动态获取jar包在哪里job.setJarByClass(CommonFriendsOne2.class);//2.封装参数:本次job所要调用的mapper实现类job.setMapperClass(CommonFriendsMapper.class);job.setReducerClass(CommonFriendsOneReduce.class);//3.封装参数:本次job的Mapper实现类产生的数据key,value的类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(Text.class);//4.封装参数:本次Reduce返回的key,value数据类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);FileInputFormat.setInputPaths(job,new Path("F:\\mrdata\\friends\\output"));FileOutputFormat.setOutputPath(job,new Path("F:\\mrdata\\friends\\output2"));boolean res = job.waitForCompletion(true);System.exit(res ? 0:-1);}}
统计输出结果为
A-B[C, E]A-C[D, F]A-D[E, F]A-E[B, C, D]A-F[B, C, D, E, O]A-G[C, D, E, F]A-H[C, D, E, O]A-I[O]A-J[B, O]A-K[C, D]A-L[D, E, F]A-M[E, F]B-C[A]B-D[A, E]B-E[C]B-F[A, C, E]B-G[A, C, E]B-H[A, C, E]B-I[A]B-K[A, C]B-L[E]B-M[E]B-O[A]C-D[A, F]C-E[D]C-F[A, D]C-G[A, D, F]C-H[A, D]C-I[A]C-K[A, D]C-L[D, F]C-M[F]C-O[A, I]D-E[L]D-F[A, E]D-G[A, E, F]D-H[A, E]D-I[A]D-K[A]D-L[E, F]D-M[E, F]D-O[A]E-F[B, C, D, M]E-G[C, D]E-H[C, D]E-J[B]E-K[C, D]E-L[D]F-G[A, C, D, E]F-H[A, C, D, E, O]F-I[A, O]F-J[B, O]F-K[A, C, D]F-L[D, E]F-M[E]F-O[A]G-H[A, C, D, E]G-I[A]G-K[A, C, D]G-L[D, E, F]G-M[E, F]G-O[A]H-I[A, O]H-J[O]H-K[A, C, D]H-L[D, E]H-M[E]H-O[A]I-J[O]I-K[A]I-O[A]K-L[D]K-O[A]L-M[E, F]
版权@须臾之余/u/3995125