用mapreduce计算框架实现了4个小demo: wordcount、带有词频属性的倒排索引算法、基于物品的推荐算法(itemCF)和基于用户的推荐算法(userCF)
map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 userID,itemID,score
map输出:key:Text类型 itemID value: Text类型 userID_score
reduce输入:key:Text类型 itemID value: Text类型 <userID1_score, userID2_score, userID2_score, ...>
reduce输出:key:Text类型 itemID value: Text类型 userID1_score,userID2_score,userID3_score
此外,评分矩阵还要作为缓存,在setup方法里实现
map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 itemID userID1_score,userID2_score,userID3_score
map输出:key:Text类型,itemID value:Text类型 itemID1_sim
reduce输入:key:Text类型,itemID value:Text类型 <itemID1_sim,itemID3_sim,...>
reduce输出:key:Text类型 itemID value: Text类型 itemID1_sim,itemID3_sim,itemID5_sim
map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 itemID userID1_score,userID2_score,userID3_score
map输出:key:Text类型 userID value: Text类型 itemID_score
reduce输入:key: Text类型 userID value: Text类型 <itemID1_score,itemID3_score,...>
reduce输出:key: Text类型 userID value: Text类型 itemID1_score,itemID3_score,itemID2_score
此时,转置后的评分矩阵要作为缓存,在setup方法里实现
map输入:key:LongWritable类型,每一行的起始偏移量 Text类型 itemID itemID1_sim,itemID3_sim,itemID5_sim
map输出:key:Text类型 itemID value: Text类型 userID_score
reduce输入:key:Text类型 itemID value: Text类型 <userID1_score, userID2_score,...>
reduce输出:key:Text类型 itemID value: Text类型 userID1_score, userID2_score,userID3_score
此时,评分矩阵作为缓存,在setup方法里实现
map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 itemID userID1_score, userID2_score,userID3_score
map输出:key:Text类型 userID value: Text类型 itemID_score
reduce输入:key:Text类型 userID value: Text类型 <itemID1_score, itemID3_score,...>
reduce输出:key:Text类型 userID value: Text类型 itemID1_score,itemID3_score,itemID5_score