Coder Social home page Coder Social logo

fkdwne / mapreduce Goto Github PK

View Code? Open in Web Editor NEW

This project forked from marvelousgirl/mapreduce

0.0 0.0 0.0 280 KB

用mapreduce计算框架实现了4个小demo: wordcount、带有词频属性的倒排索引算法、基于物品的推荐算法和基于用户的推荐算法

Java 100.00%

mapreduce's Introduction

mapreduce

用mapreduce计算框架实现了4个小demo: wordcount、带有词频属性的倒排索引算法、基于物品的推荐算法(itemCF)和基于用户的推荐算法(userCF)

itemCF步骤:

图片加载失败时,显示这段字

step1: 根据用户行为列表构建评分矩阵

map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 userID,itemID,score
map输出:key:Text类型 itemID value: Text类型 userID_score
reduce输入:key:Text类型 itemID value: Text类型 <userID1_score, userID2_score, userID2_score, ...>
reduce输出:key:Text类型 itemID value: Text类型 userID1_score,userID2_score,userID3_score

step2: 利用step1得到的评分矩阵,构建物品与物品的相似度矩阵,此处的相似度度量方法采用余弦相似度

此外,评分矩阵还要作为缓存,在setup方法里实现
map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 itemID userID1_score,userID2_score,userID3_score
map输出:key:Text类型,itemID value:Text类型 itemID1_sim
reduce输入:key:Text类型,itemID value:Text类型 <itemID1_sim,itemID3_sim,...>
reduce输出:key:Text类型 itemID value: Text类型 itemID1_sim,itemID3_sim,itemID5_sim

step3: 将评分矩阵转置

map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 itemID userID1_score,userID2_score,userID3_score
map输出:key:Text类型 userID value: Text类型 itemID_score
reduce输入:key: Text类型 userID value: Text类型 <itemID1_score,itemID3_score,...>
reduce输出:key: Text类型 userID value: Text类型 itemID1_score,itemID3_score,itemID2_score

step4: 物品与物品的相似度矩阵 * 转置后的评分矩阵

此时,转置后的评分矩阵要作为缓存,在setup方法里实现
map输入:key:LongWritable类型,每一行的起始偏移量 Text类型 itemID itemID1_sim,itemID3_sim,itemID5_sim
map输出:key:Text类型 itemID value: Text类型 userID_score
reduce输入:key:Text类型 itemID value: Text类型 <userID1_score, userID2_score,...>
reduce输出:key:Text类型 itemID value: Text类型 userID1_score, userID2_score,userID3_score

step5: 根据评分矩阵,将用户已有过行为的商品忽略

此时,评分矩阵作为缓存,在setup方法里实现
map输入:key:LongWritable类型,每一行的起始偏移量 value: Text类型 itemID userID1_score, userID2_score,userID3_score
map输出:key:Text类型 userID value: Text类型 itemID_score
reduce输入:key:Text类型 userID value: Text类型 <itemID1_score, itemID3_score,...>
reduce输出:key:Text类型 userID value: Text类型 itemID1_score,itemID3_score,itemID5_score

userCF:

和itemCF的逻辑是一样的,区别在于以userID作为行
图片加载失败时,显示这段字

mapreduce's People

Contributors

marvelousgirl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.