Coder Social home page Coder Social logo

bigdata.examples's Introduction

Study examples of big data

HDFS

  1. WriteFileToHdfs
  2. ReadFileFromHdfs
  3. WriteSequenceFile
  4. ReadSequenceFile

MapReduce

  1. WordCount

    1. WordCountMapper
    2. WordCountReducer
  2. DataClean

    1. DataCleanMapper

    A simple data clean program base on sogou search records. Every record(line) contains 6 fields:

    1. datetime
    2. userid
    3. search keyword
    4. return order, the record's return order on current page.
    5. click order, the user's click order on current page.
    6. click url

    The sogou search records can be found on https://pan.baidu.com/s/1aSvsmIPSRm_ukDQKxruLgQ 提取码:3ype

  3. UserSearchCount

    1. UserSearchCountMapper
    2. UserSearchCountReducer

    A simple user search count program base on sogou search records that has been data cleaned, that is the output from DataClean.

  4. SequenceFileWriter

    1. SmallFileInputFormat
    2. SmallFileRecordReader
    3. SequenceFileMapper

    Combines small files to sequence file. Store as [key: filename, value: content bytes]

  5. CommentSplit

    1. CommentSplitMapper
    2. RatingSplitOutputFormat
    3. RatingSplitRecordWriter

    Split comments by users' rating. The input data can be found on https://pan.baidu.com/s/1ZC98VXdD-8xxoSbr6e-vlA 提取码:x5sg

    The 10th field of every record is the rating, which 0 is positive rating.

  6. PersonSortMain

    queen	20	12000
    pompeya	22	13000
    vexento 22	9000
    onerepublic	19	12000
    aaron	19	10000
    damon	30	29000
    raney    29	40000
    

    Sort the personInfo(name age salary) by salary desc, then by age asc.

  7. TopN

    1. TaobaoOrder
    2. TopNMapper
    3. TaobaoOrderPartitioner
    4. TaobaoOrderGrouping
    5. TopNReducer
    24764639956	2014-12-01 02:20:42.000	原宿风暴显色美瞳彩色隐形艺术眼镜1片 拍2包邮	33.6	2	18067785675
    24377918580	2014-12-17 08:10:25.000	大直径混血美瞳年抛彩色近视隐形眼镜2片包邮	19.8	2	173590154456
    24764639956	2014-11-12 21:28:42.000	之城混血小大直径彩色隐形眼镜1片装 包邮	49.8	2	18115243270
    24764639956	2014-11-22 13:24:46.000	纯铜艾灸盒 温灸器 5年陈艾艾柱 包邮	88	1	38644098439
    24856049592	2014-11-23 01:56:53.000	cosplay艺术片火影忍者美瞳彩色隐形眼镜	65	1	39814158438
    

    Calculate the top n amount consumption of every user and every month base on the taobao order history. Every order contains 6 fields: 1. user id 2. datetime 3. title 4. unit price 5. purchase number 6. product id

    The test data can be found on https://pan.baidu.com/s/1W7fvYdCRVu-pef_SzV_2mQ 提取码:i98h

  8. Raw weather data from NCDC

    0114010010999991990010100004+70933-008667FM-12+0009ENJA V0201901N00151004201CN0030001N9+99999+99999101721ADDAA106005091AG10001AY171061AY221061GF108991081071004501999999MD1110011+9999MW1731OA149900211REMSYN011333   91104
    0088010010999991990010103004+70933-008667FM-12+0009ENJA V0201601N00051000301CN0002001N9-00051-00061101651ADDAG12000AY171031AY241031GA1091+999999999GF109991091999999999999999MD1110051+9999MW1101
    0149010010999991990010106004+70933-008667FM-12+0009ENJA V0209991C00001000151CN0001001N9-00031-00041101621ADDAA199005091AG10000AY171061AY241061GA1091+999999999GF109991091999999999999999KA1120N-00101MD1710031+9999MW1471OA149900211REMSYN017333   21010 91104
    0088010010999991990010109004+70933-008667FM-12+0009ENJA V0200901N00101000151CN0000001N9-00031-00041101601ADDAG12000AY141031AY241031GA1091+999999999GF109991091999999999999999MD1710021+9999MW1451
    

    The [88, 92] range of data indicate the temperature zone(celsius, scale 10), we use mapreduce to sort the data by the temperature.

    1. PreSortProcessor The input is raw NCDC data, the output of map stage is SequenceFileOutputFormat (temperature IntWriteable, raw data Text), no reduce stage.
    2. TemperatureSorting The input is the data from [PreSortProcessor]'s map stage, we use TotalOrderPartitioner and InputSampler.RandomSampler to split the dataset evenly.

    The test data can be found on https://pan.baidu.com/s/1HD8-hruV_cEW5pkWWZRD4g 提取码: w5ge

Zookeeper

Program with Apache Curator

  1. CuratorClientUtil

Hive

  1. Hive Jdbc Sample HiveJdbc

bigdata.examples's People

Contributors

linch90 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.