MapReduce parctice.
A counter to count differernt word and its number.
- Set classpath according to reference document
- Compile them
javac *.java
- Pack them into a jar
jar cvf WordCount.jar *.class
- Upload input file to HDFS
hadoop fs -mkdir -p /input/WordCount hadoop fs -put input_file.txt /input/WordCount
- Create a output path(If already exists, skip to next step)
hadoop fs -mkdir /output
- Run it
hadoop jar WordCount.jar WordCount /input/WordCount /output/WordCount
- Cat or download output file
hadoop fs -cat /output/WordCount/part-r-00000 hadoop fs -get /output/WordCount/part-r-00000 result.txt
Calculate mean score from student transcript by studnet and by subject
- Set classpath according to reference document
- Compile them
javac StudentMean\*.java
- Pack them into a jar
jar cvf StudentMean.jar StudentMean\*.class
- Upload input file to HDFS
hadoop fs -mkdir -p /input/ScoreMean hadoop fs -put student_transcript.txt /input/ScoreMean
- Create a output path(If already exists, skip to next step)
hadoop fs -mkdir /output
- Run it
hadoop jar StudentMean.jar StudentMean.StudentMean /input/ScoreMean /output/StudentMean
- Cat or download output file
hadoop fs -cat /output/StudentMean/part-r-00000 hadoop fs -get /output/StudentMean/part-r-00000 StudentMean.txt
- Set classpath according to reference document
- Compile them
javac SubjectMean\*.java
- Pack them into a jar
jar cvf SubjectMean.jar SubjectMean\*.class
- Upload input file to HDFS(If already exists, skip to next step)
hadoop fs -mkdir -p /input/ScoreMean hadoop fs -put student_transcript.txt /input/ScoreMean
- Create a output path(If already exists, skip to next step)
hadoop fs -mkdir /output
- Run it
hadoop jar SubjectMean.jar SubjectMean.SubjectMean /input/ScoreMean /output/SubjectMean
- Cat or download output file
hadoop fs -cat /output/SubjectMean/part-r-00000 hadoop fs -get /output/SubjectMean/part-r-00000 SubjectMean.txt
Input a child-parent file, find all grandchild-grandparent. Suppose there is no same name
- Set classpath according to reference document
- Compile them
javac *.java
- Pack them into a jar
jar cvf GrandchildGrandparent.jar *.class
- Upload input file to HDFS
hadoop fs -mkdir -p /input/GrandchildGrandparent hadoop fs -put child-parent.txt /input/GrandchildGrandparent
- Create a output path(If already exists, skip to next step)
hadoop fs -mkdir /output
- Run it
hadoop jar SubjectMean.jar SubjectMean.SubjectMean /input/ScoreMean /output/SubjectMean
- Cat or download output file
hadoop fs -cat /output/GrandchildGrandparent/part-r-00000 hadoop fs -get /output/GrandchildGrandparent/part-r-00000 GrandchildGrandparent.txt