UTILIZE HADOOP FOR DATA ANALYSIS
The following commands are required to run the analysis after cloning into CS Cloud
- Basic MapReduce Recipe (Python) cmdline:
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /data/nyc/nyc-traffic.csv -output /users-cloud-16fs/(your username)/output/job1-out -mapper ~/mapper.py -reducer ~/reducer.py -file ~/{mapper,reducer}.py
- To view the output file:
hdfs dfs -cat /users-cloud-16fs/(your username)/output/job1-out/part-00000