The word-count-program from jeslingnanasheela

Word Count Program Using Map And Reduce

Aim:

To execute a Word Count Program Using Map And Reduce

Procedure:

Analyze the input file content
Develop the code a. Writing a map function b. Writing a reduce function c. Writing the Driver class
Compiling the source
Building the JAR file
Starting the DFS
Creating Input path in HDFS and moving the data into Input path
Executing the program

Program: WordCount.java

Step 1: Compile the java program

For compilation we need this hadoop-core-1.2.1.jar file to compile the map reduce program.

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core/1.2.1

Assuming both jar and java files in same directory run the following command to compile root@a4cseh160:/#javac -classpath hadoop-core-1.2.1.jar WordCount.java Step 2: Create a jar file

Syntax:

jarcf jarfilename.jar MainClassName*.class

Output:

root@a4cseh160:/#jar cf wc.jar WordCount*.class

Step 3: Make directory in hadoop file system

Syntax:

hdfsdfs -mkdirdirectoryname

Output:

root@a4cseh160:/# hdfsdfs -mkdir /user

Step 4: Copy the input file into hdfs Syntax:

hdfsdfs -put sourcefiledestpath

Output:

root@a4cseh160:/#hdfsdfs -put /input.txt /user

Step5: To a run a program

Syntax:

hadoop jar jarfilenamemain_class_nameinputfileoutputpath

Output:

root@a4cseh160:/#hadoop jar wc.jar WordCount /user/input.txt /user/out

InputFile: (input.txt)

Cloud and Grid Lab. Cloud and Grid Lab. Cloud Lab.

Output:

3 Cloud

3 Lab.

2 Grid

2 and

Step 6: Check the output in the Web UI at http://localhost:50070.

In the Utilities tab select browse file system and select the correct user. The output is available inside the

output folder named user.

Step 7: To Delete an output folder

Syntax:

hdfsdfs -rm -R outputpath

Output:

root@a4cseh160:/#hdfsdfs -rm -R /user/out.txt

Result:

Thus the numbers of words were counted successfully by the use of Map and Reduce tasks.

Review Questions:

What is Map? a programming model or pattern within the Hadoop framework that is used to access big data stored in the Hadoop File System (HDFS).
What is Reduce? he reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples.
What is Hadoop core? Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.

4 What is HBase, Pig, Zooeeper and Hive? HBase - Is a NoSQL database on top of HDFS (hadoop distributed file system which is the file system and core component of Hadoop). Hive - Is an ecosystem/component of Hadoop which is similar to SQL to handle mostly structured data. Pig - Is an ecosystem/component of Hadoop which is a kind of scripting language

jeslingnanasheela / word-count-program Goto Github PK

word-count-program's Introduction

word-count-program's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent