Department of Electronics & Telecommunciation,
Don Bosco Institute of Technology, Mumbai.
Course Coordinator: Mr. Jithin Isaac
To learn and explore Apache Hive, the data warehouse infrastructure tool to process structured data in HDFS.
- Software:
- Apache Hive
- Hortonworks Data Platform Sandbox
- Installation of HDP (Done in experiment 3)
- Accessing Apache Hive client in HDP
- Basic SQLish quering of data stored in HDFS
-
Install HDP via https://jithinsisaac.github.io/posts/hdp_sandbox/
Additional help via https://www.cloudera.com/tutorials/getting-started-with-hdp-sandbox.html -
Access the Hive client in HDP and perform DDL operations to generate data, which is stored in the HDFS
-
Perform certain DML operations on the data stored in the HDFS
-
More HQL commands at https://docs.cloudera.com/documentation/enterprise/5-8-x/PDF/cloudera-hive.pdf & https://hortonworks.com/wp-content/uploads/2013/05/hql_cheat_sheet.pdf
Queries related to
- Copying data from Local FS to HDFS
- Hive Shell
- Creating Database
- Creating Table
- Loading data into Table
- Select queries on the table (any 5 - min, max, count,avg, describe, select-and-or)
- ADD THE PROCEDURE THAT YOU FOLLOWED FOR COMPLETING THE EXPERIMENT HERE
- ADD SCREENSHOTS OF YOUR OUTPUT HERE ALONG WITH VIDEO
- Submitted on 30-09-2021
- Submitted by Mr/Ms. XYZ
- Roll No. 111