Coder Social home page Coder Social logo

jkcso / bigdata-analytics Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 6 KB

Optimised Hadoop scripts to retrieve data analytics from the latest US geographical survey. Mostly done for personal usage while learning Hadoop so there is no real value behind the data extracted but more of a comparison with SQL complex queries.

License: MIT License

PigLatin 100.00%
hadoop hadoop-mapreduce data-mining

bigdata-analytics's Introduction

Summary

Data Set

The tables of the usgs database provided by US Geographical Survey are available as a set of TSV files, each containing a table of data.

Tables

state

Contains all states and administratively equivalent entities within the USA.

populated_place

Each state has a number of habitations recorded in the populated place table.

feature

The type column of feature identifies the type of geographic feature, such as forest, dam, lake, and include some classified as populated places under type ppl.

Data Cleaning

Note that the data is not very ‘clean’, meaning that there are foreign keys not present that intuitively might expected to be present, and there is a certain amount of inconsistency between data found in the ferature and populated place tables.

How to Run a script.

pig −x local q0.pig

Scripts

  1. A Pig script that writes a CSV file with the scheme (state name) containing all those state names in feature for which there are no corresponding records in state. The result must be ordered by state name, return the names found in upper case, should assume all records in state are in upper case, and ignore difference in case between the two tables.

  2. A Pig script that writes a CSV file with the scheme (state name,population,elevation) that returns in order of state name the sum of the population and the average elevation of all populated place data in a given state. The result must be ordered by state name, and elevation data must be rounded to the nearest integer.

  3. A Pig script that writes a CSV file with the scheme (state name,county,no ppl,no stream) the number of populated places and the number of streams recorded in feature in each county. The result must be ordered by state name and county.

  4. A Pig script that writes a CSV file with the scheme (state name,name,population) containing the state name and place name of each populated place, returning only the five largest populated places in each state. The result must be ordered by state name, with places in each state listed in declining order of population. If populations agree, then order of name should be used.

bigdata-analytics's People

Contributors

jkcso avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.