Coder Social home page Coder Social logo

divithraju / divith-raju-building-big-data-infrastucture-nosql-and-sql Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 203 KB

Big Data Platform on MongoDB Atlas and Heroku PostgreSQL

License: MIT License

Python 100.00%
big-data etl nosql sql bigdataplatform

divith-raju-building-big-data-infrastucture-nosql-and-sql's Introduction

Building-Big-Data-Infrastucture-using-NoSQL-and-SQL

Big Data Pltaform on MongoDB Atlas and Heroku PostgreSQL

Background

The motivation behind this project is I see a lot of big data datasets on the internet, people used for data analysis and machine learning, so this time I am interested to build what is behind these big datasets and how to build big data infrastructure.

Methodology

  • First, finding Developer API’s for enriching the database. I found Tweeter API, Yahoo Finance API, and News API all three API’s are available with registration.
  • Then, writing Python codes to connect to Databases and consuming data using API’s and populating both NoSQL and SQL databases.
  • Last, building dashboards and connecting to Databricks.

methodology

Tech Stack and Implementation

Almost the same code used for parsing data from Twitter, Yahoo, News API for NoSQL and SQL Databases the difference was only in connecting populating databases. For parsing data from twitter used Python ‘tweepy’ package, and it also provides some code snippets to start with.

For NoSQL big data infrastructure, I used MongoDB Atlas free cluster that provides 500 connections per day and 512 mb of free database size. Consuming data from Twitter, Yahoo, News API I used Python, also connecting to database MongoDB Python API Client. Before populating database, I created 3 collections (tweets, articles, stocks) using MongoDB atlas, no DDL needed. MongoDB documentation is clear and easy to use.

Python connector to MongoDB using Python ‘pymongo’ package, where we authenticate and we have access to the database. Connecting Databricks Community Version free cluster and MongoDB Atlas used MongoDB Spark Connector. MongoDB charts used for building real-time dashboards, connected to your MongoDB Atlas and automatically updates charts once we insert data into database.

For SQL database, I decided to go with PostgreSQL because it is open source and it would be easier in the feature if there is a need to transfer data from one platform to another. Also, there are many option on using PostgreSQL on cloud, then SQL Server or Oracle. Searching free and potentially easy to scale PostgreSQL platform I left my choice to Heroku PostgreSQL, which provides a wide range of features like testing schema migration, manage database access levels and protect queries, scale horizontally and quick access of data.

Python connector to Heroku PostgreSQL. Basically, for all PostgreSQL database there is a Python connector called ‘psycopg2’. Also, we can access PostgreSQL database using PgAdmin as like it is local PostgreSQL.

Dependencies

  • Python packages Use the package manager pip to install python dependencies.
pip install -r requirements.txt
  • Twitter API Access. Get Developer API token from here

  • News API Access. Get Developer API token from here

  • Yahoo Finance API. Get access here

  • Databases:

    • MongoDB Atlas. Get Free 512MB cloud cluster here
    • Heroku PostgreSQL. Get cloud database for non-comercial apps here

License

Licensed under the MIT License

divith-raju-building-big-data-infrastucture-nosql-and-sql's People

Contributors

divithraju avatar

Stargazers

Roman avatar Oybek Kayumov avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.