Coder Social home page Coder Social logo

openweathermap-etl's Introduction

OpenWeatherMap API - ETL

Overview

This project orchestrates an ETL pipeline through Apache Airflow, designed to retrieve weather data from the OpenWeatherMap API and initially store it as JSON in MongoDB. Subsequently, the data undergoes a transformation process to conform to a structured format suitable for insertion into a PostgreSQL database. This structured data is then ready for further analysis or integration with other applications.

Used Technologies

  • Apache Airflow: Orchestrates the workflow of the ETL process.
  • Python: Scripts for fetching, transforming, and loading the data.
  • MongoDB: The initial storage of raw JSON data from the OpenWeatherMap API.
  • PostgreSQL: The final storage where transformed data is loaded for future use.
  • Docker: Containerization of the Airflow environment for consistent deployment.

Files

  • openweather_fetch_api.py: This script contains the DAG responsible for extracting weather data from the OpenWeatherMap API. It defines the tasks to hit the API endpoint and fetches the latest weather information. After that, these raw data is stored in a MongoDB database.

  • openweather_transform.py: This script transforms the raw JSON data fetched from the OpenWeatherMap API into a structured format that is suitable for storage in the PostgreSQL database. The transformation process includes cleaning, mapping, and aggregating the data.

  • openweather_load.py: This cctains the DAG for loading the transformed weather data into the PostgreSQL database. This script handles the connection to the database and ensures that the data is loaded correctly.

  • openweathermap_dag.py: This is the main DAG file that orchestrates the execution of the fetching, transforming, and loading tasks. It schedules and sequences the ETL process, ensuring that each step is executed in the correct order and at the right time.

Setup Instructions

NOTE: Before starting setup the project, don't forget to create your secrets.yaml file under dags/ folder and fill in required parameters.

  • secrets.yaml Template:

        api_key: fancy-api-key
        longitude: longitude-of-a-city
        latitude: latitude-of-a-city
        city: name-of-the-best-city
        mongo_uri: mongodb://your-username:your-password@your-host:your-port
        db_name: your-mongodb-database-name
        collection_name: your-mongodb-collection-name
    
        postgres_uri: postgresql+psycopg2://airflow-username:airflow-password@postgres/airflow-database-name 
        #If you haven't changed anything, these should be set to default value whic is airflow.
        postgres_table_name: your-postgresql-database-name
    
        log_file_path: logs/fancy-file-name.log
    

A) Python Environment Setup

  1. Create a Python Virtual Environment:

    Use the venv module to create a new virtual environment. Open your terminal and execute:

    python3 -m venv env

    This command creates a virtual environment named env.

  2. Activate the Virtual Environment:

    You need to activate the virtual environment to use it. Depending on your OS, the activation command differs:

    • For Windows:

      .\env\Scripts\activate
    • For Unix or MacOS:

      source env/bin/activate
  3. Install the Dependencies:

    With the virtual environment activated, install the project dependencies using the command:

    pip install -r requirements.txt

    This installs all the packages defined in requirements.txt.

B) MongoDB with Docker

In the API fecthing stage (aka. Extract), we store json response in MongoDB and therefore we need to initialize it. To do this;

  1. **Change directory to mongodb-docker: cd mongodb-docker

  2. Run docker-compose.yaml docker-compose up -d or docker compose up -d

C) Airflow Setup with Docker

Since this project is using external dependencies, it needs to be built with customized docker image and everything is provided inside of Dockerfile In order to do this;

  1. Initialize database and Create the first user account

    docker-compose up airflow-init
    

    or

    docker compose up airflow-init
    

    After initialization is complete, you should see a message like this:

    airflow-init_1       | Upgrades done
    airflow-init_1       | Admin user airflow created
    airflow-init_1       | 2.8.1
    start_airflow-init_1 exited with code 0
    
  2. Run build command for our custom docker image

    docker-compose build
    

    or

    docker compose build
    
  3. Run airflow docker

    docker-compose up -d
    

    or

    docker compose up -d
    

Usage

Once the containers are up and running, access the Airflow web interface at http://localhost:8080. Trigger the ETL DAG to start the data pipeline. If you don't change anything, username and password will be the default value which is airflow for both.

Documentation References

For further information and to understand the setup and management of the Airflow environment within a Docker context, refer to the following documentation:

  1. Airflow with Docker: Detailed guide on how to use Apache Airflow with Docker Compose. Airflow Docker Compose Documentation

  2. Airflow Module Management: Information on managing Python modules in Apache Airflow. Airflow Modules Management Documentation

openweathermap-etl's People

Contributors

nedimcanulusoy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.