Coder Social home page Coder Social logo

waqarg2001 / covid-19-de-project Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.39 MB

ETL process applied on covid-19 dataset of European countries using Azure services such as databricks, keyvault, sql database, data factory etc. Finally power bi dashbaord was also made.

Python 100.00%
azure azure-pipelines covid-19 data data-engineering data-factory data-lake databricks etl etl-pipeline

covid-19-de-project's Introduction


Utilisation of Azure Cloud Services to architect and orchestrate data pipeline(weekly) to perform ETL on Covid-19 dataset of European countries extracted from European Centre for Disease Prevention and Control

built-with-love powered-by-coffee cc-nc-sa

OverviewToolsArchitectureSupportLicense

Overview

The European Centre for Disease Prevention and Control (ECDC) was established in 2005. It is an EU agency aimed at strengthening Europe's defenses against infectious diseases.

Covid 19 Analysis is a comprehensive project that harnesses the capabilities of Azure services to collect, analyze, and visualize essential COVID-19 data while ensuring robust security through Azure Key Vault and Azure Service Principals. This project seamlessly retrieves data from the European Centre for Disease Prevention and Control (ECDC) and combines it with population data for a comprehensive analysis of the pandemic's impact. Data is ingested into Azure Data Lake Gen2, which acts as a centralized storage repository, and then undergoes transformations and exploratory analysis using Azure Dataflow and Azure Databricks. To maintain stringent security, Azure Key Vault is employed to securely manage and store sensitive credentials and secrets. Processed data is stored in an Azure SQL Database for efficient querying, and Azure Data Lake Gen2 is used for intermediate and refined datasets. The project includes the use of Power Bi for showcasing the spread and testing of Covid 19 in European countries.

The repository directory structure is as follows:

├── README.md          <- The top-level README for developers using this project. 
| 
├── Data             <- Contains data extracted, processed, and used throughout the project.
│   ├── Raw          <- Contains raw data folders
│   │
│   ├── Processed    <- Contains processed data acquired through databricks spark notebooks and azre data flow.
│   │
│   ├── Lookup       <- Contains look up files used for population and country info.
│   │
│   ├── Config       <- Contains file used to automate the extraction part for ADF.
│
│
├── Databricks Notebooks         <- Scripts to aggregate and transform data
│   ├── configuration           <- Contains configurations used for mounting ADLS and azure key vault.
│   │
│   ├── transformation          <- Contains transformation notebooks 
|         
├── Resources                  <- Resources for readme file.

Tools

To build this project, the following tools were used:

  • Azure Databricks
  • Azure KeyVault
  • Azure Active Directory
  • Azure DataLake Gen 2
  • Azure Data Factory
  • Azure SQL Database
  • Power Bi
  • Pyspark
  • SQL
  • Git

Architecture

The architecture of this project is inspired by the following architecture.

Support

If you have any doubts, queries, or suggestions then, please connect with me on any of the following platforms:

Linkedin Badge Gmail Badge

License

by-nc-sa

This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.

covid-19-de-project's People

Contributors

waqarg2001 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.