-
Developed an ETL data pipeline on Google Cloud Platform project: Data_Engineer-End-to-End_Project_on_GCP
- Collected Data from REST API and MySQL database then imported to the Data lake on Google Cloud Storage.
- Cleaning and manipulating data with Pandas and Spark.
- Built an automated data pipeline via Apache Airflow on Google Cloud Composer
- Load final data to Data Warehouse on Google BigQuery and Visualized data on Looker Data Studio dashboard (Google data studio) to identify key factors and trends
-
Database modeling and ETL data pipeline with Docker: Database-modeling-ETL-with-Airflow-on-Docker
- Designed a database data model and created a PostgreSQL database to store a dataset.
- Automated an ETL data pipeline with Apache Airflow on Docker. Extracted data from PostgreSQL, Transformed with Pandas and Loaded to Datawarehouse.
-
End to End Batch processing and Real-time Data streaming on Hadoop ecosystem managed by Cloudera. : Real-time_Data_streaming_on_Hadoop-ecosystem_by_Cloudera
- Ingested raw data into Hadoop HDFS via Hadoop CLI, clean data with SparkSQL.
- Real-time streaming data with Flume, stored on HBase and Transformed with Spark streaming then loaded to Hive.
- Scheduling data workflow with Oozie and Using HiveQL to generate data insights.
-
Change Data Capture (CDC) to replicate data from a source database to a destination database using Kafka: Change data capture with Kafka
๐ซ Email: [email protected]