🚀 Welcome to my robust ETL (Extract, Transform, Load) pipeline for processing daily bank transactions using Amazon Web Services (AWS).
This project implements a scalable and serverless ETL pipeline for handling daily bank transactions. The entire process is orchestrated using various AWS services to ensure efficiency, reliability, and security.
- Source Data: Daily CSV files are uploaded to an S3 bucket (
s3://your-source-bucket
).
- AWS Lambda Function: A serverless AWS Lambda function is triggered upon the upload of CSV files. This Lambda function initiates the execution of the AWS Glue job.
- AWS Glue Job: The AWS Glue job is responsible for processing the data. It leverages bookmarking for efficient incremental loads, ensuring that only new data is processed.
- Processed Data: The transformed data is securely stored in another S3 bucket (
s3://your-destination-bucket
) in the efficient Parquet format.
- Athena for Analysis: AWS Athena is employed for seamless analysis of the transformed data. Users can run SQL queries on the Parquet data stored in the destination S3 bucket.