stefen-taime Goto Github PK

followers: 75.0 following: 2.0 repos: 66.0 gists: 0.0

Name: Stefen

Type: User

Bio: Data Engineer & Ops

Location: Montréal

Blog: http://stefen.pythonanywhere.com/

Hello, I am Stefen 👋

I am an data engineer 💻, photographer 📸, and designer 🎨!

Hello and welcome to my GitHub! With over eight years of experience as a data engineer, I have gained deep expertise in designing, building, and managing data pipelines and infrastructures. My work supports various analytical and business intelligence needs, with a solid mastery of a variety of technologies and tools, including but not limited to, SQL, Python, ETL frameworks, as well as big data technologies such as Hadoop and Spark.

My GitHub account hosts a range of projects and code examples that demonstrate my expertise in the field of data engineering. These projects not only showcase my skills in extracting, transforming, and loading data from various sources but also reflect my abilities in data modeling, visualization, and reporting.

I am committed to continuous learning and growth, and I am open to collaborating with other professionals in the field. I invite you to join me on GitHub to explore my work, exchange ideas, and share knowledge.

🤝 Connect with me:

Feel free to contact me if you have any questions or comments!

🔭 I am currently working on

Redesigning my old projects
Data engineering projects
Data analyses
DevOps projects

🌱 I am currently learning

📱 Machine Learning
AI (Artificial Intelligence)
CloudSec (Cloud Security)
Kubernetes

💼 Technical Skills

Stefen's Projects

-google-analytics-360

Welcome to the Google Analytics 360 Dataset Project! This repository is designed for anyone interested in working with realistic Google Analytics data. Whether you're a data scientist, a student, or a marketing analyst

adv_nlp_workshop_odsc_europe22

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage deep learning and deep transfer learning to solve popular tasks in NLP including Classification, Information Retrieval, Sentiment Analysis, Search Engines, Clustering, Paraphrase Mining, Summarization, Language Translation, Q&A systems

airflow_etl

The Pipeline for updating data between OLTP and OLAP environments

azurepipeline

Azure Data Pipeline

big-o-algorithm

we’ll explain Big O notation an real-world Python examples to illustrate how it can be applied to various time complexities.

build_api_auth2.0

build_api_devops_pipeline

data-engineering-practice

Data Engineering Practice Problems

dataops

dbt-redshift-demo

dbt / Amazon Redshift Demonstration Project

de-apache-spark

Data Engineering com Apache Spark

devops-bash-script

This repository contains a collection of bash scripts for common DevOps tasks, such as installing software, setting up environments, and managing resources.

docker-stack

directory with different docker-compose file to quickly start an infrastructure

docsearch

Our project is a testament to this need, offering a comprehensive solution that combines modern technologies and architectures to create a powerful document search engine. This engine is not just a tool but a sophisticated ecosystem designed to handle complex data processing and retrieval tasks.

elt-pipeline

Apartments Data Pipeline using Airflow and Spark.

elt_pipeline

etl-data-pipeline-rdbms-to-hdfs-using-airflow-apache-sqoop-spark-postgres-and-hive

This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)

etl_onaws_deploy_with_terraform

The objective of this guide is to demonstrate how to automate the deployment of a data pipeline on AWS using Terraform. The pipeline will utilize AWS services such as Lambda, Glue, Crawler, Redshift, and

eventmusic

EventMusic Producer is a Dockerized application designed to read data and output them to a Kafka topic, using Avro schemas for data serialization. It integrates seamlessly with Kafka and the Schema Registry to manage the flow of event data linked to music event information.

fake-server-data

free-real-time-flight-status-pipeline

real-time flight status data pipeline using a myriad of technologies such as Kafka, Schema Registry, Avro, GraphQL, Postgres, and React.

gmail-to-mongodb-script

This script facilitates the automation of fetching emails from a user's Gmail account and storing them into a MongoDB database. The emails fetched are filtered by specific labels such as Promotions, Social, Updates, and Forums. The script is intended to run continuously, checking for new emails every minute.

how-to-automatically-deploy-a-flask-application-on-an-ec2-instance-with-a-bash-script

The main motivation for this mini-project is to get familiar with using Bash Scripting and the AWS CLI to automate command line tasks. This particular repo contains a configuration script that automatically creates an EC2 instance, accesses it via SSH, installs dependencies and hosts a simple Flask application using the image taken from Docker Hub.