san089 Goto Github PK
Name: Sanchit Kumar
Type: User
Company: ScotiaBank
Bio: Data Engineer | Graduate Student | Open-source Contributor | Learner
Location: Toronto, Canada
Name: Sanchit Kumar
Type: User
Company: ScotiaBank
Bio: Data Engineer | Graduate Student | Open-source Contributor | Learner
Location: Toronto, Canada
Introduction to the data pipeline management with Airflow. Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.
Curated list of resources about Apache Airflow
Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.
This Project gives an insight into few statistics related to black Friday Sale.
Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups
Projects done in the Data Engineering Nanodegree by Udacity.com
A repo for data science related questions and answers
dbt_common_utils
:snake: Python wrapper for Goodreads API :books:
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is my personal collection of free Hadoop books, please feel free to share and learn.
This project provides an analysis on IPL(Indian premier League) stats from Year 2008 to 2017.
Machine learning demo projects
Sentiment analysis on live twitter stream and plotting the sentiment values using Matplotlib
:zap: motivate :zap: - A simple script to print random motivational quotes. Highly influenced by linux command fortune.
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
Example project and best practices for Python-based Spark ETL jobs and applications.
An archive of datasets distributed with R
A Kafka and Spark Streaming Integration project : SF Crime Statistics with Spark Streaming
Bare minimum code needed to detect occurrences of code and design smells
This Repository is for course SOEN 6011.
A multiplayer board Risk Game.
Apache Spark (PySpark) Practice on Real Data
This project contains pyspark jobs to create data pipelines and shows how to distribute the project package on Cluster.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.