Kanthi's Projects
A sample project to explain node event loop in the presence of multiple processes.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache Airflow Website
Build highly concurrent, distributed, and resilient message-driven applications on the JVM
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
AWS Glue crawler using Cloudformation template to scan s3 bucket.
Script based on boto3 to run Pyspark jobs on AWS EMR.
Apache Beam is a unified programming model for Batch and Streaming
AWS SDK for Python
Mirror of Apache Cassandra
Mirror of Distributed test suite for Apache Cassandra
https://www.udemy.com/course/cca-175-spark-and-hadoop-developer-practice-tests
Prometheus exporter for Confluent Cloud API metric
ClickHouse® is a free analytics DBMS for big data
ClickHouse Python Driver with native interface support
Golang driver for ClickHouse
JDBC driver for ClickHouse
Altinity Sink Connector for ClickHouse
ClickHouse dialect for SQLAlchemy
Code style for Airlift projects
Solutions to codility tests in java.
An orchestration platform for the development, production, and observation of data assets.
Distributed SQL Engine in Python using Dask
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
An extremely simple Golang-based in-memory KV store to rule them all.
A distributed task scheduler for Dask
Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache Druid: a high performance real-time analytics database.