brucemen711 Goto Github PK
Type: User
Type: User
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Application Performance Optimization Summary
:memo: An awesome Data Science repository to learn and apply for real world problems.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
A curated list of amazingly awesome open source sysadmin resources inspired by Awesome PHP.
大数据入门指南 :star:
Notes from books and other interesting things that I've read. Table of contents at the end 👇
Scalable PostgreSQL for multi-tenant and real-time analytics workloads
Dynamically generate Apache Airflow DAGs from YAML configuration files
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Collection of Papers On Database Management Systems
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
Examples for running Debezium (Configuration, Docker Compose files etc.)
ETL best practices with airflow, with examples
Tools for working with parquet, impala, and hive
Example for article Running Spark 3 with standalone Hive Metastore 3.0
Apache Iceberg
Iceberg Stack
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`.
Community developed integrations and plugins for the Datadog Agent.
example
High Performance Kafka Connector for Spark Streaming.Supports Multi Topic Fetch, Kafka Security. Reliable offset management in Zookeeper. No Data-loss. No dependency on HDFS and WAL. In-built PID rate controller. Support Message Handler . Offset Lag checker.
New generation decentralized data warehouse and streaming data pipeline
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Companion webpage to the book "Mathematics For Machine Learning"
YSDA course in Natural Language Processing
Official home of the community managed version of Presto, the distributed SQL query engine for big data, under the auspices of the Presto Software Foundation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.