Coder Social home page Coder Social logo

luis-sousa-pinto / dozer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from getdozer/dozer

0.0 0.0 0.0 56.88 MB

Connect any data source, combine them in real-time and instantly get low-latency Data APIs. All with just a simple configuration!

Home Page: https://getdozer.io

License: Apache License 2.0

Shell 0.31% JavaScript 0.01% Python 0.01% Rust 99.55% Dockerfile 0.11% Solidity 0.01%

dozer's Introduction


Connect any data source, combine them in real-time and instantly get low-latency data APIs.
⚡ All with just a simple configuration! ⚡️


CI Coverage Status Docs Join on Discord License

Overview

Dozer makes it easy to build low-latency data APIs (gRPC and REST) from any data source. Data is transformed on the fly using Dozer's reactive SQL engine and stored in a high-performance cache to offer the best possible experience. Dozer is useful for quickly building data products.

Architecture

Quick Start

Follow the instruction below to install Dozer on your machine and run a quick sample using the NY Taxi Dataset

Installation

MacOS Monterey (12) and above

brew tap getdozer/dozer && brew install dozer

Ubuntu 20.04 and above

# amd64
curl -sLO https://github.com/getdozer/dozer/releases/latest/download/dozer-linux-amd64.deb && sudo dpkg -i dozer-linux-amd64.deb

# aarch64
curl -sLO https://github.com/getdozer/dozer/releases/latest/download/dozer-linux-aarch64.deb && sudo dpkg -i dozer-linux-aarch64.deb

Dozer requires protobuf-compiler, installation instructions can be found in additional steps

Build from source

cargo install --path dozer-cli --locked

Run it

Download sample configuration and data

Create a new empty directory and run the commands below. This will download a sample configuration file and a sample NY Taxi Dataset file.

curl -o dozer-config.yaml https://raw.githubusercontent.com/getdozer/dozer-samples/main/connectors/local-storage/dozer-config.yaml
curl --create-dirs -o data/trips/fhvhv_tripdata_2022-01.parquet https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2022-01.parquet

Run Dozer binary

dozer -c dozer-config.yaml

Dozer will start processing the data and populating the cache. You can see a progress of the execution from the console.

Query the APIs

When some data is loaded, you can query the cache using gRPC or REST

# gRPC
grpcurl -d '{"query": "{\"$limit\": 1}"}' -plaintext localhost:50051 dozer.generated.trips_cache.TripsCaches/query

# REST
curl -X POST  http://localhost:8080/trips/query --header 'Content-Type: application/json' --data-raw '{"$limit":3}'

Alternatively, you can use Postman to discover gRPC endpoints through gRPC reflection

postman query

Read more about Dozer here. And remember to star 🌟 our repo to support us!

Client Libraries

Library Language License
dozer-python Dozer Client library for Python Apache-2.0
dozer-js Dozer Client library for Javascript Apache-2.0
dozer-react Dozer Client library for React with easy to use hooks Apache-2.0

Python

from pydozer.api import ApiClient
api_client = ApiClient('trips')
api_client.query()

Javascript

import { ApiClient } from "@dozerjs/dozer";

const flightsClient = new ApiClient('flights');
flightsClient.count().then(count => {
    console.log(count);
});

React

import { useCount } from "@dozerjs/dozer-react";
const AirportComponent = () => {
    const [count] = useCount('trips');
    <div> Trips: {count} </div>
}

Samples

Check out Dozer's samples repository for more comprehensive examples and use case scenarios.

Type Sample Notes
Connectors Postgres Load data using Postgres CDC
Local Storage Load data from local files
AWS S3 Load data from AWS S3 bucket
Ethereum Load data from Ethereum
Kafka Load data from kafka stream
Snowflake (Coming soon) Load data using Snowflake table streams
SQL Using JOINs Dozer APIs over multiple sources using JOIN
Using Aggregations How to aggregate using Dozer
Using Window Functions Use Hop and Tumble Windows
Use Cases Flight Microservices Build APIs over multiple microservices.
Scaling Ecommerce Profile and benchmark Dozer using an ecommerce data set
Use Dozer to Instrument (Coming soon) Combine Log data to get real time insights
Real Time Model Scoring (Coming soon) Deploy trained models to get real time insights as APIs
Client Libraries Dozer React Starter Instantly start building real time views using Dozer and React
Ingest Polars/Pandas Dataframes Instantly ingest Polars/Pandas dataframes using Arrow format and deploy APIs
Authorization Dozer Authorziation How to apply JWT Auth on Dozer

Connectors

Refer to the full list of connectors and example configurations here.

Connector Status Type Schema Mapping Frequency Implemented Via
Postgres Available ✅ Relational Source Real Time Direct
Snowflake Available ✅ Data Warehouse Source Polling Direct
Local Files (CSV, Parquet) Available ✅ Object Storage Source Polling Data Fusion
Delta Lake Alpha Data Warehouse Source Polling Direct
AWS S3 (CSV, Parquet) Alpha Object Storage Source Polling Data Fusion
Google Cloud Storage(CSV, Parquet) Alpha Object Storage Source Polling Data Fusion
Ethereum Available ✅ Blockchain Logs/Contract ABI Real Time Direct
Kafka Stream Available ✅ Schema Registry Real Time Debezium
MySQL In Roadmap Relational Source Real Time Debezium
Google Sheets In Roadmap Applications Source
Excel In Roadmap Applications Source
Airtable In Roadmap Applications Source

Pipeline Log Reader Bindings

Library Language License
dozer-log-python Python binding for reading Dozer logs Apache-2.0
dozer-log-js Node.js binding for reading Dozer logs Apache-2.0

Python

we support CPython >= 3.10 on Windows, MacOS and Linux, both amd and arm architectures.

import pydozer_log

reader = await pydozer_log.LogReader.new('.dozer', 'trips')
print(await reader.next_op())

Javascript

const dozer_log = require('@dozerjs/log');

const runtime = dozer_log.Runtime();
reader = await runtime.create_reader('.dozer', 'trips');
console.log(await reader.next_op());

Releases

We release Dozer typically every 2 weeks and is available on our releases page. Currently, we publish binaries for Ubuntu 20.04, Apple(Intel) and Apple(Silicon).

Please visit our issues section if you are having any trouble running the project.

Contributing

Please refer to Contributing for more details.

dozer's People

Contributors

chubei avatar karolisg avatar v3g42 avatar mediuminvader avatar snork-alt avatar dependabot[bot] avatar duonganhthu43 avatar xudong963 avatar chloeminkyung avatar auterium avatar nurikk avatar tinnguyen71 avatar hoangnh93 avatar universalmind303 avatar abhishekmishragithub avatar dozerpadawan avatar gautamprikshit1 avatar readall avatar sonhmai avatar abcpro1 avatar cahyosubroto avatar crajcan avatar hi-rustin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.