Coder Social home page Coder Social logo

mfrank2016 / incubator-hudi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/hudi

0.0 0.0 0.0 9.73 MB

Upserts And Incremental Processing on Big Data

Home Page: https://hudi.apache.org

License: Apache License 2.0

Shell 2.25% Java 94.21% Scala 2.80% Dockerfile 0.57% HTML 0.13% Python 0.04%

incubator-hudi's Introduction

Apache Hudi (Incubating)

Apache Hudi (Incubating) (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).

https://hudi.apache.org/

Build Status License Maven Central

Features

  • Upsert support with fast, pluggable indexing
  • Atomically publish data with rollback support
  • Snapshot isolation between writer & queries
  • Savepoints for data recovery
  • Manages file sizes, layout using statistics
  • Async compaction of row & columnar data
  • Timeline metadata to track lineage

Hudi provides the ability to query via three types of views:

  • Read Optimized View - Provides excellent snapshot query performance via purely columnar storage (e.g. Parquet).
  • Incremental View - Provides a change stream with records inserted or updated after a point in time.
  • Real-time View - Provides snapshot queries on real-time data, using a combination of columnar & row-based storage (e.g Parquet + Avro).

Learn more about Hudi at https://hudi.apache.org

Building Apache Hudi from source

Prerequisites for building Apache Hudi:

  • Unix-like system (like Linux, Mac OS X)
  • Java 8 (Java 9 or 10 may work)
  • Git
  • Maven
# Checkout code and build
git clone https://github.com/apache/incubator-hudi.git && cd incubator-hudi
mvn clean package -DskipTests -DskipITs

Quickstart

Please visit https://hudi.apache.org/quickstart.html to quickly explore Hudi's capabilities using spark-shell.

incubator-hudi's People

Contributors

bvaradar avatar dependabot[bot] avatar eisig avatar gekath avatar guru107 avatar hddong avatar hotienvu avatar jianxu avatar kaka11chen avatar kaushikd49 avatar lamberken avatar leesf avatar leletan avatar milantracy avatar n3nash avatar nsivabalan avatar ovj avatar panxing4game avatar prasannarajaperumal avatar pratyakshsharma avatar prazanna avatar pseudomuto avatar suniluber avatar umehrot2 avatar vinothchandar avatar xjodoin avatar yanghua avatar yaooqinn avatar yashs360 avatar zqureshi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.