Coder Social home page Coder Social logo

yohto / yosegi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yahoojapan/yosegi

0.0 0.0 0.0 1.47 MB

Yosegi is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.

License: Apache License 2.0

Java 99.93% Dockerfile 0.05% Shell 0.03%

yosegi's Introduction

Introduction to Yosegi

What does this project do?

Yosegi is a Schema-less columnar storage format. Provide flexible representation like JSON and efficient reading similar to other columnar storage formats.

Why is this project useful?

There was a problem that it is too large to compress and save the data as it is in the Big Data era. From the demand for improvement in compression ratio and read performance, several columnar data formats (for example, Apache ORC and Apache Parquet) were proposed. They achieve the high compression ratio from similar data in column and reading performance for grouping data by column when data is used.

However, these data formats are required the data structure in a row (or a record) should be defined before saving the data. It was necessary to decide how to use it at the time of data storage, and it was often a problem that it was difficult to decide what kind of data to use.

In this project, we provide a new columnar format which does not require the schema at the time of data storage with compression and read performance equal to (or higher in case) than other formats.

Use cases

Data Analysis

Analyzing big data requires store data compactly and get data smoothly. Yosegi as a columnar format is useful for this needs.

Data Lake

Data Lake is a data pool that is not required the data structure (as a schema) in the row at the time of data storage. And stored data can be used with defining its schema at the time of analyzing. See DataLake.

License

This project is on the Apache License. Please treat this project under this license.

How do I get started?

Java

For easy usage please see the quick start.

CLI

Please see the repository of yosegi-tools for details.

If you want to know what kind of function it has, look at the command list.

Apache Hadoop

Yosegi supports Apache Hadoop. Please see the repository of yosegi-hadoop for details.

For easy usage please see quick start.

Apache Hive

Yosegi supports Apache Hive. Please see the repository of yosegi-hive for details.

For easy usage please see quick start.

Apache Spark

Yosegi supports Apache Spark. Please see the repository of yosegi-spark for details.

For easy usage please see quick start.

Where can I get more help, if I need it?

Support and discussion of Yosegi are on the Mailing list.

We plan to support and discussion of Yosegi on the Mailing list. However, please contact us via GitHub until ML is opened.

How to contribute

We welcome to join this project widely.

For information on how to start contributing to the project, please refer to the Yosegi contribution guide.

Building

System requirement

Following environments are required.

  • Mac OS X or Linux
  • Java 8 Update 92 or higher (8u92+), 64-bit
  • Maven 3.3.9 or later (for building)

Maven

Yosegi sources can get from the Maven repository.

Compile sources

Compile each source following instructions.

$ mvn clean install

yosegi's People

Contributors

koijima avatar yohto avatar kkuramot avatar dependabot[bot] avatar taikegami avatar jie211 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.