Coder Social home page Coder Social logo

rparquet's Introduction

RParquet

Overview

  • RParquet is a tool for read/write R Data Frames from/into Apache Parquet file.
  • The tool operates on the data by using Apache Arrow API.
  • It supports some data types mapping between R and Parquet.
  • R types: "integer", "integer64", "nanotime", "numeric", "character", "logical"
  • Arrow(Parquet): "INT32", "INT64", "TIMESTAMP", "DOUBLE", "STRING", "BOOL"

Installation

  • Please check parquet-cpp to install the parquet-cpp library.
  • The arrow version need to be changed when we cmake parquet-cpp
STEPS
1. git clone https://github.com/apache/arrow.git
   git rev-parse --verify HEAD
   Get the current version hascode. ex. 0.10 version(94e8196ff925e2a8051ac330a02bee0dc63702c8)
   
2. cd /parquet-cpp/cmake_modules
   vi ./ArrowExternalProject.cmake
   find the line contains set(ARROW_VERSION...)
   change the version hashcode to the one we got in step1. 
   ex. set(ARROW_VERSION "94e8196ff925e2a8051ac330a02bee0dc63702c8")

3. In R(>=3.4), we need to install some packages
   install.packages(c("curl", "httr", "xml2", "devtools", "roxygen2", "testthat", "knitr", "nanotime", "bit64"))

4. Set environment variables for compile dependences
   There are three parts need to be set:
     "parquet-cpp/src", "parquet-cpp/build/.../include" and boost library path("/usr/lib/x86_64-linux-gnu")
   For example, in RStudio or R terminal:
     Sys.setenv(CPLUS_INCLUDE_PATH="parquet-cpp/src:parquet-cpp/build/latest/include")
     Sys.setenv(LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:parquet-cpp/build/latest")
     Sys.setenv(LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:parquet-cpp/build/latest")
     Sys.setenv(LD_RUN_PATH="/usr/lib/x86_64-linux-gnu:parquet-cpp/build/latest")

5. Build RParquet library
   In RStudio:
     Build -> Install and Restart
   In R terminal:
     library(devtools)
     install()
     library(RParquet)

Features

 TODO

rparquet's People

Contributors

laoshubobo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.