Coder Social home page Coder Social logo

daru's Introduction

daru - Data Analysis in RUby

Gem Version Build Status Gitter Open Source Helpers

Introduction

daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.

daru makes it easy and intuitive to process data predominantly through 2 data structures: Daru::DataFrame and Daru::Vector. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.5.1 and 2.7.1.

daru plugin gems

daru-view is for easy and interactive plotting in web application & IRuby notebook. It can work in any Ruby web application frameworks like Rails, Sinatra, Nanoc and hopefully in others too.

Articles/Blogs, that summarize powerful features of daru-view:

This gem extends support for many Import and Export methods of Daru::DataFrame. This gem is intended to help Rubyists who are into Data Analysis or Web Development, by serving as a general purpose conversion library that takes input in one format (say, JSON) and converts it another format (say, Avro) while also making it incredibly easy to getting started on analyzing data with daru. One can read more in SciRuby/blog/daru-io.

Features

  • Data structures:
    • Vector - A basic 1-D vector.
    • DataFrame - A 2-D spreadsheet-like structure for manipulating and storing data sets. This is daru's primary data structure.
  • Compatible with IRuby notebook, statsample, statsample-glm and statsample-timeseries.
  • Support for time series.
  • Singly and hierarchically indexed data structures.
  • Flexible and intuitive API for manipulation and analysis of data.
  • Easy plotting, statistics and arithmetic.
  • Plentiful iterators.
  • Optional speed and space optimization on MRI with NMatrix and GSL.
  • Easy splitting, aggregation and grouping of data.
  • Quickly reducing data with pivot tables for quick data summary.
  • Import and export data from and to Excel, CSV, SQL Databases, ActiveRecord and plain text files.

Installation

$ gem install daru

Notebooks

Notebooks on most use cases

Visualization

Notebooks on Time series

Notebooks on Indexing

Case Studies

Blog Posts

Time series

Categorical Data

Basic Usage

daru exposes two major data structures: DataFrame and Vector. The Vector is a basic 1-D structure corresponding to a labelled Array, while the DataFrame - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.

Basic DataFrame intitialization.

data_frame = Daru::DataFrame.new(
  {
    'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
    'Gallons sold' => [500, 400, 450, 200, 250]
  },
  index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
)
data_frame

init0

Load data from CSV files.

df = Daru::DataFrame.from_csv('TradeoffData.csv')

init1

Basic Data Manipulation

Selecting rows.

data_frame.row['USA']

man0

Selecting columns.

data_frame['Beer']

man1

A range of rows.

data_frame.row['India'..'USA']

man2

The first 2 rows.

data_frame.first(2)

man3

The last 2 rows.

data_frame.last(2)

man4

Adding a new column.

data_frame['Gallons produced'] = [550, 500, 600, 210, 240]

man5

Creating a new column based on data in other columns.

data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']

man6

Condition based selection

Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by Arel, i.e. by using the where clause.

data_frame.where(data_frame['Gallons sold'].lt(300))

con0

You can pass a combination of boolean operations into the #where method and it should work fine:

data_frame.where(
  data_frame['Beer']
  .in(['Snow', 'Kingfisher','Tiger Beer'])
  .and(
    data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
  )
)

con1

Plotting

Daru supports plotting of interactive graphs with nyaplot. You can easily create a plot with the #plot method. Here we plot the gallons sold on the Y axis and name of the brand on the X axis in a bar graph.

data_frame.plot type: :bar, x: 'Beer', y: 'Gallons sold' do |plot, diagram|
  plot.x_label "Beer"
  plot.y_label "Gallons Sold"
  plot.yrange [0,600]
  plot.width 500
  plot.height 400
end

plot0

In addition to nyaplot, daru also supports plotting out of the box with gnuplotrb.

Documentation

Docs can be found here.

Contributing

Pick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!

For details see CONTRIBUTING.

Acknowledgements

  • Google and the Ruby Science Foundation for the Google Summer of Code 2016 grant for speed enhancements and implementation of support for categorical data. Special thanks to @lokeshh, @zverok and @agisga for their efforts.
  • Google and the Ruby Science Foundation for the Google Summer of Code 2015 grant for further developing daru and integrating it with other ruby gems.
  • Thank you last.fm for making user data accessible to the public.

Copyright (c) 2015, Sameer Deshmukh All rights reserved

daru's People

Contributors

ananyo2012 avatar ankane avatar athityakumar avatar baarkerlounger avatar dansbits avatar deepakkoli93 avatar genya0407 avatar hibariya avatar jpaulgs avatar kojix2 avatar lokeshh avatar matugm avatar mhammiche avatar mrkn avatar nowlinuxing avatar paisible-wanderer avatar parthm avatar phitherekreborn avatar prakriti-nith avatar rainchen avatar rohitner avatar shahsaurabh0605 avatar shekharrajak avatar sivagollapalli avatar takkanm avatar v0dro avatar xprazak2 avatar yui-knk avatar yuki-inoue avatar zverok avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.