Coder Social home page Coder Social logo

ajunlonglive / cuxfilter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rapidsai/cuxfilter

0.0 0.0 0.0 134.47 MB

GPU accelerated cross filtering with cuDF. Docs here:

Home Page: https://rapidsai.github.io/cuxfilter/index.html

License: Apache License 2.0

Python 0.90% Makefile 0.01% Jupyter Notebook 99.08% Shell 0.02%

cuxfilter's Introduction

  cuxfilter

cuxfilter ( ku-cross-filter ) is a RAPIDS framework to connect web visualizations to GPU accelerated crossfiltering. Inspired by the javascript version of the original, it enables interactive and super fast multi-dimensional filtering of 100 million+ row tabular datasets via cuDF.

RAPIDS Viz

cuxfilter is one of the core projects of the “RAPIDS viz” team. Taking the axiom that “a slider is worth a thousand queries” from @lmeyerov to heart, we want to enable fast exploratory data analytics through an easier-to-use pythonic notebook interface.

As there are many fantastic visualization libraries available for the web, our general principle is not to create our own viz library, but to enhance others with faster acceleration, larger datasets, and better dev UX. Basically, we want to take the headache out of interconnecting multiple charts to a GPU backend, so you can get to visually exploring data faster.

By the way, cuxfilter is best used to interact with large (1 million+) tabular datasets. GPU’s are fast, but accessing that speedup requires some architecture overhead that isn’t worthwhile for small datasets.

For more detailed requirements, see below.

cuxfilter.py Architecture

The python version of cuxfilter leverage jupyter notebook and bokeh server to greatly reduce backend complexity. Currently we are focusing development efforts on the python version instead of the older javascript version.

What is cuDataTiles?

cuxfilter.py implements cuDataTiles, a GPU accelerated version of data tiles based on the work of Falcon. When starting to interact with specific charts in a cuxfilter dashboard, values for the other charts are precomputed to allow for fast slider scrubbing without having to recalculate values.

Open Source Projects

cuxfilter wouldn’t be possible without using these great open source projects:

Where is the original cuxfilter and Mortgage Viz Demo?

The original version (0.2) of cuxfilter, most known for the backend powering the Mortgage Viz Demo, has been moved into the GTC-2018-mortgage-visualization branch. As it has a much more complicated backend and javascript API, we’ve decided to focus more on the streamlined notebook focused version in the /python folder.

Usage

import cuxfilter
from cuxfilter import charts

#update data_dir if you have downloaded datasets elsewhere
DATA_DIR = './data'
from cuxfilter.sampledata import datasets_check
datasets_check('auto_accidents', base_dir=DATA_DIR)

cux_df = cuxfilter.DataFrame.from_arrow('./data/auto_accidents.arrow')
cux_df.data['ST_CASE'] = cux_df.data['ST_CASE'].astype('float64')

label_map = {1: 'Sunday',    2: 'Monday',    3: 'Tuesday',    4: 'Wednesday',   5: 'Thursday',    6: 'Friday',    7: 'Saturday',    9: 'Unknown'}
gtc_demo_red_blue_palette = [ (49,130,189), (107,174,214), (123, 142, 216), (226,103,152), (255,0,104) , (50,50,50) ]

#declare charts
chart1 = charts.cudatashader.scatter_geo(x='dropoff_x', y='dropoff_y', aggregate_col='ST_CASE',
                                         color_palette=gtc_demo_red_blue_palette)
chart2 = charts.panel_widgets.multi_select('YEAR')
chart3 = charts.bokeh.bar('DAY_WEEK', x_label_map=label_map)
chart4 = charts.bokeh.bar('MONTH')

#declare dashboard
d = cux_df.dashboard([chart1, chart2, chart3, chart4], layout=cuxfilter.layouts.feature_and_double_base,theme = cuxfilter.themes.light, title='Auto Accident Dataset')

#preview the dashboard inside the notebook(non-interactive) with layout
await d.preview()

output dashboard

Documentation

Full documentation can be found here.

Troubleshooting help can be found here.

Dependecies

Installation

You need to have RAPIDS (cudf) installed for cuxfilter to work

1. If installing within the rapidai DOCKER container, follow the following instructions

Before you start juypter lab, you need to install cuxfilter and cudatashader.  In terminal, when you start docker, please run the following commands:

#Get to the /rapidsai folder, up one level, where the libraries live.  List files to verify (you'll see cuspatal, cuml, cdf, etc)
cd /rapids

#Clone cuxfilter here
git clone https://github.com/rapidsai/cuxfilter

#Drop into cuxfilter's python library folder, make, and install
cd cuxfilter/python
/opt/conda/envs/rapids/bin/python -m pip install -U -r requirements.txt
/opt/conda/envs/rapids/bin/python -m pip install -e .

#Get back to /rapidsai folder
cd /rapids

#clone cudatashader
git clone https://github.com/rapidsai/cudatashader

#Drop into cudatashader folder and install
cd cudatashader
/opt/conda/envs/rapids/bin/python -m pip install -e .

# start a jupyter lab environment
# visit localhost:8888/

To run the bokeh server in a jupyter lab

  1. expose an additional port for server, lets call it bokeh_port.
  2. Install jupyterlab dependencies
conda install -c conda-forge jupyterlab
jupyter labextension install @pyviz/jupyterlab_pyviz
jupyter labextension install jupyterlab_bokeh

3.running the server

#enter ip address without http://
#current port is the port at which jupyterlab is running
d.app(url='ip.addr:current_port', port=bokeh_port)
# OR for a separate web app
d.show('ip.addr:bokeh_port')

2. If installing in a conda environment

#Clone cuxfilter here
git clone https://github.com/rapidsai/cuxfilter

#create a conda environment
conda create -n test_env
source activate test_env


#Drop into cuxfilter's python library folder, make, and install
cd cuxfilter/python
make
pip install -e .

#Get back to /rapidsai folder
cd ..
cd ..

#clone cudatashader
git clone https://github.com/rapidsai/cudatashader

Drop into cudatashader folder and install
cd cudatashader
pip install -e .

Download Datasets

  1. Auto download datasets

The notebooks inside python/notebooks already have a check function which verifies whether the example dataset is downloaded, and downloads it if it's not.

  1. Download manually

While in the directory you want the datasets to be saved, execute the following

#go the the environment where cuxfilter is installed. Skip if in a docker container
source activate test_env

#download and extract the datasets
python -c "from cuxfilter.sampledata import datasets_check; datasets_check(base_dir='./')"

Individual links:

  • Download the mortgage dataset from here

  • Nyc taxi dataset from here

  • Auto dataset from here

Guides and Layout Templates

Currently supported layout templates and example code can be found here

Currently Supported Charts

Library Chart type
bokeh bar, line, choropleth
cudatashader scatter, scatter_geo, line, stacked_lines, heatmap
panel_widgets range_slider, float_slider, int_slider, drop_down, multi_select
custom view_dataframe

Our plan is to add support in the future for the following libraries:

  1. plotly
  2. altair
  3. pydeck

Contributing Developers Guide

cuxfilter.py acts like a connector library and it is easy to add support for new libraries. The python/cuxfilter/charts/core directory has all the core chart classes which can be inherited and used to implement a few (viz related) functions and support dashboarding in cuxfilter directly.

You can see the examples to implement viz libraries in the bokeh and cudatashader directories. Let us know if you would like to add a chart by opening a feature request issue or submitting a PR.

For more details, check out the contributing guide.

Future Work

cuxfilter development is in early stages and on going. See what we are planning next on the projects page.

cuxfilter's People

Contributors

ajaythorve avatar dependabot[bot] avatar exactlyallan avatar jacobtomlinson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.