dspipe,thoppe

dspipe's Introduction

easy to use data science pipes

pip install dspipe

Shuffling your data from one directory into another directory is easier than before!

On it's own, a Pipe calls a function in parallel using joblib. By default it uses all possible cores to compute:

from dspipe import Pipe

def add2(x):
    return x + 2

result = Pipe(range(3))(add2)

# result = [2, 3, 4]

But it can do so much more! Let's consider a more involved example, let's say you have a bunch of XML files and you'd like to parse them into clean CSVs.

from dspipe import Pipe

def compute(f0, f1):
    # Read the XML file f0
    # Turn it into a dataframe
    df.to_csv(f1)

P = Pipe("data/xml", "data/parsed_csv", input_suffix='.xml', output_suffix='.csv')

P(compute, -1)

This will run the pipe for every XML file in data/xml, create the directory data/parsed_csv, and give the compute function a new filename to save it to f1 with the .xml replaced with .csv. Running the pipe with 1 as the second argument wil run single-threaded execution.

Init options for Pipe

    source: str
    dest: str = None
    input_suffix: str = ""
    output_suffix: str = ""
    shuffle: bool = False
    limit: int = None
    prefilter: bool = True
    progressbar: bool = True
    autoname: bool = False
    total: int = None

Call options for Pipe

def __call__(self, func, n_jobs=-1, **kwargs):
    # **kwargs are passed to the function

Recommend Projects

thoppe / dspipe Goto Github PK

dspipe's Introduction

dspipe's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent