Coder Social home page Coder Social logo

falconsoft / datapipe Goto Github PK

View Code? Open in Web Editor NEW
18.0 4.0 2.0 286 KB

dataPipe is a data processing and data analytics library for JavaScript. Inspired by LINQ (C#) and Pandas (Python)

Home Page: https://datapipe-js.com/

License: MIT License

JavaScript 9.99% TypeScript 89.16% HTML 0.85%
data-analysis data-engineering data-cleaning data-manipulation data-wrangling data-munging csv-parser csv linq pandas

datapipe's Introduction

dataPipe

dataPipe is a data processing and data analytics library for JavaScript. Inspired by LINQ (C#) and Pandas (Python). It provides a facilities for data loading, data transformation, data analysis and other helpful data manipulation functions.

Originally DataPipe project was created to power JSPython and Worksheet Systems related projects, but it is also can be used as a standalone library for your data-driven JavaScript or JSPython applications on both the client (web browser) and server (NodeJS).

Get started

A quick way to use it in html

<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/datapipe-js/dist/data-pipe.min.js"></script>

or npm

npm install datapipe-js

A quick example

JavaScript / TypeScript

StackBlitz example

const { dataPipe, avg, first } = require('datapipe-js');
const fetch = require('node-fetch');

async function main() {

    const dataUrl = "https://raw.githubusercontent.com/FalconSoft/sample-data/master/CSV/sample-testing-data-100.csv";
    const csv = await (await fetch(dataUrl)).text();

    return dataPipe()
        .fromCsv(csv)
        .groupBy(r => r.Country)
        .select(g => ({
            country: first(g).Country,
            sales: dataPipe(g).sum(i => i.Sales),
            averageSales: avg(g, i => i.Sales),
            count: g.length
        })
        )
        .where(r => r.sales > 5000)
        .sort("sales DESC")
        .toArray();
}

main()
    .then(console.log)
    .catch(console.error)

Data management functions

All utlity functions can be used as a chaining (pipe) methods as well as a separately. In an example you will notice that to sum up sales we created a new dataPipe, but for an averageSales we used just a utility method avg.

Data Loading

Loading and parsing data from a common file formats like: CSV, JSON, TSV either from local variable

  • dataPipe (array) - accepts a JavaScript array

  • fromTable (table) - converts a rows and columns list into array of sclalar objects

  • parseCsv (csvContent[, options]) - it loads a string content and process each row with optional but robust configuration options and callbacks e.g. skipRows, skipUntil, takeWhile, rowSelector, rowPredicate etc. This will automatically convert all types to numbers, datetimes or booleans if otherwise is not specified

Data Transformation

  • select (elementSelector) synonym map - creates a new element for each element in a pipe based on elementSelector callback.
  • where (predicate) / filter - filters elements in a pipe based on predicate
  • groupBy (keySelector) - groups elements in a pipe according to the keySelector callback-function. Returns a pipe with new group objects.
  • distinct (elementSelector) / unique - returns distinct elements from array. Optional parameter elementSelector will create new array based on a callback function, then will eliminate dublicates
  • pivot (array, rowFields, columnField, dataField, aggFunction?, columnValues?) - Returns a reshaped (pivoted) array based on unique column values.
  • transpose (array) - Transpose rows to columns in an array
  • sort ([fieldName(s)]) - Sort array of elements according to a field and direction specified. e.g. sort(array, 'name ASC', 'age DESC')
  • flattenObject (Object) - flattens complex nested object into simple object. e.g. flattenObject(obj)
  • unflattenObject (Object) - unflattens simple object into complex nested object. e.g. unflattenObject(obj)

Joining data arrays

  • innerJoin (leftArray, rightArray, leftKey, rightKey, resultSelector) - Joins two arrays together by selecting elements that have matching values in both arrays. The array elements that do not have matche in one array will not be shown!
  • leftJoin (leftArray, rightArray, leftKey, rightKey, resultSelector) - Joins two arrays together by selrcting all elements from the left array (leftArray), and the matched elements from the right array (rightArray). The result is NULL from the right side, if there is no match.
  • fullJoin (leftArray, rightArray, leftKey, rightKey, resultSelector) - Joins two arrays together by selrcting all elements from the left array (leftArray), and the matched elements from the right array (rightArray). The result is NULL from the right side, if there is no match.
  • merge (targetArray, sourceArray, targetKey, sourceKey) - merges elements from two arrays. It takes source elements and append or override elements in the target array.Merge or append is based on matching keys provided

Aggregation and other numerical functions

  • avg ([propertySelector, predicate]) synonym average - returns an average value for a gived array. With propertySelector you can choose the property to calculate average on. And with predicate you can filter elements if needed. Both properties are optional.
  • max ([propertySelector, predicate]) synonym maximum - returns a maximum value for a gived array. With propertySelector you can choose the property to calculate maximum on. And with predicate you can filter elements if needed. Both properties are optional.
  • min ([propertySelector, predicate]) synonym minimum - returns a minimum value for a gived array. With propertySelector you can choose the property to calculate minimum on. And with predicate you can filter elements if needed. Both properties are optional.
  • count ([predicate]) - returns the count for an elements in a pipe. With predicate function you can specify criteria
  • first ([predicate]) - returns a first element in a pipe. If predicate function provided. Then it will return the first element in a pipe for a given criteria.
  • last ([predicate]) - returns a first element in a pipe. If predicate function provided. Then it will return the first element in a pipe for a given criteria.
  • mean (array, [propertySelector]) - returns a mean in array.
  • quantile array, [propertySelector]) - returns a quantile in array.
  • variance (array, [propertySelector]) - returns a sample variance of an array.
  • stdev (array, [propertySelector]) - returns a standard deviation in array.
  • median (array, [propertySelector]) - returns a median in array.

Output your pipe data to

  • toArray - output your pipe result into JavaScript array.
  • toObject (nameSelector, valueSelector) - output your pipe result into JavaScript object, based of name and value selectors.
  • toSeries (propertyNames) - convert array into an object of series.
  • toCsv ([delimiter]) - output pipe result into string formated as CSV when in browser.

Other helpful utilities for working with data in JavaScript or JSPython

  • parseDatetimeOrNull (dateString[, formats]) - a bit wider date time parser than JS's parseDate(). Be aware. It gives UK time format (dd/MM/yyyy) a priority! e.g. '8/2/2019' will be parsed to 8th of February 2019
  • dateToString (date, format) - converts date to string without applying time zone. It returns ISO formated date with time (if time present). Otherwise it will return just a date - yyyy-MM-dd
  • parseNumberOrNull (value: string | number): convert to number or returns null
  • parseBooleanOrNull (val: boolean | string): convert to Boolean or returns null. It is treating ['1', 'yes', 'true', 'on'] as true and ['0', 'no', 'false', 'off'] as false
  • deepClone returns a deep copy of your object or array.
  • getFieldsInfo (items: Record<string, ScalarType>[]): FieldDescription[] : Generates a field descriptions (first level only) from array of items. That eventually can be used for relational table definition. If any properties are Objects, it would use JSON.stringify to calculate maxSize field.
  • addDays (date: Date, daysOffset: number): Date: add days to the current date. daysOffset can be positive or negative number
  • addBusinessDays (date: Date, bDaysOffset: number): Date: Worksout a business date (excludes Saturdays and Sundays) based on bDaysOffset count. bDaysOffset can be positive or negative number.

String Utils

  • replaceAll (text, searchValue, replaceValue) - Replace all string function
  • formatCamelStr (text) - Formats string to the Camel Case
  • trimStart (text, charactersToTrim) - Trims characters from the start
  • trimEnd (text, charactersToTrim) - Trims characters at the end
  • trim (text, charactersToTrim) - Trims characters in both sides
  • split (text, separator, brackets): string - Splits text into tokens. Also, it supports multiple separators and will respect open/close brackets. e.g. split('field1=func(a,b,c),field2=4', ',', ['()']) will result into ["field1=func(a,b,c)", "field2=4"]

License

A permissive MIT (c) - FalconSoft Ltd

datapipe's People

Contributors

olegpaska avatar ppaska avatar viktorkukurba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

kaloob olegpaska

datapipe's Issues

Flatten is missing in exported DataPipe object type

Hi, firstly thanks for a great project.
I tried to use flatten on DataPipe object, but the type is missing it. Could you add it?

Also in web docs it is misspelled as "Flattern".

One last thing: in web docs, there is a mention that groupBy can also take string or array of strings. Again types denies that.

Cannot use in Node.js

Doesn't appear to be a way to get it to work in Node.js even after installing via npm. What require statement should be used? I've tried a few and none seem to work.

There also appears to be an error in the JS example with missing {} inside the select function:

.select(g => 
    r = {
      country: dataPipe(g).first().country,
      names: dataPipe(g).map(r => r.name).join(", "),
      count: dataPipe(g).count()
    };
    return r
  )

Should I think be:

.select(g => {
    r = {
      country: dataPipe(g).first().country,
      names: dataPipe(g).map(r => r.name).join(", "),
      count: dataPipe(g).count()
    };
    return r
  })

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.