Coder Social home page Coder Social logo

catalystml-flogo's Introduction

Project Flogo Implementation for CatalystML

Overview of Specification

CatalystML is a language agnostic specification aimed at facilitating data transformations. The initiating use-case being transforming incoming data into a form for a 1-to-1 input into a Machine Learning (ML) model. Refer to the CatalystML specification for specifics around the specification, supported operations, etc.

Flogo/Golang implementation

This repository contains an implementation of CatalystML for Flogo written in Golang. With relatively minor context switching it can also be used as a Golang implementation. The documents included within this repository detail the choices that were made when creating this implementation as well as any deviations from the specification itself (primarily things yet to be implemented).

Within Flogo this specification is implementated as an action. A flogo action is an engine within Flogo that runs a specific type of event manipulation function (like a flow, stream, rules engine, etc.). A more detailed discussion is included in the documentation.

Use of this implementation

As discussed above this implementation is written with Golang within the Flogo ecosystem. As such CatalystML can be used with the flogo command line interface (with a flogo.json) or the Golang Flogo API (library). Two examples for each of the CLI or the API are discussed below.

Flogo Command Line Interface

Flogo's Command line interface is built around a json object that represents the structure of a Flogo application. Compiling this json (here is an example of compiling a flogo.json) with the Flogo CLI then creates an executable binary. Within a flogo.json CatalystML

There are multiple ways to embed a CatalystML structure within flogo:

  1. As a flogo action that responds to a trigger. In this case a trigger responds to input data, while the CatalystML action transforms that data. An example flogo.json of CatalystML as a flogo action is located here.
  2. As a Flogo activity within a flogo flow or stream. Flows and streams are flogo actions that allow you to chain predefined functions called activities. In this case CatalystML is simply one step in a chain of functions. This is how CatalystML-flogo would be used to interact with a machine learning model with the model executed within another activity. An example flogo.json of CatalystML as a flogo activity within a stream is located here.

Golang Flogo API (library)

Project-Flogo allows for the functinality of flogo to be integrated with custom Golang code by using the Flogo Golang API as a Golang library. This can be done by either following a template that includes triggers and flows/streams or by just calling the CML action as a function.

  1. An example of Golang code that includes CatalystML within a Flogo template that includes triggers and flows is located here.

  2. Here is an example of using CatalystML in Golang using the CatalystML action as a function.

catalystml-flogo's People

Contributors

abramvandergeest avatar balamg avatar mellistibco avatar skothari-tibco avatar steveny-tibco avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

catalystml-flogo's Issues

oneHotEncoding operation

This operation exists and was never included as an issue.

The spec has changed and now the operation is being updated.

reshape operation

  • reShape: change the dimensionality of a matrix without changing the underlying data
    • Input
      • data - [array] data to be reshaped
      • shape - [array] array of integers where: 1) length of array number of output dimensions and 2) each integer specifies the number of values for a given dimension. If the integer is 0 that dimension is sized to fit the data. i.e. [0,2,3] for a 24 value array means a 4x2x3 matrix
    • OutputType - [matrix]

tolower operation

including to lower operation to get the NLP demo to line up properly

sort operation

sort is an important operation

  • sort: sort a matrix/map based on given columns
    • Input
      • Data - [array of arrays or map]
        • Optional=False
      • Col - map key or column number
    • Params
      • Ascending - [boolean]
      • KeepRow - [boolean] (keeps row/column)
      • Axis - [int] (0=vertical/column, 1=horizontal/row)
    • OutputType - [array of arrays or map]

getStopWords operation

  • getStopWords: gets array of (stop words).
    • Params:
      • lib - [string] - which library stopword list to load
        • Optional = True
        • allowed: ["nltk","none"]
        • default = "nltk"
      • lang - [string] The language to be used, based on ISO 639-1 codes. For example: English='en'.
        • Optional = True
        • default = "en"
      • fileLoc - [string]- path to file that contains list of stop words (1 word per line)
        • Optional = True
      • merge - [bool] Whether to merge list from file with file from library
        • Optional = True
        • default = False
    • OutputType - [string]

addPairWise operation

addPairWise:for matrices of the same shape add corresponding values
* Input
* matrix0 - [array of arrays]
*Optional=False
* matrix1 - [array of arrays]
* Optional=False
* Params
* None
* OutputType - [array of arrays] (same size as inputs)

scale operation

  • scale: multiply every value of a matrix by a scalar
    *Input
    * Data - < array of arrays>
    * Optional=False
    * scaler - [float]
    * Optional=False
    • Params
      • None
    • OutputType - [array of arrays] (same size as input)

valToArray operation

the idea is to take a single value and create an array or matrix with all the elements being that value

  • castToArray: casts single value to array or array of arrays of given shape
    • Input
      • value - [int,string,float,etc]
        • Optional=False
      • shape - [array of ints] - array determines shape of output ([2,3] means a 2x3 matrix)
        • Optional=False
    • Params
      • None
    • OutputType - [array of arrays] (same size as inputs)

cast operation

the structure of this one is not yet worked out, but we need to be able to cast an input to a specific type (i.e. int32 to int64 or int to float etc.)

  • cast: convert from one base type to another

map2Table operation

  • map2Table: convert a map to a matrix
    • Input
      • map - [map] contains map to be converted to table
        • Optional=False
      • colOrder - [array of strings] list of the columns to be merged into table (ORDER MATTERS)
        • Optional=False
    • Params
      • axis - the orientation of the table
        • default=0
    • OutputType - [array of arrays]

ifnotin operation

Included to help nlp demo/work

  • ifNotIn:Given 2 arrays returns the new array with the elements of the first array only if they DO NOT appear in the second array as well.
    • Input
      • arr0 - [array] first array, the one to compare to the "not in" list
        • Optional=False
      • arr1 - [array of data variables] the array for the "not in" of if not in.
        • Optional=False
    • OutputType - [array]

ifin operation

creating for NLP demo

  • ifIn: Given 2 arrays returns the new array with the elements of the first array only if they appear in the second array as well.
    • Input
      • arr0 - [array] first array, the one to compare to the "in" list
        • Optional=False
      • arr1 - [array of data variables] the array for the "in" of if in.
        • Optional=False
    • OutputType - [array]

filter operation

filter is an i mportant operation

  • filter: keep/remove rows with certain values
    • Input
      • Data - [array or map]
        • Optional=False
      • Value - [int, float, string, NaN]
        • Optional=True
        • Default=NaN
      • filterType - [string]
        • Optional=True
        • Default=Remove
        • Acceptable values = “Remove”,”Keep”
    • Params
      • Axis - [int] (0=vertical/column, 1=horizontal/row)
        • Optional=True
        • Default=0
      • Col - [string]
        • Optional=True
        • Default=’index’
    • OutputType - [array or map] (same type as inputs)

table2Map operation

  • table2Map: convert a matrix to a map by adding a name to each column
    • Input
      • table - [array of arrays] 2D table to be converted to map
      • colKeys - [array of strings] list of keys for map that correspond to 0 to n columns in table
    • Params
      • axis - the orientation of the table (0=vertical/column, 1=horizontal/row)
        • default=0
    • OutputType - [map]

toupper operation

adding toupper because since I had toupper for the nlp demo toupper is super easy

Standardized logging

I propose that when we write an operation we include info level logging at the beginning and end that says something like "starting xyz operation" and "finishing xyz". Then after each info log there is a debug level log that for the beginning includes the inputs and params and for the end includes outputs. What does everyone think? Any further suggestions?
@fm-tibco @SteveNY-Tibco @skothari-tibco

lag operation

create new vector shifted down by lagnum with NaN filling missing locations added to table/map

runCML operation

An operation that allows you to run other CML structures/JSONs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.