Coder Social home page Coder Social logo

ballerina-pmml's Introduction

PMML Library For Ballerina

A "proof-of-concept" library written in Ballerina for Ballerina to read and predict values using machine learning models defined with PMML

Contents

  1. Introduction
  2. Downloading The Library
  3. Using The Library
  4. Making Changes
  5. Copyright Information

Introduction

This is a PMML library (proof-of-concept) written to test the limits of Ballerina's built-in XML library.

Ballerina is a general purpose, concurrent and strongly typed programming language offered by WSO2 with both textual and graphical syntaxes, optimized for integration.

The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications.

This library does not fully support all features of PMML nor all versions. The library only supports PMML version 4.2 and the only PMML files that can be used are the ones that contain one of the following machine learning models:

  • Linear Regression
  • Polynomial Regression
  • Logistic Regression
  • Classification

All of the above machine learning models are defined by the <RegressionModel> element in PMML.

Downloading The Library

To download the library you can either clone the repository to your local machine by typing the following git command,

~$ git clone https://github.com/moizalicious/ballerina-pmml

Or you can download the source files by clicking on one of the following links,

Extract and open the downloaded file, then copy the ml folder provided to your own Ballerina project. Once the folder is added to your project you can start using the PMML library by importing the pmml package in your code by typing import ml.pmml;

Using The PMML Library

Using The library

To predict values using this library with a machine learning model, you need two things,

  1. The PMML file of that respective machine learning model.
  2. The independent values that is to be fed into the machine learning model to predict an outcome.

Both the PMML file and the independent values should be of Ballerina xml type. The PMML XML object must have the <PMML> element as the root element and should follow the XML Schema of PMML version 4.2.

The independent values must be added in the following XML format,

<data>
    <value1></value1>
    <value2></value2>
    ...
    ...
</data>

Where the root element must always be <data> and the child elements must have the same names that are defined in the name attribute in the data field elements of the PMML element's data dictionary (besides the target element).

Once we have these two XML objects, we have to add them as arguments to the function predict(xml pmml, xml data) to obtain the predicted value. The outcome is returned as a Ballerina any type variable. An example is given below.

Example For Classification Using The Iris Data Set

The following XML code is a trained classification model in the form of PMML for the iris data set. Using this machine learning model we can predict which species a flower would come under (setosa, versicolor or verginica), based on the flower's sepal length, sepal width, petal length & petal width. To do this, download the following XML code and save it as RegressionIris.pmml in your Ballerina project.

Click here to download the following PMML file.

<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2">
    <Header copyright="Copyright (c) 2013 Vfed" description="Multinomial Logistic Model">
        <Extension name="user" value="Vfed" extender="Rattle/PMML"/>
        <Application name="Rattle/PMML" version="1.3"/>
        <Timestamp>2013-09-16 10:33:36</Timestamp>
    </Header>
    <DataDictionary numberOfFields="5">
        <DataField name="Species" optype="categorical" dataType="string">
            <Value value="setosa"/>
            <Value value="versicolor"/>
            <Value value="virginica"/>
        </DataField>
        <DataField name="Sepal.Length" optype="continuous" dataType="double"/>
        <DataField name="Sepal.Width" optype="continuous" dataType="double"/>
        <DataField name="Petal.Length" optype="continuous" dataType="double"/>
        <DataField name="Petal.Width" optype="continuous" dataType="double"/>
    </DataDictionary>
    <RegressionModel modelName="multinom_Model" functionName="classification" algorithmName="multinom" normalizationMethod="softmax">
        <MiningSchema>
            <MiningField name="Species" usageType="target"/>
            <MiningField name="Sepal.Length" usageType="active"/>
            <MiningField name="Sepal.Width" usageType="active"/>
            <MiningField name="Petal.Length" usageType="active"/>
            <MiningField name="Petal.Width" usageType="active"/>
        </MiningSchema>
        <RegressionTable intercept="18.6903742570084" targetCategory="versicolor">
            <NumericPredictor name="Sepal.Length" exponent="1" coefficient="-5.4584240070066"/>
            <NumericPredictor name="Sepal.Width" exponent="1" coefficient="-8.70740085056537"/>
            <NumericPredictor name="Petal.Length" exponent="1" coefficient="14.2447701274546"/>
            <NumericPredictor name="Petal.Width" exponent="1" coefficient="-3.09768387037777"/>
        </RegressionTable>
        <RegressionTable intercept="-23.836276029112" targetCategory="virginica">
            <NumericPredictor name="Sepal.Length" exponent="1" coefficient="-7.92363397246724"/>
            <NumericPredictor name="Sepal.Width" exponent="1" coefficient="-15.3707689334102"/>
            <NumericPredictor name="Petal.Length" exponent="1" coefficient="23.6597792429927"/>
            <NumericPredictor name="Petal.Width" exponent="1" coefficient="15.1353005479779"/>
        </RegressionTable>
        <RegressionTable intercept="0.0" targetCategory="setosa"/>
    </RegressionModel>
</PMML>

The PMML file has 5 <DataField> elements,

  • Species
  • Sepal Length
  • Sepal Width
  • Petal Length
  • Petal Width

Since 'Species' is the target that we are trying to predict (as defined by the usageType attribute in the <MiningSchema> element), we do not add any values for that. So if we want to predict what kind of species a flower with sepal length = 5.1, sepal width = 3.5, petal length = 1.4 and petal width = 0.2 comes under. We write the <data> element like this,

<data>
    <Sepal.Length>5.1</Sepal.Length>
    <Sepal.Width>3.5</Sepal.Width>
    <Petal.Length>1.4</Petal.Length>
    <Petal.Width>0.2</Petal.Width>
</data>

The following code shows how to use the predict() function in the PMML library to get the predicted species of the flower based on the <data> element above (the RegressionIris.pmml file is read using the readXMLFromFile() function provided by the library),

import ml.pmml;

function main (string[]args) {
    // Read the file as a Ballerina XML object
    xml pmml = pmml:readXMLFromFile("RegressionIris.pmml");
    // Create the data XML with the independent values
    xml data = xml `<data>
                        <Sepal.Length>5.1</Sepal.Length>
                        <Sepal.Width>3.5</Sepal.Width>
                        <Petal.Length>1.4</Petal.Length>
                        <Petal.Width>0.2</Petal.Width>
                    </data>`;
    // Predict the outcome and print it to the console                
    any result = pmml:predict(pmml, data);                
    println(result);
}

The output of the above code is,

setosa

This means that the species the flower with sepal length = 5.1, sepal width = 3.5, petal length = 1.4 and petal width = 0.2 is predicted to come under the setosa category. You can change the values yourself and see how the outcome changes. To see whether the model is accurate you can use real data from here and see whether the expected outcome is printed in the console.

Note that the inputs in the data element do not always have to be continuous (i.e numerical), you can enter categorical values as well based on the PMML code requirements

For an example,

<data>
    <age>20</age>
    <salary>10000</salary>
    <car_location>street</car_location>
</data>

To know more about the other public functions provided by the library check out the User Guide

Making Changes

If you would like to contribute to this repository and help improve this library feel free to do so. Check out the Developer Guide to learn more about the source and the internal structure of the library.

Copyright Information

Copyright 2017 Moiz Mansoor Ali.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

ballerina-pmml's People

Contributors

moizmali avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.