Coder Social home page Coder Social logo

mllite / sklearn2sql-demo Goto Github PK

View Code? Open in Web Editor NEW
13.0 2.0 3.0 580.63 MB

Demo of an In-database processing tool for scikit-learn

Jupyter Notebook 100.00%
scikit-learn in-database deployment sql postgresql mysql sqlite pmml in-database-analytics scoring

sklearn2sql-demo's Introduction

sklearn2sql-demo

Note : A final presentation is available here (pdf slides) : https://github.com/antoinecarme/presentations_slides/blob/main/sklearn2sql_presentation_2022-08.pdf

This repository contains some demos of the usage of sklearn2sql.

sklearn2sql is an ongoing development tool for generating deployment SQL code from scikit-learn objects.

Using sklearn2sql, it is possible to predict values from an already-fitted classifier or a regressor simply by executing some SQL code. It can be seen as an alternative to PMML-based methods to perform In-database processing.

(NEW) sklearn2sql is available as a RESTful web service on Heroku. A sample python client allows you to generate SQL from your own models. Your feedback is welcome.

The SQL code is produced in an agnostic way (the mechansim used does not depend on the database) and supports most widely used relational databases.

It is designed to support all classification and regression methods in scikit-learn (SVMs, linear models, naive-bayes. decision trees, MLP, etc) , as well as transformations (PCA, imputers, scalers), feature selection, outlier detection and and their derived objects (random forest, meta-estimators, pipelines, feature unions, ensembles, etc).

Roughly speaking, sklearn2sql allows one to translate a scikit learn model as a large, machine-friendly ;) SQL code that can later be executed on your favorite database. For example, this is a multilayer perceptron on oracle , and this is a random forest on postgresql ....

Extensions

Since the beginning of this project, some extensions have been added to support machine learning models built using tools similar to scikit-learn. The goal is to be able to generate the deployment SQL code for any kind of classification and regression model on any kind of SQL-capable database. These extensions share the same SQL generation layer used for scikit-learn.

  1. A caret2sql project has been added to support R caret models. Some R jupyter notebook demos are available. It supports most used R machine learning models.

  2. For deep learning models (neural network models), the keras2sql project has been added to support models built using the Keras framework with TensorFlow, Theano, and CNTK. Some demo python jupyter notebooks are available.

  3. PyTorch Deep Learning models are also supported through pytorch2sql. Some demo python jupyter notebooks are available.

  4. A similar generation process has been added for C++ backends through ml2cpp.

    1. It generates a simple, readable C++ code that maps easily with the model structure. Facilitates debugging and integration.
    2. The project uses the same low-level layers as sklearn2sql.
    3. It supports all the models supported by the SQL backend.
    4. It generates C++ code that can be executed on almost any hardware platform that has a serious C++ compiler (GCC welcome).
    5. Some demo python jupyter notebooks are available.
    6. The C++ code is even runnable on very small platforms (STM32, ESP32, Kendryte etc).
  5. A Heroku-based web service can be used to generate SQL code for a given model. scikit-learn, keras and caret models are supported. SQL and C++ backends supported.

  6. ... (wip) ...

Supported Databases

Support for most popular relational databases has been added progressively. Now, sklearn2sql supports almost all the leading relational databases referenced on DB-Engines.

  1. Open source databases : PostgreSQL (Just perfect !!!, most dervied database), MariaDB (contribued some CTE-related bugs for this project. Very reactive team. All bugs were fixed !!!!
  2. Commercial databases : Oracle, MS SQL Server, IBM DB2, Teradata (to cover 95% of the market and get real-world tests)
  3. Embedded databases : SQLite (even in-memory ;). Nice for prototyping, documentation and development. Zero config. Available everywhere (on Android and iOS devices and inside jupyter notebooks ;).
  4. Hadoop databases : Hive and Impala
  5. Other : Firebird (low memory footprint. A stress test ;) , Monetdb (columnar, a SQL quality reminder ;)

sklearn2sql-demo's People

Contributors

antoinecarme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.