Coder Social home page Coder Social logo

valeriocardoso / skope-rules Goto Github PK

View Code? Open in Web Editor NEW

This project forked from scikit-learn-contrib/skope-rules

0.0 0.0 0.0 4.66 MB

machine learning with logical rules in Python

Home Page: http://skope-rules.readthedocs.io

License: Other

Shell 4.34% Python 32.50% Batchfile 2.61% Jupyter Notebook 60.55%

skope-rules's Introduction

Travis_ Coveralls_ CircleCI_ Python27_ Python35_

image

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for "scoping" a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

See the AUTHORS.rst file for a list of contributors.

image

Installation

You can get the latest sources with pip :

pip install skope-rules

Quick Start

SkopeRules can be used to describe classes with logical rules :

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset = load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=feature_names)

for idx, species in enumerate(dataset.target_names):
    X, y = dataset.data, dataset.target
    clf.fit(X, y == idx)
    rules = clf.rules_[0:3]
    print("Rules for iris", species)
    for rule in rules:
        print(rule)
    print()
    print(20*'=')
    print()

:

SkopeRules can also be used as a predictor if you use the "score_top_rules" method :

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset = load_boston()
clf = SkopeRules(max_depth_duplication=None,
                 n_estimators=30,
                 precision_min=0.2,
                 recall_min=0.01,
                 feature_names=dataset.feature_names)

X, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y[:len(y)//2]
X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train)
y_score = clf.score_top_rules(X_test) # Get a risk score for each test example
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall curve')
plt.show()

:

For more examples and use cases please check our documentation. You can also check the demonstration notebooks.

The main advantage of decision rules is that they are offering interpretable models. The problem of generating such rules has been widely considered in machine learning, see e.g. RuleFit [1], Slipper [2], LRI [3], MLRules[4].

A decision rule is a logical expression of the form "IF conditions THEN response". In a binary classification setting, if an instance satisfies conditions of the rule, then it is assigned to one of the two classes. If this instance does not satisfy conditions, it remains unassigned.

  1. In [2, 3, 4], rules induction is done by considering each single decision rule as a base classifier in an ensemble, which is built by greedily minimizing some loss function.
  2. In [1], rules are extracted from an ensemble of trees; a weighted combination of these rules is then built by solving a L1-regularized optimization problem over the weights as described in [5].

In this package, we use the second approach. Rules are extracted from tree ensemble, which allow us to take advantage of existing fast algorithms (such as bagged decision trees, or gradient boosting) to produce such tree ensemble. Too similar or duplicated rules are then removed, based on a similarity threshold of their supports.. The main goal of this package is to provide rules verifying precision and recall conditions. It still implement a score (decision_function) method, but which does not solve the L1-regularized optimization problem as in [1]. Instead, weights are simply proportional to the OOB associated precision of the rule.

This package also offers convenient methods to compute predictions with the k most precise rules (cf score_top_rules() and predict_top_rules() functions).

[1] Friedman and Popescu, Predictive learning via rule ensembles,Technical Report, 2005.

[2] Cohen and Singer, A simple, fast, and effective rule learner, National Conference on Artificial Intelligence, 1999.

[3] Weiss and Indurkhya, Lightweight rule induction, ICML, 2000.

[4] Dembczyński, Kotłowski and Słowiński, Maximum Likelihood Rule Ensembles, ICML, 2008.

[5] Friedman and Popescu, Gradient directed regularization, Technical Report, 2004.

Dependencies

skope-rules requires:

  • Python (>= 2.7 or >= 3.3)
  • NumPy (>= 1.10.4)
  • SciPy (>= 0.17.0)
  • Pandas (>= 0.18.1)
  • Scikit-Learn (>= 0.17.1)

For running the examples Matplotlib >= 1.1.1 is required.

Documentation

You can access the full project documentation here

You can also check the notebooks/ folder which contains some examples of utilization.

skope-rules's People

Contributors

ngoix avatar vighneshbirodkar avatar floriangardin avatar datajms avatar stat17-hb avatar mechcoder avatar tomdlt avatar timstaley avatar agramfort avatar bryandeng avatar 8bit-pixies avatar fabianp avatar lawwu avatar phylliade avatar arokem avatar kjacks21 avatar mrahim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.