csinva / imodels Goto Github PK
View Code? Open in Web Editor NEWInterpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Home Page: https://csinva.io/imodels
License: MIT License
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Home Page: https://csinva.io/imodels
License: MIT License
When I run SlimRegressor on a regression data
from imodels import SLIMRegressor
model = SLIMRegressor()
for lambda_reg in [0, 1e-2, 5e-2, 1e-1, 1, 2]:
model.fit(X_train, y_train, lambda_reg)
mse = np.mean(np.square(y_train - model.predict(X_train)))
print(f'lambda: {lambda_reg}\tmse: {mse: 0.2f}\tweights: {model.model.coef_}')
I get
SolverError: Either candidate conic solvers (['GLPK_MI']) do not support the cones output by the problem (SOC, NonNeg), or there are not enough constraints in the problem.
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
in
2 model = SLIMRegressor()
3 for lambda_reg in [0, 1e-2, 5e-2, 1e-1, 1, 2]:
----> 4 model.fit(X_train, y_train, lambda_reg)
5 mse = np.mean(np.square(y_train - model.predict(X_train)))
6 print(f'lambda: {lambda_reg}\tmse: {mse: 0.2f}\tweights: {model.model.coef_}')
~/opt/anaconda3/lib/python3.7/site-packages/imodels/algebraic/slim.py in fit(self, X, y, lambda_reg, sample_weight)
58 except:
59 m = Lasso(alpha=lambda_reg)
---> 60 m.fit(X, y, sample_weight=sample_weight)
61 self.model.coef_ = np.round(m.coef_).astype(np.int)
62 self.model.intercept_ = m.intercept_
TypeError: fit() got an unexpected keyword argument 'sample_weight'
Hello - thanks all for the very interesting looking package. The hierarchical shrinkage wrapper seems especially interesting/novel. I'm interested in whether it would be possible to add sample weight support to this package? For background, sample weights are a fairly typical part of many scikit-learn packages (e.g., RandomForestRegressor
or HistGradientBoostingRegressor
, etc...), and are passed via the fit call, e.g., model.fit(X_train, y_train, sample_weight = w_train)
.
The purpose of sample weights is to increase the weighting of rows/observations based on some external criteria, typically based around how the training data was gathered, e.g., if your data has different sensors of varying sensitivity, you may increase the sample weighting of certain sensors. Or alternatively if your data is aggregated in some form, then you can increase the weights based on the aggregation (e.g., weekly data with a weight of 7, daily data with a weight of 1, etc...).
In terms of implementation, it's typically as simple as multiplying the loss for each row by the sample weights, to increase the model's sensitivity to large weightings, although I'm not sure if the novel hierarchical shrinkage capabilities of this package would present complications.
Thanks again for the very interesting looking package. I look forward to testing and using it.
I have a training dataset of 1473711 records.
After throwing it to FIGSRegressor.fit()
, it's been running for almost 3hrs without evidence of stopping.
Looking at the processes I see there are 4 running parallel:
259883 Sl 0:00 /usr/bin/python -c from joblib.externals.loky.backend.resource_tracker import main; main(60, False)
259885 S 1:47 /usr/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-1 --pipe 73
259886 S 2:18 /usr/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-2 --pipe 74
259887 S 2:01 /usr/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-3 --pipe 75
259888 S 1:48 /usr/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-4 --pipe 77
However in top
I see only 1/4 CPU getting utilized:
Any chance this is a bug or something expected?
Thanks.
when i am putting feature_names= X.columns only the first feature is appearing in the rule list and others are appearing as feat i. unable to fix this and request for your kind support.
here is the output snippet:
Selected features: Index(['Processor(P99)_Q', 'Opto(F99)_Q', 'Logic(L99)_Am', 'Qualcom',
'Toshiba', 'ABB', 'Whirlpool', 'Honeywell'],
dtype='object')
mean 0.6 (30 pts)
if Whirlpool >= 153 then 1.0 (16 pts)
mean 0.143 (14 pts)
if feat 1 >= 16882885 then 1.0 (2 pts)
mean 0.0 (12 pts)
Hi, the .ix is deprecated, so when I try to call ruleFit.get_rules(exclude_zero_coef=True), it will have an error.
if exclude_zero_coef:
rules = rules.ix[rules.coef != 0]
Hello~ I find when I use the RuleFitClassifier with the parameter tree_generator, there are some problems:
In your document, you have mentioned that the value Must be GradientBoostingRegressor or GradientBoostingClassifier.
But when I am doing it, it has errors to remind me to use RandomForest and BoostingRegressor.
I want to confirm whether the tree_generote can not support the classifier??? Or is my wrong to use it, can you give an example about how to do so? Thanks!!
The following code snippet results in an error:
from sklearn.datasets import load_iris
from imodels import RuleFitClassifier
iris = load_iris()
X, Y = iris.data, iris.target
rulefit = RuleFitClassifier()
rulefit.fit(X, Y)
print(rulefit)
The error reads:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_208411/3401153452.py in <cell line: 9>()
7 rulefit = RuleFitClassifier()
8 rulefit.fit(X, Y)
----> 9 print(rulefit)
~/.cache/pypoetry/virtualenvs/xlemoo-6BFI3yUJ-py3.8/lib/python3.8/site-packages/imodels/rule_set/rule_fit.py in __str__(self)
247 s += '> \tPredictions are made by summing the coefficients of each rule\n'
248 s += '> ------------------------------\n'
--> 249 return s + self.visualize().to_string(index=False) + '\n'
250
251 def _extract_rules(self, X, y) -> List[Rule]:
~/.cache/pypoetry/virtualenvs/xlemoo-6BFI3yUJ-py3.8/lib/python3.8/site-packages/imodels/rule_set/rule_fit.py in visualize(self, decimals)
237
238 def visualize(self, decimals=2):
--> 239 rules = self._get_rules()
240 rules = rules[rules.coef != 0].sort_values("support", ascending=False)
241 pd.set_option('display.max_colwidth', None)
~/.cache/pypoetry/virtualenvs/xlemoo-6BFI3yUJ-py3.8/lib/python3.8/site-packages/imodels/rule_set/rule_fit.py in _get_rules(self, exclude_zero_coef, subregion)
208 for i in range(0, n_features):
209 if self.lin_standardise:
--> 210 coef = self.coef[i] * self.friedscale.scale_multipliers[i]
211 else:
212 coef = self.coef[i]
IndexError: index 4 is out of bounds for axis 0 with size 4
I tried to look into this issue myself, but I am not familiar enough with the method to make any definitive claims. However, this line of code seems fishy. Why not just use the actual number of features stored in self.n_features
? Could be a source of the indexing error.
Hello, thanks again for the great library!
I'm interested in applying HSTree to RandomForest and ExtraTrees models.
According to the documentation, I can specify a random forest object in the estimator_
argument, however this raises an error when I try to fit:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
import imodels
from imodels.tree.hierarchical_shrinkage import HSTreeClassifier
model = HSTreeClassifier(estimator_=model)
model = model.fit(X, y)
File "/Users/neerick/workspace/code/autogluon/tabular/src/autogluon/tabular/models/rf/rf_model.py", line 232, in _fit
model = model.fit(X, y, sample_weight=sample_weight)
File "/Users/neerick/workspace/virtual/autogluon/lib/python3.8/site-packages/imodels/tree/hierarchical_shrinkage.py", line 64, in fit
self.complexity_ = compute_tree_complexity(self.estimator_.tree_)
AttributeError: 'RandomForestClassifier' object has no attribute 'tree_'
https://github.com/csinva/imodels/blob/master/imodels/tree/hierarchical_shrinkage.py
I don't see any tutorial / documentation for creating a random forest or extra trees model via HSTree, but the paper mentions that this is possible and gets good results. I was wondering if the maintainers could point me to an example or tutorial on this.
Thanks!
Hello
Thanks for consolidating the implementations into one nice package. I was running SkopeRules on the diabetes dataset and saw that the results are
('Insulin > 142.0 and Age > 26.5', (0.8732394366197183, 0.7005649717514124, 1))
('Insulin <= 187.5 and Insulin > 121.0 and Age > 24.5', (0.8862208393458393, 0.6502540183068366, 3))
('Insulin <= 169.75 and Insulin > 168.75', (1.0, 0.5337078651685393, 1))
('Insulin > 121.0 and BMI > 30.300000190734863 and Age <= 27.5', (0.5128579777907656, 0.17596669877528553, 2))
('Glucose <= 167.5 and Insulin > 169.75', (0.38333333333333336, 0.12921348314606743, 1))
('Insulin <= 169.75 and Insulin > 143.0 and Age <= 26.5', (0.6923076923076923, 0.10465116279069768, 1))
Looking at the code, I see that the first two elements after the Rule are precision and recall, what is the third integer?
Thanks
Uday
Hi,
I was trying to replicate some of the Random Forest results from the paper, specifically Figure 3(D) for the diabetes
dataset - but I am unable to see the gap in AUC, as presented in the paper. Its probably me doing something silly :) - appreciate some help!
To simplify identifying a good max_depth
for a Random Forest object, I'm using this class- this allows me to use scikit's GridSearchCV
:
import utils as data_utils
import numpy as np, pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.base import BaseEstimator
from imodels import HSTreeClassifierCV
from matplotlib import pyplot as plt
import seaborn as sns; sns.set()
class HSRF(BaseEstimator):
def __init__(self, reg_param_space, num_trees, max_depth):
self.reg_param_space = reg_param_space
self.num_trees = num_trees
self.max_depth = max_depth
self.HSTCV = None
self.classes_ = None
def fit(self, X, y):
base_clf = RandomForestClassifier(n_estimators=self.num_trees, max_depth=self.max_depth)
clf = HSTreeClassifierCV(base_clf, reg_param_list=self.reg_param_space)
clf.fit(X, y)
self.HSTCV = clf
# this is needed for scikit's scorer to work
self.classes_ = clf.estimator_.classes_
return clf
def predict(self, X):
return self.HSTCV.predict(X)
def predict_proba(self, X):
return self.HSTCV.predict_proba(X)`
And here's my code - the X
and y
values passed in are from the diabetes
dataset:
def run_mwe(X, y):
reg_param_space = [0.1, 1.0, 10.0, 25.0, 50.0, 100.0] # these are from the paper, Section 4.2
num_trees_space = np.arange(2, 21, 2)
max_depth_space = np.arange(1, 12, 2)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, train_size=0.7)
num_folds = 3
df = pd.DataFrame(columns=['method', 'score', 'num_trees', 'cv_max_depth'])
# RF experiments first
for nt in num_trees_space:
base_clf = RandomForestClassifier(class_weight='balanced', n_estimators=nt)
clf = GridSearchCV(base_clf, cv=num_folds, param_grid={'max_depth': max_depth_space}, refit=True,
scoring='roc_auc')
clf.fit(X_train, y_train)
pr = clf.best_estimator_.predict_proba(X_test)[:, 1]
score = roc_auc_score(y_test, pr)
df = df.append({'method': 'RF', 'score': score, 'num_trees': nt, 'cv_max_depth': clf.best_params_['max_depth']},
ignore_index=True)
# HS Tree experiments next
for nt in num_trees_space:
clf = GridSearchCV(HSRF(reg_param_space=reg_param_space, num_trees=nt, max_depth=1),
param_grid={'max_depth': max_depth_space}, cv=num_folds, scoring='roc_auc', verbose=4,
refit=True)
clf.fit(X_train, y_train)
pr = clf.best_estimator_.predict_proba(X_test)[:, 1]
score = roc_auc_score(y_test, pr)
df = df.append(
{'method': 'HSRF', 'score': score, 'num_trees': nt, 'cv_max_depth': clf.best_params_['max_depth']},
ignore_index=True)
return df
When I plot the columns score
against num_trees
in df
, I see something like this:
for the IRFClassifier, there seems to have a missing file (irf)
from irf.ensemble import wrf
ModuleNotFoundError: No module named 'irf'
When I look in the source files, this file does not exists.
Thanks!
Add any required translation code to allow imodels
trees to be plotted with dtreeviz
. This basically boils down to successfully generating a ShadowDecTree
object from an imodels
tree.
We can reuse the existing ShadowSKDTree
constructor by converting imodels
trees into sklearn
objects, then calling:
sk_dtree = ShadowSKDTree(tree_classifier, X, y, features, target, [0, 1])
Alternatively, we can make an imodels
specific implementation of ShadowDecTree
, similar to the sklearn
implementation here, but that may be more work than necessary.
In addition to the pypi package, please add a conda-forge package (https://conda-forge.org).
You can easily create a boilerplate conda recipe with grayskull (starting from the pypi package): https://github.com/conda-incubator/grayskull
It seems that this package does not support categorical variables? Right?
Sometimes fails when feature_names not passed...
Sometimes fails when only passed one input column
Willing to help.
References:
When running a certain number of experiments with different splits of a given dataset, I see that GreedyRuleListClassifier
's accuracy wildly varies, and sometimes the code (see for loop below) crashes.
So, for example running 10 experiments like this, with different random splits of the same set:
import pandas
import sklearn
import sklearn.datasets
from sklearn.model_selection import train_test_split
from imodels import GreedyRuleListClassifier
X, Y = sklearn.datasets.load_breast_cancer(as_frame=True, return_X_y=True)
model = GreedyRuleListClassifier(max_depth=10)
for i in range(10):
try:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)
model.fit(X_train, y_train, feature_names=X_train.columns)
y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test.values,y_pred)
print('Accuracy:\n', score)
except KeyError as e:
print("Failed with KeyError")
Will give as output something along the lines of
Accuracy: 0.6081871345029239
Failed with KeyError
Accuracy: 0.4619883040935672
Accuracy: 0.45614035087719296
Accuracy: 0.2222222222222222
Failed with KeyError
Failed with KeyError
Failed with KeyError
Accuracy: 0.18128654970760233
Failed with KeyError
Is this intended behavior? While my test dataset is smallish, the variation in accuracy is still surprising for me and so is the throwing of a KeyError
. I'm using scikit-learn==1.0.2
and imodels=1.3.6
and can edit the issue here to add more details.
Incidentally, the same behaviour was observed in https://datascience.stackexchange.com/a/116283/50519, noticed by @jonnor.
Thanks!
I have a training dataset of around 1.5m records. I was trying to get FIGSRegressor to fit it, and it's been running more than 2hrs without any indication about its progress.
It'd be great to have verbose: int
param in the constructor to report what's happening within the fitting process based on the level (in int) passed to it.
E.g.
ensemble.RandomForestRegressor(n_jobs=-1, random_state=rand_state, verbose=1)
ensemble.BaggingRegressor(n_jobs=-1, random_state=rand_state, verbose=1)
xgb.XGBRegressor(verbosity=1, booster='gbtree', n_jobs=-1, random_state=rand_state)
lgb.LGBMRegressor(num_leaves=2047, random_state=rand_state, force_col_wise=True, verbose=1)
Thanks.
The readme page claims that hierarchical shrinkage supports any sklearn tree-based model, but in reality it only works with sklearn.ensemble.GradientBoostingRegressor and sklearn.ensemble.GradientBoostingClassifier. When used with sklearn.ensemble.HistGradientBoostingRegressor, the _shrink method is nullified because neither of the two conditions is true:
https://github.com/csinva/imodels/blob/master/imodels/tree/hierarchical_shrinkage.py#L125-L127 . This is same for HistGradientBoostingClassifier.
We should either add support to hist boosting trees, or clarify this in the readme.
While importing imodels
in python 3.10.7 I receive the following deprecation warning:
<frozen importlib._bootstrap>:283: DeprecationWarning: the load_module() method is deprecated and slated for removal in Python 3.12; use exec_module() instead
After some searching I believe it is coming from the setuptools
package in setup.py
, but I am not sure how to fix it.
I am utilizing the OptimalTreeClassifier model from GOSDT as shown in the repository example itself. But unable to print the tree and is throwing an "AttributeError: 'OptimalTreeClassifier' object has no attribute 'classes_' " error as shown
But this is throwing a error internally
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_33/809247974.py in
----> 1 test_classification_binary()
/tmp/ipykernel_33/280002779.py in test_classification_binary()
23 # test acc
24 acc_train = np.mean(preds == new_y)
---> 25 print(type(m),m, 'final acc', acc_train)
26 assert acc_train > 0.8, 'acc greater than 0.8'
/opt/conda/lib/python3.7/site-packages/imodels/tree/cart_wrapper.py in str(self)
58 return 'GreedyTree:\n' + export_text(self, feature_names=self.feature_names, show_weights=True)
59 else:
---> 60 return 'GreedyTree:\n' + export_text(self, show_weights=True)
61
62
/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
/opt/conda/lib/python3.7/site-packages/sklearn/tree/export.py in export_text(decision_tree, feature_names, max_depth, spacing, decimals, show_weights)
872 tree = decision_tree.tree_
873 if is_classifier(decision_tree):
--> 874 class_names = decision_tree.classes_
875 right_child_fmt = "{} {} <= {}\n"
876 left_child_fmt = "{} {} > {}\n"
AttributeError: 'OptimalTreeClassifier' object has no attribute 'classes_'
`
Implementing a Dynamic CDIs class based on FIGS
.
TODOs:
D-FIGS
in a new file imodels/tree/dynamic_figs.py
More details:
D-FIGS
class should inherit from FIGS
class, and take an additional dictionary at initialization, corresponding to the features phases.fit
or predict
methods, the class should verify that the matrix D-FIGS
should infer the phase from the matrix.imodels/tests/dynamic_figs_test.py
, using pytest (see package documentation or you can use the figs test as reference)Hi,
When using RuleFitClassifier(tree_generator = GradientBoostingClassifier())
with a GradientBoostingClassifier()
object fitted and optimized separately via Scikitlearn API, it returns the next error when fitting RuleFitClassifier(tree_generator = GradientBoostingClassifier())
:
ValueError: n_estimators=1 must be larger or equal to estimators_.shape[0]=100 when warm_start==True
When inspecting whats inside RuleFitClassifier(tree_generator = GradientBoostingClassifier())
after fitting the model, the GradientBoostingClassifier()
is completely modified to other parameters different than those optimized before fitting RuleFitClassifier()
, i.e., GradientBoostingClassifier(max_leaf_nodes=4, n_estimators=1, random_state=0, warm_start=True)
. Not sure why these parameters (from the GradientBoostingClassifier()
) are changed inside the RuleFitClassifier()
object.
If RuleFitClassifier(tree_generator = None)
, everything works well.
As per documentation:
tree_generator : Optional: this object will be used as provided to generate the rules.
This will override almost all the other properties above. Must be GradientBoostingRegressor(), GradientBoostingClassifier(), or RandomForestRegressor()
RuleFitClassifier()
that are override if tree_generator=GradientBoostingClassifier()
?Here is the closest solution I found in Issue #34, however the behavior is not clear.
Any help will be highly appreciated.
Many thanks!
Thanks for the work! It's awsome!
After imodel being updated to 1.3.8, we've got the error msg 'BoostedRulesClassifier' object has no attribute 'complexity_'
. Wonder is it removed or renamed? It is generally better to keep public apis/attributes unchanged during minor releases, any plan to add it back?
Hi,
When I use the BoostedRulesClassifier, it sometimes throws an exception as follows:
This BoostedRulesClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
I find that the exception results from the implementation of the class RuleSet:
` def _eval_weighted_rule_sum(self, X) -> np.ndarray:
check_is_fitted(self, ['rules_without_feature_names_', 'n_features_', 'feature_placeholders'])
X = check_array(X)
if X.shape[1] != self.n_features_:
raise ValueError("X.shape[1] = %d should be equal to %d, the number of features at training time."
" Please reshape your data."
% (X.shape[1], self.n_features_))
df = pd.DataFrame(X, columns=self.feature_placeholders)
selected_rules = self.rules_without_feature_names_
scores = np.zeros(X.shape[0])
for r in selected_rules:
features_r_uses = list(map(lambda x: x[0], r.agg_dict.keys()))
scores[df[features_r_uses].query(str(r)).index.values] += r.args[0]
return scores`
Specifically, when the computer runs the check_is_fitted(self,
['rules_without_feature_names_', 'n_features_', 'feature_placeholders'])
, it finds that self.rules_without_feature_names_ does not exist, so the computer throws the above exception.
And I further review my code and data set, I find that my training set is easy to train a classifier, so the training error of the estimator is close to zero, it may result in a bug in the fit function of the class BoostedRulesClassifier:
`
for _ in range(self.n_estimators):
# Fit a classifier with the specific weights
clf = self.estimator()
clf.fit(X, y, sample_weight=w) # uses w as the sampling weight!
preds = clf.predict(X)
self.estimator_mean_prediction_.append(np.mean(preds)) # just for printing
# Indicator function
miss = preds != y
# Equivalent with 1/-1 to update weights
miss2 = np.ones(miss.size)
miss2[~miss] = -1
# Error
err_m = np.dot(w, miss) / sum(w)
if err_m < 1e-3:
return self
# Alpha
alpha_m = 0.5 * np.log((1 - err_m) / float(err_m))
# New weights
w = np.multiply(w, np.exp([float(x) * alpha_m for x in miss2]))
self.estimators_.append(deepcopy(clf))
self.estimator_weights_.append(alpha_m)
self.estimator_errors_.append(err_m)
rules = []
`
Because the error_m is zero, so it directly returns self without executing subsequent statements, in such a case, self.rules_without_feature_names_ dose not exist.
My current solution to this bug is to modify the following code fragment in the fit function of the class BoostedRulesClassifier:
`
# Error
err_m = np.dot(w, miss) / sum(w)
# modification ###########################
if err_m < 1e-3:
# return self
w = np.ones(miss.size) / len(y)
self.estimators_.append(deepcopy(clf))
self.estimator_weights_.append(float("inf"))
self.estimator_errors_.append(err_m)
break
####################################
# Alpha
alpha_m = 0.5 * np.log((1 - err_m) / float(err_m))
`
I'm not sure whether it may introduce new defects, but it indeed solves the exception.
Running SLIMRegressor on the Regression dataset https://www.kaggle.com/mirichoi0218/insurance with the handling of categorical values, the error I get is
SolverError Traceback (most recent call last)
in
1 from imodels import SLIMRegressor
2 rf = SLIMRegressor()
----> 3 rf.fit(X, y)
~/opt/anaconda3/lib/python3.7/site-packages/imodels/algebraic/slim.py in fit(self, X, y, lambda_reg, sample_weight)
49
50 # solve the problem using an appropriate solver
---> 51 prob.solve()
52 self.model.coef_ = w.value.astype(np.int)
53 self.model.intercept_ = 0
~/opt/anaconda3/lib/python3.7/site-packages/cvxpy/problems/problem.py in solve(self, *args, **kwargs)
394 else:
395 solve_func = Problem._solve
--> 396 return solve_func(self, *args, **kwargs)
397
398 @classmethod
~/opt/anaconda3/lib/python3.7/site-packages/cvxpy/problems/problem.py in _solve(self, solver, warm_start, verbose, gp, qcp, requires_grad, enforce_dpp, **kwargs)
749
750 data, solving_chain, inverse_data = self.get_problem_data(
--> 751 solver, gp, enforce_dpp)
752 solution = solving_chain.solve_via_data(
753 self, data, warm_start, verbose, kwargs)
~/opt/anaconda3/lib/python3.7/site-packages/cvxpy/problems/problem.py in get_problem_data(self, solver, gp, enforce_dpp)
498 self._cache.invalidate()
499 solving_chain = self._construct_chain(
--> 500 solver=solver, gp=gp, enforce_dpp=enforce_dpp)
501 self._cache.key = key
502 self._cache.solving_chain = solving_chain
~/opt/anaconda3/lib/python3.7/site-packages/cvxpy/problems/problem.py in _construct_chain(self, solver, gp, enforce_dpp)
655 A solving chain
656 """
--> 657 candidate_solvers = self._find_candidate_solvers(solver=solver, gp=gp)
658 return construct_solving_chain(self, candidate_solvers, gp=gp,
659 enforce_dpp=enforce_dpp)
~/opt/anaconda3/lib/python3.7/site-packages/cvxpy/problems/problem.py in _find_candidate_solvers(self, solver, gp)
614 in incorrect solutions and is not recommended.
615 """
--> 616 raise error.SolverError(msg)
617 candidates['qp_solvers'] = [
618 s for s in candidates['qp_solvers']
SolverError:
You need a mixed-integer solver for this model. Refer to the documentation
https://www.cvxpy.org/tutorial/advanced/index.html#mixed-integer-programs
for discussion on this topic.
Quick fix 1: if you install the python package CVXOPT (pip install cvxopt),
then CVXPY can use the open-source mixed-integer solver `GLPK`.
Quick fix 2: you can explicitly specify solver='ECOS_BB'. This may result
in incorrect solutions and is not recommended.
how to set maximum number of rules and maximum length of rules for rulefit ?
when I set to max_rules = 4
I get this
rules
rule ... importance
0 crim ... 1.437129
7 dis ... 2.151996
12 lstat ... 2.533759
11 black ... 0.683094
1 zn ... 0.307754
9 tax ... 1.825235
8 rad ... 2.150141
10 ptratio ... 1.735849
6 age ... 0.489965
5 rm ... 1.107437
4 nox ... 1.251279
3 chas ... 0.213609
2 indus ... 0.032803
13 rm <= 6.805000066757202 ... 1.701190
17 dis > 1.3727499842643738 & rm <= 6.82150006294... ... 0.221855
22 lstat <= 9.890000343322754 & dis > 1.372749984... ... 0.030449
19 rm <= 6.821500062942505 & dis <= 2.00444996356... ... 0.342680
18 rm > 6.821500062942505 & lstat <= 4.7200000286... ... 2.445310
21 rm > 6.821500062942505 & lstat <= 4.7200000286... ... 0.836880
15 dis <= 1.3727499842643738 & rm <= 6.8215000629... ... 2.641944
[20 rows x 5 columns]
rules.columns
Index(['rule', 'type', 'coef', 'support', 'importance'], dtype='object')
from imodels import RuleFit
import pandas as pd
import numpy as np
#boston_data = pd.read_csv("../data/boston.csv", index_col=0)
boston_data = pd.read_csv("boston.csv", index_col=0)
y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.values
rf = RuleFit(max_rules = 4)
'''
Parameters
----------
tree_size: Number of terminal nodes in generated trees. If exp_rand_tree_size=True,
this will be the mean number of terminal nodes.
sample_fract: fraction of randomly chosen training observations used to produce each tree.
FP 2004 (Sec. 2)
max_rules: approximate total number of rules generated for fitting. Note that actual
number of rules will usually be lower than this due to duplicates.
memory_par: scale multiplier (shrinkage factor) applied to each new tree when
sequentially induced. FP 2004 (Sec. 2)
rfmode: 'regress' for regression or 'classify' for binary classification.
lin_standardise: If True, the linear terms will be standardised as per Friedman Sec 3.2
by multiplying the winsorised variable by 0.4/stdev.
lin_trim_quantile: If lin_standardise is True, this quantile will be used to trim linear
terms before standardisation.
exp_rand_tree_size: If True, each boosted tree will have a different maximum number of
terminal nodes based on an exponential distribution about tree_size.
(Friedman Sec 3.3)
model_type: 'r': rules only; 'l': linear terms only; 'rl': both rules and linear terms
random_state: Integer to initialise random objects and provide repeatability.
tree_generator: Optional: this object will be used as provided to generate the rules.
This will override almost all the other properties above.
Must be GradientBoostingRegressor or GradientBoostingClassifier, optional (default=None)
'''
rf.fit(X, y, feature_names=features)
preds = rf.predict(X)
print(f'train mse: {np.mean(np.square(preds-y)):0.2f}')
rules = rf.get_rules()
rules = rules[rules.coef != 0].sort_values("support", ascending=False)
#rules[['rule', 'coef', 'support']].head().style.background_gradient(cmap='viridis')
rules[['rule', 'coef', 'support']].head()
q=0
Hi guys,
This is an amazing library! Thank you for your hard work.
I am testing the FIGSRegressor
algorithm after reading your paper but it seems that I cannot load the CV implementation. I have tried both FIGSRegressorCV
and FIGSCV
but I keep getting ImportError: cannot import name 'FIGSCV' from 'imodels'
.
Thank you
Does HSTree support multiclass classification problems with RandomForest / ExtraTrees as the estimator?
From my initial tests it appears buggy. Calling predict_proba
with the final model results in lots of NaN predictions, along with warnings during training such as:
/Users/neerick/workspace/virtual/autogluon/lib/python3.8/site-packages/imodels/tree/hierarchical_shrinkage.py:87: RuntimeWarning: invalid value encountered in double_scalars
val = tree.value[i][0, 1] / (tree.value[i][0, 0] + tree.value[i][0, 1]) # binary classification
If helpful I can try to create a reproducible example.
Here is an example result comparing with sklearn default RF (_og_
) with accuracy metric. Because HSTree returns many NaN predictions, the scores are very low.
One observation is the scores get worse the more trees there are in HSTree forests. I'd guess the likelihood of returning a NaN result is increasing with the number of trees.
model score_test score_val pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 RandomForest_og_n300 0.711651 0.723618 0.985573 0.050956 0.519926 0.985573 0.050956 0.519926 1 True 1
1 RandomForest_og_n100 0.710154 0.748744 0.453769 0.019050 0.170951 0.453769 0.019050 0.170951 1 True 2
2 WeightedEnsemble_L2 0.710154 0.748744 0.464755 0.019376 0.295161 0.010986 0.000326 0.124210 2 True 36
3 RandomForest_og_n40 0.700636 0.698492 0.193009 0.010738 0.088012 0.193009 0.010738 0.088012 1 True 3
4 RandomForest_og_n20 0.692039 0.698492 0.103616 0.007549 0.057396 0.103616 0.007549 0.057396 1 True 4
5 RandomForest_og_n10 0.674165 0.688442 0.075296 0.006166 0.041720 0.075296 0.006166 0.041720 1 True 5
6 RandomForest_hs=10_n10 0.521949 0.537688 0.070260 0.005246 0.082384 0.070260 0.005246 0.082384 1 True 15
7 RandomForest_hs=50_n10 0.520839 0.517588 0.075151 0.004875 0.071219 0.075151 0.004875 0.071219 1 True 20
8 RandomForest_hs=0.1_n10 0.520796 0.537688 0.074070 0.005233 0.093299 0.074070 0.005233 0.093299 1 True 35
9 RandomForest_hs=1_n10 0.520692 0.542714 0.077687 0.005690 0.075061 0.077687 0.005690 0.075061 1 True 10
10 RandomForest_hs=100_n10 0.519246 0.517588 0.075059 0.006019 0.082536 0.075059 0.006019 0.082536 1 True 25
11 RandomForest_hs=500_n10 0.488877 0.517588 0.072145 0.005125 0.072223 0.072145 0.005125 0.072223 1 True 30
12 RandomForest_hs=1_n20 0.485125 0.472362 0.113002 0.006484 0.123639 0.113002 0.006484 0.123639 1 True 9
13 RandomForest_hs=0.1_n20 0.485005 0.472362 0.111342 0.005953 0.146246 0.111342 0.005953 0.146246 1 True 34
14 RandomForest_hs=10_n20 0.484833 0.482412 0.104076 0.006577 0.131909 0.104076 0.006577 0.131909 1 True 14
15 RandomForest_hs=50_n20 0.482896 0.482412 0.115057 0.006263 0.130512 0.115057 0.006263 0.130512 1 True 19
16 RandomForest_hs=100_n20 0.480840 0.482412 0.108625 0.006045 0.135224 0.108625 0.006045 0.135224 1 True 24
17 RandomForest_hs=500_n20 0.458035 0.467337 0.108658 0.006302 0.123907 0.108658 0.006302 0.123907 1 True 29
18 RandomForest_hs=1_n40 0.451434 0.467337 0.185129 0.010619 0.210639 0.185129 0.010619 0.210639 1 True 8
19 RandomForest_hs=0.1_n40 0.451382 0.467337 0.170597 0.009024 0.244322 0.170597 0.009024 0.244322 1 True 33
20 RandomForest_hs=10_n40 0.451322 0.467337 0.173382 0.009955 0.210795 0.173382 0.009955 0.210795 1 True 13
21 RandomForest_hs=50_n40 0.450350 0.467337 0.170041 0.008673 0.236081 0.170041 0.008673 0.236081 1 True 18
22 RandomForest_hs=100_n40 0.449119 0.467337 0.169396 0.010918 0.226784 0.169396 0.010918 0.226784 1 True 23
23 RandomForest_hs=500_n40 0.435832 0.472362 0.162881 0.009256 0.202447 0.162881 0.009256 0.202447 1 True 28
24 RandomForest_hs=1_n100 0.420419 0.452261 0.442328 0.017688 0.480776 0.442328 0.017688 0.480776 1 True 7
25 RandomForest_hs=0.1_n100 0.420411 0.452261 0.354523 0.018247 0.548557 0.354523 0.018247 0.548557 1 True 32
26 RandomForest_hs=10_n100 0.419981 0.452261 0.355097 0.017487 0.469547 0.355097 0.017487 0.469547 1 True 12
27 RandomForest_hs=50_n100 0.419034 0.447236 0.344341 0.021125 0.465810 0.344341 0.021125 0.465810 1 True 17
28 RandomForest_hs=100_n100 0.418672 0.447236 0.372041 0.018402 0.477048 0.372041 0.018402 0.477048 1 True 22
29 RandomForest_hs=500_n100 0.415256 0.457286 0.338696 0.017128 0.492786 0.338696 0.017128 0.492786 1 True 27
30 RandomForest_hs=0.1_n300 0.381049 0.391960 0.967061 0.045552 1.533075 0.967061 0.045552 1.533075 1 True 31
31 RandomForest_hs=10_n300 0.381049 0.391960 1.109062 0.054005 1.442369 1.109062 0.054005 1.442369 1 True 11
32 RandomForest_hs=1_n300 0.381040 0.391960 1.677277 0.055421 2.346773 1.677277 0.055421 2.346773 1 True 6
33 RandomForest_hs=50_n300 0.380945 0.391960 0.889030 0.053650 1.320377 0.889030 0.053650 1.320377 1 True 16
34 RandomForest_hs=100_n300 0.380885 0.391960 1.031198 0.045266 1.254918 1.031198 0.045266 1.254918 1 True 21
35 RandomForest_hs=500_n300 0.380816 0.391960 0.948715 0.050209 1.266396 0.948715 0.050209 1.266396 1 True 26
Hi authors,
I am reviewing your submission and imodels repository.
I am looking into your software, but I cannot find a guide for installing your software.
If you support PyPI installation or installation with setuptools, please add some description about these for the future users of your software.
Best,
Jungtaek.
can you help to clarify how to set parameters for SBRL
1
to limit number of rules
2
to limit number of conditions in rules
but get best performance for this constrains
like done in by use of max_card=2, to set maximum number of rules to 2
https://github.com/fingoldin/pycorels
C = CorelsClassifier(max_card=2, c=0.0, verbosity=["loud", "samples"])
what and how to use from these
model = RuleListClassifier(max_iter=10000, class1label="diabetes", verbose=False)
'''
Parameters
----------
listlengthprior : int, optional (default=3)
Prior hyperparameter for expected list length (excluding null rule)
listwidthprior : int, optional (default=1)
Prior hyperparameter for expected list width (excluding null rule)
maxcardinality : int, optional (default=2)
Maximum cardinality of an itemset
minsupport : int, optional (default=10)
Minimum support (%) of an itemset
alpha : array_like, shape = [n_classes]
prior hyperparameter for multinomial pseudocounts
n_chains : int, optional (default=3)
Number of MCMC chains for inference
max_iter : int, optional (default=50000)
Maximum number of iterations
class1label: str, optional (default="class 1")
Label or description of what the positive class (with y=1) means
verbose: bool, optional (default=True)
Verbose output
random_state: int
Random seed
'''
Hello. 👋 I have no issues with the BRL classifier when I'm using datasets with all numeric or all categorical features. But when I use a dataset with both, I get the following error. The categorical feature in this dataset has already been one hot encoded and I'm passing those columns to the "undiscretized_features" parameter, but it looks like it's being encoded again anyway?
KeyError Traceback (most recent call last)
<ipython-input-13-a701282e6a2c> in <module>
1 cls = BayesianRuleListClassifier()
----> 2 cls.fit(X.values, Y, feature_names = X.columns, undiscretized_features = ["X1_N", "X1_Y"])
C:\ProgramData\Anaconda3\lib\site-packages\imodels\rule_list\bayesian_rule_list\bayesian_rule_list.py in fit(self, X, y, feature_names, undiscretized_features, verbose)
204 rule_strs = itemsets_to_rules(self.final_itemsets)
205 self.rules_without_feature_names_ = [Rule(r) for r in rule_strs]
--> 206 self.rules_ = [
207 replace_feature_name(rule, self.feature_dict_) for rule in self.rules_without_feature_names_
208 ]
C:\ProgramData\Anaconda3\lib\site-packages\imodels\rule_list\bayesian_rule_list\bayesian_rule_list.py in <listcomp>(.0)
205 self.rules_without_feature_names_ = [Rule(r) for r in rule_strs]
206 self.rules_ = [
--> 207 replace_feature_name(rule, self.feature_dict_) for rule in self.rules_without_feature_names_
208 ]
209
C:\ProgramData\Anaconda3\lib\site-packages\imodels\util\rule.py in replace_feature_name(rule, replace_dict)
74 replaced_agg_dict = {}
75 for feature, symbol in rule_replaced.agg_dict:
---> 76 replaced_agg_dict[(replace_dict[feature], symbol)] = rule_replaced.agg_dict[(feature, symbol)]
77 rule_replaced.agg_dict = replaced_agg_dict
78 return rule_replaced
KeyError: 'X_0_0.0'
I'm getting the following error when I try to use a string variable in my dataset:
TypeError Traceback (most recent call last)
in
----> 1 brl.fit(X_train, y_train, undiscretized_features=['agag_id'])
~/opt/anaconda3/lib/python3.8/site-packages/imodels/rule_list/bayesian_rule_list/bayesian_rule_list.py in fit(self, X, y, feature_labels, undiscretized_features, verbose)
119 raise Exception("Only binary classification is supported at this time!")
120
--> 121 itemsets, self.discretizer = extract_fpgrowth(X, y,
122 feature_labels=feature_labels,
123 minsupport=self.minsupport,
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/extract.py in extract_fpgrowth(X, y, feature_labels, minsupport, maxcardinality, undiscretized_features, verbose)
31
32 discretizer = BRLDiscretizer(X, y, feature_labels=feature_labels, verbose=verbose)
---> 33 X = discretizer.discretize_mixed_data(X, y, undiscretized_features)
34 X_df_onehot = discretizer.onehot_df
35
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in discretize_mixed_data(self, X, y, undiscretized_features)
286 "Warning: non-categorical data found. Trying to discretize. (Please convert categorical values to "
287 "strings, and/or specify the argument 'undiscretized_features', to avoid this.)")
--> 288 X = self.discretize(X, y)
289
290 self.discretized_X = X
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in discretize(self, X, y)
297 print("Discretizing ", self.discretized_features, "...")
298 D = pd.DataFrame(np.hstack((X, np.array(y).reshape((len(y), 1)))), columns=list(self.feature_labels) + ["y"])
--> 299 self.discretizer = MDLP_Discretizer(dataset=D, class_label="y", features=self.discretized_features)
300
301 cat_data = pd.DataFrame(np.zeros_like(X))
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in init(self, dataset, class_label, out_path_data, out_path_bins, features)
59 self._cuts = {f: [] for f in self._features}
60 # get cuts for all features
---> 61 self.all_features_accepted_cutpoints()
62 # discretize self._data
63 self.apply_cutpoints(out_data_path=out_path_data, out_bins_path=out_path_bins)
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in all_features_accepted_cutpoints(self)
218 '''
219 for attr in self._features:
--> 220 self.single_feature_accepted_cutpoints(feature=attr)
221 return
222
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in single_feature_accepted_cutpoints(self, feature, partition_index)
190 return
191 # determine whether to cut and where
--> 192 cut_candidate = self.best_cut_point(data=data_partition, feature=feature)
193 if cut_candidate == None:
194 return
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in best_cut_point(self, data, feature)
160 :return: value of cut point with highest information gain (if many, picks first). None if no candidates
161 '''
--> 162 candidates = self.boundaries_in_partition(data=data, feature=feature)
163 # candidates = self.feature_boundary_points(data=data, feature=feature)
164 if not candidates:
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in boundaries_in_partition(self, data, feature)
151 '''
152 range_min, range_max = (data[feature].min(), data[feature].max())
--> 153 return set([x for x in self._boundaries[feature] if (x > range_min) and (x < range_max)])
154
155 def best_cut_point(self, data, feature):
~/opt/anaconda3/lib/python3.8/site-packages/imodels/util/discretization/mdlp.py in (.0)
151 '''
152 range_min, range_max = (data[feature].min(), data[feature].max())
--> 153 return set([x for x in self._boundaries[feature] if (x > range_min) and (x < range_max)])
154
155 def best_cut_point(self, data, feature):
TypeError: '>' not supported between instances of 'numpy.ndarray' and 'str'
Hi guys,
I am using FIGSRegressor
in combination with VotingRegressor
and StackingRegressor
but I keep getting the following error whenever I run the fit function.
ValueError: The estimator FIGSRegressor should be a regressor.
Please check this example. Is there a workaround or am I missing something?
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import StackingRegressor,VotingRegressor
from sklearn.linear_model import LinearRegression
from imodels import FIGSRegressor
np.random.seed(123)
# generate X and y
n, p = 500, 10
X_sim = np.random.randn(n, p)
y_sim = 1 * X_sim[:, 0] + 2 * X_sim[:, 1] - 1 * X_sim[:, 2] + np.random.randn(n)
base_models = [('figs', FIGSRegressor()),
('random_forest', RandomForestRegressor())]
comb_model = VotingRegressor(estimators=base_models,
n_jobs=10,
verbose=2)
comb_model=comb_model.fit(X_sim, y_sim)
meta_model = LinearRegression()
stacking_model = StackingRegressor(estimators=base_models,
final_estimator=meta_model,
passthrough=False,
cv=5,
verbose=2)
stacking_model=stacking_model.fit(X_sim, y_sim)
Hi 👋
is there a way to get the feature importance from the RuleFit algorithm through your implementation? 🤔
While it is great that the models in imodels
are readily interpretable, it would be nice to have a MDI feature importance, i.e. Gini importance, for tree-based models like FIGS to compare to other tree-based models like sklearn
's RandomForestClassifier
and GradientBoostingClassifier
.
In imodels_demo.ipynb the Tree #0 returned by printing the fitted model:
Glucose concentration test <= 99.500 (Tree #0 root)
Val: 0.068 (leaf)
Glucose concentration test <= 168.500 (split)
#Pregnant <= 6.500 (split)
Body mass index <= 30.850 (split)
Val: 0.065 (leaf)
Blood pressure(mmHg) <= 67.000 (split)
Val: 0.705 (leaf)
Val: 0.303 (leaf)
Val: 0.639 (leaf)
Blood pressure(mmHg) <= 93.000 (split)
Val: 0.860 (leaf)
Val: -0.009 (leaf)
and plotting:
do not agree!
Based on _tree_to_str_with_data
, which agrees with the simpler _tree_to_str
actually being called here - see below, the first line printed after a split is the left / true branch, while the second line after the split is the right / false branch.
Reading the printed version, after the first "Glucose concentration test <= 99.500" split, there should be a leaf with value 0.068 for <= 99.5, and then the "Glucose concentration test <= 168.500" split for > 99.5, but this is not the structure of the plotted tree. Also, note that both of the "Blood pressure(mmHg)" splits should end in leaves, while in the plot one of them leads to a "#Pregnant" split.
The output of print(figs.print_tree(X_train, y_train))
is shown below for reference:
Glucose concentration test <= 99.500 65/192 (33.85%)
ΔRisk = 0.07 4/59 (6.78%)
Glucose concentration test <= 168.500 61/133 (45.86%)
#Pregnant <= 6.500 44/112 (39.29%)
Body mass index <= 30.850 21/76 (27.63%)
ΔRisk = 0.06 2/31 (6.45%)
Blood pressure(mmHg) <= 67.000 19/45 (42.22%)
ΔRisk = 0.71 10/14 (71.43%)
ΔRisk = 0.30 9/31 (29.03%)
ΔRisk = 0.64 23/36 (63.89%)
Blood pressure(mmHg) <= 93.000 17/21 (80.95%)
ΔRisk = 0.86 17/19 (89.47%)
ΔRisk = -0.01 0/2 (0.0%)
Also see my stripped down notebook demonstrating the issue here.
Hi
I updated my imodels thorugh pip3 install --upgrade imodels to get the new BoostedRuleSetClassifier. When I run the notebook I get this
cannot import name 'BoostedRulesClassifier' from 'imodels' (/Users/user/opt/anaconda3/lib/python3.7/site-packages/imodels/init.py)
pkg_resources.get_distribution('imodels').version
'0.2.5'
Any clue?
Hi!
In the course of exploring the possibilities of rulefit (via models), questions appeared, I would be happy to discuss them/get hints/etc.
First I will describe the dataframe + code, then I will show the results and there will actually be questions about them.
An data.csv has been generated in which the gross part of cases (oil supply processes) lasts 4-6 hours (case_duration ~20.000
), and there are abnormal cases that last 1.5 days (case_duration > 100.000
).
It is necessary to find out - what is the reason for this anomaly?
Here are the conditions that affected the high duration :
(this can be seen even by human viewing of the table)
In addition, there is an eventlog at the input (only those columns that had at least a minimal impact are served - this was calculated separately earlier), there is a breakdown condition, there is an understanding of what you want to get at the output.
Here is a jupiter notebook code (change format to .ipynb
) with a code, here is an (again) data.csv.
As you can see, it does not meet expectations somewhat.
Questions:
The first thing that catches your eye is some 0.5 and equal signs in different directions. Why do they appear at the exit? After one-hot-encoding, the algorithm has only 0 and 1, a pure category. He knows how to do without them, by the way. Example of a rule from a simple eventlog:
(everything is right here, there is nothing to find fault with).
Is there any way to tell the algorithm not to generate these numbers for category columns?
While the conditions are from 5 to 9, the algorithm returns only 3-4 conditions, and with an incorrect answer and large coefficient...
Based on the points above, are there any ideas on how to configure the algorithm so that it returns the correct set of rules?
ConvergenceWarning message: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 0.000e+00, tolerance: 0.000e+00
- there are 2 points:I looked at the inner code of HSTree and for forest models it is simply for looping through the trees and applying the reg_param update. This works, but will only use 1 CPU core. Sklearn when training these forest models uses multithreading/processing to speed things up considerably. Have the authors considered adding parallel support to HSTree for forest models?
Hello~
When I use the RulefitClassifier, it will return two exactly same rules but with different coef, whether the inherent structures didn't aggregate the rules? I have tried to use the Rulefit directly, and it seems that it doesn't have the similar problem~
Hi!
There are methods such as:
SkopeRulesClassifier
BoostedRulesClassifier
BayesianRuleSetClassifier
OptimalRuleListClassifier
BayesianRuleListClassifier
GreedyRuleListClassifier
FIGS Classifier
FIGSRegressor
etc
Which sort of return a list of rules, but which don't have a convenient method."vizualize()" as for RuleFitClassifier, RuleFitRegressor.
How can I get a list of rules received as a Data Frame for them?
I was trying to get a clear output set through running this example:
https://csinva.io/imodels/rule_set/skope_rules.html
and on your own dataset:
Hi @vissarion
I reviewed the manuscript as well as the repository.
This work provides very important direction in interpretable machine learning, and supports such interpretable models without performance loss. Moreover, it is implemented by following scikit-learn API style, so that it can be easily used by any practitioners.
My one issue #43 has already been resolved, and it complies with the conventions of open source project as well as JOSS guideline.
For this reason, I recommend acceptance of this work.
Best,
Jungtaek.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.