jjbrophy47 / tree_influence Goto Github PK

Influence Estimation for Gradient-Boosted Decision Trees

License: Apache License 2.0

Python 91.76% Makefile 0.04% Shell 5.38% Cython 2.82%

influence-estimation instance-attribution influential-examples instance-based boostin gradient-boosted-trees tracin influence-functions explainability interpretability

tree_influence's People

Stargazers

Watchers

Forkers

bratian jxzhangjhu zaydh sunbing7 saptak-narula darby23 ykwon0407 persistence2021 dearborn-open-ai

tree_influence's Issues

XGBoost parameters

When trying to run an explainer (I've tried LeafInfluence and BoostIn) on an XGBoost model, I get an error if the model does not have a reg_alpha=0, tree_method='hist' and scale_pos_weight=1. The errors are all arising from assert statements in parser_xgb. Are these necessary, as it would be good to be able to test models with different hyperparameters?

ModuleNotFoundError: No module named 'tree_influence.explainers.parsers._tree32'

i tried running this -- from tree_influence.explainers import BoostIn

getting said error. I don't know what could be causing this error ? Please help.

The only difference that I see is that - _tree32 is a .c file compared to other .py files. But for this I already have installed cython.

[bug] training xgboost dosen't work with dataframe, only numpy array

Hello and thanks you for that package.
I came across a problem while trying to use a xgboost model that was trained on dataframe.
So this is my code:

X_train, X_test, y_train, y_test = load_csv('X_train'), load_csv('X_test'), load_csv('y_train'), load_csv('y_test')
model = XGBClassifier(tree_method='hist')
X_train_val, y_train_vals = X_train.values, y_train.values.squeeze()
X_test_val, y_test = X_test.values, y_test.values.squeeze()
model.fit(X_train, y_train)

# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)

Which produce this exception:

Traceback (most recent call last):
  File "/home/jupyter/owlytics-data-science/influence/influence.py", line 35, in <module>
    explainer = BoostIn().fit(model, X_train, y_train)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/boostin.py", line 44, in fit
    super().fit(model, X, y)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/base.py", line 31, in fit
    self.model_ = parse_model(model, X, y)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/__init__.py", line 33, in parse_model
    trees, params = parse_xgb_ensemble(model)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in parse_xgb_ensemble
    trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in <listcomp>
    trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 88, in _parse_xgb_tree
    node_dict = _parse_line(line)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 190, in _parse_line
    res['feature'], res['threshold'] = _parse_decision_node_line(line)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 201, in _parse_decision_node_line
    feature_ndx = int(feature_str[1:])
ValueError: invalid literal for int() with base 10: 'ecent_beta_blockers_change'

However, When training X_train_val, y_train_val (which is a numpy array) works perfectly good.
It would be great if you could support training with DataFrame as well.
Thanks again!

Error when fitting the estimator

Hello,

I was using your implementation of BoostIn to fit my own data, but I came across an error, so I thought it might be due to some inherent inconsistency with my features. However, when fitting it to the iris data provided by the sklearn package (as cited in your example document in the repository), I came across this very same error:

180 # compute leaf derivative w.r.t. each train example in leaf_docs
181 numerator = g[leaf_docs, class_idx] + leaf_vals[leaf_idx] * h[leaf_docs, class_idx] # (no. docs,)
--> 182 denominator = np.sum(h[leaf_docs, class_idx]) + l2_leaf_reg
183 leaf_dvs[leaf_docs, boost_idx, class_idx] = numerator / denominator * lr # (no. docs,)
185 # update approximation

TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

Could you please give me some guidance as to what can be going wrong? For context, I am using an XGBoost model here, and I must provide scale_pos_weight=1 in order to avoid having an assertion error. It would be nice if this could be modified as well. Thank you!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.