Coder Social home page Coder Social logo

jjbrophy47 / tree_influence Goto Github PK

View Code? Open in Web Editor NEW
23.0 4.0 9.0 5.51 MB

Influence Estimation for Gradient-Boosted Decision Trees

License: Apache License 2.0

Python 91.76% Makefile 0.04% Shell 5.38% Cython 2.82%
influence-estimation instance-attribution influential-examples instance-based boostin gradient-boosted-trees tracin influence-functions explainability interpretability

tree_influence's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tree_influence's Issues

XGBoost parameters

When trying to run an explainer (I've tried LeafInfluence and BoostIn) on an XGBoost model, I get an error if the model does not have a reg_alpha=0, tree_method='hist' and scale_pos_weight=1. The errors are all arising from assert statements in parser_xgb. Are these necessary, as it would be good to be able to test models with different hyperparameters?

[bug] training xgboost dosen't work with dataframe, only numpy array

Hello and thanks you for that package.
I came across a problem while trying to use a xgboost model that was trained on dataframe.
So this is my code:

X_train, X_test, y_train, y_test = load_csv('X_train'), load_csv('X_test'), load_csv('y_train'), load_csv('y_test')
model = XGBClassifier(tree_method='hist')
X_train_val, y_train_vals = X_train.values, y_train.values.squeeze()
X_test_val, y_test = X_test.values, y_test.values.squeeze()
model.fit(X_train, y_train)

# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)

Which produce this exception:

Traceback (most recent call last):
  File "/home/jupyter/owlytics-data-science/influence/influence.py", line 35, in <module>
    explainer = BoostIn().fit(model, X_train, y_train)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/boostin.py", line 44, in fit
    super().fit(model, X, y)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/base.py", line 31, in fit
    self.model_ = parse_model(model, X, y)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/__init__.py", line 33, in parse_model
    trees, params = parse_xgb_ensemble(model)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in parse_xgb_ensemble
    trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in <listcomp>
    trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 88, in _parse_xgb_tree
    node_dict = _parse_line(line)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 190, in _parse_line
    res['feature'], res['threshold'] = _parse_decision_node_line(line)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 201, in _parse_decision_node_line
    feature_ndx = int(feature_str[1:])
ValueError: invalid literal for int() with base 10: 'ecent_beta_blockers_change'

However, When training X_train_val, y_train_val (which is a numpy array) works perfectly good.
It would be great if you could support training with DataFrame as well.
Thanks again!

Error when fitting the estimator

Hello,

I was using your implementation of BoostIn to fit my own data, but I came across an error, so I thought it might be due to some inherent inconsistency with my features. However, when fitting it to the iris data provided by the sklearn package (as cited in your example document in the repository), I came across this very same error:

180 # compute leaf derivative w.r.t. each train example in leaf_docs
181 numerator = g[leaf_docs, class_idx] + leaf_vals[leaf_idx] * h[leaf_docs, class_idx] # (no. docs,)
--> 182 denominator = np.sum(h[leaf_docs, class_idx]) + l2_leaf_reg
183 leaf_dvs[leaf_docs, boost_idx, class_idx] = numerator / denominator * lr # (no. docs,)
185 # update approximation

TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'

Could you please give me some guidance as to what can be going wrong? For context, I am using an XGBoost model here, and I must provide scale_pos_weight=1 in order to avoid having an assertion error. It would be nice if this could be modified as well. Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.