trusted-ai / aix360 Goto Github PK
View Code? Open in Web Editor NEWInterpretability and explainability of data and machine learning models
Home Page: https://aix360.res.ibm.com/
License: Apache License 2.0
Interpretability and explainability of data and machine learning models
Home Page: https://aix360.res.ibm.com/
License: Apache License 2.0
From the contrastive algorithm, I only can find the constructors for the base and Keras model. Will there be any support for TensorFlow or PyTorch Model? Thank you.
Certain models, such as Keras sequential models, have predict methods that behave like predict_proba i.e. return predicted class probabilities rather than predicted class like predict method of scikit classifiers. Current metrics implementation use predict method to get predicted class which throws error for Keras models.
In the HELOC example you metioned that Protodash can be applied to time series, but how would one go about doing that, do you perhaps have an example or can you offer some advice?
Thanks for this. I'm looking forward to checking it out.
Is there a conda-forge release? If not, is there any plan for one?
Hi there,
I've actually copied the code (did no modification at all) from the BRCG part of the "Credit Approval Tutorial" code and ran into errors. I'm quite sure that the dataset was loaded appropriately, as I have also trained a scikit learn Decision Tree Classifier on it with no problem and in the same notebook.
Can someone help me with this issue? Am I missing something or is it an internal problem?
Thanks in advance!
Here is the code and the output.
It was run on google colab, with pandas 1.1.2 and the latest aix360 release, which is 0.2.0.
import warnings
warnings.filterwarnings('ignore')
# Load FICO HELOC data with special values converted to np.nan
from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing
data = HELOCDataset(custom_preprocessing=nan_preprocessing).data()
# Separate target variable
y = data.pop('RiskPerformance')
# Split data into training and test sets using fixed random seed
from sklearn.model_selection import train_test_split
dfTrain, dfTest, yTrain, yTest = train_test_split(data, y, random_state=0, stratify=y)
dfTrain.head().transpose()
# Binarize data and also return standardized ordinal features
from aix360.algorithms.rbm import FeatureBinarizer
fb = FeatureBinarizer(negations=True, returnOrd=True)
dfTrain, dfTrainStd = fb.fit_transform(dfTrain)
dfTest, dfTestStd = fb.transform(dfTest)
dfTrain['ExternalRiskEstimate'].head()
# Instantiate BRCG with small complexity penalty and large beam search width
from aix360.algorithms.rbm import BooleanRuleCG
br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3, CNF=True)
# Train, print, and evaluate model
br.fit(dfTrain, yTrain)
from sklearn.metrics import accuracy_score
print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
print('Test accuracy:', accuracy_score(yTest, br.predict(dfTest)))
print('Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:')
print(br.explain()['rules'])
Learning CNF rule with complexity parameters lambda0=0.001, lambda1=0.001
Initial LP solved
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
1001 try:
-> 1002 self._set_with_engine(key, value)
1003 except (KeyError, ValueError):
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _set_with_engine(self, key, value)
1032 # fails with AttributeError for IntervalIndex
-> 1033 loc = self.index._engine.get_loc(key)
1034 validate_numeric_casting(self.dtype, value)
pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()
KeyError: 'ExternalRiskEstimate'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-98-8d81fbd6c0e1> in <module>()
26
27 # Train, print, and evaluate model
---> 28 br.fit(dfTrain, yTrain)
29 from sklearn.metrics import accuracy_score
30 print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
/usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/boolean_rule_cg.py in fit(self, X, y)
118 UB = min(UB.min(), 0)
119 v, zNew, Anew = beam_search(r, X, self.lambda0, self.lambda1,
--> 120 K=self.K, UB=UB, D=self.D, B=self.B, eps=self.eps)
121
122 while (v < -self.eps).any() and (self.it < self.iterMax):
/usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/beam_search.py in beam_search(r, X, lambda0, lambda1, K, UB, D, B, wLB, eps, stopEarly)
285 if i[1] == '<=':
286 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
--> 287 colKeep[i[0]] = (Xp[i[0]].columns.get_level_values(0) == '>') & (thresh < i[2])
288 elif i[1] == '>':
289 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
1008 else:
1009 # GH#12862 adding an new key to the Series
-> 1010 self.loc[key] = value
1011
1012 except TypeError as e:
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __setitem__(self, key, value)
668
669 iloc = self if self.name == "iloc" else self.obj.iloc
--> 670 iloc._setitem_with_indexer(indexer, value)
671
672 def _validate_key(self, key, axis: int):
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
1790 # setting for extensionarrays that store dicts. Need to decide
1791 # if it's worth supporting that.
-> 1792 value = self._align_series(indexer, Series(value))
1793
1794 elif isinstance(value, ABCDataFrame):
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
1909 # series, so need to broadcast (see GH5206)
1910 if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer):
-> 1911 ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values
1912
1913 # single indexer
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in reindex(self, index, **kwargs)
4397 )
4398 def reindex(self, index=None, **kwargs):
-> 4399 return super().reindex(index=index, **kwargs)
4400
4401 def drop(
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
4457 # perform the reindex on the axes
4458 return self._reindex_axes(
-> 4459 axes, level, limit, tolerance, method, fill_value, copy
4460 ).__finalize__(self, method="reindex")
4461
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
4480 fill_value=fill_value,
4481 copy=copy,
-> 4482 allow_dups=False,
4483 )
4484
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
4525 fill_value=fill_value,
4526 allow_dups=allow_dups,
-> 4527 copy=copy,
4528 )
4529 # If we've made a copy once, no need to make another one
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate)
1274 # some axes don't allow reindexing with dups
1275 if not allow_dups:
-> 1276 self.axes[axis]._can_reindex(indexer)
1277
1278 if axis >= self.ndim:
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
3283 # trying to reindex on an axis with duplicates
3284 if not self.is_unique and len(indexer):
-> 3285 raise ValueError("cannot reindex from a duplicate axis")
3286
3287 def reindex(self, target, method=None, level=None, limit=None, tolerance=None):
ValueError: cannot reindex from a duplicate axis
When executing the second cell in the Health and Lifestyle Survey Questions Tutorial, on this line:
nhanes = CDCDataset()
I get this error:
Downloading file ACQ_H.XPT
Downloading file ALQ_H.XPT
Downloading file BPQ_H.XPT
Downloading file CDQ_H.XPT
Downloading file CFQ_H.XPT
Downloading file CBQ_H.XPT
Downloading file CKQ_H.XPT
Downloading file HSQ_H.XPT
Downloading file DEQ_H.XPT
Downloading file DIQ_H.XPT
Downloading file DBQ_H.XPT
Downloading file DLQ_H.XPT
Downloading file DUQ_H.XPT
Downloading file ECQ_H.XPT
Downloading file FSQ_H.XPT
Downloading file HIQ_H.XPT
Downloading file HEQ_H.XPT
Downloading file HUQ_H.XPT
Downloading file HOQ_H.XPT
Downloading file IMQ_H.XPT
Downloading file INQ_H.XPT
Downloading file KIQ_U_H.XPT
Downloading file MCQ_H.XPT
Downloading file DPQ_H.XPT
Downloading file OCQ_H.XPT
Downloading file OHQ_H.XPT
Downloading file OSQ_H.XPT
Downloading file PAQ_H.XPT
Downloading file PFQ_H.XPT
Downloading file RXQASA_H.XPT
Downloading file RHQ_H.XPT
Downloading file SXQ_H.XPT
Downloading file SLQ_H.XPT
Downloading file SMQFAM_H.XPT
Downloading file SMQRTU_H.XPT
Downloading file SMQSHS_H.XPT
Downloading file CSQ_H.XPT
Downloading file VTQ_H.XPT
Downloading file WHQ_H.XPT
Downloading file WHQMEC_H.XPT
converting Acculturation : /opt/conda/lib/python3.7/site-packages/aix360/datasets/../data/cdc_data/ACQ_H.XPT to /opt/conda/lib/python3.7/site-packages/aix360/datasets/../data/cdc_data/csv/ACQ_H.csv
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-2-c372c6c55e63> in <module>
----> 1 nhanes = CDCDataset()
2 nhanes_files = nhanes.get_csv_file_names()
3 (nhanesinfo, _, _) = nhanes._cdc_files_info()
/opt/conda/lib/python3.7/site-packages/aix360/datasets/cdc_dataset.py in __init__(self, custom_preprocessing, dirpath)
49 sys.exit(1)
50
---> 51 self._convert_xpt_to_csv()
52 #if custom_preprocessing:
53 # self._data = custom_preprocessing(df)
/opt/conda/lib/python3.7/site-packages/aix360/datasets/cdc_dataset.py in _convert_xpt_to_csv(self)
133 with open(xptfile, 'rb') as in_xpt:
134 with open(csvfile, 'w',newline='') as out_csv:
--> 135 reader = xport.Reader(in_xpt)
136 writer = csv.writer(out_csv, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
137 writer.writerow(reader.fields)
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in __init__(self, fp)
768
769 def __init__(self, fp):
--> 770 self.dataset = to_dataframe(fp)
771
772 def __iter__(self):
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in to_dataframe(fp)
749 from xport.v56 import load
750 warnings.warn('Please use ``xport.v56.load`` in the future', DeprecationWarning)
--> 751 library = load(fp)
752 dataset = next(iter(library.values()))
753 return dataset
/opt/conda/lib/python3.7/site-packages/xport/v56.py in load(fp)
898 except UnicodeDecodeError:
899 raise TypeError(f'Expected a BufferedReader in bytes-mode, got {type(fp).__name__}')
--> 900 return loads(bytestring)
901
902
/opt/conda/lib/python3.7/site-packages/xport/v56.py in loads(bytestring)
909 >>> library = loads(bytestring)
910 """
--> 911 return Library.from_bytes(bytestring)
912
913
/opt/conda/lib/python3.7/site-packages/xport/v56.py in from_bytes(cls, bytestring, member_header_re)
703 modified=strptime(mo['modified']),
704 sas_os=mo['os'].strip(b'\x00').decode('ISO-8859-1').strip(),
--> 705 sas_version=mo['version'].strip(b'\x00').decode('ISO-8859-1').strip(),
706 )
707 LOG.info(f'Decoded {self}')
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in __init__(self, members, created, modified, sas_os, sas_version)
587 self[name] = dataset # Use __setitem__ to validate metadata.
588 else:
--> 589 for dataset in members:
590 if dataset.name in self:
591 warnings.warn(f'More than one dataset named {dataset.name!r}')
/opt/conda/lib/python3.7/site-packages/xport/v56.py in from_bytes(cls, bytestring, pattern)
605 head = cls.from_header(header)
606 data = Member(pd.DataFrame.from_records(observations, columns=list(header)))
--> 607 data.copy_metadata(head)
608 LOG.info(f'Decoded XPORT dataset {data.name!r}')
609 LOG.debug('%s', data)
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in copy_metadata(self, other)
410 object.__setattr__(self, name, getattr(other, name, None))
411 if isinstance(other, (Dataset, Mapping)):
--> 412 for k, v in self.items():
413 try:
414 v.copy_metadata(other[k])
~/.local/lib/python3.7/site-packages/pandas/core/frame.py in items(self)
1015 if self.columns.is_unique and hasattr(self, "_item_cache"):
1016 for k in self.columns:
-> 1017 yield k, self._get_item_cache(k)
1018 else:
1019 for i, k in enumerate(self.columns):
~/.local/lib/python3.7/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
3792 loc = self.columns.get_loc(item)
3793 values = self._mgr.iget(loc)
-> 3794 res = self._box_col_values(values, loc).__finalize__(self)
3795
3796 cache[item] = res
~/.local/lib/python3.7/site-packages/pandas/core/frame.py in _box_col_values(self, values, loc)
3312 name = self.columns[loc]
3313 klass = self._constructor_sliced
-> 3314 return klass(values, index=self.index, name=name, fastpath=True)
3315
3316 # ----------------------------------------------------------------------
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in __init__(self, data, index, dtype, name, copy, fastpath, label, vtype, width, format, informat, **kwds)
308 for name, value in metadata.items():
309 setattr(self, name, getattr(self, name, value))
--> 310 LOG.debug(f'Initialized {self}')
311
312 def __finalize__(self, other, method=None, **kwds):
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in __repr__(self)
274 metadata = {name: getattr(self, name) for name in metadata}
275 metadata = (f'{name}: {value}' for name, value in metadata.items() if value is not None)
--> 276 return f'{type(self).__name__}\n{super().__repr__()}\n{", ".join(metadata)}'
277
278 def __init__(
~/.local/lib/python3.7/site-packages/pandas/core/series.py in __repr__(self)
1305 min_rows=min_rows,
1306 max_rows=max_rows,
-> 1307 length=show_dimensions,
1308 )
1309 result = buf.getvalue()
~/.local/lib/python3.7/site-packages/pandas/core/series.py in to_string(self, buf, na_rep, float_format, header, index, length, dtype, name, max_rows, min_rows)
1368 float_format=float_format,
1369 min_rows=min_rows,
-> 1370 max_rows=max_rows,
1371 )
1372 result = formatter.to_string()
~/.local/lib/python3.7/site-packages/pandas/io/formats/format.py in __init__(self, series, buf, length, header, index, na_rep, name, float_format, dtype, max_rows, min_rows)
270 self.adj = get_adjustment()
271
--> 272 self._chk_truncate()
273
274 def _chk_truncate(self) -> None:
~/.local/lib/python3.7/site-packages/pandas/io/formats/format.py in _chk_truncate(self)
292 else:
293 row_num = max_rows // 2
--> 294 series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
295 self.tr_row_num = row_num
296 else:
~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
293 verify_integrity=verify_integrity,
294 copy=copy,
--> 295 sort=sort,
296 )
297
~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
404 # Standardize axis parameter to int
405 if isinstance(sample, ABCSeries):
--> 406 axis = sample._constructor_expanddim._get_axis_number(axis)
407 else:
408 axis = sample._get_axis_number(axis)
/opt/conda/lib/python3.7/site-packages/xport/__init__.py in _constructor_expanddim(self)
338 For example, transforming a series into a dataframe.
339 """
--> 340 raise NotImplementedError("Can't copy SAS variable metadata to dataframe")
341
342 @property
NotImplementedError: Can't copy SAS variable metadata to dataframe
Here are the package versions I have installed:
absl-py==0.11.0
aix360 @ file:///home/jovyan/AIX360-master
alembic==1.4.2
analytics-python==1.2.9
asgiref==3.3.1
astor==0.8.1
async-generator==1.10
attrs==20.3.0
backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work
bamboolib==1.22.2
beautifulsoup4 @ file:///home/conda/feedstock_root/build_artifacts/beautifulsoup4_1589761456552/work
bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1588608214987/work
blinker==1.4
bokeh @ file:///home/conda/feedstock_root/build_artifacts/bokeh_1592227515025/work
Bottleneck==1.3.2
Brotli==1.0.9
brotlipy==0.7.0
certifi==2020.4.5.2
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
click==7.1.2
cloudpickle @ file:///home/conda/feedstock_root/build_artifacts/cloudpickle_1588164361239/work
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.9.2
cvxopt==1.2.6
cvxpy==1.1.11
cycler==0.10.0
Cython @ file:///home/conda/feedstock_root/build_artifacts/cython_1591799499719/work
cytoolz==0.10.1
dash==1.19.0
dash-core-components==1.15.0
dash-cytoscape==0.2.0
dash-html-components==1.1.2
dash-renderer==1.9.0
dash-table==4.11.2
dask==2.15.0
decorator==4.4.2
defusedxml==0.6.0
dill @ file:///home/conda/feedstock_root/build_artifacts/dill_1592315758554/work
distributed @ file:///home/conda/feedstock_root/build_artifacts/distributed_1591409248443/work
Django==3.1.7
docutils==0.16
ecos==2.0.7.post1
entrypoints==0.3
fastcache==1.1.0
Flask==1.1.2
Flask-Compress==1.9.0
fsspec @ file:///home/conda/feedstock_root/build_artifacts/fsspec_1589989738418/work
future==0.18.2
gast==0.4.0
geographiclib==1.50
geopy==2.1.0
gevent==21.1.2
gmpy2==2.1.0b1
google-pasta==0.2.0
graphviz==0.16
greenlet==1.0.0
grpcio==1.36.1
h5py==2.10.0
HeapDict==1.0.1
idna==2.9
image==1.5.33
imageio==2.8.0
importlib-metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1591451751445/work
interpret==0.2.4
interpret-core==0.2.4
ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1590020200501/work/dist/ipykernel-5.3.0-py3-none-any.whl
ipympl==0.5.6
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1590796899444/work
ipython-genutils==0.2.0
ipywidgets==7.5.1
itsdangerous==1.1.0
jedi==0.17.0
Jinja2==2.11.2
joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1589812474002/work
json5 @ file:///home/conda/feedstock_root/build_artifacts/json5_1591810480056/work
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.1.3
jupyterlab-server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1590229434073/work
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.2.0
lime==0.1.1.37
llvmlite==0.31.0
locket==0.2.0
Mako==1.1.0
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.2.1
mistune==0.8.4
mock @ file:///home/conda/feedstock_root/build_artifacts/mock_1588618847833/work
mpmath==1.1.0
msgpack==1.0.0
nbconvert==5.6.1
nbformat==5.0.6
networkx==2.4
notebook @ file:///home/conda/feedstock_root/build_artifacts/notebook_1588887226267/work
numba==0.48.0
numexpr==2.7.1
numpy @ file:///home/conda/feedstock_root/build_artifacts/numpy_1591485215893/work
oauthlib==3.0.1
olefile==0.46
osqp==0.6.2.post0
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1589925210001/work
pamela==1.0.0
pandas==1.2.3
pandocfilters==1.4.2
parso==0.7.0
partd==1.1.0
patsy==0.5.1
PDPbox==0.2.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
plotly==4.14.3
ppscore==1.2.0
progressbar==2.5
prometheus-client @ file:///home/conda/feedstock_root/build_artifacts/prometheus_client_1590412252446/work
prompt-toolkit==3.0.5
protobuf==3.11.4
psutil==5.7.0
ptyprocess==0.6.0
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyparsing==2.4.7
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-json-logger==0.1.11
pytz==2020.1
PyWavelets==1.1.1
PyYAML==5.3.1
pyzmq==19.0.1
qdldl==0.1.5.post0
qgrid==1.3.1
qpsolvers==1.5
quadprog==0.1.8
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1592425495151/work
retrying==1.3.3
rpy2==3.1.0
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
SALib==1.3.12
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scipy==1.4.1
scs==2.1.2
seaborn @ file:///home/conda/feedstock_root/build_artifacts/seaborn-base_1591878760859/work
Send2Trash==1.5.0
shap==0.34.0
simplegeneric==0.8.1
six @ file:///home/conda/feedstock_root/build_artifacts/six_1590081179328/work
skope-rules==1.0.1
sortedcontainers @ file:///home/conda/feedstock_root/build_artifacts/sortedcontainers_1591999956871/work
soupsieve @ file:///home/conda/feedstock_root/build_artifacts/soupsieve_1589778966114/work
SQLAlchemy @ file:///home/conda/feedstock_root/build_artifacts/sqlalchemy_1589421717839/work
sqlparse==0.4.1
statsmodels @ file:///home/conda/feedstock_root/build_artifacts/statsmodels_1591963256838/work
sympy==1.5.1
tables==3.6.1
tblib==1.6.0
tensorboard==1.14.0
tensorflow==1.14.0
tensorflow-estimator==1.14.0
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.2
toolz==0.10.0
torch==1.8.0
torchvision==0.9.0
tornado==6.0.4
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1591181521996/work
traitlets==4.3.3
treeinterpreter==0.2.3
tslearn==0.5.0.5
typing-extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1588470653596/work
tzlocal @ file:///home/conda/feedstock_root/build_artifacts/tzlocal_1588939190034/work
urllib3==1.25.9
vincent==0.4.4
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1591600393557/work
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
xgboost==1.0.2
xlrd==1.2.0
xport==3.2.1
zict==2.0.0
zipp==3.1.0
zope.event==4.5.0
zope.interface==5.2.0
Our environments expand 5x in size because all these dependencies that we don't use are installed. It makes our CI builds take longer.
Offering something like extras_requires{'tensorflow': ['tensorflow>=1.14'], ...}
should greatly help.
The docutils package isn't even necessary.
iso
Inspecting the KerasClassifier class for CEM, I could see that it is made specifically for single label classification:
predicted_class = np.argmax(prob)
It may be important to prepare this class to handle Multi Label Multi Class classification, maybe allowing the programmer to select which of the N target classes to take into account when explaining with CEMExplainer.
I'm trying to run BRCG on some self-generated binary data (note: the generated data is such that there exists a DNF rule that perfectly matches X with y). The first dataset is very small, and looks as follows:
X=
0 1 2 3 4
0 0 1 1 0 0
1 1 0 1 0 0
2 1 1 1 0 0
3 1 1 0 0 1
4 0 1 0 1 0
5 0 1 1 1 0
6 1 0 0 0 1
7 1 1 1 1 1
8 1 0 1 0 1
9 1 0 1 1 0
y=
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
where X is a pandas dataframe and y is a pandas series. I then use the following:
br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3)
br.fit(Xdf, ydf)
For some datasets this works fine, but for other datasets it gives the following error:
Initial LP solved
Traceback (most recent call last):
File "dash.py", line 37, in
br.fit(Xdf, ydf)
File "/home/marleen/miniconda3/envs/aix360_env/lib/python3.7/site-packages/aix360/algorithms/rbm/boolean_rule_cg.py", line 120, in fit
K=self.K, UB=UB, D=self.D, B=self.B, eps=self.eps)
File "/home/marleen/miniconda3/envs/aix360_env/lib/python3.7/site-packages/aix360/algorithms/rbm/beam_search.py", line 284, in beam_search
colKeep = pd.Series(Xp.columns.get_level_values(0) != i[0], index=Xp.columns)
IndexError: invalid index to scalar variable.
Any idea what is going wrong?
I am trying to install aix360 in VM google cloud and also try to install it on anaconda enviornment (on windows) and both times it gives me the same error.
I am following the same steps which I found in the documentation, but still facing the same error. If someone installed this toolkit already can you please help me setting up the enviornment for this toolkit.
Below is the error log:
ERROR: Command errored out with exit status 1:
command: /home/mansoor_working/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'
/tmp/pip-install-zt5ypyrs/cvxpy/setup.py'"'"'; file='"'"'/tmp/pip-install-zt5ypyrs/cvxpy/setup.py'"'"';f=getatt
r(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(comp
ile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-uy3pz4nb --python-tag cp37
cwd: /tmp/pip-install-zt5ypyrs/cvxpy/
Complete output (369 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/cvxpy
copying cvxpy/init.py -> build/lib.linux-x86_64-3.7/cvxpy
copying cvxpy/error.py -> build/lib.linux-x86_64-3.7/cvxpy
copying cvxpy/settings.py -> build/lib.linux-x86_64-3.7/cvxpy
creating build/lib.linux-x86_64-3.7/cvxpy/problems
copying cvxpy/problems/init.py -> build/lib.linux-x86_64-3.7/cvxpy/problems
copying cvxpy/problems/objective.py -> build/lib.linux-x86_64-3.7/cvxpy/problems
copying cvxpy/problems/xpress_problem.py -> build/lib.linux-x86_64-3.7/cvxpy/problems
copying cvxpy/problems/problem.py -> build/lib.linux-x86_64-3.7/cvxpy/problems
copying cvxpy/problems/iterative.py -> build/lib.linux-x86_64-3.7/cvxpy/problems
creating build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_objectives.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_expressions.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_curvature.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_quadratic.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_dgp.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_solvers.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/init.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_monotonicity.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_quad_form.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_benchmarks.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_super_scs.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/base_test.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_mip_vars.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_qp.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_matrices.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_problem.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_constraints.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_atoms.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
copying cvxpy/tests/test_scs.py -> build/lib.linux-x86_64-3.7/cvxpy/tests
There is a note on the explain_instance function that states:
Note that this assumes that the classifier was trained with inputs normalized in [-0.5,0.5] range.
Is there a reason this is required and we can not use a different normalization range?
If I change the name:
ValueError Traceback (most recent call last)
in
2 from aix360.algorithms.rbm import FeatureBinarizer
3 fb = FeatureBinarizer(negations=True, returnOrd=True)
----> 4 dfTrain, dfTrainStd = fb.fit_transform(dfTrain)
5 dfTest, dfTestStd = fb.transform(dfTest)
6 dfTrain['MostRecentBillAmountRaw'].head()
~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
697 if y is None:
698 # fit method of arity 1 (unsupervised transformation)
--> 699 return self.fit(X, **fit_params).transform(X)
700 else:
701 # fit method of arity 2 (supervised transformation)
~/PycharmProjects/AIX360/aix360/algorithms/rbm/features.py in fit(self, X)
111 self.ordinal = ordinal
112 # Fit StandardScaler to ordinal features
--> 113 self.scaler = StandardScaler().fit(data[ordinal])
114 return self
115
~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/preprocessing/_data.py in fit(self, X, y, sample_weight)
728 # Reset internal state before fitting
729 self._reset()
--> 730 return self.partial_fit(X, y, sample_weight)
731
732 def partial_fit(self, X, y=None, sample_weight=None):
~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/preprocessing/_data.py in partial_fit(self, X, y, sample_weight)
766 X = self._validate_data(X, accept_sparse=('csr', 'csc'),
767 estimator=self, dtype=FLOAT_DTYPES,
--> 768 force_all_finite='allow-nan', reset=first_call)
769 n_features = X.shape[1]
770
~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
419 out = X
420 elif isinstance(y, str) and y == 'no_validation':
--> 421 X = check_array(X, **check_params)
422 out = X
423 else:
~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/opt/anaconda3/envs/aix360/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
538
539 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
--> 540 dtype_orig = np.result_type(*dtypes_orig)
541
542 if dtype_numeric:
<array_function internals> in result_type(*args, **kwargs)
ValueError: at least one array or dtype is required
ERROR: xai 0.0.5 has requirement matplotlib==3.0.2, but you'll have matplotlib 3.1.0 which is incompatible.
ERROR: xai 0.0.5 has requirement numpy==1.15.4, but you'll have numpy 1.16.4 which is incompatible.
ERROR: xai 0.0.5 has requirement pandas==0.23.4, but you'll have pandas 0.24.2 which is incompatible.
ERROR: xai 0.0.5 has requirement scikit-learn==0.20.1, but you'll have scikit-learn 0.23.1 which is incompatible.
Is there a plan to support tf2?
I'm looking to use the contrastive explainer on tabular data which the docs state is supported.
What is the recommended mechanism to deal with categorical features for this explainer?
I've one-hot encoded and then normalized like so:
c_transformer = Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore')),
('functr', FunctionTransformer(lambda x: x.toarray(), accept_sparse=True)),
('scalar', MinMaxScaler(feature_range=(-0.5, 0.5)))])
The resulting pertinent negatives and positives adjust all values of a category. As an example, here is the delta_pn
(which I understand to be the difference needed change the classification) for the sex
feature which is binary in this dataset.
sex_Female 0.500000
sex_Male -0.500000
The change impacts both categories. Its unclear how to do the inverse transform for these cases when using one-hot encoding.
I am using CEMExplainer to generate explanations for my dataset of the following shape: (91,3)
However, I face the following error:
IndexError Traceback (most recent call last)
in
10
11 (adv_pn, delta_pn, info_pn) = explainer.explain_instance(train_dataset, arg_mode, ae_model, arg_kappa, arg_b,
---> 12 arg_max_iter, arg_init_const, arg_beta, arg_gamma)
~/AIX360/aix360/algorithms/contrastive/CEM.py in explain_instance(self, input_X, arg_mode, AE_model, arg_kappa, arg_b, arg_max_iter, arg_init_const, arg_beta, arg_gamma)
75 target_label = orig_class
76
---> 77 target = np.array([np.eye(self._wbmodel._nb_classes)[target_label]])
78
79 # Hard coding batch_size=1
IndexError: index 80 is out of bounds for axis 0 with size 1
The code that I use to find pertinent negative/postitive:
mymodel = KerasClassifier(pred_model)
explainer = CEMExplainer(mymodel)
arg_mode = "PN" # Find pertinent negative
arg_max_iter = 1000 # Maximum number of iterations to search for the optimal PN for given parameter settings
arg_init_const = 10.0 # Initial coefficient value for main loss term that encourages class change
arg_b = 9 # No. of updates to the coefficient of the main loss term
arg_kappa = 10 # Minimum confidence gap between the PNs (changed) class probability and original class' probability
arg_beta = 1e-1 # Controls sparsity of the solution (L1 loss)
arg_gamma = 100 # Controls how much to adhere to a (optionally trained) autoencoder
(adv_pn, delta_pn, info_pn) = explainer.explain_instance(train_dataset, arg_mode, ae_model, arg_kappa, arg_b,
arg_max_iter, arg_init_const, arg_beta, arg_gamma)
The solution to this problem is a simple one-liner. I will submit a pull request along with this issue. I'm just documenting the error so other people can find it.
FeatureBinarizer prints "Skipping column 'X': data type cannot be handled" for various integer and float subtypes that should be handled. The following code reproduces the problem.
import numpy as np
import pandas as pd
from aix360.algorithms.rbm import FeatureBinarizer
dtypes = np.dtype([
('int32', np.int32),
('int64', np.int64),
('float32', np.float32),
('float64', np.float64),
])
X = pd.DataFrame(np.array(np.arange(100)).astype(dtypes))
print(X.dtypes)
# Both fit and transform do not handle int64 and float32 though there should be no problem...
fb = FeatureBinarizer()
fb.fit(X);
fb.transform(X);
# Skipping column 'int64': data type cannot be handled
# Skipping column 'float32': data type cannot be handled
# Skipping column 'int64': data type cannot be handled
# Skipping column 'float32': data type cannot be handled
The problem is that the FeatureBinarizer
class tests the type as follows:
for c in X.columns:
if np.issubdtype(X[c].dtype, np.dtype(int).type) | np.issubdtype(X[c].dtype, np.dtype(float).type):
pass
else:
print(("Skipping column '" + str(c) + "': data type cannot be handled"))
# Skipping column 'int64': data type cannot be handled
# Skipping column 'float32': data type cannot be handled
This can be resolved by using the generic numpy types integer
and floating
in the test.
for c in X.columns:
if np.issubdtype(X[c].dtype, np.integer) | np.issubdtype(X[c].dtype, np.floating):
pass
else:
print(("Skipping column '" + str(c) + "': data type cannot be handled"))
lime-ml.readthedocs says that explain_instance for tabular lime expects a 1D-array as input, but when running the code with 1D-array the following error message occurs:
ValueError: Expected 2D array, got 1D array instead:
[1, 2, 3, 4, 5]
Does the function expect 2D or 1D- arrays?
Edit:
Should probably mention that I get the same error message when passing 2D arrays (for example [[1,2,3,4,5]]). The problem probably lies elsewhere, but the error message is not very helpful.
Is the benefit of protodash that the data does not have to be normalised? And the other one that you get weights. I am just trying to understand why one should use protodash and not a nearest neighbour algorithm. Thanks in advance.
Hi!
I have also problems with downloading the FICO HELOC data set. I fill in the requested information and click the Send button nothing happens....
Anyone who may please help me downloading the data set?
Kind regards,
Kjersti
I'm trying to run BRCG on some self-generated binary data (note: the generated data is such that there exists a DNF rule that perfectly matches X with y). The first dataset is very small, and looks as follows:
X=
0 1 2 3 4
0 0 1 1 0 0
1 1 0 1 0 0
2 1 1 1 0 0
3 1 1 0 0 1
4 0 1 0 1 0
5 0 1 1 1 0
6 1 0 0 0 1
7 1 1 1 1 1
8 1 0 1 0 1
9 1 0 1 1 0
y=
0
0 0
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
I then use the following:
br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3)
br.fit(Xdf, ydf)
The error I get is:
Initial LP solved
Traceback (most recent call last):
File "test.py", line 40, in
br.fit(Xdf, ydf)
File "/home/marleen/miniconda3/envs/aix360_env/lib/python3.7/site-packages/aix360/algorithms/rbm/boolean_rule_cg.py", line 113, in fit
r[P] = -constraints[0].dual_value
ValueError: shape mismatch: value array of shape (7,) could not be broadcast to indexing result of shape (7,1)
Do you know how to resolve this? Thanks!
Hi!
I'm encountering an error with a simple use case of ProtoDash to get prototypes of a given dataset.
Here's an example that triggers the error:
import pandas as pd
from sklearn import datasets
from aix360.algorithms.protodash import PDASH
# Load Iris
X, y = datasets.load_iris(True)
df = pd.DataFrame(X, columns=range(X.shape[1]))
df['y'] = y
tmp = df[df['y'] == 0].drop('y', axis=1).values
X_1 = PDASH.HeuristicSetSelection(X=tmp, Y=tmp, m=10, kernelType='gaussian', sigma=2)
# This generates an error:
# ---------------------------------------------------------------------------
# ValueError Traceback (most recent call last)
# <ipython-input-48-e631ba33f62a> in <module>
# 1 tmp = df[df['y'] == 0].drop('y', axis=1).values
# ----> 2 X_1 = PDASH.HeuristicSetSelection(X=tmp, Y=tmp, m=10, kernelType='gaussian', sigma=2)
#
# c:\users\pc\aix\aix360\aix360\algorithms\protodash\PDASH_utils.py in HeuristicSetSelection(X, Y, m, kernelType, sigma)
# 267 currK = K2
# 268 if maxGradient <= 0:
#--> 269 newCurrOptw = np.vstack((currOptw[:], np.array([0])))
# 270 newCurrSetValue = currSetValue
# 271 else:
#
#~\AppData\Local\Continuum\anaconda3\envs\aix360\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
# 281 """
# 282 _warn_for_nonsequence(tup)
#--> 283 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
# 284
# 285
#
#ValueError: all the input array dimensions except for the concatenation axis must match exactly
Interestingly, the error does not pop up for m < 10.
Is this a bug or am I using it incorrectly?
Thanks,
FeatureBinarizer throws a ValueError: Length of passed values is 1, index implies 2
when passed a binary feature with missing values.
This happens in FeatureBinarizer L70, when trying to creating a Series.
For these cases Pandas Series treats nunique
and unique
differently. While nunique
ignores nan's
unique
doesn't.
Example:
x = pd.Series([0, np.nan, 1])
print(x.nunique(), len(x.unique()))
returns
2 3
Versions:
AIX360: 0.2.0
Pandas: 0.25.3
I followed these steps to install aix360 on MacOS 10.14.6
the command failed with the following error:
(aix360) ~/AIX360 [master] $ pip install -e . Obtaining file:///Users/fchiossi/AIX360 Collecting joblib>=0.11 Using cached joblib-0.14.1-py2.py3-none-any.whl (294 kB) Collecting scikit-learn>=0.21.2 Using cached scikit_learn-0.22.1-cp36-cp36m-macosx_10_6_intel.whl (11.1 MB) Collecting torch Using cached torch-1.4.0-cp36-none-macosx_10_9_x86_64.whl (81.1 MB) Collecting torchvision Using cached torchvision-0.5.0-cp36-cp36m-macosx_10_9_x86_64.whl (438 kB) Collecting cvxpy Using cached cvxpy-1.0.28-cp36-cp36m-macosx_10_9_x86_64.whl (745 kB) Collecting cvxopt Using cached cvxopt-1.2.4-cp36-cp36m-macosx_10_9_x86_64.whl (3.1 MB) Collecting Image Using cached image-1.5.28.tar.gz (15 kB) Collecting keras Using cached Keras-2.3.1-py2.py3-none-any.whl (377 kB) Collecting matplotlib Using cached matplotlib-3.1.3-cp36-cp36m-macosx_10_9_x86_64.whl (13.2 MB) Collecting numpy Using cached numpy-1.18.1-cp36-cp36m-macosx_10_9_x86_64.whl (15.2 MB) Collecting pandas Using cached pandas-1.0.1-cp36-cp36m-macosx_10_9_x86_64.whl (9.9 MB) Collecting scipy>=0.17 Using cached scipy-1.4.1-cp36-cp36m-macosx_10_6_intel.whl (28.5 MB) Collecting tensorflow==1.14 Using cached tensorflow-1.14.0-cp36-cp36m-macosx_10_11_x86_64.whl (105.8 MB) Collecting xport Using cached xport-2.0.2-py2.py3-none-any.whl (14 kB) Collecting scikit-image Using cached scikit_image-0.16.2-cp36-cp36m-macosx_10_6_intel.whl (30.4 MB) Collecting requests Using cached requests-2.23.0-py2.py3-none-any.whl (58 kB) Collecting lime Using cached lime-0.1.1.37.tar.gz (275 kB) Collecting shap Using cached shap-0.34.0.tar.gz (264 kB) Collecting xgboost Using cached xgboost-1.0.1.tar.gz (820 kB) ERROR: Command errored out with exit status 1: command: /Users/fchiossi/opt/anaconda3/envs/aix360/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/setup.py'"'"'; __file__='"'"'/private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/pip-egg-info cwd: /private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/ Complete output (27 lines): ++ pwd + oldpath=/private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost + cd ./xgboost/ + mkdir -p build + cd build + cmake .. ./xgboost/build-python.sh: line 21: cmake: command not found + echo ----------------------------- ----------------------------- + echo 'Building multi-thread xgboost failed' Building multi-thread xgboost failed + echo 'Start to build single-thread xgboost' Start to build single-thread xgboost + cmake .. -DUSE_OPENMP=0 ./xgboost/build-python.sh: line 27: cmake: command not found Traceback (most recent call last): File "<string>", line 1, in <module> File "/private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/setup.py", line 42, in <module> LIB_PATH = libpath['find_lib_path']() File "/private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/xgboost/libpath.py", line 50, in find_lib_path 'List of candidates:\n' + ('\n'.join(dll_path))) XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path? List of candidates: /private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/xgboost/libxgboost.dylib /private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/xgboost/../../lib/libxgboost.dylib /private/var/folders/j1/x8bvblx563n247csfz521xkr0000gn/T/pip-install-lrjswknu/xgboost/xgboost/./lib/libxgboost.dylib /Users/fchiossi/opt/anaconda3/envs/aix360/xgboost/libxgboost.dylib ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
How can I fix it?
Thanks in advance
Hello, I'm trying to use the GLRM LogisticRuleRegression and it seems to be compatible with my own code for training/evaluation with sklearn models. However, it fails when I use functions like GridSearchCV for hyperparameter tuning.
TypeError: Cannot clone object '<aix360.algorithms.rbm.logistic_regression.LogisticRuleRegression object at 0x10f731310>' (type <class 'aix360.algorithms.rbm.logistic_regression.LogisticRuleRegression'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods. (py37) Jamess-MacBook-Pro-2:sk
If the class inherits from BaseEstimator
and ClassifierMixin
from sklearn.base
instead of just object
, then it will inherit get_params()
and this will resolve the issue. I've tested this on my local machine. So the change should be:
class LogisticRuleRegression(object):
to
class LogisticRuleRegression(BaseEstimator, ClassifierMixin):
This can also be applied to LinearRuleRegression (replacing ClassifierMixin
with RegressorMixin
) and any other similar classes and may resolve other sklearn compatability issues I haven't come across yet (e.g. Pipeline may be affected as well)
Hi!
I wonder if you are interested in incorporating GRACE method into the framework?
https://dl.acm.org/doi/abs/10.1145/3394486.3403066
Thanks
Is it possible to use ProtoDash for images?
If we use it in an embedding space any recommendations on embedding space?
Thank you in advance.
Hello,
Iam trying to run this example notebook for global GLRM explainer. I get this error when i execute fit--
ValueError: cannot reindex from a duplicate axis
Any help will be appreciated.
Thanks
Can the faithfulness metric can be used as a metric for explanations obtained using LIME or SHAP.
Hello, i am exploring AIX360 for the first time, i wanted to start off by executing the demo notebook of Credit approval usecase.
I have downloaded the heloc_dataset.csv and placed it in the respective folder.
While executing the import statement:
from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing
The below warnings are getting displayed and the cell stops execution.
I have tried restarting the terminal, uninstalling and reinstalling aix360, but the same issue persists.
Please help me how to resolve this.
Pull request for the solution: #111
I've just noticed that the FeatureBinarizer, when including the negated columns as well, does not work when using a dataset where there is a binary categorical feature. That's probably another Pandas version error, where the 1.0.0 or newer Pandas versions work significantly different than they previously did. (Got the error using Pandas 0.25.3)
When calling fb.fit_transform(<dataset_with_binary_category>, negations=True)
The error message was: TypeError: unsupported operand type(s) for -: 'int' and 'Categorical'
At line 142. in function transform(): A[(str(c), 'not', '')] = 1 - A[(str(c), '', '')]
where A[(str(c), '', '')] = data[c].map(maps[c])
and c
is a specific column
At that line the substraction does not work, because the Series A[(str(c), '', '')]
is categorical.
Solution:
For a solution just convert the type of A[(str(c), '', '')]
to integer as A[(str(c), '', '')] = data[c].map(maps[c]).astype(int)
. Although it could be solvable in many formats, I've seen the pattern astype(int)
elsewhere in the codebase, so I hope that the solution is satisfactory.
With pandas version > 1.1.0, the line 148 (145, and 150) return an error: ValueError: cannot reindex from a duplicate axis.
Locally, locally I just added '.values'. Example of line 148:
colKeep[i[0]] = ((Xp[i[0]].columns.get_level_values(0) == '<=') & (thresh > i[2])).values
First, I want to thank you very much for providing this toolkit! I am eager to use your implementation for my own research!
Unfortunately, as I was working through the example "CEM-MAF-CelebA.ipynb" notebook for contrastive explanations, I was stopped dead while obtaining the pertinent negative explanation. (Code chunk 12)
InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[{{node gradients/G_paper_1_1/cond/ToRGB_lod8/Conv2D_grad/Conv2DBackpropInput}}]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-13-b1a3ab914e94> in <module>
3 arg_max_iterations, arg_initial_const, arg_gamma, None,
4 arg_attr_reg, arg_attr_penalty_reg,
----> 5 arg_latent_square_loss_reg)
6
7 print(info_pn)
c:\workspaces\aix360\aix360\algorithms\contrastive\CEM_MAF.py in explain_instance(self, sess, input_img, input_latent, arg_mode, arg_kappa, arg_binary_search_steps, arg_max_iterations, arg_initial_const, arg_gamma, arg_beta, arg_attr_reg, arg_attr_penalty_reg, arg_latent_square_loss_reg)
95 attr_penalty_reg=arg_attr_penalty_reg, latent_square_loss_reg=arg_latent_square_loss_reg)
96
---> 97 adv_img = attack_pn.attack(input_img, target_label, input_latent)
98 adv_prob, adv_class, adv_prob_str = self._wbmodel.predict_long(adv_img)
99 attr_mod = self.check_attributes_celebA(self._attributes, input_img, adv_img)
c:\workspaces\aix360\aix360\algorithms\contrastive\CEM_MAF_aen_PN.py in attack(self, imgs, labs, latent)
268 # perform the attack
269
--> 270 self.sess.run([self.train])
271 temp_adv_latent = self.sess.run(self.adv_latent)
272 self.sess.run(self.adv_updater, feed_dict={self.assign_adv_latent: temp_adv_latent})
...
InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[node gradients/G_paper_1_1/cond/ToRGB_lod8/Conv2D_grad/Conv2DBackpropInput (defined at c:\workspaces\aix360\aix360\algorithms\contrastive\CEM_MAF_aen_PN.py:197) ]]
Errors may have originated from an input operation.
Input Source operations connected to node gradients/G_paper_1_1/cond/ToRGB_lod8/Conv2D_grad/Conv2DBackpropInput:
G_paper_1_1/cond/ToRGB_lod8/mul (defined at <string>:27)
I tried this example twice. Once on a windows machine (CPU only) and on a linux machine (CPU only). Both systems error out at the same step. The installation of aix360 worked both times according to the setup instructions in the git documentation.
I am thinking that the pickled CelebA model (karras2018iclr-celebahq-1024x1024.pkl) is the cause of this error.
Maybe the problem lies with the requirements. AIX360 needs tensorflow=1.14.0
whereas progressive_growing_of_gans requires tensorflow-gpu>=1.6.0
.
I would really appreciate it, if you could help me out on this, as I want to know, if it's a model problem, which I can fix with my own models in the future, or if it's something more complicated than that.
Thank you very much in advance!
I have gone through the HELOC.ipynb file. I can't find documentation anywhere that about what types of models can be used other than Neural Networks.
What model types are supported with aix360?
Thanks!
Does CEMExplainer support scikit-learn models (e.g., RandomForestClassifier)?
Hi,
I identified ways to use LightGBM with many of your tools. Let me know if it is something you would like to incorporate into your package. Maybe we can house a tutorial on the repo AIX360 that can show people how to use your tools with LightGBM as I had to develop a few work-around strategies.
https://github.com/firmai/ml-fairness-framework
Best,
Derek
Hi there,
I've actually copied the code (did no modification at all) from the BRCG part of the "Credit Approval Tutorial" code and ran into errors. I'm quite sure that the dataset was loaded appropriately, as I have also trained a scikit learn Decision Tree Classifier on it with no problem and in the same notebook.
Can someone help me with this issue? Am I missing something or is it an internal problem?
Thanks in advance!
Here is the code and the output.
It was run on google colab, with pandas 1.1.2 and the latest aix360 release, which is 0.2.0.
import warnings
warnings.filterwarnings('ignore')
# Load FICO HELOC data with special values converted to np.nan
from aix360.datasets.heloc_dataset import HELOCDataset, nan_preprocessing
data = HELOCDataset(custom_preprocessing=nan_preprocessing).data()
# Separate target variable
y = data.pop('RiskPerformance')
# Split data into training and test sets using fixed random seed
from sklearn.model_selection import train_test_split
dfTrain, dfTest, yTrain, yTest = train_test_split(data, y, random_state=0, stratify=y)
dfTrain.head().transpose()
# Binarize data and also return standardized ordinal features
from aix360.algorithms.rbm import FeatureBinarizer
fb = FeatureBinarizer(negations=True, returnOrd=True)
dfTrain, dfTrainStd = fb.fit_transform(dfTrain)
dfTest, dfTestStd = fb.transform(dfTest)
dfTrain['ExternalRiskEstimate'].head()
# Instantiate BRCG with small complexity penalty and large beam search width
from aix360.algorithms.rbm import BooleanRuleCG
br = BooleanRuleCG(lambda0=1e-3, lambda1=1e-3, CNF=True)
# Train, print, and evaluate model
br.fit(dfTrain, yTrain)
from sklearn.metrics import accuracy_score
print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
print('Test accuracy:', accuracy_score(yTest, br.predict(dfTest)))
print('Predict Y=0 if ANY of the following rules are satisfied, otherwise Y=1:')
print(br.explain()['rules'])
Learning CNF rule with complexity parameters lambda0=0.001, lambda1=0.001
Initial LP solved
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
1001 try:
-> 1002 self._set_with_engine(key, value)
1003 except (KeyError, ValueError):
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _set_with_engine(self, key, value)
1032 # fails with AttributeError for IntervalIndex
-> 1033 loc = self.index._engine.get_loc(key)
1034 validate_numeric_casting(self.dtype, value)
pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()
KeyError: 'ExternalRiskEstimate'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-98-8d81fbd6c0e1> in <module>()
26
27 # Train, print, and evaluate model
---> 28 br.fit(dfTrain, yTrain)
29 from sklearn.metrics import accuracy_score
30 print('Training accuracy:', accuracy_score(yTrain, br.predict(dfTrain)))
/usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/boolean_rule_cg.py in fit(self, X, y)
118 UB = min(UB.min(), 0)
119 v, zNew, Anew = beam_search(r, X, self.lambda0, self.lambda1,
--> 120 K=self.K, UB=UB, D=self.D, B=self.B, eps=self.eps)
121
122 while (v < -self.eps).any() and (self.it < self.iterMax):
/usr/local/lib/python3.6/dist-packages/aix360/algorithms/rbm/beam_search.py in beam_search(r, X, lambda0, lambda1, K, UB, D, B, wLB, eps, stopEarly)
285 if i[1] == '<=':
286 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
--> 287 colKeep[i[0]] = (Xp[i[0]].columns.get_level_values(0) == '>') & (thresh < i[2])
288 elif i[1] == '>':
289 thresh = Xp[i[0]].columns.get_level_values(1).to_series().replace('NaN', np.nan)
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in __setitem__(self, key, value)
1008 else:
1009 # GH#12862 adding an new key to the Series
-> 1010 self.loc[key] = value
1011
1012 except TypeError as e:
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __setitem__(self, key, value)
668
669 iloc = self if self.name == "iloc" else self.obj.iloc
--> 670 iloc._setitem_with_indexer(indexer, value)
671
672 def _validate_key(self, key, axis: int):
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
1790 # setting for extensionarrays that store dicts. Need to decide
1791 # if it's worth supporting that.
-> 1792 value = self._align_series(indexer, Series(value))
1793
1794 elif isinstance(value, ABCDataFrame):
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _align_series(self, indexer, ser, multiindex_indexer)
1909 # series, so need to broadcast (see GH5206)
1910 if sum_aligners == self.ndim and all(is_sequence(_) for _ in indexer):
-> 1911 ser = ser.reindex(obj.axes[0][indexer[0]], copy=True)._values
1912
1913 # single indexer
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in reindex(self, index, **kwargs)
4397 )
4398 def reindex(self, index=None, **kwargs):
-> 4399 return super().reindex(index=index, **kwargs)
4400
4401 def drop(
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
4457 # perform the reindex on the axes
4458 return self._reindex_axes(
-> 4459 axes, level, limit, tolerance, method, fill_value, copy
4460 ).__finalize__(self, method="reindex")
4461
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
4480 fill_value=fill_value,
4481 copy=copy,
-> 4482 allow_dups=False,
4483 )
4484
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
4525 fill_value=fill_value,
4526 allow_dups=allow_dups,
-> 4527 copy=copy,
4528 )
4529 # If we've made a copy once, no need to make another one
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate)
1274 # some axes don't allow reindexing with dups
1275 if not allow_dups:
-> 1276 self.axes[axis]._can_reindex(indexer)
1277
1278 if axis >= self.ndim:
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
3283 # trying to reindex on an axis with duplicates
3284 if not self.is_unique and len(indexer):
-> 3285 raise ValueError("cannot reindex from a duplicate axis")
3286
3287 def reindex(self, target, method=None, level=None, limit=None, tolerance=None):
ValueError: cannot reindex from a duplicate axis
I am using the HELOC Dataset and trying to explain a single test instance using prototypes from my training subset using below code:
explainer = ProtodashExplainer()
(W, S, _) = explainer.explain(dfTrain.to_numpy(), dfTest.iloc[0:1,:].to_numpy(), m=2)
However, I am getting below error:
Is this intentional? Please help.
Thank you
How do I get an exhaustive list of all the CNF/DNF rules?
I am trying BRCG on my dataset. I have nearly 40 features but most of them are categorical. I have nearly 18k rows or examples of labeled data. After feature binarization, my dataset gets converted into 35 thousand features and now my overall dimension of data is (18K,35k). Now, I run BRCG on this, but it throws memory error as shown below.
The issue might be due to memory allocation in cvxpy module. Any help in resolving this issue will be highly appreciated. I am using 32GB RAM.
I'm working through the Credit Approval Tutorial, 3. Loan Officer: Prototypical explanations for HELOC use case.
When I run the line of code:
(Data, x_train, x_test, y_train_b, y_test_b) = heloc.split()
I get the error:
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
Any idea on how I can get around this error?
Running the HELOC notebook produces this error:
module 'torch.jit' has no attribute '_script_if_tracing'
which is resolved by using a different version of pytorch
!pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
setup.py may need an update to use the correct version of torch and torchvision
(aix360) C:\Users\ESISARP\AIX360>python setup.py
C:\Users\ESISARP\AppData\Local\Continuum\anaconda3\envs\aix360\lib\distutils\dist.py:261: UserWarning: Unknown distribution option: 'authos'
warnings.warn(msg)
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: no commands supplied
It seems that somebody tried to migrate AIX360 to TensorFlow 2.3.0 but the build is broken - is there somebody who could look into this?
I have tried to open the http://aix360.mybluemix.net/ many times, but failed, so the website is not available?
The following does not work:
explainer.explain(X= [500,000,:64], Y= [50,000,:64], m=60,000 )
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 3545 and the array at index 1 has size 3544
If we purely want to select from X I don't see why when m>Y.shape[0], there should be a problem?
Also in the heloc example, you take S from Y, how does that makes sense with the above description.
Also if I had the ordering wrong, I also switched them around to no effect,
explainer.explain(X= [50,000,:64] , Y= [500,000,:64], m=60,000 )
273 [newCurrOptw, value] = runOptimiser(currK, curru, currOptw, maxGradient)
--> 274 newCurrSetValue = -value
275
276 currOptw = newCurrOptw
TypeError: bad operand type for unary -: 'NoneType'
Here is a reproducible example to play with - maybe about 5 mins to run. (reset kernel after installs)
https://colab.research.google.com/drive/1FdafzzZku0RgJEk7Zf_rJ1lLuciRF0YD
Thanks.
The goal of integrating LIME & SHAP is to provide users with an option of using any explainer of their choice for a given use case and to be able to compare different explainability algorithms, simply by installing aix360 instead of having to install multiple libraries.
The integration task would involve the following steps:
(1) Update setup to include lime and shap installs.
(2) Create tests that invoke lime and shap explainers and link these to travis.yml
(3) Write appropriate wrappers around these explainers so users have an option to invoke them in the same manner as other explainers available in aix360.
(4) Create notebooks to illustrate their usage.
(5) Update docs.
References:
LIME: https://github.com/marcotcr/lime
SHAP: https://github.com/slundberg/shap
I'm trying to explain a Cifar10 classification model with CEMExplainer. By messing around with the parameters I got a result for the pertinent negative, but I've been having trouble with getting any results for pertinent positives... What am I doing wrong?
mymodel = KerasClassifier(model)
explainer = CEMExplainer(mymodel)
img = x_test[0]
arg_max_iter = 1000
arg_init_const = 10.0
arg_b = 9
arg_kappa = 0.05
arg_beta = 1e-1
arg_gamma = 100
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
arg_mode = "PP" # Find pertinent positive
(adv_pp, delta_pp, info_pp) = explainer.explain_instance(np.expand_dims(img, axis=0), arg_mode, ae, arg_kappa, arg_b, arg_max_iter, arg_init_const, arg_beta, arg_gamma)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.