Comments (4)
Hey Yan, thanks for this question! Yes, DiCE one-hot-encodes categorical features internally and we assume the model is also trained on one-hot-encoded data. In this case, though the optimization has to run on more columns, we didn't want to assume any particular order of values for categorical features.
However, if your model is trained with label encoded data, you can simply present all features as continuous features to DiCE and it should work. For instance, for the adult data, you could do something like
d = dice_ml.Data(dataframe=dataset, continuous_features=list(dataset.columns), outcome_name='income')
While outputting the CFs, DiCE outputs an integer (0 for Government, 2 for Private etc.) instead of strings ("Government" or "Private") for categorical features then.
from dice.
Thanks, that seems a very good idea! So if I treat workclass as a continuous feature, does that mean I should enter the feature like the following format or just give it an interval, i.e. 'workclass': [0, 3]?
d = dice_ml.Data(features={
'age':[17, 90],
'workclass': [0, 1, 2, 3],
'education': [0, 1, 2, 3, 4, 5, 6, 7],
'marital_status': [0, 1, 2, 3, 4],
'occupation':[0, 1, 2, 3, 4, 5],
'race': [0, 1],
'gender':[0, 1],
'hours_per_week': [1, 99]},
outcome_name='income')
One more thing that I am concerned is that if we enter the workclass as a continuous feature, when we generate the counterfactual examples, is it possible to output 1.5 for workclass rather than just 0, 1, 2 ?
And I tried to use a 3D dataset, it seems the library is not compatible with 3-dimentional input dataset and 3d model, e.g. Convolutional neural network.
Look forward to your advice! Thank you very much.
Yan
from dice.
You need not provide the range since it is inferred from the data. Just provide all feature names in a list to the continuous_features
parameter like this: d = dice_ml.Data(dataframe=dataset, continuous_features=list(dataset.columns), outcome_name='income')
. Make sure that the categorical features are of type int or float in your case .ie., for instance, 'workclass' takes integer values 0,1,2, and 3.
The precision of a variable is also inferred from the data, so if a variable takes only 0, 1, and 2, then the resulting CF will take only one of these 3 values and nothing else. Try it out and let me know if you get something else.
from dice.
Yes, currently DiCE supports only 2D tabular data with rows as observations and columns as features. We have not tested on image and related datasets yet, but it will be an interesting experiment (both conceptually and empirically) to do and we are looking forward to it at some point in the future!
from dice.
Related Issues (20)
- Custom probability threshold for outcome
- How to apply a timeout if counterfactual is not generated for any instance HOT 4
- Gradient Method and features_to_vary Parameter
- visualize_as_dataframe show_only_changes doesn't appear to work with floating point numbers
- VAE example not working HOT 2
- dice-ml needs to return the original classes instead of encoded classes
- How to use DICE on multi-label tasks?
- load_adult_income_dataset returns FileNotFoundError HOT 1
- visualize_as_dataframe(show_only_changes=True) does not work when categorical data is composed of numbers
- pip install dice downloads tests HOT 2
- Cannot perform DataFrame operations on generated counterfactuals HOT 2
- ('Feature', ... , 'has a value outside the dataset.') caused by type mismatch HOT 3
- show(shap_local) in .py file
- TypeError: _generate_counterfactuals() got an unexpected keyword argument 'feature_weights' HOT 3
- "ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields" even for the columns with type as HOT 1
- How to generate CF for three-dimensional dataset
- Unexpected Behavior in Calculating ”feature_weight_list“ leads to abnormal loss?
- Sometimes Counterfactuals generated with random method have wrong class HOT 1
- TypeError: expected str, bytes or os.PathLike object, not CatBoostRegressor HOT 3
- pandas > 2.0.0 should be supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dice.