Comments (2)
That's not possible, DiCE explanations are always truthful to the ML model by definition. In other words, we are simply tweaking the input instance until we get a different prediction from the same ML model. It is difficult to say anything more without looking at your code. Perhaps, you are missing out on something while creating a validation set. For instance, are you normalizing the continuous features and one-hot-encoding the categorical features in the validation data? You can use DiCE's data interface to create a validation set as follows
dataset = helpers.load_adult_income_dataset()
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
train, test = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_test = test.loc[:, test.columns != 'income']
y_test = test.loc[:, test.columns == 'income']
For your reference, I have included a sample code implementing your logic that gave me valid results.
import dice_ml
from dice_ml.utils import helpers
import tensorflow as tf
from tensorflow import keras
print(tf.__version__) # 2.1.0
# creating a testing dataset - the inbuilt ML model in DiCE for adult data is trained only on the 'train' data below
dataset = helpers.load_adult_income_dataset()
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
train, test = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_test = test.loc[:, test.columns != 'income']
y_test = test.loc[:, test.columns == 'income']
# get normalized age=31
normalized_age = (31-d.train_df['age'].min())/((d.train_df['age'].max()-d.train_df['age'].min())) # should print 0.1917808219178082
# we can verify if the above number is correct using the following
# (normalized_age*(d.train_df['age'].max() - d.train_df['age'].min())) + d.train_df['age'].min() # should give you 31
my_test = X_test[X_test['age']==normalized_age]
print(my_test.shape) # (187,29)
# there are 187 instances with age =31 in our data, I'm choosing the first one below as an example.
# create a test instance dictionary
my_test_instance = {}
for feature in d.feature_names:
if feature in d.continuous_feature_names:
my_test_instance[feature] = (my_test[feature].iloc[0]*(d.train_df[feature].max() - d.train_df[feature].min())) + d.train_df[feature].min()
else:
encoded_features = [feat for feat in d.encoded_feature_names if feat.startswith(feature)]
for encoded_feat in encoded_features:
if my_test.iloc[0][encoded_feat] == 1.0:
my_test_instance[feature] = encoded_feat.split(feature+'_')[1]
print(my_test_instance)
# {'age': 31.0,
# 'workclass': 'Private',
# 'education': 'HS-grad',
# 'marital_status': 'Single',
# 'occupation': 'Blue-Collar',
# 'race': 'White',
# 'gender': 'Female',
# 'hours_per_week': 40.0}
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
backend = 'TF'+tf.__version__[0] # TF2
ML_modelpath = helpers.get_adult_income_modelpath(backend=backend)
m = dice_ml.Model(model_path= ML_modelpath, backend=backend)
exp = dice_ml.Dice(d, m)
# changing every feature except age
dice_exp = exp.generate_counterfactuals(my_test_instance, total_CFs=4, desired_class="opposite", features_to_vary=['workclass', 'education', 'marital_status', 'occupation', 'race', 'gender', 'hours_per_week'])
# visualize the results
dice_exp.visualize_as_list(show_only_changes=True) # prints the following
# Query instance (original outcome : 0)
# [31.0, 'Private', 'HS-grad', 'Single', 'Blue-Collar', 'White', 'Female', 40.0, 0.019464194774627686]
#
# Diverse Counterfactual set (new outcome : 1)
# ['-', 'Self-Employed', '-', 'Married', 'White-Collar', '-', '-', 48.0, 0.75]
# ['-', '-', 'Doctorate', 'Married', '-', '-', '-', 26.0, 0.697]
# ['-', '-', 'Masters', 'Married', '-', '-', '-', '-', 0.749]
# ['-', '-', 'Prof-school', 'Married', '-', '-', '-', 58.0, 0.858]
# To check that the predictions are indeed equal
for ix, cf in enumerate(exp.final_cfs):
model_pred = exp.predict_fn(cf)
cf_pred = exp.cfs_preds[ix]
print(model_pred, cf_pred)
# prints the following
# [[0.75035185]] [[0.75035185]]
# [[0.69670826]] [[0.69670826]]
# [[0.73952144]] [[0.73952144]]
# [[0.85771877]] [[0.85771877]]
from dice.
Thanks! It's very kind of you to provide the code. It helped me find my bug.
from dice.
Related Issues (20)
- Custom probability threshold for outcome
- How to apply a timeout if counterfactual is not generated for any instance HOT 4
- Gradient Method and features_to_vary Parameter
- visualize_as_dataframe show_only_changes doesn't appear to work with floating point numbers
- VAE example not working HOT 2
- dice-ml needs to return the original classes instead of encoded classes
- How to use DICE on multi-label tasks?
- load_adult_income_dataset returns FileNotFoundError HOT 1
- visualize_as_dataframe(show_only_changes=True) does not work when categorical data is composed of numbers
- pip install dice downloads tests HOT 2
- Cannot perform DataFrame operations on generated counterfactuals HOT 2
- ('Feature', ... , 'has a value outside the dataset.') caused by type mismatch HOT 3
- show(shap_local) in .py file
- TypeError: _generate_counterfactuals() got an unexpected keyword argument 'feature_weights' HOT 3
- "ValueError: DataFrame.dtypes for data must be int, float or bool. Did not expect the data types in fields" even for the columns with type as HOT 1
- How to generate CF for three-dimensional dataset
- Unexpected Behavior in Calculating ”feature_weight_list“ leads to abnormal loss?
- Sometimes Counterfactuals generated with random method have wrong class HOT 1
- TypeError: expected str, bytes or os.PathLike object, not CatBoostRegressor HOT 3
- pandas > 2.0.0 should be supported
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dice.