Coder Social home page Coder Social logo

Comments (2)

raam93 avatar raam93 commented on July 27, 2024

That's not possible, DiCE explanations are always truthful to the ML model by definition. In other words, we are simply tweaking the input instance until we get a different prediction from the same ML model. It is difficult to say anything more without looking at your code. Perhaps, you are missing out on something while creating a validation set. For instance, are you normalizing the continuous features and one-hot-encoding the categorical features in the validation data? You can use DiCE's data interface to create a validation set as follows

dataset = helpers.load_adult_income_dataset()
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
train, test = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_test = test.loc[:, test.columns != 'income']
y_test = test.loc[:, test.columns == 'income']

For your reference, I have included a sample code implementing your logic that gave me valid results.

import dice_ml
from dice_ml.utils import helpers

import tensorflow as tf
from tensorflow import keras

print(tf.__version__) # 2.1.0

# creating a testing dataset - the inbuilt ML model in DiCE for adult data is trained only on the 'train' data below
dataset = helpers.load_adult_income_dataset()
d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')
train, test = d.split_data(d.normalize_data(d.one_hot_encoded_data))
X_test = test.loc[:, test.columns != 'income']
y_test = test.loc[:, test.columns == 'income']

# get normalized age=31
normalized_age = (31-d.train_df['age'].min())/((d.train_df['age'].max()-d.train_df['age'].min())) # should print 0.1917808219178082

# we can verify if the above number is correct using the following
# (normalized_age*(d.train_df['age'].max() - d.train_df['age'].min())) + d.train_df['age'].min() # should give you 31

my_test = X_test[X_test['age']==normalized_age]
print(my_test.shape) # (187,29)
# there are 187 instances with age =31 in our data, I'm choosing the first one below as an example.

# create a test instance dictionary
my_test_instance = {}
for feature in d.feature_names:
    if feature in d.continuous_feature_names:
        my_test_instance[feature] = (my_test[feature].iloc[0]*(d.train_df[feature].max() - d.train_df[feature].min())) + d.train_df[feature].min()
    else:
        encoded_features = [feat for feat in d.encoded_feature_names if feat.startswith(feature)]
        for encoded_feat in encoded_features:
            if my_test.iloc[0][encoded_feat] == 1.0:
                my_test_instance[feature] = encoded_feat.split(feature+'_')[1]

print(my_test_instance)
# {'age': 31.0,
#  'workclass': 'Private',
#  'education': 'HS-grad',
#  'marital_status': 'Single',
#  'occupation': 'Blue-Collar',
#  'race': 'White',
#  'gender': 'Female',
#  'hours_per_week': 40.0}

d = dice_ml.Data(dataframe=dataset, continuous_features=['age', 'hours_per_week'], outcome_name='income')

backend = 'TF'+tf.__version__[0] # TF2
ML_modelpath = helpers.get_adult_income_modelpath(backend=backend)
m = dice_ml.Model(model_path= ML_modelpath, backend=backend)

exp = dice_ml.Dice(d, m)

# changing every feature except age
dice_exp = exp.generate_counterfactuals(my_test_instance, total_CFs=4, desired_class="opposite", features_to_vary=['workclass', 'education', 'marital_status', 'occupation', 'race', 'gender', 'hours_per_week'])

# visualize the results
dice_exp.visualize_as_list(show_only_changes=True) # prints the following
# Query instance (original outcome : 0)
# [31.0, 'Private', 'HS-grad', 'Single', 'Blue-Collar', 'White', 'Female', 40.0, 0.019464194774627686]
#
# Diverse Counterfactual set (new outcome : 1)
# ['-', 'Self-Employed', '-', 'Married', 'White-Collar', '-', '-', 48.0, 0.75]
# ['-', '-', 'Doctorate', 'Married', '-', '-', '-', 26.0, 0.697]
# ['-', '-', 'Masters', 'Married', '-', '-', '-', '-', 0.749]
# ['-', '-', 'Prof-school', 'Married', '-', '-', '-', 58.0, 0.858]

# To check that the predictions are indeed equal
for ix, cf in enumerate(exp.final_cfs):
    model_pred = exp.predict_fn(cf)
    cf_pred = exp.cfs_preds[ix]
    print(model_pred, cf_pred)
# prints the following
# [[0.75035185]] [[0.75035185]]
# [[0.69670826]] [[0.69670826]]
# [[0.73952144]] [[0.73952144]]
# [[0.85771877]] [[0.85771877]]

from dice.

sina-salek avatar sina-salek commented on July 27, 2024

Thanks! It's very kind of you to provide the code. It helped me find my bug.

from dice.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.