pair-code / what-if-tool Goto Github PK

Source code/webpage/demos for the What-If Tool

Home Page: https://pair-code.github.io/what-if-tool

License: Apache License 2.0

Jupyter Notebook 9.86% Python 0.58% HTML 88.99% TypeScript 0.21% Shell 0.02% JavaScript 0.19% Starlark 0.06% Liquid 0.04% CSS 0.05%

ml-fairness visualization machine-learning jupyterlab-extension colaboratory tensorboard

what-if-tool's People

Stargazers

Watchers

Forkers

davidsonggithub joshisa devki98 yuhonghong66 cchienwu jangieseler lakshya0002 simonewu tolga-b freelogic xeransis hafidzdaud a2un mpower4ru danielzhang111cn sandersk sararob b2220333 aesmin amydoulaohu gavinljj sundara26071978 josephsefara s-you bigdatasciencegroup rymabmzz beedata iamyourboss mhamedjabriqb gazzola shalevy1 mpushkarna stephanwlee hjeffreywang prhldk haqpl butcher211 bhugueney avianaglobal datasec8889 vardavo databill86 alanfranz satheeshgs vmaudgalya josefsosa afcarl kevinrobinson emailhy vivek200192 mwolfca hbcbh1999 lanpa yangyuan6 zyxdtk milan-chicago neelavasengupta bbqgonewrong ozen sandy4321 riahtu yujiawang4 thedriftofwords m271828ngtao raniereramos wchargin genesis-sistemas salamthabet anigasan bastienzim avain fagan2888 minalspatil novapyth mjdhasan brunotech sjoerdteunisse taposh jay-chakalasiya 82dilip sonyeric jadeluo deepak-katchi tomaszek2902 saifalmaliki dhikum c3333 micseb hendrikmax zacht1711 ricmagno eupimenta abrandenb tens0rflowjs pragyanaischool siewlinyap bhupendramishra brunoscaglione personx000 robbie-daniels

what-if-tool's Issues

Problems replicating web demo using TensorFlow Serving on Docker

Hi,
I was trying to run the toy CelebA model using TF Serving and I couldn't connect to the model.

Here is the command:

 docker run -p 8500:8500 --mount type=bind,source='/Users/user/Downloads/wit_testing',target=/models/wit_testing -e MODEL_NAME=wit_testing -it tensorflow/serving

I ran TensorBoard on localhost:6006 and then I configured the WIT tool as follows:

I serialized the model to a ProtoBuf as instructed, serialized data to a TFRecords file but:

The photos on the right do not display in the same way they are displayed in the Web Demo, there are just dots representing datapoints.
Whenever I try to run an inference call to the model, I get bad request error (500). The error I am getting:

grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Expects arg[0] to be float but string is provided"
        debug_error_string = "{"created":"@1560888632.009402000","description":"Error received from peer ipv6:[::1]:8500","file":"src/core/lib/surface/call.cc","file_line":1041,"grpc_message":"Expects arg[0] to be float but string is provided","grpc_status":3}"

Is there some convention in naming and paths here that I am missing? Or am I doing something else completely wrong?

Can the attribution values be visualized in Tensorboard mode?

I worked with the Notebook mode and was successfully able to project attribution scores (using Shapely algorithm) on WIT dashboard. Due to a bigger data size, I then tried the visualization in the Tensorboard mode. The instructions given on the documentation page, mentions only two requirements: 1. ML model in TF serving format and, 2. TFRecord file of the example data set. There isn't any mention of generating or uploading attribution values (generated by Google Integrated Gradient or SHAP) in the Tensorboard mode. Please suggest if it's possible to add attribution values in Tensorboard mode or am I missing anything.

WIT Error in Dashboard (Using Jupyter Extention)

Age Demo from WIT [From here](: https://colab.research.google.com/github/pair-code/what-if-tool/blob/master/WIT_Age_Regression.ipynb)

current behavior:
On executing the command- WitWidget(config_builder, height=tool_height_in_px)
I encountered the below error above the WIT extension dashboard:

  `Cannot set tensorflow.serving.Regression.value to array([21.733267], dtype=float32): array([21.733267], dtype=float32) has type <class 'numpy.ndarray'>, but expected one of: numbers.Real`

Problem: Performance tab has no output in WIT dashboard
Browser: Chrome

Expected Output:

Performance tab: Should work with all features of graphs and plots

Please see what is the issue. I have tried to debug the WitWidget function but unable to overcome this error.
Thanks.

Multi-point editing

Would be nice to have a mode where one can click on a feature in the datapoint editor and edit it, and have that edit take affect for ALL datapoints, not just a single one.

Need to think about proper UI for that experience.

Installing error: cannot import "ensure_str" from "six"

Hi, I encounter an error while executing jupyter nbextension install --py --symlink --sys-prefix witwidget:
Traceback (most recent call last): File "/usr/bin/jupyter-nbextension", line 11, in <module> load_entry_point('notebook==5.2.2', 'console_scripts', 'jupyter-nbextension')() File "/usr/lib/python3/dist-packages/jupyter_core/application.py", line 266, in launch_instance return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs) File "/usr/lib/python3/dist-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/usr/lib/python3/dist-packages/notebook/nbextensions.py", line 988, in start super(NBExtensionApp, self).start() File "/usr/lib/python3/dist-packages/jupyter_core/application.py", line 255, in start self.subapp.start() File "/usr/lib/python3/dist-packages/notebook/nbextensions.py", line 716, in start self.install_extensions() File "/usr/lib/python3/dist-packages/notebook/nbextensions.py", line 695, in install_extensions **kwargs File "/usr/lib/python3/dist-packages/notebook/nbextensions.py", line 211, in install_nbextension_python m, nbexts = _get_nbextension_metadata(module) File "/usr/lib/python3/dist-packages/notebook/nbextensions.py", line 1122, in _get_nbextension_metadata m = import_item(module) File "/usr/lib/python3/dist-packages/traitlets/utils/importstring.py", line 42, in import_item return __import__(parts[0]) File "/home/linin/.local/lib/python3.6/site-packages/witwidget/__init__.py", line 15, in <module> from witwidget.notebook.visualization import * File "/home/linin/.local/lib/python3.6/site-packages/witwidget/notebook/visualization.py", line 27, in <module> from witwidget.notebook.jupyter.wit import * # pylint: disable=wildcard-import,g-import-not-at-top File "/home/linin/.local/lib/python3.6/site-packages/witwidget/notebook/jupyter/wit.py", line 25, in <module> from witwidget.notebook import base File "/home/linin/.local/lib/python3.6/site-packages/witwidget/notebook/base.py", line 26, in <module> from six import ensure_str ImportError: cannot import name 'ensure_str'

But I can import ensure_str within my python2 and python3, where could it go wrong?
Thanks a lot.

Add sort by variation documentation

In the partial dependence plot view, there is a sort by variation button to sort features by how much their partial dependence plots vary (total Y axis distance traveled across the chart). Also, if comparing two models, each feature is ranked by its largest Y axis distance traveled across the two models' PD plots for that feature.

This information should be displayed in a information popup next to the sort button, like we have with other non-obvious buttons/controls.

Widget not loading across multiple applications

I am working on several different tools that use the What-If tool for interactive visuals. It seems like I keep running into the same error where I see 'Loading Widget...' and no visual.

Currently, I am working in the JupyterLab Environment of Google Cloud's AI Platform.

I have did the following installations:
pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
conda install -c conda-forge nodejs
jupyter nbextension install --py --user witwidget
jupyter nbextension enable witwidget --user --py
jupyter labextension install @jupyter-widgets/jupyterlab-manager

And have the following jupyterlab list:

JupyterLab v1.2.16
Known labextensions:
app dir: /opt/conda/share/jupyter/lab
@jupyter-widgets/jupyterlab-manager v1.1.0 enabled OK
@jupyterlab/celltags v0.2.0 enabled OK
@jupyterlab/git v0.10.1 enabled OK
js v0.1.0 enabled OK
jupyter-matplotlib v0.7.2 enabled OK
jupyterlab-plotly v4.8.1 enabled OK
nbdime-jupyterlab v1.0.0 enabled OK
plotlywidget v4.8.1 enabled OK
wit-widget v1.6.0 enabled OK

It would be helpful if you had any insight on why this problem might be occuring!

Requirement: Multiclass classification data analysis using WIT...(Flower classification python file required)

Hi James/Team,
Can i have python file (.piynb) for Multiclass classification as m working on the same case.

Thanks in advance !!

Jupyter Lab 2.x support

The witwidget does not seem to work in Jupyter Lab 2.x versions. Even before running witWidget, I get the error:

Whenever I use witwidget with Jupyter Lab 2.x, I get the error:

But when I use Jupyterlab 1.x, everything works fine. I guess the witwidget is not ported to Jupyterlab 2.x. Jupyterlab docs contain extension migration guide, which may help in updating the widget to JupyterLab 2.x.

Can i use sklearn models directly without deploying them on AI platform.....please help.

Above code gives below error, seems like feature mismatch

However if i use this model after deploying on AI platform then it works fine.

Release tags

Could you start tagging the releases here on Github?

Create a feedback module

Add a control for users to send feedback about the tool to the WIT team.

Investigate how to best do this. Could just be a bug/feedback button that links to a new github issue as the simplest approach.

Clarifying how selection updates UI for counterfactuals

hi, thanks for sharing all your awesome work! 👍

I was exploring the UCI dataset on the web demo while reading the paper and it looks to me like there might be a bug in how the UI state updates to color which elements of the counterfactual are different. Alternately, it might be I'm just misunderstanding the UX :)

I'm expecting that when I look at a data point, the attributes of the counterfactual that are different will be shown in green, like the "occupation" and "relationship" values here:

To reproduce:

Bin the x-axis by occupation:

Zoom into "Transport-moving" at the bottom and click on the lowest ">50k" data point, colored red and highlighted here:

Enable showing counterfactuals
Notice that "occupation" and "relationship" are highlighted in green, which is in line with what I'd expect since they're different:
Click on the highest "<50k" data point, colored blue and highlighted here:
Check out the counterfactual
It looks like some attributes that are the same are highlighted in green, which is not what I would expect. In this screenshot, I'd expect "occupation" to be green but "relationship" to be standard black text.

Note that the highlighting behavior is different if you clear the selection, and then click on the data point in step 4 directly. That shows these attributes highlighted, as I'd expected:

So this might be a UX misunderstanding, and maybe I'm not understanding how the counterfactual computation is supposed to interact with the selection. But since the behavior is different depending on the order of doing this, I suspect it's a UI bug with updating in response to state changes. I poked around a bit and seemed like maybe around here is where the syncing between selection interactions, changing the values, and rendering the color here.

Thanks! Let me know if there's anything else I can provide that's helpful for debugging.

Unable to build the repository due to bazel version

As per package.json, this project is using bazel 0.23.2.

what-if-tool/package.json

Line 35 in a4ada74

"@bazel/bazel": "^0.23.2",

However, in WORKSPACE file, it requires bazel 0.26.1.

what-if-tool/WORKSPACE

Line 19 in a4ada74

versions.check(minimum_bazel_version = "0.26.1")

I tried yarn add @bazel/[email protected], the build can start but always fails at some bazel rules package, like error loading package 'node_modules/@schematics/update/node_modules/rxjs/src/operators': Unable to find package for @build_bazel_rules_typescript//:defs.bzl: The repository '@build_bazel_rules_typescript' could not be resolved.
or ERROR: error loading package '': in .../org_tensorflow_tensorboard/third_party/workspace.bzl: in .../npm_bazel_typescript/index.bzl: in .../npm_bazel_typescript/internal/ts_repositories.bzl: Unable to load file '@build_bazel_rules_nodejs//:index.bzl': file doesn't exist (when I tried to upgrade @bazel/typescript package to latest).

What is the correct versions of bazel, bazel rules, etc., to use?

Nothing shows up in my jupyter notebook

I can successfully run the demo script COMPAS Recidivism Classifier in the jupyter notebook on my local machine. But nothing shows up at the end of jupyter notebook. I assume it will show up the visualization interface after I run WitWidget(config_builder, height=tool_height_in_px).

And I have installed all the extension at the beginning of the jupyter notebook(python3).

! jupyter nbextension install --py --symlink --sys-prefix witwidget
! jupyter nbextension enable --py --sys-prefix witwidget

Error message handling when remote model fails

When a TF serving model returns a failure instead of a prediction, pass the failure string to the front-end for display instead of the generic http error we see now.

WIT missing functionality with TFX exports

Similar to issue #37 I would like to use the WIT with a TFX pipeline. I am trying this out with the Iris/Native Keras example from TFX (https://github.com/tensorflow/tfx/blob/master/tfx/examples/iris/iris_pipeline_native_keras.py). I have tried both set_custom_predict_fn and set_estimator_and_feature_spec. Both allow me to load the WIT but the Predict button cannot be used. In the set_custom_predict_fn case, the WIT gives me the error "AttributeError("'list' object has no attribute 'SerializeToString'",)". In the set_estimator_and_feature_spec case, the WIT gives me the error "AttributeError("'str' object has no attribute 'predict'",)".

Here's the code in a Colab: https://colab.research.google.com/drive/1tfUZ4MLT2Ynj8LNeghOUnL7iBstEQTgv

Which is the correct way to use the WitConfigBuilder with a TFX model, and how do I correct the error?

Thanks!

which trained models can be used?

Hi There,
I am new to using the what-if tool. I would like to use it to see if my ML model is fair or not; I already have a trained XGBoost model (booster object, saved model). How can I use this model with the what-if tool and is this even possible? I notice in your code that can be modified by users the WIT-from scratch.ipynb that you use classifier = tf.estimator.LinearClassifier
What if I already have a saved model that is of the form I mentioned above (xgboost)? Can I still use the what-if tool or not. I am concerned that tensorflow does not support these types of booster object type models. Any help would be appreciated!!!! Thanks!

How to use what-it-tool in front end?

Hi James and Team,

config_builder = (WitConfigBuilder(test_examples.tolist(), X_test.columns.tolist() + ["total_time"])
.set_custom_predict_fn(adjust_prediction)
.set_target_feature('total_time')
.set_model_type('regression'))
WitWidget(config_builder, height=800)

Can we return html or something else from above code, which we can render on frontend as I want to use this in one of my applications.

Just like in plotly we do...in plotly we can return the HTML or we can open the plot in new Web page.

Style the global attributions table

For both classification and regression models, we now have a global attributions table in the performance tab, if the model returns attributions along with predictions.

We need to appropriately style and position this table for both model types and for one or two models.

Fix type in demographic parity popup

it says party :)

Can't find wheel file in the package folder

I'm trying to add some functionality my own in the what-if-tool dashboard. I follow the https://github.com/PAIR-code/what-if-tool/blob/master/DEVELOPMENT.md to setup and rebuild the package.
Because I just use jupyter notebook to play around the what-if-tool, I just follow

a. rm -rf /tmp/wit-pip (if it already exists)
b. bazel run witwidget/pip_package:build_pip_package
c. Install the package
For use in Jupyter notebooks, install and enable the locally-build pip package per instructions in the README, but instead use pip install <pathToBuiltPipPackageWhlFile>, then launch the jupyter notebook kernel.

But after I run b step, I don't know where's the <pathToBuiltPipPackageWhlFile>. I can't find any files end with .whl in the folder. Maybe I missed something. Appreciate any help.

And there is a WARNING when I run b step. Not sure if it causes the problem

WARNING: Download from https://mirror.bazel.build/repo1.maven.org/maven2/com/google/javascript/closure-compiler-unshaded/v20190909/closure-compiler-unshaded-v20190909.jar failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found

Calculate/display AUC for PR curve

For classification models, we display a PR curve in the performance tab. We should calculate the area under this curve and display it above the curve, in the title for the chart.

Add rank correlation when comparing two models

When comparing two models, we could calculate rank correlation (at least for binary classification and regression models). Rank correlation is a number indicating how much the scores from the two models across the test examples line up in terms of order of the scores across those examples.

Need to think about where this info would go though. Would be valuable to calculate on slices as well, when user is slicing in performance tab.

what's the difference between this and tensorboard/plugins/interactive_inference

I have read the two repo code, but I found that they are not exactly the same, so what's the difference between them. Like which one will updated frequently and whick one will have a newer version and etc...

Docs/Meta for Packagers: Packaging info on PyPI, tags for patches

I'm representing TF SIG Build, a TensorFlow special interest group dedicated to building and testing TF in open source. Our last meeting surfaced confusion from community members involved in packaging TF for other environments (e.g. Gentoo, Anaconda) about tensorboard-plugin-wit, which I think could be resolved with these two asks:

Could the PyPI page be updated to answer these questions? Our packagers only know about WIT from the indirect tensorflow -> tensorboard -> tensorflow-plugin-wit dependency, which points to an empty PyPI page.
- What does the plugin do? (e.g. a short description and links to the canonical WIT site)
- Why does tensorboard depend on it? (e.g. "it was once part of core tensorboard but was moved to a plugin")
- Where is the source code, and how is it built? (i.e. this repo)
Could we have assurance that future PyPI releases match up with a tag from a Git source? In this case, the 1.6.0post* patch releases lack a matching tag in this repo. For packagers, a tag for each release means they can rebuild the package in the necessary configuration for their platform, and helps verify that the package on PyPI really matches up with the code.

These would help a lot!

Local development setup, adjusting Closure compiler compilation level

hello! When doing development, what's a good workflow? I'll share what I discovered and hope that this helps other folks new to the repo, or perhaps they can help me understand better ways to approach this. The TDLR is compilation_level = "BUNDLE" seems useful for development :)

I started with the web demo for the smiling classifier, and it seems the way the build works, changes to the filesystem aren't detected and you have to kill the process and then run the full vulcanize process on each change. This takes about a minute on my laptop, so that's what prompted me to look into this.

In the Bazel output I see:

$ bazel run wit_dashboard/demo:imagedemoserver
INFO: Analyzed target //wit_dashboard/demo:imagedemoserver (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
INFO: From Vulcanizing /wit-dashboard/image_index.html:
...

The vulcanizing step takes about a full minute just to rebuild when changing just the text in line of logging. To understand what it's doing, I read through the BUILD files and the related .bzl definitions over in TensorBoard. I noticed that there are references to other libraries in https://github.com/PAIR-code/what-if-tool/blob/master/wit_dashboard/demo/wit-image-demo.html#L19, and figured maybe that's why the build is taking so long.

Removing those cut the build time down to ~45 seconds.
Removing wit-dashboard cuts the build down to ~15 seconds.
Removing everything but just the Polymer bits brings the build build to ~2 seconds.

Stepping back, I see that the wit-dashboard includes dependencies in the BUILD file from Polymer, Facets, and TensorBoard (as well as WIT components). If I comment out the WIT dependencies from the BUILD file and from the <link /> tags in wit-dashboard.html, this still takes ~40 seconds to build. So it seems to me like most of the build time, even on just changing text in a console.log statement is from either re-compiling dependencies, or from whole-program optimization the Vulcanize task is doing (or maybe that Closure compiler is doing on its behalf).

I tried copying the vulcanize.bzl from TensorBoard into the WIT folder so I could look at it and understand what it's doing. But in the process, I noticed some params in the BUILD task that ultimately does the vulcanizing:

tensorboard_html_binary(
    name = "imagedemoserver",
    testonly = True,  # Keeps JavaScript somewhat readable
    compile = True,  # Run Closure Compiler
    input_path = "/wit-dashboard/image_index.html",
    output_path = "/wit-dashboard/image_demo.html",
    deps = [":imagedemo"],
)

Changing compile = False cuts the build to 2 seconds! But it doesn't work because somewhere in the project there are goog.require style dependencies.

Changing the compilation_level helps though! I found these options in the Closure compiler, and luckily the build task in TensorBoard that calls Closure passes those right along. This gets things working again and down to ~20 seconds. The Closure Bazel defs say to use WHITESPACE_ONLY but that it will disable type checking (https://github.com/bazelbuild/rules_closure/blob/4925e6228e89e3b051a89ff57b8a033fa3fb9544/README.md#arguments-2). This helps (~10 seconds) but breaks the app. The Closure docs don't mention BUNDLE but you can see it in the source:

public enum CompilationLevel {
  /** BUNDLE Simply orders and concatenates files to the output. */
  BUNDLE,

And using this takes like half the time to build as compared to SIMPLE_OPTIMIZATIONS.

In the end, this is the impact on my sweet 2012 MacBook Pro:

# master, after just changing a logging string
$ bazel run wit_dashboard/demo:imagedemoserver
INFO: Elapsed time: 53.611s, Critical Path: 52.66s

# set compilation_level = "BUNDLE" instead of default ("ADVANCED")
$ bazel run wit_dashboard/demo:imagedemoserver
INFO: Elapsed time: 17.940s, Critical Path: 17.45s

So, I'll do this locally now, but would also love to learn if there are better ways to do this :)

Alternately, I also poked around to see if there was a way to either update these calls to listen to a command line arg or env variable passed through bazel run. I skimmed the Bazel docs and issues and saw things like aspects and bazelrc but nothing seemed fast and direct. I suppose this could be done in TensorBoard in the tensorboard_html_binary task. But I also discovered that there's a WORKSPACE and workspace.bzl tasks here, so maybe that could be a place to add a layer of indirection so that the project calls into tensorboard_html_binary_wrapper that reads some env switch so it builds for production by default, but if you do bazel run whatev --be-faster then it can run Closure compiler without the slower advanced optimizations. If doing something like that is helpful I can try but attempting changes to the build setup are always dicey :)

If that's too much of a pain I can just add a note to https://github.com/PAIR-code/what-if-tool/blob/master/DEVELOPMENT.md to help folks discover how to speed up local builds. Thanks!

EDIT: Also noticed that a long time ago @stephanwlee was thinking about this upstream tensorflow/tensorboard#1599 and some other open issues reference related things about advanced compilation mode in dev (eg, tensorflow/tensorboard#2687)

use my own data

Hi James and Team,

Is it possible to use my own data but still have the same front end?

Issue with ML for Business professional training

There is one point that has been gone that is: Group unaware and in the training they suggest to click. What happened? Thanks.

Incomplete categorical feature list in "Partial Dependence Plots"

WhatIf currently only reads first 50 Datapoints to generate candidates for categorical features to be used in "Partial Dependence Plots". This could be too restrictive. It shall read more data to get a more complete list of categories and choose the most frequent ones for plots.

'gcloud beta ai-platform explain' giving errors with 3d input array for LSTM model

Following post is not exactly en error with WIT, but I'm having issues with the output from google explain which acts as input for WIT tool. Please help, if possible.

I have a 3d input keras model which trains successfully.

The model summary:

Model: "model"
Layer (type) Output Shape Param #

input_1 (InputLayer) [(None, 5, 1815)] 0
bidirectional (Bidirectional (None, 5, 64) 473088
bidirectional_1 (Bidirection (None, 5, 64) 24832
output (TimeDistributed) (None, 5, 25) 1625
Total params: 499,545
Trainable params: 499,545
Non-trainable params: 0

Post that estimator is defined and the serving is created as:

Convert our Keras model to an estimator

keras_estimator = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir='export')

We need this serving input function to export our model in the next cell

serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(
{'input_1': model.input}
)

export the model to bucket

export_path = keras_estimator.export_saved_model(
'gs://' + BUCKET_NAME + '/explanations',
serving_input_receiver_fn=serving_fn
).decode('utf-8')
print(export_path)

The explanation metadata definition is defined and copied to required destination as below:
explanation_metadata = {
"inputs": {
"data": {
"input_tensor_name": "input_1:0",
"input_baselines": [np.mean(data_X, axis=0).tolist()],
"encoding": "bag_of_features",
"index_feature_mapping": feature_X.tolist()
}
},
"outputs": {
"duration": {
"output_tensor_name": "output/Reshape_1:0"
}
},
"framework": "tensorflow"
}

Write the json to a local file

with open('explanation_metadata.json', 'w') as output_file:
json.dump(explanation_metadata, output_file)
!gsutil cp explanation_metadata.json $export_path

Post that the model is created and the version is defined as:

Create the model if it doesn't exist yet (you only need to run this once)

!gcloud ai-platform models create $MODEL --enable-logging --regions=us-central1

Create the version with gcloud

explain_method = 'integrated-gradients'
!gcloud beta ai-platform versions create $VERSION
--model $MODEL
--origin $export_path
--runtime-version 1.15
--framework TENSORFLOW
--python-version 3.7
--machine-type n1-standard-4
--explanation-method $explain_method
--num-integral-steps 25

Everything works fine until this step, but now when I create and send the explain request as:

prediction_json = {'input_1': data_X[:5].tolist()}
with open('diag-data.json', 'w') as outfile:
json.dump(prediction_json, outfile)

Send the request to google cloud

!gcloud beta ai-platform explain --model $MODEL --json-instances='diag-data.json'

I get the following error

{
"error": "Explainability failed with exception: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INVALID_ARGUMENT\n\tdetails = "transpose expects a vector of size 4. But input(1) is a vector of size 3\n\t [[{{node bidirectional/forward_lstm_1/transpose}}]]"\n\tdebug_error_string = "{"created":"@1586068796.692241013","description":"Error received from peer ipv4:10.7.252.78:8500","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"transpose expects a vector of size 4. But input(1) is a vector of size 3\n\t [[{{node bidirectional/forward_lstm_1/transpose}}]]","grpc_status":3}"\n>"
}

I tried altering the input shape, but nothing worked. Then to verify the format I tried with google cloud predict command which initially did not work, but worked after reshaping the input as:

prediction_json = {'input_1': data_X[:5].reshape(-1,1815).tolist()}
with open('diag-data.json', 'w') as outfile:
json.dump(prediction_json, outfile)

send the predict request

!gcloud beta ai-platform predict --model $MODEL --json-instances='diag-data.json'

I'm at a dead end now with !gcloud beta ai-platform explain --model $MODEL --json-instances='diag-data.json' and looking for the much needed help from SO community.

Also, for ease of experimenting, the notebook could be accessed from google_explain_notebook

Clarify connection between fairness optimization choices on left and data on right

Thanks for sharing! This is awesome, and super cool to see tools that let people now do explorations like in https://research.google.com/bigpicture/attacking-discrimination-in-ml/ with their own models or plain CSVs :)

In reading the UI, and in talking through this with other people about what's going on in the fairness optimizations, I found myself marking up screenshots to explain what was going on, like this:

I thought it might be a helpful improvement to make these connections more explicit and obvious, rather than having to parse the text definitions and map them to the UI and data points on the right. The screenshot above isn't a UI proposal, but I could sketch some options if you're interested in brainstorming. It's particularly hard to see what's being compared when you slice by more than one dimension and the confusion matrix isn't visible, so would be interesting in seeing if there's ways to make it possible to see this across say four slices. If there are other ways to look at this, that'd be awesome to learn about too! There's a lot of information and conceptual density here, so laying it out and staying simple seems like a great design challenge but also super hard :)

Relatedly, if I'm understanding right, for some of the choices the actual metric being optimized isn't visible anywhere at all (putting aside the cost ratio altogether for now). So for single threshold, as an example, I believe the number being optimized is the overall accuracy, the aggregation of these two numbers weighted by the count of examples:

So in this case I'm trying to see how much the overall accuracy goes down when trying different optimization strategies that will all bring it down as they trade off other goals (eg, equal opportunity). These questions may just from me exploring the impact of different parameters to build intuition, but the kind of question I'm trying to ask is "how much worse is the metric that equal opportunity is optimizing for overall, when I choose demographic parity?" and "how much worse is the metric for equal opportunity for each slice when I choose demographic parity?" Not sure if I'm making any sense, but essentially trying to compare how one optimization choice impacts the other optimization choices' metrics.

Thanks!

Incomplete categorical feature list in Partial Dependence Plots"

Data split for binning - Datapoint editor vs. Performance & Fairness

Hi,

we really like to use the What-If tool. The last days we encountered that the split of the data between the datapoint editor and the performance and fairness tabs isn't performed in the same way. As an example, we binned the data of the UCI census income dataset by age in 10 bins. The number of data points in each bin for the datapoint editor and performance & fairness tabs can vary (s. figure).

For us, it would be extremely helpful if the data in e.g. the first bin of the datapoint editor would be exactly the same as in the first bin of the performance and fairness tab.

Best,
Timo

FP/FN Typo in ProPublica Notebook Demo

In the notebook https://colab.research.google.com/github/pair-code/what-if-tool/blob/master/WIT_COMPAS.ipynb#scrollTo=VZ-rK11X5arK

The 8th cell mentions:
"But, the FP rate is MUCH higher for African Americans and the FN rate is MUCH lower for caucasians"

Even though the data shows :

AA FN: 15.4%
Caucasion FN: 25.2%
AA FP: 19.5%
Caucasion FP: 9.7%

Is this a typo or are we missing something here?

Getting a full stacktrace for errors

Hey there,

I've used the WIT in the past and am now coming back to it for a new project. I'm trying to use the Jupyter integrated widget with the inference done through a

TF serving instance
In the tensorflow/serving docker container (2.1.0)
Running a servable created from a Keras model
- Using tf.model.save
On another machine
Talking via gRPC

I'm getting a fairly unhelpful error whenever I try to run inference (when it starts up and when I click that Infer button):

TypeError('None has type NoneType, but expected one of: bytes, unicode',)

The configuration I've setup is as follows

wit_config = (
    witwidget.WitConfigBuilder(pred_df_ex)
    .set_inference_address('<host_redacted>:8500')
    .set_model_name('fts_test')
    .set_uses_predict_api(True)
    .set_predict_output_tensor('outputs')
    .set_model_type('classification')
    #.set_predict_input_tensor('inputs')
    .set_target_feature('label')
    .set_label_vocab(['No','Yes'])
)

I'm able to rule out that it's not getting a response from the server, since if I mess with the configuration to make it intentionally broken (e.g. if I uncomment that input tensor line) I'll get an error that probably could only come from the server.

<_Rendezvous of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "input size does not match signature: 1!=11 len({inputs}) != 
len({<feature_names_redacted>}). Sent extra: {inputs}. Missing but required: {<feature_names_redacted>}." debug_error_string = 
"{"created":"@1585001946.586897426","description":"Error received from peer ipv4:<ipaddr_redacted>:8500","file":"src/core/lib/surface
/call.cc","file_line":1052,"grpc_message":"input size does not match signature: 1!=11 
len({inputs}) != len({<feature_names_redacted>}). Sent extra: {inputs}. Missing but required: {<feature_names_redacted>}.","grpc_status":3}" >

I've verified that I can talk to the host no problem from this machine as well.

I'm not sure, honestly, whether or not that's an accurate assessment however -- what I'd need is the full stacktrace of that error. Any help would be appreciated.

colab notebook runtime disconnects when numdata_points = 20000

I started from the demo income classification , using the linear regressor - I am using my own data set to predict insurance payments using about 12 features. My dataset has over 30,000 points, the what-if tool disconnect when I set the number of data points to 20000. whatif works up to about 15,000 numdata_points.
This is not a time out problem, code runs for under 5 minutes. Is there a limit on the number of data points that WitConfigBuilder can handle in colab.
response would be much appreciated.

Smile detection demo and CelebA datapoint IDs

This is an awesome demo! I spent some time exploring the performance & fairness tabs, and then digging into facets dive, and individual examples. It's really interesting, thanks for putting together such an accessible demo 👍

In the process I found a bunch of data points that were labeled differently than I would have expected. I figured this was errors from the labeling that were part of the CelebA set, but it came up enough times that it led to me investigate further and try to see how many data points appeared to be mislabeled, when compared to my own personal oracle-like labeling truth :)

To debug, I downloaded the CelebA dataset and then started looking at individual data points, assuming the Datapoint ID in WIT would correspond to the image ID and filename in CelebA. This doesn't seem to be the case though, and so I can't figure out how to verify further.

You can pick any number as an example (Datapoint ID 1 is what I started with when trying to compare to CelebA). But for one full example, in facets within WIT, I noticed a datapoint was labeled "Sideburns" in a way I didn't expect looking at the image myself, so I clicked in to see the image and the datapoint ID:

But then checking in the the CelebA set, this is what I see for image 000038.jpg:

The data in CelebA for 38 in the list_attr_celeba.csv file is also different than the data in WIT for Datapoint ID 38:

Since there's only 250 examples in the WIT tool, I'm wondering if the full dataset is being sampled for the demo, and then the Datapoint ID values are being mapped to 0-249 and the reference back to the original dataset is lost? That's just a guess though. I see there's a bunch of data in https://github.com/PAIR-code/what-if-tool/tree/master/data/images but not sure how to debug further.

Thanks for sharing this work!

Can we use the what-if-tool off-line?

Good morning, Thank you so much for this amazing piece of work. When I saw the presentation I immediately thought of our leaders and how might they benefit from the understanding that this process brings. Due to the nature of the data that we are working with here at the Ministry of Education we would never be able to use this tool effectively because of the breach in confidentiality. Is there a way to use this tool on the local machine and have it display the results just as it would have done in the cloud?
Thank you again for your time and suppor.
Best Regards,

Andrei.

Issues in converting to correct json format.

we are not being able to convert our custom dataset into correct format that what if tool requires.
Sample data that we are trying to convert is below.

"ID","age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-country","result"
19122,42," Federal-gov",178470," HS-grad",9," Divorced"," Adm-clerical"," Not-in-family"," White"," Female",0,0,40," United-States"," <=50K"
20798,49," Federal-gov",115784," Some-college",10," Married-civ-spouse"," Craft-repair"," Husband"," White"," Male",0,0,40," United-States"," <=50K"
32472,34," Private",30673," Masters",14," Married-civ-spouse"," Prof-specialty"," Husband"," White"," Male",0,0,55," United-States"," <=50K"
21476,29," Private",157612," Bachelors",13," Never-married"," Prof-specialty"," Not-in-family"," White"," Female",14344,0,40," United-States"," >50K"
24836,30," Private",175931," HS-grad",9," Married-civ-spouse"," Craft-repair"," Husband"," White"," Male",0,0,40," United-States"," <=50K"
5285,31," Self-emp-inc",236415," Some-college",10," Married-civ-spouse"," Adm-clerical"," Wife"," White"," Female",0,0,20," United-States"," >50K"

This is the json format that i currently have.
[{ "ID": 19122, "age": 42, "capital-gain": 0, "capital-loss": 0, "education": " HS-grad", "education-num": 9, "fnlwgt": 178470, "hours-per-week": 40, "marital-status": " Divorced", "native-country": " United-States", "occupation": " Adm-clerical", "race": " White", "relationship": " Not-in-family", "result": " <=50K", "sex": " Female", "workclass": " Federal-gov" }]

what do we need to do to generate the json in the format that is required by the what if tool.

Problems replicating COMPASS web demo using TensorFlow Serving on Docker

Hi,

I am new using tensorflow and WIT and I do not even know if Ishould be posting this here but I am trying to replicate the COMPAS demo using TensorFlow Serving on Docker and I get the next error:

Request for model inference failed: RequestNetworkError: RequestNetworkError: 500 at /data/plugin/whatif/infer?inference_address.

I am using the following docker command:

docker run -p 8500:8500 --mount type=bind,source="C:\Users\arancha.abad\Importar_modelos\versiones",target=/models/saved_model -e MODEL_NAME=saved_model -t tensorflow/serving

and it seems to work properly, but when I open WIT on TensorBoard, the only thing I can see is everything related to the .tfrecord file. I can see the datapoints and edit them and I can also go to Features and see every histogram but I can't run an infer and when WIT is opened, the error described abobe is displayed.

I am using tensorflow 2.2 (rc) and tensorboard 2.1.1 and this is the way I export the COMPAS model:
serving_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec) export_path = classifier.export_saved_model(export_path, serving_input_fn)

I get the saved_model.pb and the variable folder. If I use saved_model_cli to show the model i get what follows:

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['classification']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_example_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 2)
name: head/Tile:0
outputs['scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: head/predictions/probabilities:0
Method name is: tensorflow/serving/classify

signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['examples'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_example_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['all_class_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 2)
name: head/predictions/Tile:0
outputs['all_classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 2)
name: head/predictions/Tile_1:0
outputs['class_ids'] tensor_info:
dtype: DT_INT64
shape: (-1, 1)
name: head/predictions/ExpandDims:0
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 1)
name: head/predictions/str_classes:0
outputs['logistic'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: head/predictions/logistic:0
outputs['logits'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: linear/linear_model/linear/linear_model/linear/linear_model/weighted_sum:0
outputs['probabilities'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: head/predictions/probabilities:0
Method name is: tensorflow/serving/predict

signature_def['regression']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_example_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['outputs'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: head/predictions/logistic:0
Method name is: tensorflow/serving/regress

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: input_example_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
dtype: DT_STRING
shape: (-1, 2)
name: head/Tile:0
outputs['scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: head/predictions/probabilities:0
Method name is: tensorflow/serving/classify

The model has the signatures: predict, classification, regression and serving_default so everything seems to be fine. Right now I dont know what else should I do to make it work, maybe my mistake is the way I create the serving_input_fn or anyother thing so any help would be appreciated.

Thank you for your help!

Is there any way to save the result template?

I tried to use dill to save the result witwidget template but it doesn't work. Is there anyway to save the result and load it back later for further analysis? Thanks

Creating a Web page with custom dataset and predictions

How can I create a Web page, similar to the UCI Census demo, with a custom and predictions (without specifying TF model)?

I've looked into the wit_dashboard, but it is not clear to me what should I modify and how should I specify my data.

ReferenceError: configCallback is not defined

https://colab.research.google.com/github/pair-code/what-if-tool/blob/master/WIT_COMPAS.ipynb

In this notebook, running the "Invoke What-If Tool for test data and the trained models" cell gives the following.

MessageErrorTraceback (most recent call last)

<ipython-input-8-18dbcd24366f> in <module>()
     10 config_builder = WitConfigBuilder(examples[0:num_datapoints]).set_estimator_and_feature_spec(
     11     classifier, feature_spec)
---> 12 WitWidget(config_builder, height=tool_height_in_px)

3 frames

/usr/local/lib/python2.7/dist-packages/witwidget/notebook/colab/wit.pyc in __init__(self, config_builder, height, delay_rendering)
    238 
    239     if not delay_rendering:
--> 240       self.render()
    241 
    242     # Increment the static instance WitWidget index counter

/usr/local/lib/python2.7/dist-packages/witwidget/notebook/colab/wit.pyc in render(self)
    252     # Send the provided config and examples to JS
    253     output.eval_js("""configCallback({config})""".format(
--> 254         config=json.dumps(self.config)))
    255     output.eval_js("""updateExamplesCallback({examples})""".format(
    256         examples=json.dumps(self.examples)))

/usr/local/lib/python2.7/dist-packages/google/colab/output/_js.pyc in eval_js(script, ignore_result)
     37   if ignore_result:
     38     return
---> 39   return _message.read_reply_from_input(request_id)
     40 
     41 

/usr/local/lib/python2.7/dist-packages/google/colab/_message.pyc in read_reply_from_input(message_id, timeout_sec)
    104         reply.get('colab_msg_id') == message_id):
    105       if 'error' in reply:
--> 106         raise MessageError(reply['error'])
    107       return reply.get('data', None)
    108 

MessageError: ReferenceError: configCallback is not defined

Handle large images

Currently WIT sends all features to the front-end for all examples. If the examples contain image features, this means we can't load a ton of examples for that model.

Instead for large features like images, don't send them to the front-end immediately, only send the image feature to the front-end when clicking on an example to view in the datapoint editor.

Can I build a TB plugin from this repo to replace the plugin in TB

If I can, is there any instruction? I did't found anything about this in the repo

Web demos fail to load in Safari or Firefox

Thanks for sharing this awesome work! 👍

This is what folks see:

This is what the console outputs:

So I'm guessing something about the Bazel config, and that it needs to include a polyfill or some other way to load Polymer code. From poking around a bit I think it's in the config of this build command (to use the smiling dataset as an example): https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/interactive_inference/tf_interactive_inference_dashboard/demo/BUILD#L69

From searching around, I didn't find much info building polymer code with bazel outside the googleplex. And reading that build file, it looks like it's pulling in what I'd expect in https://github.com/tensorflow/tensorboard/blob/master/tensorboard/components/tf_imports/BUILD#L16. And it looks like it pulls in external polymer artifacts in https://github.com/tensorflow/tensorboard/blob/d3a6cfd6eb5c0fff4a405b23c5361875adf908f0/third_party/polymer.bzl#L1379. Those must be working right, since the facets demos work fine in other browsers, so I'm guessing it's something about the specific BUILD task within tf_interactive_inference_dashboard/demo/ but not sure what.

Thanks! :)

Matching format for multi input keras model with tfrecord

I am new to tensorflow and I'm quite confused about the format the wit is expecting from the tfrecords to feed the model
Basically I have a multi input keras model with the following signature:

The given SavedModel SignatureDef contains the following input(s):
  inputs['byte_entropy'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 256)
      name: serving_default_byte_entropy:0
  inputs['data_directories'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 30)
      name: serving_default_data_directories:0
  inputs['exports'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 128)
      name: serving_default_exports:0
  inputs['general'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 10)
      name: serving_default_general:0
  inputs['header'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 62)
      name: serving_default_header:0
  inputs['histogram'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 256)
      name: serving_default_histogram:0
  inputs['imports'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1280)
      name: serving_default_imports:0
  inputs['section'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 255)
      name: serving_default_section:0
  inputs['strings'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 104)
      name: serving_default_strings:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['final_output'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

And some tfrecords (records of Examples) with the same features

feature={'histogram': FixedLenFeature(shape=[256], dtype=tf.float32, default_value=None), 
'byte_entropy': FixedLenFeature(shape=[256], dtype=tf.float32, default_value=None), 
'strings': FixedLenFeature(shape=[104], dtype=tf.float32, default_value=None), 
'general': FixedLenFeature(shape=[10], dtype=tf.float32, default_value=None), 
'header': FixedLenFeature(shape=[62], dtype=tf.float32, default_value=None), 
'section': FixedLenFeature(shape=[255], dtype=tf.float32, default_value=None), 
'imports': FixedLenFeature(shape=[1280], dtype=tf.float32, default_value=None), 
'exports': FixedLenFeature(shape=[128], dtype=tf.float32, default_value=None), 
'data_directories': FixedLenFeature(shape=[30], dtype=tf.float32, default_value=None), 
'final_output': FixedLenFeature(shape=(), dtype=tf.float32, default_value=None)}

When trying to show it as a regression I get an invalid argument error

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "input size does not match signature: 1!=9 len({byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}) != len({byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}). Sent extra: {byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}. Missing but required: {byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}."
        debug_error_string = "{"created":"@1588279350.370000000","description":"Error received from peer ipv6:[::1]:8500","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"input size does not match signature: 1!=9 len({byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}) != len({byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}). Sent extra: {byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}. Missing but required: {byte_entropy,data_directories,exports,general,header,histogram,imports,section,strings}.","grpc_status":3}"

Where or how should I specify which tfrecord goes to which input in the model?

And If you are kind do you know how may I alter the model prediction to make it suitable for classification? At the moment the prediction is a number from zero to one and to suit the wit it should be an array with two probabilities. I know this is not a question about wit especially

Use WIT for model trained in tfx

Hi
I trained a model with tfx and it was exported as saved_model.pb.
Now, I want to reload it and visualize it using WIT.
How can I do this?

I couldn't find a way to do it since when reloading the model:
imported = tf.saved_model.load(export_dir=trained_model_path) I get object from the type :
<tensorflow.python.training.tracking.tracking.AutoTrackable at 0x7f3d71e456a0>
instead of an estimator.

Thanks

Automatic slice detection

In performance tab, would be nice to have a button to (in the background) calculate slices with the largest performance disparities and surface those to the user for them to explore.

Currently users have to check slices one by one to look at their performance disparities.

Need to do this in an efficient manner as with intersectional slices this becomes quadratic in scale.