nteract / data-explorer Goto Github PK

View Code? Open in Web Editor NEW

102.0 13.0 12.0 5.67 MB

The Data Explorer is nteract's automatic visualization tool.

Home Page: https://data-explorer.nteract.io/

License: BSD 3-Clause "New" or "Revised" License

TypeScript 63.72% JavaScript 36.19% Shell 0.09%

data-explorer automatic-viz data-viz

data-explorer's Introduction

nteract Data Explorer

An automatic data visualization tool.

Interactive Documentation

Explore the documentation.

Creating Data Explorer

Read Elijah Meeks's post on designing the data explorer.

Using the Data Explorer

To use Data Explorer in your project, use the following approach.

yarn add @nteract/data-explorer

Install react and styled-components if you are not already using them.

yarn add react styled-components

The data prop must be a tabular data resource application/vnd.dataresource+json

// Default import complete with right side toolbar
import DataExplorer from "@nteract/data-explorer";

<DataExplorer data={data} />;

Or, with custom Toolbar position:

// Individual components as named imports
import { DataExplorer, Toolbar, Viz } from "@nteract/data-explorer";

<DataExplorer data={data}>
  <Toolbar />
  <Viz />
</DataExplorer>;

// Toolbar is optional
<DataExplorer data={data}>
  <Viz />
</DataExplorer>;

How do I contribute to this repo?

If you are interested in contributing to nteract, please read the contribution guidelines for information on how to set up your nteract repo for development, how to update documentation, and how to submit your code changes for review on GitHub.

data-explorer's People

Contributors

Stargazers

Watchers

Forkers

han-tun javag97 datalayer-externals colombod hydrosquall isabella232 jess-x willingc barryt2 shalevy1 codedownio castingworkbook

data-explorer's Issues

Search all columns of rows

In the table view, I'd like to be able to quickly filter rows by some string. An example of this behavior can be found in PowerShell's ogv command:

Notice the filter bar at the top?

this is not the same as the filters that are already available in data explorer

This is something faster than that. When I search for "foo" I get all the rows that have "foo" in any of the columns.

Is more advanced table filtering possible now or in the future?

I am new to the data-explorer and I noticed you can filter it by values in columns, like here:

But is it also possible to filter all values within a range, so not just speed_observation=10, but something like speed_observation=[5,25]. This would be a very, very helpful feature, google colab's data_table has it.

Also, it would be great if the data explorer table could get other advanced filtering options like there are for qgrid? - filter by a range of dates, but why not also by a list of strings, or a list of numerics, etc.
This is how filtering looks for grid. Thanks

Quality of Life updates to repo

Prepare 8.3.0 release

For next release:

Update dependencies
Move to GH Actions
Update Changelog
Tag and release

is it available in Jupyter Lab?

Do you have this package available for Jupyter Lab?

Are there similar packages for Jupyter Lab?

Thank you!

Enable Basic Sampling

With large datasets, prompt user to enable some set of standard sampling options with messaging about how they should do sampling better. So that for instance if you try to render a million points on the scatterplot it doesn't just say "Sorry, no" it says "Here we can show 50,000 of these points using one of the following three built-in sampling options (or you could sample the data yourself in a more effective way)".

Export Data Explorer Chart as PNG/SVG

Expose a couple buttons that export the chart as SVG and PNG.

Data Explorer: Add Viz Control Docs

Add docs for the Data Explorer VizControl component.

Testing: Unit tests for individual components

Presently, only the root DataExplorer is tested ( see https://github.com/nteract/data-explorer/blob/main/__tests__/index.spec.tsx ).

It will be easier to catch / guard against component specific issues components below the level of the root DataExplorer if we tested specific visuals / components (Plot Picker, etc) individually.

Based on recent bugs, I think it would be valuable to implement basic tests for

Grid (table) - (e.g. for #65 )
Scatterplot (e.g. for #23)
Other visualization types as topics arise.

This exercise will also help with writing component specific documentation.

Once this is done in a basic form for a few components, we'll have a good pattern in place that should be easy for new/first time contributors to mimic/add to.

Data Explorer: Add Data Explorer Docs

Add Data Explorer docs.

Data Explorer breaks when dataframe cell has complex data in it

Repro: run the following in a cell

import pandas as pd
pd.set_option("display.html.table_schema", True)

class Cmd:
    def __init__(self, name, params):
        self.name = name
        self.params = params
    def __repr__(self):
        return f'Cmd(name={self.name}, params={self.params})'

cell_payload = [
    Cmd(name='foo', params={'bar', 'baz'}),
    Cmd(name='foo', params={'bar', 'baz'})
]
pd.DataFrame({'param_session': [cell_payload]})

Then the following error appears (with a link to this error page, which mentions that the error was Objects are not valid as a React child (found: object with keys {name}). If you meant to render a collection of children, use an array instead.)

For reference, this is how Pandas would normally render the cell, when setting pd.set_option("display.html.table_schema", False)

Finally, here's what the output looks like in the ipynb file when the error occurs

            "application/vnd.dataresource+json": {
              "schema": {
                "fields": [
                  {
                    "name": "index",
                    "type": "integer"
                  },
                  {
                    "name": "param_session",
                    "type": "string"
                  }
                ],
                "primaryKey": [
                  "index"
                ],
                "pandas_version": "0.20.0"
              },
              "data": [
                {
                  "index": 0,
                  "param_session": [
                    {
                      "name": "foo"
                    },
                    {
                      "name": "foo"
                    }
                  ]
                }
              ]
            }
          },

Bump d3-scale to 4.0.0

When we decide to drop support for node < 14, we should be able to bump d3-scale with no issues.

[feature] Permit parent page to modify the behavior of the "show filters" button

Motivation

When embedding the data-explorer in a page that uses URL params like the Datasette project (see here ), using the "filters" bar can clobber state that was set by the parent page. It would be useful to be able to let embedders of this component disable this behavior.

Recommendation

Add a boolean prop disableFilterControls that defaults to false, which hides the show/hide filter button.
(Optional, probably more complex if the component isn't already maintaining internal state): Add a boolean prop disableSetUrlParams that defaults to false, which enables state to be saved internally to the component, doesn't modify the parent page's URL parameters.

Installation issue

Hello,

You're saying to go to the application folder here :

cd applications/jupyter-extension
pip install -e .
jupyter serverextension enable nteract_on_jupyter

but where is this folder ?

If this is the Applications folder from root, I don't have anything in it.

Thanks for your help :)

Data Explorer: Add Palette Picker Docs

Add docs for the Data Explorer PalettePicker component.

Data-explorer WITHIN jupyterlab, not jupyter nteract

I really like the data-explorer and would like to install it to work WITHIN jupyterlab, not jupyter nteract. Apparently, this is possible, as I see in this post, or just see image below:

But I find no installation instructions for integration into jupyterlab. Can you provide some? It seems that other people are interested in this functionality, too. Thanks

Add trendline to contour plot

And any other scatterplot functionality.

[bug] Data-explorer will break if provided dataframe contains a column called `none`

Motivation

Data-explorer should not behave differently based on column names.
See https://github.com/nteract/data-explorer/pull/71/files#r667406358 for an example

Proposed solution

Use a column name that isn't a string, or a very strange string (perhaps containing DATA_EXPLORER_NONE_DIM and special characters inside) to prevent
Code that checks for the literal string none should check for this special string sequence instead.

Todo

Add Codesandbox (perhaps from forking https://codesandbox.io/s/nteract-data-explorer-issue-71-viz-switching-fix-uees1 ) demonstrating the types of errors that might emerge with columns that have this name

Automatically Describe DFs Going Into Data Explorer

df.describe(include="all") should run and be included as metadata for any dataframe that's being sent to the Data Explorer component.

Guidance around persisting Data Explorer metadata

Using the Data Explorer component, the onMetadataChange prop is called when the user changes the selected UI configuration of the component (such as by switching the chart type). It appears that the nteract notebook UI persists this metadata in a dx key in root of the output's metadata within the notebook file.

notebook.cells[0].outputs[0].metadata.dx

My question is: Is this the recommended place to store the Data Explorer metadata, or would it be better to store the metadata under the MIME type considering it applies to only that part of the output?

notebook.cells[0].outputs[0].metadata["application/vnd.dataresource+json"].dx

I don't know if there is any practical need for this, but it seemed like something to consider after seeing the following example.

https://nbformat.readthedocs.io/en/latest/format_description.html#display-data

"metadata" : {
  "image/png": {
    "width": 640,
    "height": 480,
  },
},

[BUG] Can't render the data when the field values are indexed with numbers in the data object

Issue:
When the data in outputs are indexed with numbers, data-explorer can't render the output correctly.

        "application/vnd.dataresource+json": {
          "schema": {
            "fields": [
              {
                "name": "name"
              },
              {
                "name": "type"
              },
              {
                "name": "note"
              }
            ]
          },
          "data": [
            {
              "0": "aa",
              "1": "bb",
              "2": "cc"
            }
          ]
        }
      }

This is how it is rendered:

Here is an example of the notebook. You can use this to reproduce the issue:

{
  "metadata": {
    "kernelspec": {
      "name": "SQL",
      "display_name": "SQL",
      "language": "sql"
    },
    "language_info": {
      "name": "sql",
      "version": ""
    }
  },
  "nbformat_minor": 2,
  "nbformat": 4,
  "cells": [
    {
      "cell_type": "code",
      "source": [
        "test"
      ],
      "metadata": {
        "azdata_cell_guid": "286d911b-c759-489c-b7f8-5490479dbddd",
        "language": "sql",
        "tags": []
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "metadata": {},
          "execution_count": 4,
          "data": {
            "application/vnd.dataresource+json": {
              "schema": {
                "fields": [
                  {
                    "name": "name"
                  },
                  {
                    "name": "type"
                  },
                  {
                    "name": "note"
                  }
                ]
              },
              "data": [
                {
                  "0": "aa",
                  "1": "bb",
                  "2": "cc"
                }
              ]
            }
          }
        }
      ],
      "execution_count": 4
    }
  ]
}

Note that if the data object uses the filed names, it will work correctly.

		"data": [
		  {
			"name": "aa",
			"type": "bb",
			"note": "cc"
		  }

Failure States for Chart Views

Right now the views are suppressed if dimensions and metrics are not available to enable those views (for instance, Network Viz is not available unless you have two dims) instead it should show a message to the user saying "You can't display a chart like this unless your data has x,y & z"

Likewise, there should be upper limits on dataset size to not enable a view unless someone explicitly asks for it (like giant network charts).

Data Explorer: Reduce package size

Is your feature request related to a problem? Please describe.
The bundled size of Data Explorer is currently 1.5MB, which is too large to be a reasonable component pulled in by other libraries.

Describe the solution you'd like
Need to reduce the size of the package. Open to any suggestions for scoping the requirements down.

Describe alternatives you've considered

Data Explorer: Scatterplot size ignored in Firefox

Application or Package Used
Data Explorer

Describe the bug
Selecting a size for points in the Scatterplot chart type has no effect in Firefox. Works fine in Chrome.

To Reproduce
Steps to reproduce the behavior:

Go here to load the "Happiness" example notebook
Select the Scatterplot chart type
Select something in the Size dropdown
See the points do not resize in Firefox

Expected behavior
The points should resize in Firefox. It should look pretty close to how it looks in Chrome.

Screenshot - Firefox (not correct)

Screenshot - Chrome (correct)

Data Explorer: Multiple configurations

Is your feature request related to a problem? Please describe.
Some users want to be able to display multiple configurations of the Data Explorer for the same dataset. Currently, this requires outputting the DataFrame multiple times (in the same cell or multiple cells). Unfortunately, this approach duplicates the DataFrame schema/data, which bloats the notebook file. For larger datasets, this can cause the browser to struggle and significantly increases the load time of the notebook.

Describe the solution you'd like
One lightweight solution would be to allow multiple configurations of the Data Explorer in a single execution output by keeping track of an array of metadata configurations instead of a single configuration.

{
  metadata: {
    dx: { view: "bar" }
  }
}

becomes

{
  metadata: {
    dx: [
      { view: "bar" },
      { view: "line" }
    ]
  }
}

When there are multiple configurations, the UI would render multiple instances of the Data Explorer component instead of just one. Each instance would be passed the corresponding metadata along with the original the data/schema. Presumably they would be layed out vertically similar to what happens when you output the same DataFrame multiple times, though some treatment could be applied to separate them visually.

In terms of how the user is able to provide multiple configurations, this could be achieved through UI controls to add/remove and re-order configurations. You could also simplify such that this is only allowed via the programmatic configuration (see nteract/nteract#4377), especially as an early milestone.

I imagine the Data Explorer component that exists today wouldn't need to change much, but an additional wrapping component would be introduced. This could be done in user land, but we would need to align on the metadata standard.

Data Explorer on Google Colab

Is there any way to run de data Explorer on a notebook placed on Google Colab??

nteract / data-explorer Goto Github PK

data-explorer's Introduction

nteract Data Explorer

Creating Data Explorer

Using the Data Explorer

How do I contribute to this repo?

data-explorer's People

Contributors

Stargazers

Watchers

Forkers

data-explorer's Issues

Motivation

Recommendation

Motivation

Proposed solution

Todo

Recommend Projects

Recommend Topics

Recommend Org