Coder Social home page Coder Social logo

nteract / data-explorer Goto Github PK

View Code? Open in Web Editor NEW
102.0 13.0 12.0 5.67 MB

The Data Explorer is nteract's automatic visualization tool.

Home Page: https://data-explorer.nteract.io/

License: BSD 3-Clause "New" or "Revised" License

TypeScript 63.72% JavaScript 36.19% Shell 0.09%
data-explorer automatic-viz data-viz

data-explorer's Introduction

nteract Data Explorer

Binder Node.js CI CodeQL npm version Commitizen friendly

An automatic data visualization tool.

Explore the documentation.

Data Explorer Examples

Creating Data Explorer

Read Elijah Meeks's post on designing the data explorer.

Using the Data Explorer

To use Data Explorer in your project, use the following approach.

yarn add @nteract/data-explorer

Install react and styled-components if you are not already using them.

yarn add react styled-components

The data prop must be a tabular data resource application/vnd.dataresource+json

// Default import complete with right side toolbar
import DataExplorer from "@nteract/data-explorer";

<DataExplorer data={data} />;

Or, with custom Toolbar position:

// Individual components as named imports
import { DataExplorer, Toolbar, Viz } from "@nteract/data-explorer";

<DataExplorer data={data}>
  <Toolbar />
  <Viz />
</DataExplorer>;

// Toolbar is optional
<DataExplorer data={data}>
  <Viz />
</DataExplorer>;

How do I contribute to this repo?

If you are interested in contributing to nteract, please read the contribution guidelines for information on how to set up your nteract repo for development, how to update documentation, and how to submit your code changes for review on GitHub.

data-explorer's People

Contributors

benrussert avatar captainsafia avatar dependabot[bot] avatar emeeks avatar hydrosquall avatar javag97 avatar willingc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-explorer's Issues

Search all columns of rows

In the table view, I'd like to be able to quickly filter rows by some string. An example of this behavior can be found in PowerShell's ogv command:

image

Notice the filter bar at the top?

this is not the same as the filters that are already available in data explorer
image

This is something faster than that. When I search for "foo" I get all the rows that have "foo" in any of the columns.

Is more advanced table filtering possible now or in the future?

I am new to the data-explorer and I noticed you can filter it by values in columns, like here:
Screenshot from 2020-04-16 15-04-51

But is it also possible to filter all values within a range, so not just speed_observation=10, but something like speed_observation=[5,25]. This would be a very, very helpful feature, google colab's data_table has it.

Also, it would be great if the data explorer table could get other advanced filtering options like there are for qgrid? - filter by a range of dates, but why not also by a list of strings, or a list of numerics, etc.
This is how filtering looks for grid. Thanks

Quality of Life updates to repo

  • Update default branch to main
  • Update README and move old install instructions out to a legacy file
  • Add CoC doc
  • Add a license
  • Move to GH Actions for testing
  • Add a Change Log
  • Add a Contributing file
  • Triage open issues
  • Triage open PRs
  • Do a dependency update/review
  • Add code quality for CI

Prepare 8.3.0 release

For next release:

  • Update dependencies
  • Move to GH Actions
  • Update Changelog
  • Tag and release

Enable Basic Sampling

With large datasets, prompt user to enable some set of standard sampling options with messaging about how they should do sampling better. So that for instance if you try to render a million points on the scatterplot it doesn't just say "Sorry, no" it says "Here we can show 50,000 of these points using one of the following three built-in sampling options (or you could sample the data yourself in a more effective way)".

Testing: Unit tests for individual components

Presently, only the root DataExplorer is tested ( see https://github.com/nteract/data-explorer/blob/main/__tests__/index.spec.tsx ).

It will be easier to catch / guard against component specific issues components below the level of the root DataExplorer if we tested specific visuals / components (Plot Picker, etc) individually.

Based on recent bugs, I think it would be valuable to implement basic tests for

  • Grid (table) - (e.g. for #65 )
  • Scatterplot (e.g. for #23)
  • Other visualization types as topics arise.

This exercise will also help with writing component specific documentation.

Once this is done in a basic form for a few components, we'll have a good pattern in place that should be easy for new/first time contributors to mimic/add to.

Data Explorer breaks when dataframe cell has complex data in it

Repro: run the following in a cell

import pandas as pd
pd.set_option("display.html.table_schema", True)

class Cmd:
    def __init__(self, name, params):
        self.name = name
        self.params = params
    def __repr__(self):
        return f'Cmd(name={self.name}, params={self.params})'

cell_payload = [
    Cmd(name='foo', params={'bar', 'baz'}),
    Cmd(name='foo', params={'bar', 'baz'})
]
pd.DataFrame({'param_session': [cell_payload]})

Then the following error appears (with a link to this error page, which mentions that the error was Objects are not valid as a React child (found: object with keys {name}). If you meant to render a collection of children, use an array instead.)
image

For reference, this is how Pandas would normally render the cell, when setting pd.set_option("display.html.table_schema", False)
image

Finally, here's what the output looks like in the ipynb file when the error occurs

            "application/vnd.dataresource+json": {
              "schema": {
                "fields": [
                  {
                    "name": "index",
                    "type": "integer"
                  },
                  {
                    "name": "param_session",
                    "type": "string"
                  }
                ],
                "primaryKey": [
                  "index"
                ],
                "pandas_version": "0.20.0"
              },
              "data": [
                {
                  "index": 0,
                  "param_session": [
                    {
                      "name": "foo"
                    },
                    {
                      "name": "foo"
                    }
                  ]
                }
              ]
            }
          },

Bump d3-scale to 4.0.0

When we decide to drop support for node < 14, we should be able to bump d3-scale with no issues.

[feature] Permit parent page to modify the behavior of the "show filters" button

Motivation

When embedding the data-explorer in a page that uses URL params like the Datasette project (see here ), using the "filters" bar can clobber state that was set by the parent page. It would be useful to be able to let embedders of this component disable this behavior.

Recommendation

  • Add a boolean prop disableFilterControls that defaults to false, which hides the show/hide filter button.
  • (Optional, probably more complex if the component isn't already maintaining internal state): Add a boolean prop disableSetUrlParams that defaults to false, which enables state to be saved internally to the component, doesn't modify the parent page's URL parameters.

Installation issue

Hello,

You're saying to go to the application folder here :

cd applications/jupyter-extension
pip install -e .
jupyter serverextension enable nteract_on_jupyter

but where is this folder ?

If this is the Applications folder from root, I don't have anything in it.

Thanks for your help :)

Data-explorer WITHIN jupyterlab, not jupyter nteract

I really like the data-explorer and would like to install it to work WITHIN jupyterlab, not jupyter nteract. Apparently, this is possible, as I see in this post, or just see image below:
BgSjI
But I find no installation instructions for integration into jupyterlab. Can you provide some? It seems that other people are interested in this functionality, too. Thanks

[bug] Data-explorer will break if provided dataframe contains a column called `none`

Motivation

Proposed solution

  • Use a column name that isn't a string, or a very strange string (perhaps containing DATA_EXPLORER_NONE_DIM and special characters inside) to prevent
  • Code that checks for the literal string none should check for this special string sequence instead.

Todo

Guidance around persisting Data Explorer metadata

Using the Data Explorer component, the onMetadataChange prop is called when the user changes the selected UI configuration of the component (such as by switching the chart type). It appears that the nteract notebook UI persists this metadata in a dx key in root of the output's metadata within the notebook file.

notebook.cells[0].outputs[0].metadata.dx

My question is: Is this the recommended place to store the Data Explorer metadata, or would it be better to store the metadata under the MIME type considering it applies to only that part of the output?

notebook.cells[0].outputs[0].metadata["application/vnd.dataresource+json"].dx

I don't know if there is any practical need for this, but it seemed like something to consider after seeing the following example.

https://nbformat.readthedocs.io/en/latest/format_description.html#display-data

"metadata" : {
  "image/png": {
    "width": 640,
    "height": 480,
  },
},

[BUG] Can't render the data when the field values are indexed with numbers in the data object

Issue:
When the data in outputs are indexed with numbers, data-explorer can't render the output correctly.

        "application/vnd.dataresource+json": {
          "schema": {
            "fields": [
              {
                "name": "name"
              },
              {
                "name": "type"
              },
              {
                "name": "note"
              }
            ]
          },
          "data": [
            {
              "0": "aa",
              "1": "bb",
              "2": "cc"
            }
          ]
        }
      }

This is how it is rendered:
image

Here is an example of the notebook. You can use this to reproduce the issue:

{
  "metadata": {
    "kernelspec": {
      "name": "SQL",
      "display_name": "SQL",
      "language": "sql"
    },
    "language_info": {
      "name": "sql",
      "version": ""
    }
  },
  "nbformat_minor": 2,
  "nbformat": 4,
  "cells": [
    {
      "cell_type": "code",
      "source": [
        "test"
      ],
      "metadata": {
        "azdata_cell_guid": "286d911b-c759-489c-b7f8-5490479dbddd",
        "language": "sql",
        "tags": []
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "metadata": {},
          "execution_count": 4,
          "data": {
            "application/vnd.dataresource+json": {
              "schema": {
                "fields": [
                  {
                    "name": "name"
                  },
                  {
                    "name": "type"
                  },
                  {
                    "name": "note"
                  }
                ]
              },
              "data": [
                {
                  "0": "aa",
                  "1": "bb",
                  "2": "cc"
                }
              ]
            }
          }
        }
      ],
      "execution_count": 4
    }
  ]
}

Note that if the data object uses the filed names, it will work correctly.

		"data": [
		  {
			"name": "aa",
			"type": "bb",
			"note": "cc"
		  }

Failure States for Chart Views

Right now the views are suppressed if dimensions and metrics are not available to enable those views (for instance, Network Viz is not available unless you have two dims) instead it should show a message to the user saying "You can't display a chart like this unless your data has x,y & z"

Likewise, there should be upper limits on dataset size to not enable a view unless someone explicitly asks for it (like giant network charts).

Data Explorer: Reduce package size

Is your feature request related to a problem? Please describe.
The bundled size of Data Explorer is currently 1.5MB, which is too large to be a reasonable component pulled in by other libraries.

Describe the solution you'd like
Need to reduce the size of the package. Open to any suggestions for scoping the requirements down.

Describe alternatives you've considered

Data Explorer: Scatterplot size ignored in Firefox

Application or Package Used
Data Explorer

Describe the bug
Selecting a size for points in the Scatterplot chart type has no effect in Firefox. Works fine in Chrome.

To Reproduce
Steps to reproduce the behavior:

  1. Go here to load the "Happiness" example notebook
  2. Select the Scatterplot chart type
  3. Select something in the Size dropdown
  4. See the points do not resize in Firefox

Expected behavior
The points should resize in Firefox. It should look pretty close to how it looks in Chrome.

Screenshot - Firefox (not correct)
image

Screenshot - Chrome (correct)
image

Data Explorer: Multiple configurations

Is your feature request related to a problem? Please describe.
Some users want to be able to display multiple configurations of the Data Explorer for the same dataset. Currently, this requires outputting the DataFrame multiple times (in the same cell or multiple cells). Unfortunately, this approach duplicates the DataFrame schema/data, which bloats the notebook file. For larger datasets, this can cause the browser to struggle and significantly increases the load time of the notebook.

Describe the solution you'd like
One lightweight solution would be to allow multiple configurations of the Data Explorer in a single execution output by keeping track of an array of metadata configurations instead of a single configuration.

{
  metadata: {
    dx: { view: "bar" }
  }
}

becomes

{
  metadata: {
    dx: [
      { view: "bar" },
      { view: "line" }
    ]
  }
}

When there are multiple configurations, the UI would render multiple instances of the Data Explorer component instead of just one. Each instance would be passed the corresponding metadata along with the original the data/schema. Presumably they would be layed out vertically similar to what happens when you output the same DataFrame multiple times, though some treatment could be applied to separate them visually.

In terms of how the user is able to provide multiple configurations, this could be achieved through UI controls to add/remove and re-order configurations. You could also simplify such that this is only allowed via the programmatic configuration (see nteract/nteract#4377), especially as an early milestone.

I imagine the Data Explorer component that exists today wouldn't need to change much, but an additional wrapping component would be introduced. This could be done in user land, but we would need to align on the metadata standard.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.