Coder Social home page Coder Social logo

explainaboard's Introduction

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Web Tool | API Tool | Download | Paper | Video | Bib



License GitHub stars PyPI Code Style

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Usage

We not only provide a Web-based Interactive Toolkit but also release an API that users can flexible evaluate their systems offline, which means, you can play with ExplainaBoard at following levels:

  • U1: Just playing with it: You can walk around, track NLP progress, understand relative merits of different top-performing systems.
  • U2: We help you analyze your model: You submit your model outputs and deploy them into online ExplainaBoard
  • U3: Do it by yourself: You can process your model outputs by yourself using our API.

API-based Toolkit: Quick Installation

Method 1: Simple installation from PyPI (Python 3 only)

pip install interpret-eval

Method 2: Install from the source and develop locally (Python 3 only)

# Clone current repo
git clone https://github.com/neulab/ExplainaBoard.git
cd ExplainaBoard

# Requirements
pip install -r requirements.txt

# Install the package
python setup.py install

Then, you can run following examples via bash

  interpret-eval --task chunk --systems ./interpret_eval/example/test-conll00.tsv --output out.json

where test-conll00.tsv denotes your system output file whose format depends on different tasks. For each task we have provided one example output file to show how they are formated. The above command will generate a detailed report (saved in out.json) for your input system (test-conll00.tsv). Specifically, following statistics are included:

  • fine-grained performance
  • Confidence interval
  • Error Case

Web-based Toolkit: Quick Learning

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

So far, ExplainaBoard covers following tasks

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Submit Your Results

You can submit your system's output by this form following the format description.

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Currently Covered Systems

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

explainaboard's People

Contributors

jinlanfu avatar pfliu-nlp avatar rooa avatar shuaichenchang avatar tahmid04 avatar yixinl7 avatar yyy-apple avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.