Coder Social home page Coder Social logo

aistairc / rotowire-modified Goto Github PK

View Code? Open in Web Editor NEW
9.0 4.0 2.0 13 KB

:basketball: Script for generating the rotowire-modified dataset (Iso et al; ACL 2019)

Home Page: https://www.aclweb.org/anthology/P19-1202

Python 100.00%
data-to-text acl2019 dataset boxscore-data text-generation natural-language-generation nlp

rotowire-modified's Introduction

Rotowire-modified

Script for generating the rotowire-modified dataset for Learning to Select, Track, and Generate for Data-to-Text (Iso et al; ACL 2019).

Data

This script generates the dataset "rotowire-modified" by extracting required data from the original rotowire dataset. We are not allowed to distribute the dataset itself due to the copyright issue, so we distribute only the script. The new dataset generated by this script is almost the same as, but not identical to the dataset used in the paper (Iso et al; ACL 2019). ACL's dataset contains 14 games that do not appear in the original rotowire and have to be obtained from https://www.rotowire.com and https://www.nba.com. Further information of the 14 games is listed in additional_games.txt. In addition, there are a number of slight differences in data records due to the update of the original pages. For empirical comparison by other researchers, we also distribute the experimental result on this new rotowire-modified dataset at the sports-reporter repo.

Statistic of the datasets

Data Train Validation Test Total
rotowire (Wiseman'2017) 3398 727 728 4853
rotowire-modified (Iso et al; ACL 2019) 2714 534 500 3748
rotowire-modified (This repo) 2705 532 497 3734

Format

Since this script simply removed duplicate records from the original dataset, the data format is the same as that of the original rotowire dataset. Please refer to boxscore-data repo.

Usage

You can download the original dataset (Wiseman'2017) from the boxscore-data repo, and then, transform it as bellow. The script will create a directory "rotowire-modified", which contains train.json, valid.json, and test.json files.

DATA_PATH=<path to the locally downloaded original dataset>
python script/generate_rotowire_modified.py --src_dir $DATA_PATH

rotowire-modified's People

Contributors

isomap avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.