Coder Social home page Coder Social logo

google-research-datasets / query-wellformedness Goto Github PK

View Code? Open in Web Editor NEW
83.0 7.0 12.0 512 KB

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

nlp nlp-machine-learning search-engine deep-neural-networks deep-learning information-retrieval

query-wellformedness's Introduction

Query-wellformedness Dataset

25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.

http://goo.gl/language/query-wellformedness

Description

Google's query wellformedness dataset was created by crowdsourcing well-formedness annotations for 25,100 queries from the Paralex corpus. Every query was annotated by five raters each with 1/0 rating of whether or not the query is well-formed. For further details please read our paper: Identifying Well-formed Natural Language Questions

For each query we provide the average of the 5 binary judgements as the wellformedness score for the query. Following are some examples of queries present in the dataset:

Query Wellformedness rating
Which form of government is still in place in greece ? 1.0
Population of owls just in north america ? 0.0
Is johnny depp a celtic fan ? 0.8
Where did Roald Dahl live in his teenaged years ? 0.6

The dataset is divided into three files: train.tsv, dev.tsv and test.tsv each containing rated queries. The size of the files is as follows:

File No. of queries
train.tsv 17,500
dev.tsv 3,750
test.tsv 3,850

Examples

The examples in each file are tab separated containing the following columns:

Column Content
1 The European Union includes how many ?
2 0.2

Reference

If you use or discuss this dataset in your work, please cite our paper:

@InProceedings{FaruquiDas2018,
  title = {{Identifying Well-formed Natural Language Questions}},
  author = {Faruqui, Manaal and Das, Dipanjan},
  booktitle = {Proc. of EMNLP},
  year = {2018}
}

License

Query-wellformedness dataset is licensed under CC BY-SA 4.0. Any third party content or data is provided β€œAs Is” without any warranty, express or implied.

Contact

If you have a technical question regarding the dataset or publication, please create an issue in this repository.

query-wellformedness's People

Contributors

daphnelg avatar mfaruqui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

query-wellformedness's Issues

Training a model using this dataset

I want to train a model using this dataset so that when I feed it with grammatically incorrect questions it should rate it. How can create such model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.