Coder Social home page Coder Social logo

askplatypus / wikidata-simplequestions Goto Github PK

View Code? Open in Web Editor NEW
80.0 80.0 18.0 41.73 MB

Mapping of the SimpleQuestions dataset to Wikidata

License: Other

Jupyter Notebook 74.58% Python 25.42%
benchmark freebase question-answering wikidata

wikidata-simplequestions's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wikidata-simplequestions's Issues

Data with String Label Version

Dear authors,

Thank you very much for your work.
Do you have a version of the data with explicitly the string label of the entity and property?
Like this:
Alex Golfis \t place of birth \t Athens \t what city was alex golfis born in
Instead of this:
Q16330302 \t P19 \t Q1524 \t what city was alex golfis born in

Thank you for your attention.

'answerable' questions are not answerable

The files ending with "_answerable" contain only triples that are also in Wikidata.

In the first few lines in annotated_wd_data_test_answerable.txt, there are several issues:

  1. 'Which genre of album is harder.....faster?': different result (rock music vs. classic rock)
  2. 'what city was alex golfis born in': fine
  3. 'what film is by the writer phil hay?': would be fine, but the triple is incorrect in wikidata
  4. 'Which equestrian was born in dublin?': There is no 'place of birth' for Mark Kyle in wikidata.
  5. 'What is a tv action show?': m/01htzx (Action) is mapped to Q11272426 (some church in the Ukraine)
  6. 'what's akbar tandjung's ethnicity': The triple is not part of wikidata.
  7. 'Which Swiss conductor's cause of death is myocardial infarction?': fine
  8. 'where was padraic mcguinness's place of death': fine
  9. 'Who influenced michael mcdowell?': The triple is not part of wikidata.
  10. 'which military was involved in the second battle of fort fisher': The triple is not part of wikidata.

So, in these first ten lines, there are three or four correct entries, five which are not answerable and one where the mapping is incorrect.

Scaling that up would mean that I can trust about 40% of all the 'answerable' examples. That's not a lot and makes the dataset unusable in my opinion.

question on the files ends with 'answerable'

This work is very interesting and helpful.
I have a question is that what's files ends with 'answerable' means?

The files ending with "_full" contain only triples that are also in Wikidata.
And there is no file ending with "_full".

Thanks again, and merry Christmas!

QALD format

Hi,
thanks for this data!
I noticed that annotated_wd_data_test_answerable.txt contains 5621 questions, however qald-format/annotated_wd_data_test.json contains 5721 (jq ".questions[].query.answers" annotated_wd_data_test.json | grep entity) Does the qald-format contain the same data as the *_anwserable.txt files ? Further, the qald data contains multiple answers to the questions (if applicable) but in the *_anwserable.txt files there is always exactly one question (and not always the same as in the qald-format files, e.g what is the film genre for snow falling on cedars? has as answers the entities
Q1054574,Q1257444, Q130232 and Q3072039 in qald-format/annotated_wd_data_test.json (and in wikidata.org) but Q1257444 in annotated_wd_data_test.txt (possibly old data from freebase ?)).
Concerning the qald-format directory: What is the difference between annotated_wd_data_*_full.json and annotated_wd_data_*.json (for instance annotated_wd_data_train_full.json is much much large as annotated_wd_data_train.json, for valid and test it is the opposite.

Wrong entities in the qald-format directory

It seems that the queries in the 'qald-format' directory point to the answer entities instead of the question entities in some cases. Eg., in the first example the query contains the link for Saving Shiloh rather than for Warner Bros.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.