Coder Social home page Coder Social logo

Comments (4)

sebastianruder avatar sebastianruder commented on April 29, 2024 2

Hey, sorry about my late response. This sounds like a useful idea. I just have two concerns:

  • Even though arXiv is very popular, not all papers are on arXiv. Many are just available in the refereed proceedings (e.g. ACL Anthology, AAAI), which don't have an API. How would you deal with these?
  • As far as I can see, anyone who wants to contribute to the repo needs to run gener_yaml.py to produce the full yaml. Is there another way? If not, I think this places too heavy a burden on contributors; I also think having two yaml files (one template and the full version) might get confusing.

from nlp-progress.

lopusz avatar lopusz commented on April 29, 2024

Hi Sebastian,

this time I apologize for a slow response. I was off-line for two weeks.

Concerning the first bullet. The script can deal not only with arXiv API, but also with DOI API and Semantics Scholar API. Especially, Semantics Scholar has a huge database. For example for the dependency parsing I could easily fetch metadata for every paper via API. My feeling is that these three APIs will for sure cover >95% of the listed resources. If all data was in YAML format one could easily write a short script checking the coverage.

Of course for the "unAPIzed" papers one can still fallback to entering all the details by hand as it is now.

from nlp-progress.

lopusz avatar lopusz commented on April 29, 2024

As far as the second bullet is concerned. That is essentially my question - if and how would you see integrating this in your maintenance workflow? The idea would be that contributors need to enter either arXiv ID, DOI or Semantics Scholar id and the tooling would do the rest. I believe it is worth considering. Your repo will be growing, e.g. with addition of new languages or tasks, so tools improving the consistency of data would definitely increase the overall quality of the NLP progress. For example,
this would mean no more title/link inconsistencies like in #95 or inconsistencies with arxiv.org/abs vs arxiv.org/pdf that are there now.

Another benefit of having full metadata would be that tooling could easily generate the downloadable bibtex file for every task/language which could be helpful for many users.

Best regards,
Michał

from nlp-progress.

sebastianruder avatar sebastianruder commented on April 29, 2024

Hey Michał, I'm really sorry about my late reply. I meant to answer sooner, but somehow this slipped through the cracks.
I really like the idea and would love to offer more functionality on top of this. However, we've decided that we'll stick with storing data in Markdown for now. I'm not sure if this is still compatible with that.

from nlp-progress.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.