Dear Sebastian, dear NLP-progress Contributors, Thank you for creati

Automatic metadata fetching via API call about nlp-progress HOT 4 OPEN

sebastianruder commented on April 29, 2024 1

Automatic metadata fetching via API call

from nlp-progress.

Comments (4)

sebastianruder commented on April 29, 2024 2

Hey, sorry about my late response. This sounds like a useful idea. I just have two concerns:

Even though arXiv is very popular, not all papers are on arXiv. Many are just available in the refereed proceedings (e.g. ACL Anthology, AAAI), which don't have an API. How would you deal with these?
As far as I can see, anyone who wants to contribute to the repo needs to run gener_yaml.py to produce the full yaml. Is there another way? If not, I think this places too heavy a burden on contributors; I also think having two yaml files (one template and the full version) might get confusing.

from nlp-progress.

lopusz commented on April 29, 2024

Hi Sebastian,

this time I apologize for a slow response. I was off-line for two weeks.

Concerning the first bullet. The script can deal not only with arXiv API, but also with DOI API and Semantics Scholar API. Especially, Semantics Scholar has a huge database. For example for the dependency parsing I could easily fetch metadata for every paper via API. My feeling is that these three APIs will for sure cover >95% of the listed resources. If all data was in YAML format one could easily write a short script checking the coverage.

Of course for the "unAPIzed" papers one can still fallback to entering all the details by hand as it is now.

from nlp-progress.

lopusz commented on April 29, 2024

As far as the second bullet is concerned. That is essentially my question - if and how would you see integrating this in your maintenance workflow? The idea would be that contributors need to enter either arXiv ID, DOI or Semantics Scholar id and the tooling would do the rest. I believe it is worth considering. Your repo will be growing, e.g. with addition of new languages or tasks, so tools improving the consistency of data would definitely increase the overall quality of the NLP progress. For example,
this would mean no more title/link inconsistencies like in #95 or inconsistencies with arxiv.org/abs vs arxiv.org/pdf that are there now.

Another benefit of having full metadata would be that tooling could easily generate the downloadable bibtex file for every task/language which could be helpful for many users.

Best regards,
Michał

from nlp-progress.

sebastianruder commented on April 29, 2024

Hey Michał, I'm really sorry about my late reply. I meant to answer sooner, but somehow this slipped through the cracks.
I really like the idea and would love to offer more functionality on top of this. However, we've decided that we'll stick with storing data in Markdown for now. I'm not sure if this is still compatible with that.

from nlp-progress.

Automatic metadata fetching via API call about nlp-progress HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent