Comments (10)
Ok. So as things stand now, I think it'll be more beneficial to the community to have things in the more readable Markdown format to facilitate reading and contributing. We can think again about converting to YAML if there's a more immediate need in the future.
from nlp-progress.
Thinking out loud:
Assuming that markdown tables can be parsed with something like fsm, we can probably use markdown tables + git logs for plotting and trend spotting.
We could also automate a bot which periodically, say, every 2 weeks - dumps markdown data into more machine readable _data
folder for such usage.
from nlp-progress.
@NirantK
It is nowhere near that simple. Turning Markdown tables to YAMLs required a lot of my manual labour (even with some automatization) - various formats, some formatting mistakes, etc.
Also, for converting tables to YAML I wrote this script:
https://gist.github.com/stared/ec29b1e8d3c99a6288dcc20d77affc93
It requires some manual inspection, as:
- there is some inconsistency with table formats
- there is some misformatting (e.g. no closing
|
) - I manually check if to use
&Author2018
and<<: *Author2018
mappings
from nlp-progress.
Thanks for sharing that script @stared ! Some neat hacks there.
I am hoping that if we enforced a markdown table linter of some sort, this would be slightly less tedious to do. I definitely don't claim that it is simple.
To focus on the issue at hand, I am simply asking if the loss in reader (and contributor) ease of access is worth the gain from visualizations?
from nlp-progress.
Yep, a table linter or better enforcement of style guidelines is something we'd definitely want to do.
So far, I haven't really seen any visualizations that added much value beyond what the tables provide. The progress visualizations at AI metrics are nice, but I don't think they're that helpful if a task doesn't have a clear metric of human performance.
@stared, do you have any thoughts regarding a "killer visualization" that would clearly warrant using YAML files?
from nlp-progress.
Hey @stared - just following up :)
from nlp-progress.
OK, I know it is a matter of taste. Personally for me YAML files are easier to edit than Markdown tables, and are less error-prone (end certainly simpler than Markdown table + enforcing linter). I admit that for others can have different opinions, depending on the background.
With killer features:
- visualization (all markdown scraping will be clunky)
- possibility to add OTHER data (e.g. comments, other fields when they become necessary)
- possibility of copying entries (before there was redundancy and there were errors)
For contributions, I think that the tricky part is to inform where is the
(can be done easily, by adding an automatic link [edit entry in filename]
).
For viewing changes - by pushing to one's own repos, one can see it online.
When it comes to visualizations - true, that for many area (especially if there are only 4 entries or so) it does not provide that much additional information.
from nlp-progress.
While I really like the idea of separating the presentation from the data and storing the data in a dedicated format, the benefits at this point to me seem to be overshadowed by the additional burden placed on the contributor (who might not have used YAML before) and on the reader (who won't be able to view the tables on GitHub).
As at this point the objective should be to get more data (for more tasks and languages) in this repo, these two disadvantages to me outweigh the potential upsides of using YAML.
from nlp-progress.
@sebastianruder should I go ahead and refactor the Hindi and Korean pages to use Markdown?
from nlp-progress.
Yes, let's do that. Thanks!
from nlp-progress.
Related Issues (20)
- How "SOTA" should results be? HOT 2
- SOTA entity linking is based on validation set not test set
- Add FinNLP Section HOT 3
- Hindi and Indian languages resource HOT 1
- NLP Results on code-mixed text HOT 1
- Maybe we should add readability assessment task, too? HOT 2
- Add Text-to-SQL progress (Dialogue) HOT 1
- Did you release dialogue progress? thanks
- For Grammar Error Correction task, why F0.5 is consider for evaluation and not F1? (Giving twice weight to precision than recall) HOT 1
- Add CFF (citation file format) to the repository HOT 1
- Add Dataset for Twitter
- DynaSent: Dynamic Sentiment Analysis Dataset
- English information extraction has incorrect F1 scores
- Language recognition? HOT 5
- Add sentence boundaries disambiguation section
- A Knowledge Graph resource of NLP-progress HOT 7
- NLP Repository
- Regarding the PreCo dataset
- Dependency parsing using NLP for list of words rather than a given sentence
- Tasks are not the right measure anymore
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nlp-progress.