Comments (10)
Hey @sebastianruder @Hrant-Khachatrian ,
We maintain datasets and tools for several languages e.g. Korean, Arabic at awesome-nlp
To avoid duplication of effort, we can either
- Add the State of the Art results there (we have a research section, which we'll remove shortly and instead point here)
- Or we can move the entire non-English content to this repository and reorganize as you would like
- Other options?
Let's avoid duplication either way, what do you think?
from nlp-progress.
Hey, I totally agree. The most important thing for me is not to duplicate effort. I've been following awesome-nlp and really like the collection of tools, particularly across languages.
For me personally, knowing about a dataset without links to papers or results on that dataset, however, hasn't been that useful.
For that reason, what would make the most sense for me would be to add non-English datasets and results to this repo. I'd love to add you as collaborators/maintainers to this repo if that doesn't seem like too much additional work.
from nlp-progress.
Yes! Including other languages is definitely on the road map!
I'm not sure what's the best way to include them at the moment. We could break out the results first per task and then per language. Breaking everything out per language first might make it hard to find things. Maybe with some better visualization (which we're working on at the moment), including other languages would be easier. Do you have any suggestions?
from nlp-progress.
Sure, happy to contribute.
We don't maintain results/papers for the libraries and datasets at awesome-nlp yet.
I will start by adding different language datasets, and we can add results as and when we find them. Does that sound good?
How do we handle libraries or tools?
In parallel, I'll add a link to nlpprogress.com to every language that is migrated here.
from nlp-progress.
Guys, we have results and data sets for polish nlp, where should we post them, this is for NER and for Language Modeling. How about we add a headline to each English pages as the most advanced language with links to other languages?
I will propose a change in a second.
from nlp-progress.
Let's have the discussion what is the best format for adding other languages here. @PiotrCzapla summarized this well in #105:
Either we have a file for each task linking to the task in each language:
- language_modeling.md
- ar_language_modeling.md
- pl_language_modeling.md
Or we'd have a file for each language linking to each task in the language:
- ar.md
- language modeling
- sentiment analysis
I think this mainly depends on what is the preferred way that people will look up tasks / results. Personally, I think the preferred setting is to look for one's own language and then at the tasks in that language. As long as we don't have too many tasks and languages yet, we could thus simply extend the main README.md
file with additional languages and the corresponding tasks, e.g.
- Arabic
- Language modeling
- Sentiment analysis
- etc.
- English
- Language modeling
- Sentiment analysis
- etc.
What do you think?
from nlp-progress.
This sounds good to me!
Since most libraries are around POS tagging, tokenizers etc. in several languages, will start there over coming weekend.
from nlp-progress.
from nlp-progress.
Yep, good point. Let's keep English on the top.
@NirantK, given your great work on awesome-nlp
and other NLP projects, would you like me to add you as a collaborator to the repo?
from nlp-progress.
Sure, happy to help.
Even then, I'll start by raising a few PR's so that we can workout any accidentally left out details.
from nlp-progress.
Related Issues (20)
- How "SOTA" should results be? HOT 2
- SOTA entity linking is based on validation set not test set
- Add FinNLP Section HOT 3
- Hindi and Indian languages resource HOT 1
- NLP Results on code-mixed text HOT 1
- Maybe we should add readability assessment task, too? HOT 2
- Add Text-to-SQL progress (Dialogue) HOT 1
- Did you release dialogue progress? thanks
- For Grammar Error Correction task, why F0.5 is consider for evaluation and not F1? (Giving twice weight to precision than recall) HOT 1
- Add CFF (citation file format) to the repository HOT 1
- Add Dataset for Twitter
- DynaSent: Dynamic Sentiment Analysis Dataset
- English information extraction has incorrect F1 scores
- Language recognition? HOT 5
- Add sentence boundaries disambiguation section
- A Knowledge Graph resource of NLP-progress HOT 7
- NLP Repository
- Regarding the PreCo dataset
- Dependency parsing using NLP for list of words rather than a given sentence
- Tasks are not the right measure anymore
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nlp-progress.