Comments (11)
Hi!
The tests-tatcorpus directory contains just a list of word forms, that I collected from the Corpus of Written Tatar. They are not taken from any dictionary.
This list can be used to:
- see effect of code changes;
- collect words unknown to analyser and add it to .lexc file.
I'd like to keep it there so Ilnar also could use it. If you think it is better to remove it from repository, I will do it immediately.
Thank you!
from apertium-tat.
@IlnarSelimcan, @ftyers, what do you two think? I think using something like this for regression testing is good, but I still have the licensing concern (maybe less than originally) and the size concern.
from apertium-tat.
Size can be reduced 2 times, because not all of those files are necessary: one of them just backup, another can be generated.
from apertium-tat.
This repo is already 360 MiB in size. It's not enough that you delete a file - it's still part of the cloned data. Anything you add is part of the repo's history forever. Those big files should be removed and purged from history with a rewrite.
Of the 145 repos I track, it's in the top 15 size-wise.
from apertium-tat.
Ok, I understand, I will remove those files right now and please help me purging them from history.
from apertium-tat.
I removed the files, but I don't know how to purge them from repo's history. @TinoDidriksen could you, please, help me with that?
from apertium-tat.
@mansayk, which files are you planning on keeping / didn't remove?
from apertium-tat.
Repository trimmed - now down to 54 MiB, which is manageable. Everyone will have to re-clone from scratch. I've taken a backup of the repo before doing the trim, just in case.
from apertium-tat.
@TinoDidriksen thank you so much for your help!
@jonorthwash I will keep that test files locally and I will use it periodically. If I find any regression then I will create an issue(s) + add some new rules to existing tests, ok? If you have a better idea, please, let me know. Thank you.
from apertium-tat.
I think I've found a better solution for this in 6dbcb19 . It seems to work, but improvements are welcome.
from apertium-tat.
One particular thing that should be done is to split the frequency list into many and pass them through tat-morph in parallel (using GNU Parallel tool or something similar).
from apertium-tat.
Related Issues (20)
- "алд" instead of "ал" HOT 9
- асфальтны is not analyzed correctly HOT 16
- "бульдог" is not analyzed in the form "бульдогка" HOT 3
- бульдозер, бульдозерында HOT 1
- бунтарь, бунтарьлар HOT 16
- конъюнктивитны HOT 1
- объективрак HOT 1
- шәфәкъны
- Affixes after quotes HOT 2
- "китаб" instead of "китап" HOT 9
- Loanwords after marking them HOT 4
- Rule conflicts HOT 3
- -RUS tag vs -RUS-BACK and -RUS_FRONT HOT 4
- гыйнвар:январь HOT 2
- поши, пошиең HOT 14
- Add analysis for 'дисәңче'
- Does archaic -мак verb form accept additional affixes
- Unrecognized numerals HOT 2
- Installed modes are missing files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apertium-tat.