Coder Social home page Coder Social logo

Comments (11)

mansayk avatar mansayk commented on July 21, 2024

Hi!

The tests-tatcorpus directory contains just a list of word forms, that I collected from the Corpus of Written Tatar. They are not taken from any dictionary.

This list can be used to:

  1. see effect of code changes;
  2. collect words unknown to analyser and add it to .lexc file.

I'd like to keep it there so Ilnar also could use it. If you think it is better to remove it from repository, I will do it immediately.

Thank you!

from apertium-tat.

jonorthwash avatar jonorthwash commented on July 21, 2024

@IlnarSelimcan, @ftyers, what do you two think? I think using something like this for regression testing is good, but I still have the licensing concern (maybe less than originally) and the size concern.

from apertium-tat.

mansayk avatar mansayk commented on July 21, 2024

Size can be reduced 2 times, because not all of those files are necessary: one of them just backup, another can be generated.

from apertium-tat.

TinoDidriksen avatar TinoDidriksen commented on July 21, 2024

This repo is already 360 MiB in size. It's not enough that you delete a file - it's still part of the cloned data. Anything you add is part of the repo's history forever. Those big files should be removed and purged from history with a rewrite.

Of the 145 repos I track, it's in the top 15 size-wise.

from apertium-tat.

mansayk avatar mansayk commented on July 21, 2024

Ok, I understand, I will remove those files right now and please help me purging them from history.

from apertium-tat.

mansayk avatar mansayk commented on July 21, 2024

I removed the files, but I don't know how to purge them from repo's history. @TinoDidriksen could you, please, help me with that?

from apertium-tat.

jonorthwash avatar jonorthwash commented on July 21, 2024

@mansayk, which files are you planning on keeping / didn't remove?

from apertium-tat.

TinoDidriksen avatar TinoDidriksen commented on July 21, 2024

Repository trimmed - now down to 54 MiB, which is manageable. Everyone will have to re-clone from scratch. I've taken a backup of the repo before doing the trim, just in case.

from apertium-tat.

mansayk avatar mansayk commented on July 21, 2024

@TinoDidriksen thank you so much for your help!

@jonorthwash I will keep that test files locally and I will use it periodically. If I find any regression then I will create an issue(s) + add some new rules to existing tests, ok? If you have a better idea, please, let me know. Thank you.

from apertium-tat.

IlnarSelimcan avatar IlnarSelimcan commented on July 21, 2024

I think I've found a better solution for this in 6dbcb19 . It seems to work, but improvements are welcome.

from apertium-tat.

IlnarSelimcan avatar IlnarSelimcan commented on July 21, 2024

One particular thing that should be done is to split the frequency list into many and pass them through tat-morph in parallel (using GNU Parallel tool or something similar).

from apertium-tat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.