Coder Social home page Coder Social logo

irish-word-frequency's People

Contributors

michmech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

irish-word-frequency's Issues

bigrams?

Hi,
I'm interested in the script / methodology used to construct this list.

Specifically, 'coinne' comes up quite high in the frequency list, but I imagine that's because of it's use in phrases such as 'i gcoinne' (against), 'gan choinne' (unexpectedly) & 'os coinne' (in front of/opposite).

From a language learning pov, I'd like to learn these phraselets separately, so my idea is to allow bigrams alongside high frequency words. E.g. given the corpus frequency for 'coinne' as 8507, maybe the above 3 phrases have (say) frequencies of 4000, 3000, and 1000, in which case, they would appear in the top 6,500 list and bump the plain 'coinne' version off the list (which would now have a frequency score 507 after subtracting the bigram frequency).

Is the source code for how this list was created available?

With thanks!

téarmach

You should not have replied to the other issues you are just encouraging me :D

I came across this one this week.

téarmach, a1. Terminal. is in https://www.teanglann.ie/ but nothing in https://www.focloir.ie/

I search in the corpas http://corpas.focloir.ie/ reveals only téarma and ghearrtéarmach

I guess the answer is téarmaí (terms) has been incorrectly lemmatized to the adjective rather than the noun.

proper name removal?

Was wondering why 'dobhar' was appearing so high up in the list and after puzzling over the dictionary entries on focloir & teanglann, I remembered that Gaoth Dobhair would likely be a common Gaeltacht placename mentioned in the source texts. Just want to mention it as an issue if others' use this repository and add a query as to whether proper names were correctly identified? (I know Gaillimh is in the list and kept capitalized which is fine)

`cál` too high up?

Sorry just wanted to register a further issue although I know this is an old repository.
I'm wondering why cál is so high up the list as 'kale/cabbage' doesn't seem to merit such a high position.

Anyhow probably time I dived into creating a similar word frequency list myself from the source texts as then I'll be able to investigate myself!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.