Coder Social home page Coder Social logo

Comments (12)

eternal-sorrow avatar eternal-sorrow commented on June 28, 2024

Another example is ようこそ.

from kakugo.

eternal-sorrow avatar eternal-sorrow commented on June 28, 2024

Also 方 (かた) in the meaning "person".

from kakugo.

blastrock avatar blastrock commented on June 28, 2024

I used the words from JMdict for kakugo. The issue is that that dictionary is HUGE, I had to filter them. I chose to take only words that were "ichi1"

ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)

(source)

And even just that gives a lot of entries. It's an arbitrary choice, but it's hard to find a criteria to keep only words useful for a learner...

from kakugo.

eternal-sorrow avatar eternal-sorrow commented on June 28, 2024

要る is also missing. I think, there is something wrong with your method of filtering the dictionary.

from kakugo.

plhosk avatar plhosk commented on June 28, 2024

Out of curiosity I checked the given words in the latest JMDict english version I could at the source URL above.

ようこそ is listed as "ichi1" so not sure what happened there. Maybe the tags were different in the older JMDict.

The tags for 彼氏 are news2, nf36, spec2 so it doesn't seem to be that popular. nf36 indicates it's in the top 36000 words

方 is ichi1 but the translations are "direction, way"

Tags for 要る are news2, nf27, spec1 so it makes sense why it's missing.

A case could be made for adding words that have spec1 tags and those with with nf01 to nf10+ if they're not in the existing word list.

spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.

from kakugo.

ZiLot34 avatar ZiLot34 commented on June 28, 2024

Hello, I'm posting here as it kind of join this issue of "which data to use"

The Kanji 和 is listed in N3, but it should actually be in N1 according to jisho https://jisho.org/search/%E5%92%8C
The list used as an input (https://www.tanos.co.uk/jlpt/) have this mistake.
I have no idea if there are more of them in this case, as I don't automatically check the JLPT lvl on jisho, but I'll try to be more attentive about it and will report them.

from kakugo.

cshapeshifter avatar cshapeshifter commented on June 28, 2024

@blastrock: Is the code you used to generate the dictionary also open-source? I poked around this repo and your other repos but I couldn't find them. I would really like to adapt it in a fork to generate a new dictionary which includes a lot more vocab. I know you would like to reduce the size as there are many "useless" words, but there are also many that I'm missing. I see the dictionary is in a gzipped sqlite db, but I'm hoping that I don't have to write my own scripts to add more vocab. I would just like to modify the filter you used.

I started out learning kanji and vocab just using Kakugo and I think it's by far the best app out there (thanks so much!). But now I'm attending Japanese classes and I realize that the book they use (いろどり, which is free) requires me to learn a lot of vocab that is missing. A couple of examples just from the current chapter:

  • 再起動 (to restart)
  • 変倍 (different size)
  • 差出人 (sender)

If whatever scripts or code you used to generate the dictionary is open source, I can adapt it and make my own fork. I'd be happy to make pull-requests for any additional features I might also work on for myself (for example, I might add a different heuristic for auto-selecting vocab based on kanji, as the existing one selects 1000s of vocab words once you know a few 100 kanji).

from kakugo.

blastrock avatar blastrock commented on June 28, 2024

The script to generate the dictionary is not open source because it is quite ugly. I don't mind sharing it with a few people though. I'll push it to a private repo and add you to it if you want.

from kakugo.

cshapeshifter avatar cshapeshifter commented on June 28, 2024

@blastrock, it would really be great if you could grant me access to that script. Thank you!

from kakugo.

blastrock avatar blastrock commented on June 28, 2024

Done. The repo is in a poor state, don't hesitate to email me if you have any question.

from kakugo.

cshapeshifter avatar cshapeshifter commented on June 28, 2024

Thank you, I'll report back!

from kakugo.

blastrock avatar blastrock commented on June 28, 2024

I recently worked on this. In the latest release, all ichi1 and news1 words are included. For each word, I included multiple translations (kind of like kanji test). Also, it is now possible to show words that are usually written in kana actually in kanji, like 下さい and many others. This doesn't completely solve this issue, but greatly improves things I think.

from kakugo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.