Coder Social home page Coder Social logo

Comments (12)

skishore avatar skishore commented on June 17, 2024 2

Yes, all of the "matches" decompositions are written in CDL! I started with the Unihan CDL codes but curated them by hand, though, so I think they're more accurate than other sources.

from makemeahanzi.

parsimonhi avatar parsimonhi commented on June 17, 2024 1

The folder for ROC will be named svgsZhTw. A ZhHant character is not always the same as a zhTw character. Too simple otherwise, don't you think :-)

from makemeahanzi.

parsimonhi avatar parsimonhi commented on June 17, 2024 1

Thanks for the link: it looks very interesting.

from makemeahanzi.

skishore avatar skishore commented on June 17, 2024

Thanks for the report. Yes, all the stroke orders that I've computed here are based on PRC stroke order. When I started this project I was unaware of these differences, although I learned some about them through the work.

Do you know of a reference that has a comprehensive list of differences between the various stroke orders? That Wikipedia page only gives the 問 example. I know of the grass radical as well, but I wasn't able to find a complete list by browsing around.

I added a Future Work section to the README with a list of potential improvements. Is this data that you would use if it were available?

from makemeahanzi.

DanielChu avatar DanielChu commented on June 17, 2024

I don't know where there is a complete list of the differences. I think the best way forward might just to adjust the graphics.txt format so in the future people can submit alternate stroke orders for the same character marking which type of order it is (PRC, HK, TW, JP). I think it definitely can be useful since Hong Kong and Taiwan still teaches using the traditional stroke orders.

from makemeahanzi.

skishore avatar skishore commented on June 17, 2024

You may want to check out @parsimohni's work on Japanese stroke order data. They used this dataset as a starting point but did a bunch of extra work to add Japanese characters that don't have corresponding hanzi and to deal with stroke order differences: https://github.com/parsimonhi/animCJK

I don't have the time to do similar work for all the various stroke orders myself, but I hope that others can fill in those gaps. In particular, if I can provide rigorously curated PRC stroke order data, it should be possible to automate producing other types of stroke order data.

from makemeahanzi.

hugolpz avatar hugolpz commented on June 17, 2024

Hey, @skishore, I have some good expertise on this mater. I created and cowrote most of the section relate to Stroke order per polity. I knew well the source back in 2008~10. Also, as of 2018 :

  1. Can your software handle multiple stroke order ? {PRC=default|t|j|k|h} ?
  2. Do you still need a list of radicals with their official stroke order per polity ? (I could compile one within the year)
  3. What are your advancement on this front ?

EDIT: Move to /parsimonhi/animCJK/issues/1

from makemeahanzi.

skishore avatar skishore commented on June 17, 2024

This project only includes PRC stroke order, and I don't have plans to make data for other orders. The animCJK project has t and j orderings for 1k-2k of these characters, though.

There's enough data in this project's output that it should be possible to automate the generation of orderings for other characters. For example, for all characters I have here, I have both the stroke order graphics and a "matches" field that shows how the strokes in a given character map to strokes in its components. If you were to change the stroke order of a component (not necessarily a radical), you could use those "matches" decompositions to automatically infer stroke order changes for all characters using that component.

I used a similar process to produce candidate stroke orders in this project itself, and it sped things up by a lot - for the most part, I just had to go through and do a quick verification of the resulting order.

from makemeahanzi.

hugolpz avatar hugolpz commented on June 17, 2024

Hahahaha. Witty. Do you know of the CCDL's CDL ? They also use heredity/cascading, and they are the source project for Unicode's Unihan shapes. They are cool crazies with 80k characters designed with cascading in mind, from <50 strokes to ~1000 graphic elements to 80k characters. Their description paper is short, 6 pages, and quite cool to read.

Ok, as for the polity, I will check with AnimCJK if they need a crosscheck or support from myself and our other CJK nerds ^^. cc: @parsimonhi

PS: I'am catching up with your projects and efforts, my apologizes for my many questions, but it's for greater good 👍

from makemeahanzi.

parsimonhi avatar parsimonhi commented on June 17, 2024

AnimCJK takes into account that a same unicode can have several glyphs, stroke numbers or stroke orders. The solution is simple: character files are duplicated and modified as necessary. For instance, the character "王" (U+0738b/王) has two different stroke orders. The corresponding file 29579.svg in svgsJa repository (i.e. for Japanese) is not the same as in svgsZhHans (i.e. for simplified Chinese): the second and third strokes are swapped.

At the moment, 2998 Japanese characters are in the svgsJa repository, and 3538 characters are in the svgsZhHans repository. II am working on the Taiwanese version of a set of 4808 frequently-used characters but I am far from completing the task. I didn't consider other character sets at the moment.

I did my best to check the data from various sources. However, I cannot guarantee that aniwCJK is error free.

About the CCDL's CDL, is it multiple language? Or just focused on Chinese? I didn't find the information.

Anyway, stroke order is a difficult issue. There are inconsistencies (or errors?) everywhere. Even the number of stroke is not always well defined. For instance, the radical 阝 has 2 strokes? 3 strokes? and in which character set? Moreover, you cannot always rely on component decomposition to automatically derivate stroke number and stroke order. For instance, in Japanese, sometimes the component 牙 has 5 strokes as in 芽 and sometimes it has 4 strokes as in 穿 (and as in simplified Chinese).

There are also some changes made time to time. For instance, there was a Japanese reform JIS X 0213 made in 2004 (see https://kakijun.jp/main/jis2004.html) which changes a signifiant number of characters (as a result, KanjiVG which is one of the best reference for Japanese is not up-to-date at the moment). So, i don't think that there is a single source up-to-date for all characters and all languages.

from makemeahanzi.

hugolpz avatar hugolpz commented on June 17, 2024
  • (1) AnimCJK stroke order variations storing : by folders. 👍
    • /svgsZhHans/ : PRC (available : 3538)
    • /svgsJa/ : Japan (available : 2998)
    • /svgszhTw/ : ROC (possible in futur, 4808)
  • (2) Naming convention : {html-code-for-character}.svg.
  • (3) Review : done as possible.
  • (4) CDL : is flexible, but I don't know if they have multiple stroke order data. I do believe they have multiple glyph data via a cascading approach, so they only have to currate a few thousands basic characters' CDL data to affect all 80k glyphs.
  • (5) Inconsistencies : One guideline is generally very or absolutely consistent within itself. I once noticed a conflict within the ROC authoritative source, I decided to ignore it, and followed the dominant rule. As for stroke counts, the basic rules is PRC simplification has merge few strokes together. I'am not an expert for Japanese, which is also a simplification, so I cannot tell the stroke count. But you are especially free to follow the most common sensical count, as there is no official standard in Japan, only the consensus and recommendation to follow common sensical stroke order. As for my experience with weird stroke counts, from official sources in TW and CN :
    • hans 阝= 2 str. HPWG-S ; zhTw 阝= 3 str. HP-WG-S; Ja 阝= ? str.
    • hans 廴 : 2 str. HPHP-N ; zhTw 廴 : 3 str. HP-HP-N ; Ja 廴 : ? str.
    • hans 辶 : 3 str. D-HPW-TPN ; zhTw 辶 : 4 str. D-D-HPW-TPN ; Ja 廴 : ? str.
    • 牙/芽/穿 : I was unaware of this issue : 1) It recall me other cases (片 is 4 or 3 strokes), generally changing when squezzed (3) VS has space (4); 2) it could be interesting to check etymology to check if it may be 2 a different radicals. See Kangxi dictionary, Shuowen, or older 穿. Damn, It's indeed 牙 in Qin dynasty's seal script. 😭
    • I must collect the series for that as the talk come up often.
  • (6) Guideline changes : yes, difficult, but the standards are solidified. :)

from makemeahanzi.

hugolpz avatar hugolpz commented on June 17, 2024

(Ping : EDIT done upper.)
Note: 張&張 (2013:pp22-25) cites 32 CN vs TW stroke order variations, and list their cascading impacts.

p20 is interesting (you can read chinese?).

screenshot from 2018-01-22 20-21-58

~ Same shape Diff shape Sum
Same order 2407 709 3,116
Diff order 383 1,309 1,692
Sum 2,790 2,018 4,808

from makemeahanzi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.