A collection of free Chinese Mandarin dictionaries, for use with dictionary software such as Goldendict.
Name | Notes | Todo | Priority | |
---|---|---|---|---|
1. | Chinese Word Frequencies | based on word corpora and with HSK levels | Update to HSK 3.0 Rewrite explanations, add colour | |
2. | Make Me a Hanzi | Animations and Descriptions | ✓ | |
3. | Idioms / Chengyu | ?from academia / BCC idiom dictionary / Chinese name/dict corpus / Hydcd / other - English translations? | ? | Low |
4. | CC-Cedict | with enlarged characters | ✓ | |
5. | Handedict | with English machine-translations from German (for use with words not found in cc-cedict) | ✓ | |
6. | Pinyin-to-Chinese / dictionary | with Zhuyin, Pinyin and IPA, English "sounds like" (use FSI/wiki?), add GPL audio | To complete | |
7. | Unihan character dictionary (字典) | ✓ | ||
8. | Phrase dictionaries | tatoeba CUV Bible | ✓ For revision | |
9. | Chinese Idioms - W Scarborough | OCR scan | To convert | Low |
10. | Hanziyuan image library | To download | ||
11. | Taiwan Ministry of Education Dictionary (moedict) 教育部國語辭典 - 重編國語辭典 修訂本 | ?upload other formats | ||
12. | BCC 英汉词典 - BCC English-Chinese Wordlist | With spelling corrections | ✓ | |
13. | XDICT 英汉词典 English-Chinese dictionary | ✓ | ||
14. | Unihan Radical Dictionary | ✓ | ||
15. | Guoxuedashi (国学大师) Character Dictionary | ✓ | ||
16. | Chinese Lexicon | ✓ | ||
17. | CJKVI Decomposition dictionary | ✓ | ||
18. | Adso Chinese English | ✓ | ||
19. | Starling etymology | ? | Low | |
20. | Sinica etymology | ? | Low | |
21. | Kanjinetworks - Etymological Dictionary of Han/Chinese Characters | ✓ | ||
22. | LDC Chinese-English Wordlist | ✓ | ||
23. | Guoxuedashi (国学大师) Idiom Dictionaries and ?others | To search | ||
24. | WFG dictionaries | ✓ | ||
25. | Taiwan Ministry of Education Dictionary of Idioms 教育部《成語典》 | ✓ | ||
26. | 數字輸入法 Chinese Input Methods | ✓ | ||
27. | FSung fonts | ✓ | ||
28. | Zdic.net (Chinese Dict Corpus) | ✓ | ||
29. | Chinese Characters / Word org - dump, see blog | To enquire | Low | |
30. | Xiaoxue image library / other | ? | ||
31. | Chinese-English names | ✓ | ||
32. | Dictionaryphile | To search | ||
33. | Tidy files | |||
34. | Update this readme | Add Simpl./Trad./both tags Add Chinese<->English tags to table Add section examples / char. animations Separate section - by WFG Add Unicode font check - c.f. WFG |
About 說明 / 说明
The CC-CEDICT dictionary, with enlarged Chinese characters for ease of reading, and without a small handful of obscene terms or definitions that otherwise do not belong in a dictionary.
Licence 許可證 / 许可证
Creative Commons BY-SA 3.0
Original Files 資料來源
https://www.mdbg.net/chinese/dictionary?page=cc-cedict
About 說明 / 说明
A machine translation of HanDeDict into English (by DeepL translate). Intended to accompany cc-cedict, so terms already present in cc-cedict were omitted, along with a lot of numerical terms (e.g. definitions for 1, 2, 3... 10,000... 10,001...) and a small amount of profanity.
Licence 許可證 / 许可证
Creative Commons BY-SA 2.0
Original Files 資料來源
https://github.com/gugray/HanDeDict
"Make Me a Hanzi provides dictionary and graphical data for over 9000 of the most common simplified and traditional Chinese characters" (skishore).
Memo:
Add descrAdd note re overlay prob on goldendict if used with cc-cedict
Licence 許可證 / 许可证
GNU LESSER GENERAL PUBLIC LICENSE, Version 3, 29 June 2007
Original Files 資料來源
https://github.com/skishore/makemeahanzi
About 說明 / 说明
...
Details
Heading | No. of entries* | Years | |
---|---|---|---|
1a | Character freq. (Books): | 9,932 | 1911-2003 |
1b | Word freq. (Books): | 76,002 | 1911-2003 |
2a | Character freq. (Movies): | 3,360 | < 2010 |
2b | Word freq. (Movies): | 69,004 | < 2010 |
3 | Word freq. (Mixed Print): | 24,669 | ~1991 |
4 | Char freq. (Usenet): | 5,083 | |
5 | Word freq. (Internet): | 50,000 | 2006 |
6 | Word freq. (Newswire): | 4,945 | 1990-2002 |
7 | HSK Levels: | 5,000 | 2010 |
8 | Pinyin ratios: | 5,000 | 2010 |
Notes:
Notes:
*English, Russian, numeral and punctuation characters removed from references [3] & [5].
Corpora [3],[4] and [5] have been re-ranked in order of frequencies, taking into account joint rankings. Where two entries have the same prevalence, they are both ranked e.g. “≥2124”, with the next entry ranked as “2126”.
Licence 許可證 / 许可证
HTML licenced under CC BY-NC 4.0 Licence
, data according to licences below: Sources of word frequency data and their licences:
Heading | Corpus reference | Corpus licence | Word list Source | Word list licence | |
---|---|---|---|---|---|
1 | ”Chinese Word Frequencies”: this dictionary | CC BY-NC 4.0 Licence | https://github.com/lxs602/Chinese-Mandarin-Dictionaries | ||
2 | Character/Word freq. (Books) | Da, Jun. 2004. Chinese text computing. http://lingua.mtsu.edu/chinese-computing | https://lingua.mtsu.edu/chinese-computing/copyright.html | Chinese Lexicon, by Peter Olson. https://github.com/peterolson/chinese-lexicon/tree/master/statistics | As for corpus |
3 | Character/Word freq. (Movies) | Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles. Plos ONE, 5(6), e10729. https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexch/overview.htm | Creative Commons Attribution Licence | Chinese Lexicon, by Peter Olson. (See above) | As for corpus |
4 | Word freq. (Mixed Print) | Graff, David, and Ke Chen. Chinese Gigaword LDC2003T09. Web Download. Philadelphia: Linguistic Data Consortium, 2003. https://catalog.ldc.upenn.edu/LDC2003T09 | LDC User Agreement for Non-Members https://catalog.ldc.upenn.edu/license/ldc-non-members-agreement.pdf | http://corpus.leeds.ac.uk/frqc/giga-zh.num | corpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode |
5 | Character freq. (Usenet) | kFrequency field in UniHan database, Unicode version: 11.0.0 | https://www.unicode.org/license.html | ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip | As for corpus |
6 | Word freq. (Internet) | Sharoff, S. (2006) Creating general-purpose corpora using automated search engine queries. In Marco Baroni and Silvia Bernardini, (eds), WaCky! Working papers on the Web as Corpus. Gedit, Bologna, http://wackybook.sslmit.unibo.it | Creative Commons Attribution-NoDerivs 2.5 License | http://corpus.leeds.ac.uk/internet/i-zh.num | corpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode |
7 | Word freq. (News) | McEnery, A. M. and Xiao, R. Z. (2003) The Lancaster Corpus of Mandarin Chinese. European Language Resources Association / Oxford Text Archive, Paris, France / Oxford, UK. | The Lancaster Corpus of Mandarin Chinese End User License https://www.lancaster.ac.uk/fass/projects/corpus/LCMC/lcmc/lcmc_license.htm | http://corpus.leeds.ac.uk/frqc/lcmc.num | corpus.leeds.ac.uk/list.html “The lists are distributed under the Creative Commons (CC BY) Attribution license”: https://creativecommons.org/licenses/by/2.5/legalcode |
8 | HSK Levels | http://www.chinesetest.cn/userfiles/file/HSK/HSK-2012.xls | |||
9 | Pinyin ratios | kHanyuPinlu field in UniHan database, Unicode version: 11.0.0 | https://www.unicode.org/license.html | ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip, from: Chinese Lexicon, by Peter Olson. (See above) | As for corpus |
Note:
Assembled from the freely available BCC corpus dictionary, with 42,784 terms. The original file had some spelling errors, and though it has been proof-read in English, some may remain. Contains some uncommon words and national variants, so would be a good accompaniment to other dictionaries.
资料来源 Original Files
http://bcc.blcu.edu.cn/downloads/resources/%E8%8B%B1%E6%B1%89%E8%AF%8D%E5%85%B8.zip
About 說明 / 说明
XDict, the free English to Chinese dictionary, originally developed by Fu Jianjun, with about 177,000 terms.
Licence 許可證 / 许可证
XDICT是一个freeware,大致按照GPL传播.
Original Files 資料來源
http://archive.ubuntu.com/ubuntu/pool/universe/d/dict-xdict/dict-xdict_0.1-4.2_all.deb
About 說明 / 说明
A Chinese-English dictionary, from the ADSO project by David Lancashire. Source file derived from the Speaking English Dictionary, by Warren S. Goff, which also appears to include entries from the LDC wordlist.
Using the adso translation project application itself is recommended over using this particular dictionary, as it has as-you-type translation of phrases, similar to Google or Bing Translate. An online version is hosted at Popup Chinese.
Licence 許可證 / 许可证
Adsotrans Attribution-NonCommercial License 1.1
Original Files 資料來源
https://github.com/wtanaka/adso
About 說明 / 说明
A character etymology dictionary, derived from Chinese Lexicon by Peter Olsen (dong-chinese.com). Contains decomposition data, helpful images of iconographs and short definitions from CC-CEDICT. Total of 5054 terms.
Licence 許可證 / 许可证
Freely available
Original Files 資料來源
https://github.com/peterolson/chinese-lexicon
About 說明 / 说明
A Chinese-English vocabulary, from sentences submitted to tatoeba.org. Dowloaded December 2020, with 47969 phrases. “Tatoeba is a collection of sentences and translations. It's collaborative, open, free and even addictive.” (from the tatoeba website)
NB. To list all the sentences with audio, search for the term ‘audio’.
Chinese words segmented using jieba. Thanks also to "Generating Anki decks with audio from the Tatoeba Project", accessed December 2020.
Licence 許可證 / 许可证
CC BY 2.0 FR
About 說明 / 说明
...
About 說明 / 说明
A compilation of freely avaialble Chinese input method codes, as listed in this table or here.
See also:
吳語臺語字輸入法 Wu and Minnan
亞洲(日韓越泰)輸入法辞书 East Asian (JKVT)
Example: 析
文字 Character(s): | 析 |
字頻序 Character Freq. (Big5) | 1207 |
字形 Glyph / 雜項 Other | |
三角編號 3 Corners | 492700 |
行列10 Array10 | 4893 |
行列30 Array30 | vo |
表形码 Biaoxing | mjt |
嘸蝦米 Boshiamy | tki |
縱橫碼 CKC | 4092 |
全拼形導碼 Daomax | mjt |
筆結基因 DNA | 362765 |
E碼漢字 E-code | yfs |
輕鬆大詞庫 EZ Big | d1 |
華象直覺 HS pictograph | YLT |
晶晶碼 Jin Jin | mjjt |
晶数码 Jin Shu | 487751 |
冰蟾全息 QXM | mjda |
萬國蝦米 Uni Liu | tki |
晚風 Wan Feng | xim |
海峰五笔 Wubi 98 | sr |
王林快码 WLKM | uf;p |
象形王碼 Wang Ma 2 | yft |
拼音 Pinyin | |
港式廣東話 CantonHK Pinyin | sik |
帶調粵語拼音 Jyupting | cik1 |
廣東拼音 Jyupting ILE | tsik7 |
正體拼音 Pinyin | xi |
雙拼加加 Shuangpin++ | xi |
以拼音为基 Pinyin-based | |
二笔快版 Er Bi - Kuai | xxej |
小鹤音形 Flypy | xim |
T9 | hspnpphs |
自然码 Zi Ran Ma | xim |
注音 Zhuyin | |
正體注音 Bopomofo (official) | vu |
全字庫注音 CNS Phonetic lite | vu |
臺語注音 Taiwanese | "vu,e" |
粵語注音 Zyujam | hud4 |
以笔顺为基 Stroke-based | |
筆順碼 Bsm | 983 |
大易二碼 Dayi2 | i1 |
大易三碼 Dayi3 | ih1 |
大易三碼 Dayi3 patched | IHE1 |
大易四碼 Dayi4 | ihe1 |
龍飛 Dragonfly | ihe1 |
六碼筆畫 G6 Code | 123312 |
筆畫數 Strokes | 804 |
筆順五碼 Stroke5 | "m/,.," |
郑码 Zheng Ma | fpd |
以倉頡为基 Cangjie-based | |
倉頡第三代 Cangjie 3 | dhml |
倉頡第五代 Cangjie 5 | dhml |
仓颉 Chan Jei | dhml |
微倉三 Changjei3 | dhml |
自由大新 FreeNewCJ | dhl |
正體簡易 SimpleCJ | dl |
簡易五代 Simplex5 | dl |
快倉七代 Speedy Cangjie 7 | dhl |
鯨魚 MyCJ Whale | dhl |
字符集 Character Encodings | |
大五碼 Big5p | aa52 |
資訊交換碼 CCCII | 21442b |
標準交換碼 CNS11643 | 14e35 |
四角号码 Four Corner | 4292 |
電信碼 Telecode | 2649 |
統一漢字 Unicode | 6790 |
資料來源 Original Files:
https://github.com/chinese-opendesktop/cin-tables
https://github.com/openvanilla/openvanilla/tree/master/DataTables
Licences 許可證 / 许可证
數字輸入法 Chinese Input Methods
吳語臺語字輸入法 Wu and Minnan
亞洲(日韓越泰)輸入法辞书 East Asian (JKVT)
About 說明 / 说明
Contains the complete range of characters from the Unihan project, and selected properties, for language learning. Pictures of characters(as SVGs, from Glyphwiki.org) are included for those that might not be displayed in the font, being less common or new.
Unihan version: 13.0 (2020-02-18), with the 9 new characters added from Unihan 14.0 (2a6de, 2a6df, 2b735, 2b736, 2b737, 2b738, 9ffd, 9ffe, 9fff). Thus contains all characters from Unihan 14.0 (up to and including ext. G).
Example: 隷|U+FA2F, shown in both in the font and as an SVG picture:
Acknowledgements 鳴謝 / 鸣谢
Unicode code point to character look-up performed using: https://r12a.github.io/app-conversion/
Licences 許可證 / 许可证
Unihan: Unicode licence, Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
Glyphwiki:freely available
About 說明 / 说明
Description pending.
Please note that Japanese meanings are given.
Licence 許可證 / 许可证
Free distribution.
About 說明 / 说明
Shows the decomposition of characters into their constituent parts, e.g. '亭' as '⿱⿳亠口冖丁', or '乷' as '⿱沙乙'. As graphical breakdowns, these do not indicate how the character was first formed.
For further information, see the readme.
Acknowledgements 鳴謝 / 鸣谢
Based on sequences by the CHISE project, hosted at cjkvi-ids as part of the Kanji Database. This dictionary uses data modified to remove as many entity references as possible, at cjkvi-ids-unicode, by Transfusion.
Licences 許可證 / 许可证
Chise IDS: No licence, all rights reserved by chise.org, see https://gitlab.chise.org/CHISE/ids .
Interpretations and Hanyo Denshi analysis: GPLv2, see https://github.com/cjkvi/cjkvi-ids and http://kanji-database.sourceforge.net
About 說明 / 说明
This dictionary was produced from the free release by the Taiwanese Ministry of Education, and first released in 2015. Total entries: 163, 085. Compiled and with HTML design by WFG. This file is a mirror of the original version, hosted here for those searching the web in English, and as a mirror (with thanks to shawkynasr).
Authorisation / 授權 (from the compiler's webpage):
Version 版本
2015, revised 10th Oct 2020
Licence 許可證 / 许可证
Creative Commons NonCommercial 3.0 Unported Licence (No derivatives)
Original Files 資料來源
https://language.moe.gov.tw/001/Upload/Files/site_content/M0001/respub/index.html
http://fgwang.blogspot.com/2018/02/blog-post_14.html
About 說明 / 说明
An index of more than 106,000 variant characters by the Taiwanese Ministry of Education, compiled by WFG. These files are hosted here for those searching the web in English, and as a mirror.
Please note: From the compiler's webpage:
'Since the "Dictionary of Variants of the Ministry of Education" has not been authorized as openly as the "Mandarin Dictionary", I cannot open the complete offline dictionary for everyone to use (infringement will be involved). The compromise method is that I discard all the definitions and leave only the prefixes, and make a "Ministry of Education Variant Character Index Dictionary", so that after checking the words with "component retrieval", you can use this index dictionary to look up the words and check them. After that, click on the font size link to automatically jump to the official page.' (Google-translate)
Licence 許可證 / 许可证
Non-Commercial use only (see compiler's webpage)
Creative Commons NonCommercial 3.0 Unported Licence (No derivatives)
About 說明 / 说明
A new input method by WFG, available as a webpage for offline use, with a dictionary module also available. These files are hosted here for those searching the web in English, and as a mirror.
See also the FSung font by WFG, coding approximately 170,000 characters (including Unicode ext. A-G.) which may be needed.
Licence 許可證 / 许可证
Non-Commercial use only (see compiler's webpage)
Creative Commons NonCommercial 3.0 Unported Licence (No derivatives)
A Taiwanese character dictionary, with; pinyin and Zhuyin, stroke order, radicals, Cangjie input, and CNS 11463 codes. Compiled and HTML designed by WFG. This file is hosted here for those searching the web in English, and as a mirror.
Original Files 資料來源
http://fgwang.blogspot.com/2020/07/blog-post_3.html
Licence 許可證 / 许可证
Non-Commercial use only
A 7th Century Tang dynasty dictionary ('Character Book for Seeking an Official Emolument' ) of 800 characters, for students of the imperial examination, by 顏元孫 Yan Yuansun. Compiled and HTML designed by WFG. This file is hosted here for those searching the web in English, and as a mirror.
Original Files 資料來源
http://fgwang.blogspot.com/2019/04/blog-post.html
Licence 許可證 / 许可证
Non-Commercial use only
A character dictionary compiled by order of the Kangxi emperor of the Qing dynasty in AD 1710, with 214 radicals forming the basis of modern radical dictionaries. Compiled and HTML designed by WFG. This file is hosted here for those searching the web in English, and as a mirror.
This is a large dictionary. One of the files has been split into three (康熙字典.mdd.zip, 康熙字典.mdd.z01, 康熙字典.mdd.z02), which must be opened together in e.g. Winzip / other, so that they can be recombined.
Original Files 資料來源
http://fgwang.blogspot.com/2018/12/blog-post_10.html
Licence 許可證
CC BY-SA 3.0
The 2nd Century Han character dictionary, by 許慎 Xu Shen. Compiled and HTML designed by WFG. This file is hosted here for those searching the web in English, and as a mirror.
This is another large dictionary. One of the files has been split into three (說文解字.mdd.zip, 說文解字.mdd.z01, 說文解字.mdd.z02), which must be opened together in e.g. Winzip / other, so that they can be recombined.
Original Files 資料來源
http://fgwang.blogspot.com/2019/02/blog-post.html
Licence 許可證 / 许可证
Non-Commercial use only
"Hanyu Da Cidian" is a large dictionary of Chinese and Chinese texts compiled by more than 300 scholars, from 1979 to 1993. This word list, with a total of 692,661 terms, gives the headwords of all 12 volumes of the 1st edition. It does not have definitions. The full dictionary, with definitions, is available in print and electronically; see the Wikipedia page for details (accessed 2022-Feb-04).
Original Files 資料來源
https://github.com/cjkvi/cjkvi-dict/blob/master/hydcd-word.txt.
Licence 許可證
GPL v2
A collection of Chinese family and personal names (with gender), and English (and Anglicised spellings of names from other languages) translated into Chinese.See Readme file for further details.
Original Files 資料來源
Adapted from Chinese Names Corpus by Wainshine / ltccss
Licence 许可证 / 許可證
Apache 2.0
A public dictionary from Zdic.net in Simplified Chinese (downloaded via zd9999.com/ci).
Original Files 資料來源
Converted from Chinese Xinhua by pwxcoo
Licence 许可证 / 許可證
May be Public Domain, see Licence for details.
說明 / 说明 About
串珠聖經和合本 (Concordance)
例如查"企望",就會列出所有這個字原文對應聖經和合本翻譯的字及其經節出處,和英文翻譯 (World English Bible - British English / 國王詹姆斯版本 King James Version) 。欽定版聖經於 (KJV) 1611 年出版。建議 WEB-BE,因為它更簡單。
串珠圣经和合本 (Concordance)
例如查"企望",就会列出所有这个字原文对应圣经和合本翻译的字及其经节出处,和英文翻译 (World English Bible - British English / 国王詹姆斯版本 King James Version) 。钦定版圣经于 (KJV) 1611 年出版。建议 WEB-BE,因为它更简单。
A searchable concordance for the Chinese Union Version (CUV) Bible, with an English translation from the World English Bible - British English or the King James Version.
For example, searching for "企望" (hope) will show all verses with this word, and the matching English translation.
This dictionary was made to be a resource for learning English/Chinese, as the Bible is free in both languages, and has a very large amount of Chinese-English vocabulary available. It may be particularly helpful to those already familiar with parts of the text, through studying a passage in the corresponding language.
Additionally, for anyone interested primarily in studying God's word, though in English it has many concordances, there is perhaps only one in Chinese, for the New Testament only.
發展 / 发展 Compilation
中文分词利用pywordseg (ELMo) 系統未經校核親自的。請經文錯誤回報給開發者。
中文分词利用pywordseg (ELMo) 系统未经校核亲自的。请经文错误回报给开发者。
Chinese words have been segmented automatically, without checking in person. Please report any errors you find.
Word segmentation was with pywordseg (ELMo) (https://github.com/voidism/pywordseg), using CC-CEDICT dictionary and a Chinese word list of Bible names and places (https://github.com/guoshengkang/Bible-Word-Statistics/tree/master/output_file_tf), then indexed using word_line_concordance_app (https://github.com/lostchristmas0/word_line_concordance_application) by lostchristmas0.
The simplified and traditional versions of the CUV were segmented separately, to avoid errors converting from traditional to simplified, so there may be different mistakes in each version.
文本 Choice of Text
The CUV was chosen as it is widely used, in the public domain, and available already segmented by Strong's numbers. For similar reasons, the WEB-BE is a free, accurate and also readable English version.
The KJV is also free, but is not recommended for those without a very good use of English.
Related Texts
Another project which may be of interest is this Chinese-English comparison Bible by michaelchanwahyan, which has several free English and Chinese versions.
資料來源 Original Files:
https://ebible.org/webbe/
https://www.o-bible.com/
許可證 / 许可证 Licences
CUV Bible: Public Domain
KJV Bible: Crown copyright
World English Bible: Public Domain. "World English Bible" is a trademark of ebible.org; see https://ebible.org/web/copyright.htm
GoldenDict - Dictionary software for Linux, Windows and Mac.
WriteMDict by Zhansilu
Mdict-utils by Liuyug
Peazip by Giorgio Tani