chewing / libchewing Goto Github PK
View Code? Open in Web Editor NEWlibchewing - The intelligent phonetic input method library
Home Page: https://chewing.im/
License: GNU Lesser General Public License v2.1
libchewing - The intelligent phonetic input method library
Home Page: https://chewing.im/
License: GNU Lesser General Public License v2.1
See the following link:
https://groups.google.com/forum/#!msg/chewing-devel/DWF036WlG4s/YgTdaWkWGWMJ
Currently our API and internal function aren't very consistent about the return value. Whether should non-zero result be success or error? We should define the SUCCESS and ERROR symbol instead of return the raw integer.
To prevent the issue like 50101fe, we shall add sanity check for tsi.src. The idea is to collect all possible phones in one word, and compare the phone in phrase to ensure that phrase does not contain the phone that are not belong to particular word.
看來單位是 byte...?
如果 preedit 允許非中文內容的話應該要改一下名稱
See the following code
The out_buf is first populated by LoadChar, then the following loop replaces out_buf data by different phrases. However, we cannot guarantee that the phrase length in bytes is exact the same as what will be replaced. If their length in byte are different, the output buffer will be corrupted.
Chewing 用了好一陣子,一直有個不方便的地方,在輸入中文符號頓號時,找不到產生頓號的方法,情急之下只好改
/usr/share/libchewing3/chewing/swkb.dat
把字母 l 的對應改成 、
不知道有沒有更好的方法呢?
Currently we call high-level API like chewing_handle_CtrlNum in other high-level APIs. This means if we change the high-level API implementation would also affect the other high-level user.
We should create mid-level API to operate on the internal state and dictionary data. These API will also be useful to external users.
Example: select_candidate, add_phrase (note these API are different with the low-level API which operates on data directly. The mid-level API operates on buffers and candidates.)
See test case, the numlock key cannot input symbol when not in select mode. The Numlock shall be able to input symbol when not in select mode.
for example, the correct sequence is "佛坲仏"
0x4e50570 佛
0x4e50578 坲
0x4e50580 仏
0x4e2b228 仏
0x4e2b230 佛
0x4e2b238 坲
0x4e4ed60 佛
0x4e4ed68 坲
0x4e4ed70 仏
0x4e4ed60 佛
0x4e4ed68 坲
0x4e4ed70 仏
Automake version 1:1.13.3-1.1ubuntu2
shows the following warning messages.
src/Makefile.am:2: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS')
src/common/Makefile.am:1: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS')
src/porting_layer/src/Makefile.am:7: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS')
src/tools/Makefile.am:1: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS')
test/Makefile.am:66: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS')
The following is error message when building libchewing.info in FreeBSD.
[ 2%] Generating doc/libchewing.info
/home/czchen/src/chewing/libchewing/doc/libchewing.texi:575: Unknown command `leq'.
/home/czchen/src/chewing/libchewing/doc/libchewing.texi:575: Misplaced {.
/home/czchen/src/chewing/libchewing/doc/libchewing.texi:575: Misplaced }.
/home/czchen/src/chewing/libchewing/doc/libchewing.texi:576: Unknown command `leq'.
/home/czchen/src/chewing/libchewing/doc/libchewing.texi:576: Misplaced {.
/home/czchen/src/chewing/libchewing/doc/libchewing.texi:576: Misplaced }.
makeinfo: Removing output file `/home/czchen/src/chewing/cmake- libchewing/doc/libchewing.info' due to errors; use --force to preserve.
*** [doc/libchewing.info] Error code 1
Index: libchewing/data/Makefile.in
===================================================================
--- libchewing.orig/data/Makefile.in 2013-02-16 11:42:56.028902101 +0800
+++ libchewing/data/Makefile.in 2013-02-16 11:42:56.024902101 +0800
@@ -496,7 +496,7 @@
echo "chewing-definition.h exists."; \
fi
$(MAKE) gendata && \
- touch "timestamp" > $@
+ touch "gendata_stamp" > $@
gendata:
$(tooldir)/sort_word$(EXEEXT) $(top_srcdir)/data/phone.cin
Index: libchewing/data/Makefile.am
===================================================================
--- libchewing.orig/data/Makefile.am 2013-02-16 11:05:23.228839264 +0800
+++ libchewing/data/Makefile.am 2013-02-16 11:43:22.044902827 +0800
@@ -43,7 +43,7 @@
echo "chewing-definition.h exists."; \
fi
$(MAKE) gendata && \
- touch "timestamp" > $@
+ touch "gendata_stamp" > $@
gendata:
$(tooldir)/sort_word$(EXEEXT) $(top_srcdir)/data/phone.cin
When entering "ul " (ㄧㄠ), chewing_zuin_Check()
shall return 1 indicating that the zuin (bopomofo) buffer is empty. However, the return value of chewing_zuin_Check()
in this case is 0.
See code, when preedit buffer contains 1
and user uses down key to select the candidate, libchewing will call HaninSymbolInput() to handle it. However, the state transition of isSymbol when calling HaninSymbolInput() is SYMBOL_CATEGORY_CHOICE > SYMBOL_CHOICE_INSERT. The new symbol will be inserted before 1
instead of updated it.
Currently we can use the following methods to input a symbol:
test/test-symbol.c
.test/test-easy-symbol.c
.test/test-fullshape.c
.test/test-special.c
.u <CB>a<D>1<E>
and u <CB>1<D>1<E>
.I think we need to reduce the way to input the symbol. It is too complicated.
QuickCommit
and bQuickCommit
are everywhere in chewingio.c. They create lots of special cases in different chewing API, which cause maintenance problem.
Currently we could only use environment variable to set the path which should only be used to set the default path. Users will want to set the path after the library has been loaded. We will need
Step to reproduce:
cd test
rm uhash.dat
make valgrind-check
==2524== 20 bytes in 3 blocks are definitely lost in loss record 1 of 2
==2524== at 0x4C2380C: calloc (vg_replace_malloc.c:467)
==2524== by 0x402C5BD: AlcUserPhraseSeq (hash.c:30)
==2524== by 0x402D3C6: HashInsert (hash.c:101)
==2524== by 0x402F125: UserUpdatePhrase (userphrase.c:157)
==2524== by 0x402AD21: AutoLearnPhrase (chewingutil.c:675)
==2524== by 0x4028B59: chewing_handle_Enter (chewingio.c:688)
==2524== by 0x401421: main (testchewing.c:214)
==2524==
==2524== 24 bytes in 3 blocks are definitely lost in loss record 2 of 2
==2524== at 0x4C2380C: calloc (vg_replace_malloc.c:467)
==2524== by 0x402C5CD: AlcUserPhraseSeq (hash.c:31)
==2524== by 0x402D3C6: HashInsert (hash.c:101)
==2524== by 0x402F125: UserUpdatePhrase (userphrase.c:157)
==2524== by 0x402AD21: AutoLearnPhrase (chewingutil.c:675)
==2524== by 0x4028B59: chewing_handle_Enter (chewingio.c:688)
==2524== by 0x401421: main (testchewing.c:214)
Copy from TODO
The following is proposing APIs for user dictionary manipulation.
int chewing_userphrase_enumerate(ChewingContext *ctx)
0
if add is success, -1
otherwise.int chewing_userphrase_has_next(ChewingContext *ctx, size_t *phrase_len, size_t *bopomofo_len)
1
if it has next userphrase, 0
otherwise.1
, phrase_len
and bopomofo_len
store the buffer length needed including null by phrase_buf
and bopomofo_buf
.int chewing_userphrase_get(ChewingContext *ctx, char *phrase_buf, size_t phrase_len, char *bopomofo_buf, size_t bopomofo_len)
phrase_buf
shall be at least phrase_len
. The length of bopomofo_buf
shall be at least bopomofo_len
.0
if add is success, -1
otherwise.int chewing_userphrase_add(ChewingContext *ctx, char *phrase_buf, char *bopomofo_buf)
0
if add is success, -1
otherwise.void chewing_userphrase_remove(ChewingContext *ctx, char *phrase_buf, char *bopomofo_buf)
0
if add is success, -1
otherwise.int chewing_userphrase_lookup(ChewingContext *ctx, char *phrase_buf, char *bopomofo_buf)
1
if userphrase is present, 0
otherwise.sort_dic uses "single word phrase" in tsi.src to check if there is any problem in tsi.src. However, the current tsi.src lacks lots of "single word phrase", thus sort_dic generates lots of error messages when running. We need to merge all word in phone.cin to tsi.src to cease error messages.
I haven't looked at the detail. The test are failing on Sparc.
環境:
freebsd 10-current
xfce-4.10
ibus-1.4.1 (ibus-chewing-1.4.2)
gcin-2.7.8
有套用 THL 的 patch,不過應該無關,因為測試用注音模式亦有此現象
一開始可以選字也可輸出,但是只要有按過 backspace 鍵往前消除的話,字就出不來了,之後也就打不出任何字,要重新執行才行,在 ibus 跟 gcin 都是這樣,ex. 「測」依序按下 ㄘ->ㄜ->ˋ 按完四聲就直接跳掉
試的結果是在 2ca7235 之後有這個現象,之前的話沒有問題,至於其他輸入法架構我就沒實測了,不好意思…
Step to reproduce:
hk4g4<H> 3 1<B>
Expected result:
測
Actual result:
crash
Backtrace shows that we used invalid address in show_interval_buffer (gen_keystroke.c:140)
which might indicate that we got a wrong interval.
Currently state transition is handled by each function which is error prone. The idea is to define two functions, getState() and setState() to handle the state transition like English Mode -> Chinese Mode.
This is a mid-level API that could be useful. Currently we could only select from all candidates from each page but we could enumerate all candidates. To fill the gap we should have this API.
只有Windows上有詞庫編輯工具實在是太不方便了。
可以用PyGTK之類的寫一個嗎?
In https://github.com/chewing/libchewing/blob/master/Makefile.am#L64 we should try to use automake _MANS target. The hard part is that the manpages are generated by Doxygen.
Currently, the chiSymbolBuf
length is 50, and the maximum chiSymbolBufLen
is 49. It means when the 50th character is inputted, libchewing will first store it in chiSymbolBuf[49]
and later auto commit it due to exceed maximum chiSymbolBufLen
. However, the easy symbol L
contains three characters Orz
. In worst case, libchewing will store Orz
in chiSymbolBuf[49]
~ chiSymbolBuf[51]
, which causes buffer overflow.
This unit test is used to test this issue. Please turn it on when the issue is fixed.
Copy from TODO
The FSF has moved to 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA,
so please update it in COPYING.
Current we have two build systems (autotools & cmake), and they use different version system (current/revision/age vs. major/minor/revision). This different causes confused when updating version for API changed.
Since libtool provide -version-number
, which uses the same version system as cmake, we can use it to replace -version-info
so that autotools & cmake can have the same version system.
the chewing.py may be needed by users.
See test case. The user phrase 測是
is added when using Ctrl-2. However, the content of preedit buffer is 測…是
, user phrase 測是
shall not be added.
The default value of candPerPage is 0, which cause divide by zero exception. For example, the following code will crash:
ChewingContext *ctx = chewing_new();
chewing_set_selKey( ctx, SELECT_KEY, ARRAY_SIZE( SELECT_KEY ) );
chewing_set_maxChiSymbolLen( ctx, 16 );
chewing_handle_Default( ctx, '`' );
And the chewing_set_candPerPage API does not check the input, so the following code can will also crash
chewing_set_candPerPage( ctx, 0 );
chewing_handle_Default( ctx, '`' );
We shall set properly default value for every configuration and reject any nonsense configuration from API call.
src/hash.c:249:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
src/hash.c:250:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
src/hash.c:251:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
src/hash.c:252:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
We should fix this and convert the binary format to platform independent format.
See http://bugs.debian.org/608615 for more information.
Copy from TODO
左右括號=〔〕【】《》(){}﹙﹚『』﹛﹜﹝﹞<>≦≧﹤﹥「」
以此為例,「」應當優先於『』,何以順序是相反的?
另外個人認為
≦≧可以移到最後,而另外在數學符號中也放一份。
同理
上下括號=︵︶︷︸︹︺︻︼︽︾〈〉︿﹀∩∪﹁﹂﹃﹄
∩∪也應該出現在數學符號。
Step to reproduce:
cd test && make vcheck
Result:
==21867== HEAP SUMMARY:
==21867== in use at exit: 132 bytes in 6 blocks
==21867== total heap usage: 495 allocs, 489 frees, 112,503 bytes allocated
==21867==
==21867== 16 bytes in 2 blocks are still reachable in loss record 1 of 3
==21867== at 0x4C280A4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21867== by 0x402F351: ReadHashItem_bin (hash.c:256)
==21867== by 0x402FB45: InitHash (hash.c:564)
==21867== by 0x40292B8: chewing_Init (chewingio.c:165)
==21867== by 0x40146F: main (testchewing.c:174)
==21867==
==21867== 20 bytes in 2 blocks are still reachable in loss record 2 of 3
==21867== at 0x4C280A4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21867== by 0x402F3A7: ReadHashItem_bin (hash.c:266)
==21867== by 0x402FB45: InitHash (hash.c:564)
==21867== by 0x40292B8: chewing_Init (chewingio.c:165)
==21867== by 0x40146F: main (testchewing.c:174)
==21867==
==21867== 96 bytes in 2 blocks are still reachable in loss record 3 of 3
==21867== at 0x4C280A4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==21867== by 0x402FAC6: InitHash (hash.c:575)
==21867== by 0x40292B8: chewing_Init (chewingio.c:165)
==21867== by 0x40146F: main (testchewing.c:174)
==21867==
==21867== LEAK SUMMARY:
==21867== definitely lost: 0 bytes in 0 blocks
==21867== indirectly lost: 0 bytes in 0 blocks
==21867== possibly lost: 0 bytes in 0 blocks
==21867== still reachable: 132 bytes in 6 blocks
==21867== suppressed: 0 bytes in 0 blocks
==21867==
==21867== For counts of detected and suppressed errors, rerun with: -v
==21867== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)
cd libchewing-0.3.3/python
python test.py
on my amd64 laptop, it crash with seg fault.
on my i386 server, it output nothing.
both are freebsd os.
is it just not working for me or ?
我使用的作業系統是openSUSE 12.1, 輸入法是 ibus-chewing 1.3.10, ibus-1.4.0, libchewing 0.3.3。
如果暫存區的 最長字數 不是 輸入詞語的字數 的整數倍的話,當暫存區滿出而自動commit時,最前面的一、二字會被漏掉。
例如:暫存區可容納4個中文字,則連續輸入「輸入法輸入法輸入法輸入法...」而不按Enter的話,最後的輸出會變成「法法法法法...」。
我下載了 libchewing 0.3.2 編譯安裝,就沒有這個問題了。
我也檢查過在 ibus-chewing 自動 commit 呼叫 libchewing 中的 chewing_commit_String() 時確實只有傳回「法」而不是「輸入法」。
Report from here.
$ git clone git://github.com/chewing/libchewing.git
$ cd libchewing/
$ cmake .
$ make clean
$ git status -s
D data/pinyin.tab
D data/swkb.dat
D data/symbols.dat
libchewing.texi does not have dircategory and direntry, therefore it
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.