Coder Social home page Coder Social logo

libthai's People

Contributors

bact avatar callmeott avatar markbrown avatar rossburton avatar thep avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libthai's Issues

configure complains of syntax error

General build environment from the Ubuntu 18.04.2 repos.
After cloning this repo, running ./autogen.sh and ./configure this occurs:

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking whether ln -s works... yes
checking whether make sets $(MAKE)... (cached) yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... dlltool
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking whether linker supports -version-script... yes
./configure: line 12542: syntax error near unexpected token `$DOXYGEN_VER,ge,DOXYGEN_REQ_VER,'
./configure: line 12542: `    AX_COMPARE_VERSION($DOXYGEN_VER,ge,DOXYGEN_REQ_VER,'

This happens whether doxygen is installed or not.

how to convert between utf-8 and tis620

Hi there,

I am looking through the library, but seeking a C function that's accessible via a header to go from utf-8 to tis620.

Where would I find that in this library?

Thanks.

Error running ./configure

Hello, trying to build libthai, but ./configure ends with this error:

./configure: line 12540: syntax error near unexpected token `$DOXYGEN_VER,ge,DOXYGEN_REQ_VER,'
./configure: line 12540: `    AX_COMPARE_VERSION($DOXYGEN_VER,ge,DOXYGEN_REQ_VER,'

I have tried setting the $DOXYGEN_VER environment variable, but it doesn't help.

New word suggestions

This issue is dedicated for word suggestions to be added to LibThai word break dictionary and will never be closed. Please feel free to suggest words by adding comments. Thank you.

ขออนุญาตใช้งาน libthai กับซอฟต์แวร์ของ บริษัท โค้ดฮาร์ด จำกัด

ผมมาจากบริษัท โค้ดฮาร์ด จำกัด มีความสนใจที่จะนำเอา libthai
ไปใช้กับโปรเจ็คที่เป็น closed-source เพื่อการค้า แต่เนื่องจากสัญญาอนุญาตเป็น LGPL2.1 ทำให้ไม่สามารถใช้งานได้
จึงขอความอนุญาตจากทีมงาน TLWG เป็นกรณีพิเศษเพื่อให้นำไปใช้งานได้ครับ

ขอแสดงความนับถือ
รุ่งวิรุณ โกมลิทธิพงศ์
CTO
CodeHard co., ltd

Error while running ./configure

Getting an error while running configure:

./configure
checking for a BSD-compatible install... /bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking whether ln -s works... yes
checking whether make sets $(MAKE)... (cached) yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /bin/ld
checking if the linker (/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /bin/nm -B
checking the name lister (/bin/nm -B) interface... BSD nm
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... dlltool
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for mt... no
checking if : is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... no
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking whether linker supports -version-script... yes
./configure: line 11922: syntax error near unexpected token $DOXYGEN_VER,ge,DOXYGEN_REQ_VER,' ./configure: line 11922: ` AX_COMPARE_VERSION($DOXYGEN_VER,ge,DOXYGEN_REQ_VER,'

After entering 'ๅ', some key can't be committed properly.

Hello,

I use the libthai and scim-thai to enter Thai. And I found some issues while testing Thai input.
'ๅ' can't be entered consecutively, and if 'ๅ' is entered, there is an issue that some text including 'ๅ' can't be entered.

When the text is entered normally, 1 is returned when calling th_validate() API from scim-thai, but 0 is returned when issues occur.
And the same issue does not occur when entering other text, but only when entering 'ๅ'.
It is difficult to check whether this is working normally or not. I'd appreciate it if you could answer my question.

Regards
Inhong Han

Crashing on Windows

libthai hardcodes the path to thbrk.tri at compile-time, which doesn't work on Windows, since the file is normally installed in a different prefix with whatever program depends on libthai. A workaround is to set LIBTHAI_DICTDIR, but this shouldn't be necessary - the library should be able to resolve the path to thbrk.tri from the libthai-0.dll location automatically, and should handle missing thbrk.tri without crashing.

We're seeing crashes in GIMP due to this problem.

brk_shot_reuse doesn't check the return from realloc()

The code in brk_shot_reuse() doesn't check the return value and directly assigns it to the pointer variable being extended. This is bad for two reasons:

  • the old value of the pointer gets overwritten, making it impossible to free the memory again (= memleak)
  • the following code will just crash

Since the rest of the code handles malloc() failures, I assume this is an oversight.

th_brk / th_brk_find_breaks are limited to ≤ 2Gi characters

The th_brk() and th_brk_find_breaks() functions take the input string size as size_t, but return the results in an int array, effectively limiting possible results to the first INT_MAX characters. Users of this function must therefore ensure that either the input never exceeds 2Gi characters, or find a way to loop over the function in chunks of no more than INT_MAX characters, which is far from obvious whether it's possible or how it should be done (at least to this developer, who doesn't have a clue about libthai but needs to fix 64-bit issues).

Suggestion: widen the result array to size_t, document how to loop for existing versions.

Lots of th_brk_new() calls

Hi, I am currently writing a program using Pango, which uses libthai internally. While debugging my own program, I noticed that a lot of time is spent in the function trie_new_from_file(), which is called from th_brk_new(). This function is called every time a Pango layout is rendered, which leads to a significant slowdown.

I think the reason is here:
https://github.com/tlwg/libthai/blob/master/src/thbrk/thbrk.c#L342

The variable is_tried is checked, but it is never getting set. Is it possible that someone forgot to add an is_tried = true;?

portable win32 installation issues

Hello,

I'm working on getting libthai together with win32 Strawberry Perl, and there's a problem I'd like to ask how to address best.
The problem is that the Strawberry installation can be put in any path, and this conflicts with libthai looking for thbrk.tri in a path determined during the build (not counting the LIBTHAI_DICTDIR). I wonder if you will be willing to discuss ways to adapt libthai so its binaries could be also installed in arbitrary paths?

A crude temporary hack I made specifically for libthai win32 portable install that I roll out, is to use win32-specific ways to find out the install path of libthai.dll, f ex c:/usr/bin, and install thbrk.tri there as well. This won't work for static builds, and I'm also not proud of installing a non-executable file in $PREFIX/bin, so I'm not proposing this fix as a patch.

I would ideally see a solution as to embed either the .tri file or its memory dump in an .o so it will be loaded automatically as either .dll or .a file. I can work out a fork project if this gets your blessing. Anyway if you have interest in this let's discuss.

Regards
Dmitry

Some issues reported by Coverity Scan

Here are some reports from Coverity Scan for libthai-0.1.28.

1. Defect type: GCC_ANALYZER_WARNING
1. libthai-0.1.28/src/thbrk/brk-maximal.c:0: scope_hint: In function 'best_brk_new'
2. libthai-0.1.28/src/thbrk/brk-maximal.c:642:5: warning[-Wanalyzer-malloc-leak]: leak of '<unknown>'
#   640|   
#   641|   exit1:
#   642|->     free (best_brk);
#   643|       return NULL;
#   644|   }
2. Defect type: GCC_ANALYZER_WARNING
1. libthai-0.1.28/src/thbrk/brk-maximal.c:598:16: warning[-Wanalyzer-null-dereference]: dereference of NULL 'node'
18. libthai-0.1.28/src/thbrk/brk-maximal.c:36: included_from: Included from here.
20. libthai-0.1.28/src/thbrk/thbrk-utils.h:32:46: note: in definition of macro 'UNLIKELY'
22. libthai-0.1.28/src/thbrk/thbrk-utils.h:32:46: note: in definition of macro 'UNLIKELY'
26. libthai-0.1.28/src/thbrk/brk-maximal.c:36: included_from: Included from here.
28. libthai-0.1.28/src/thbrk/thbrk-utils.h:32:46: note: in definition of macro 'UNLIKELY'
51. libthai-0.1.28/src/thbrk/brk-maximal.c:31: included_from: Included from here.
#   596|   brk_pool_add (BrkPool *pool, BrkPool *node)
#   597|   {
#   598|->     node->next = pool;
#   599|       return node;
#   600|   }
3. Defect type: GCC_ANALYZER_WARNING
1. libthai-0.1.28/src/thbrk/brk-maximal.c:0: scope_hint: In function 'brk_recover_try'
2. libthai-0.1.28/src/thbrk/brk-maximal.c:598:16: warning[-Wanalyzer-malloc-leak]: leak of '<unknown>'
19. libthai-0.1.28/src/thbrk/brk-maximal.c:36: included_from: Included from here.
21. libthai-0.1.28/src/thbrk/thbrk-utils.h:32:46: note: in definition of macro 'UNLIKELY'
23. libthai-0.1.28/src/thbrk/thbrk-utils.h:32:46: note: in definition of macro 'UNLIKELY'
29. libthai-0.1.28/src/thbrk/brk-maximal.c:36: included_from: Included from here.
31. libthai-0.1.28/src/thbrk/thbrk-utils.h:32:46: note: in definition of macro 'UNLIKELY'
53. libthai-0.1.28/src/thbrk/brk-maximal.c:31: included_from: Included from here.
#   596|   brk_pool_add (BrkPool *pool, BrkPool *node)
#   597|   {
#   598|->     node->next = pool;
#   599|       return node;
#   600|   }

Does the above defect mean some issue or just false alert?

Support for other/external word breaking backends?

I am working on porting the neural network based word cutting tool deepcut to C++ and using it as a word breaking backend in libthai.
I have successfully integrated, built and tested it, but the binary size went up from 50KB to over 20 MB, so it can't be built-in.
Is there any plan for supporting external backends? I can do it if you want.
Alternatively, I can create a new project specifically for my backend, using this one as upstream.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.