Coder Social home page Coder Social logo

mzsanford / cld Goto Github PK

View Code? Open in Web Editor NEW
104.0 104.0 24.0 4.69 MB

Language Detection based on Chromium's Compact Language Detector library

Home Page: http://mzsanford.com/blog/introducing-libcld

License: BSD 3-Clause "New" or "Revised" License

C++ 90.71% C 4.22% Shell 0.33% Objective-C 0.04% CSS 0.02% Java 0.11% JavaScript 0.04% Python 0.85% Ruby 0.13% Batchfile 0.02% HTML 2.22% Makefile 1.33% M4 0.01%

cld's People

Contributors

baroquebobcat avatar dimsua avatar mikemccand avatar mzsanford avatar pavel-lexyr avatar yitzikc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cld's Issues

brew install is broken

brew install https://raw.github.com/mzsanford/homebrew/libcld/Library/Formula/libcld.rb
Error: Non-checksummed download of libcld formula file from an arbitrary URL is unsupported! `brew extract` or `brew create` and `brew tap-new` to create a formula file in a tap on GitHub instead.

so I went with:

wget https://raw.github.com/mzsanford/homebrew/libcld/Library/Formula/libcld.rb
brew install --HEAD -s libcld.rb
Error: libcld: undefined method `md5' for #<Class:0x00007fd4af8f1ef8>

so I removed the md5

brew install --HEAD -s libcld.rb
Warning: Calling bottle :unneeded is deprecated! There is no replacement.
Please report this issue to the danmx/sigil tap (not Homebrew/brew or Homebrew/core):
  /usr/local/Homebrew/Library/Taps/danmx/homebrew-sigil/sigil.rb:10


Error: Failed to load cask: libcld.rb
Cask 'libcld' is unreadable: wrong constant name #<Class:0x00007fee1ea06fe0>
Warning: Treating libcld.rb as a formula.
==> Cloning https://github.com/mzsanford/cld.git
Updating /Users/mgrosser/Library/Caches/Homebrew/libcld--git
==> Checking out branch master
Already on 'master'
Your branch is up to date with 'origin/master'.
HEAD is now at 24586f4 Merge pull request #39 from Ludar-Pavel/master
Error: Your Command Line Tools are too outdated.
Update them from Software Update in System Preferences or run:
  softwareupdate --all --install --force

If that doesn't show you any updates, run:
  sudo rm -rf /Library/Developer/CommandLineTools
  sudo xcode-select --install

Alternatively, manually download them from:
  https://developer.apple.com/download/all/.
You should download the Command Line Tools for Xcode 13.1.

already got the commandline tools and xcode, so doubt that's the problem 🤷

JNI: unable to compile self-contained library, or a wrapper that works in tandem with the C library

I'm trying to build the Java port of cld on Ubuntu 12.04 64-bit, using OpenJDK 6. The shared object file built by Maven contains the JNI bindings, but the CompactLangDet::DetectLanguage symbol is undefined. (Bit of a C++ newbie) I can't tell if the intent is for the Maven-built wrapper to be used alongside the make-built libcld, or whether Maven is supposed to be generating an "all-in-one" library.

I am encountering a symbol lookup error when running CompactLanguageDetector's main method from the command line, with the Maven-generated shared object being loaded as libcld, like so:

ln -s libcld/0.0.1-SNAPSHOT/libcld-0.0.1-SNAPSHOT.so libcld/0.0.1-SNAPSHOT/libcld.so 
java -Djava.library.path=./libcld/0.0.1-SNAPSHOT/ -cp cld/1.0-SNAPSHOT/cld-1.0-SNAPSHOT.jar com.mzsanford.cld.CompactLanguageDetector "octocat"
java: symbol lookup error: ~/.m2/repository/com/mzsanford/cld/libcld/0.0.1-SNAPSHOT/libcld-0.0.1-SNAPSHOT.so: undefined symbol: _ZN14CompactLangDet14DetectLanguageEPKNS_15DetectionTablesEPKcibbbbS4_i8LanguagePS5_PiPdS7_Pb

I'm setting the JAVA_HOME and LD_LIBRARY_PATH variables via an export statement:

export LD_LIBRARY_PATH="/usr/local/lib/cld:/usr/lib" && export JAVA_JOME="/usr/lib/jvm/java-6-openjdk-amd64" && mvn install

Examining the shared object produced by Maven (libcld-0.0.1-SNAPSHOT.so) using nm -gC shows that the CompactLangDet:DetectLanguage symbol is undefined.

U CompactLangDet::DetectLanguage(CompactLangDet::DetectionTables const*, char const*, int, bool, bool, bool, bool, char const*, int, Language, Language*, int*, double*, int*, bool*)

I've also tried writing code that loads the Maven-generated wrapper then libcldfrom /usr/local/lib/cld/ and vice-versa, but it hasn't helped.

Could someone please tell me:
a) is the Maven-generated artifact supposed to be self-contained? (i.e. provides libcld and the JNI wrappers in the same file)
and
b) am I building incorrectly?

Thanks!

Here's the mvn install output for native:compile and native:link for reference:

[INFO] ------------------------------------------------------------------------
[INFO] Building Compact Language Detector - native unix
[INFO]    task-segment: [install]
[INFO] ------------------------------------------------------------------------
[INFO] [native:initialize {execution: default-initialize}]
[INFO] [native:unzipinc {execution: default-unzipinc}]
[INFO] [native:javah {execution: default-javah}]
[INFO] [native:compile {execution: default-compile}]
[INFO] /bin/sh -c cd ~/workspace/tools/investigations/compact-language-detector/ports/java/native/unix && gcc -c -fPIC -DCLD_WINDOWS -Inull/include/linux -I/usr/local/include/cld -I~/workspace/tools/investigations/compact-language-detector/ports/java/native/src/main/native -I/usr/lib/jvm/java-6-openjdk-amd64/jre/../include -I/usr/lib/jvm/java-6-openjdk-amd64/jre/../include/unix -o ~/workspace/tools/investigations/compact-language-detector/ports/java/native/unix/target/objs/com_mzsanford_cld_CompactLanguageDetector.o -c ~/workspace/tools/investigations/compact-language-detector/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp
[INFO] [native:link {execution: default-link}]
[INFO] /bin/sh -c cd ~/workspace/tools/investigations/compact-language-detector/ports/java/native/unix && gcc -shared -lcld -L/usr/local/lib/cld -o ~/workspace/tools/investigations/compact-language-detector/ports/java/native/unix/target/libcld.so target/objs/com_mzsanford_cld_CompactLanguageDetector.o

Python: Refactor into the 'ports' tree

The original Python port needs to be moved into the ports directory and the build updated to link against the shared libcld. It probably needs documentation and test updates as well.

Cannot build with cygwin

Fresh 32-bit cygwin on Windows 7

$ make
CDPATH="${ZSH_VERSION+.}:" && cd . && aclocal-1.14
cd . && automake-1.14 --gnu
configure.ac:5: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated. For more info, see:
configure.ac:5: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation
configure.ac:8: error: required file './compile' not found
configure.ac:8: 'automake --add-missing' can install 'compile'
configure.ac:5: error: required file './missing' not found
configure.ac:5: 'automake --add-missing' can install 'missing'
Makefile.am:28: warning: source file 'encodings/compact_lang_det/cldutil.cc' is in a subdirectory,
Makefile.am:28: but option 'subdir-objects' is disabled
automake-1.14: warning: possible forward-incompatibility.
automake-1.14: At least a source file is in a subdirectory, but the 'subdir-objects'
automake-1.14: automake option hasn't been enabled. For now, the corresponding output
automake-1.14: object file(s) will be placed in the top-level directory. However,
automake-1.14: this behaviour will change in future Automake versions: they will
automake-1.14: unconditionally cause object files to be placed in the same subdirectory
automake-1.14: of the corresponding sources.
automake-1.14: You are advised to start using 'subdir-objects' option throughout your
automake-1.14: project, to avoid future incompatibilities.

broken on mac arm

autoreconf -f -i -Wall,no-obsolete
make
...
./base/build_config.h:93:2: error: Please add support for your architecture in build/build_config.h
#error Please add support for your architecture in build/build_config.h

Compile python

Hi,
Got an error here as will:

sudo python -u setup.py install
running install
running build
running build_ext
building 'cld' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/cld -I/usr/include/python2.7 -c pycldmodule.cc -o build/temp.linux-x86_64-2.7/pycldmodule.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
pycldmodule.cc:9:57: fatal error: encodings/compact_lang_det/compact_lang_det.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1
make: *** [install] Error 1

Any clue?

Compile on Ubuntu

Hi.
Has anyone toyed to compile on Ubuntu? When running ./configure I get configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."

Best

Update README

Incorrect information after the Google Code migration. Needs new build and install information as well as an overview.

Java: Update API

The Java API only supports simple results. A new API is needed for detailed results. Also needed are javadocs and a sane build/install system. This is difficult for JNI but should be possible.

Error when installing python bindings

I was able to build successfully with
source ./build.sh
I verified by running
./example and getting this output

----[ Text (detected: ENGLISH) ]----
confiscation of goods is assigned as the penalty part most of the courts consist of members and when it is necessary to bring public cases before a jury of members two courts combine for the purpose the most important cases of all are brought jurors or
----[ Text (detected: HINDI) ]----
नेपाल एसिया मंज अख मुलुक राजधानी काठमाडौं नेपाल अधिराज्य पेरेग्वाय दक्षिण अमेरिका महाद्वीपे मध् यक्षेत्रे एक देश अस् ति फणीश्वर नाथ रेणु फिजी छु दक्षिण प्रशान् त महासागर मंज अख देश बहामास छु केरेबियन मंज अख मुलुख राजधानी नसौ सम् बद्घ विषय बुरुंडी अफ्रीका महाद्वीपे मध् यक्षेत्रे देश अस् ति सम् बद्घ विषय

However when I try to install the python bindings with:
python -u ports/python/setup.py build
I get this error:

Traceback (most recent call last): File "ports/python/setup.py", line 12, in <module> **pkgconfig('cld')) TypeError: __init__() keywords must be strings

I am using Python 2.7.1 and OS X 10.7.3

Please update /ports/node for compat. with current version

I compared several Language-Detection packages and came to the conclusion that yours might be one of the best for the node community. The only other lib we can rely on is FGRibreau's node-language-detect, which has a similar level of awesomeness and -ok- is in fact updated more often. But I'm pretty sure he has a time machine or something, to get his stuff done. Not fair ;o)

I like yours a little bit more because it's based on Chromium (which is important for me), is >100x faster and is surely more memory efficient than node-language-detect.

I already tried to write my own binding.gyp and to compile it via node-gyp under node v.0.10.x, but without luck.

make: Entering directory `/[...]/v0.10.20/lib/node_modules/cld/build'
  CXX(target) Release/obj.target/languagedetector/languagedetector.o
../languagedetector.cc:213:35: error: ‘eio_req’ has not been declared
../languagedetector.cc:254:30: error: ‘eio_req’ has not been declared
../languagedetector.cc: In static member function ‘static v8::Handle<v8::Value> LanguageDetector::DetectAsync(const v8::Arguments&)’:
../languagedetector.cc:206:26: error: ‘EIO_PRI_DEFAULT’ was not declared in this scope
../languagedetector.cc:206:65: error: ‘eio_custom’ was not declared in this scope
../languagedetector.cc:207:10: error: ‘EV_DEFAULT_UC’ was not declared in this scope
../languagedetector.cc:207:23: error: ‘ev_ref’ was not declared in this scope
../languagedetector.cc: In static member function ‘static void LanguageDetector::EIO_Detect(int*)’:
../languagedetector.cc:215:82: error: request for member ‘data’ in ‘* req’, which is of non-class type ‘int’
../languagedetector.cc: In static member function ‘static int LanguageDetector::EIO_AfterDetect(int*)’:
../languagedetector.cc:257:82: error: request for member ‘data’ in ‘* req’, which is of non-class type ‘int’
../languagedetector.cc:258:14: error: ‘EV_DEFAULT_UC’ was not declared in this scope
../languagedetector.cc:258:27: error: ‘ev_unref’ was not declared in this scope
make: *** [Release/obj.target/languagedetector/languagedetector.o] Fehler 1

So, if you can afford some minutes, you would make me really happy ^-^

I guess the required steps are:

  • migrate from wscript to binding.gyp (see node-gyp)
  • migrate from libeio to libuv (i found this old howto, dunno if it helps...)

Keep up the good work!

jni .so file doesn't work

I'm trying to build the Java port of cld on centos 64-bit , using Virtualbox.
I followed your 'README.md' and Setting was finished.
i made new java project and i move 'cld/ports/java/java/src' to new '/src'.
but it doesn't work.
error is

Exception in thread "main" java.lang.UnsatisfiedLinkError: com.mzsanford.cld.CompactLanguageDetector.detectLanguageDetails(Ljava/lang/String;ZZZZLjava/lang/String;)Lcom/mzsanford/cld/LanguageDetectionResult; at com.mzsanford.cld.CompactLanguageDetector.detectLanguageDetails(Native Method) at com.mzsanford.cld.CompactLanguageDetector.detect(CompactLanguageDetector.java:23) at com.mzsanford.cld.CompactLanguageDetector.main(CompactLanguageDetector.java:41)

what should i do?

Ruby: Update API, Gemify and document

The Ruby port needs to be refactored into a usable gem. This includes more robust build options (check for libcld), rdoc, and a refactored API for better detailed response data.

Import cld fails with missing symbol

Hi,

After installing the library with Homebrew on Mac OS X Mavericks I am getting the following error:

import cld
Traceback (most recent call last):
File "", line 1, in
ImportError: dlopen(/Users/toddysm/anaconda/lib/python2.7/site-packages/cld.so, 2): Symbol not found: __Z12LanguageCode8Language
Referenced from: /Users/toddysm/anaconda/lib/python2.7/site-packages/cld.so
Expected in: dynamic lookup

Any ideas?

Java test won't pass

Hey,
I've installed and built the C++ library, and when I run make check, the test successfully passes.

But when I go to ports/java and run mvn test, the test fails (Compact Language Detector - native unix) and it seems like it's because it can't locate the relevant files.
I guess there could be a simple fix, but I'm not really familiar with C++ and ports :(
If anybody can help, that'd be great :)

The error log:
[INFO] --- native-maven-plugin:1.0-alpha-8:compile (default-compile) @ libcld ---
[INFO] /bin/sh -c cd /usr/local/langTest/cld/ports/java/native/unix && gcc -c -fPIC -DCLD_WINDOWS -I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/include/linux -I/usr/local/include/cld -I/usr/local/langTest/cld/ports/java/native/src/main/native -I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/../include -I/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64/jre/../include/unix -o /usr/local/langTest/cld/ports/java/native/unix/target/objs/com_mzsanford_cld_CompactLanguageDetector.o -c /usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:7:34: error: cld/compact_lang_det.h: No such file or directory
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:8:57: error: cld/encodings/compact_lang_det/ext_lang_enc.h: No such file or directory
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:9:58: error: cld/encodings/compact_lang_det/unittest_data.h: No such file or directory
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:10:46: error: cld/encodings/proto/encodings.pb.h: No such file or directory
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:56: error: ‘Language’ has not been declared
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp: In function ‘_jobject* mzs_new_language_detection_candidates(JNIEnv*, int*, double*, int*)’:
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:62: error: ‘IS_LANGUAGE_UNKNOWN’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:103: error: ‘IS_LANGUAGE_UNKNOWN’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:107: error: ‘LanguageCode’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp: At global scope:
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:118: error: ‘Language’ has not been declared
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:120: error: ‘Language’ has not been declared
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp: In function ‘_jobject* mzs_new_language_detection_result(JNIEnv*, int, bool, int*, double*, int*)’:
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:124: error: ‘LanguageCode’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp: In function ‘_jobject* Java_com_mzsanford_cld_CompactLanguageDetector_detectLanguageDetails(JNIEnv*, _jobject*, _jstring*, jboolean, jboolean, jboolean, jboolean, _jstring*)’:
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:162: error: ‘Language’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:162: error: expected ‘;’ before ‘plus_one’
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:168: error: ‘UNKNOWN_ENCODING’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:169: error: expected ‘;’ before ‘language_hint’
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:172: error: expected ‘;’ before ‘language3’
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:178: error: expected ‘;’ before ‘lang’
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:179: error: ‘lang’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:179: error: ‘CompactLangDet’ has not been declared
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:187: error: ‘language_hint’ was not declared in this scope
/usr/local/langTest/cld/ports/java/native/src/main/native/com_mzsanford_cld_CompactLanguageDetector.cpp:188: error: ‘language3’ was not declared in this scope

Can't configure project

config.status: executing libtool commands
sed: can't read ./ltmain.sh: No such file or directory
sed: can't read libtoolT: No such file or directory

vova@Qosmio ~/Projects/cld $ stat ltmain.sh
File: «ltmain.sh» -> «/usr/local/Cellar/libtool/2.4.6_1/share/libtool/build-aux/ltmain.sh»

/usr/local/Cellar/libtool/2.4.6_1/share/libtool/build-aux/ltmain.sh - No such file

cld.ENCONGINGS

Hello together,

cld works just fine on 10.10 with python3.3
But, when I try to look at the languages with print (cld.LANGUAGES) I get this error:

/Library/Frameworks/Python.framework/Versions/3.3/Resources/Python.app/Contents/MacOS/Python: can't open file 'cldLANGUAGES': [Errno 2] No such file or directory

cld.ENCONDINGS and cld.DETECTED_LANGUAGES does not work neither.

Can anyone just do that real quick for me and copy & paste the result for me.
That would be very nice!

Thanks!

Fix python bindings compilation on Windows

Thanks for the nice repackaging. It still needs some work on Windows though. To compile:

  1. open a cmd window
  2. run vcvarsall.bat (FIXME: why is this commented out in build.win.bat?)
  3. run build.win.bat
  4. FIXME: copy libcld.lib to ports/python/cld.lib
  5. FIXME (biggest one): setup.py fails on Windows

The hackish solution to (5) is to make pkgconfig return (on Windows only) the dict {'define_macros': [('WIN32',None)], 'libraries': packages}
Why the hack is necessary: On windows, pkg-config is not available by default. If you get it, it depends on a dll. If you get that one as well, it doesn't find the cld.pc file. If you copy it over, it returns the nonsensical (for MSVC) value "-I/usr/local/include/cld -L/usr/local/lib/cld -lcld". It also fails to define WIN32, without which the whole thing does not compile.

Do you prefer getting a pull request or making the changes yourself? Thanks.

Bad reference

Line 10 of cld_utf8statetable states

include "util/utf8/utf8statetable.h"

That file does not exist in the project. It does exist in the CLD2 project. This is why I think people are having linking problems. It looks like some of the new source file slipped into your project.

Installation on OS 10.10 with Python 3.3 failed

I want to install the chrome language detection (cld) https://github.com/mzsanford/cld for Python.
I am working on OS X 10.10 with Python 3.3, installed through a download of the .dmg from the Pythonwebsite.

What did I do so far?

  1. Downloaded and updated xCode to the most recent version
  2. Command Line Tools for xCode installed with that line $ xcode-select –install
  3. Downloaded the GNU, C and C++ Compilerhere http://hpc.sourceforge.net and installed via $ sudo tar xvf gcc-4.9-bin.tar -C /
  4. Install the libcld C++ library following the instructions on https://github.com/mzsanford/cld (I installed it from source, not with homebrew)!
  5. Installed the pkg-config package $ git clone git://anongit.freedesktop.org/pkg-config

Then I followed the steps described to install the Pythonbindings:

$ git clone http://github.com/mzsanford/cld.git

$ cd cld/ports/python

$ make install # This will prompt for your password

Which gave me this error message:

python -u setup.py build
Traceback (most recent call last):
File "setup.py", line 17, in <module>
**pkgconfig('cld'))
TypeError: __init__() keywords must be strings
make: *** [build] Error 1`

I also downloaded the source and tried to install it with python3.3 setup.py which gives me this

 pkg-config has not been found but this setup script relies on it.

Well, I installed id, but it seems that it does not find it.
Might that be a problem that it installed it to the Python the system is using (which would 2.7) while I use Python3.3? Or other suggestions?

CLD unable to detect japanese language

CLD is unable to detect Japanese language for the following Text

text = '1/15 HR Div.Q&CS Dept.全体MTG 開催

1月15日(水)、赤溜オーディトリアムにてHR Div.Q&CS Dept.の全体MTGが開催されました。
アジェンダは以下のとおりです。
・Q&CSってそもそも何のための組織だっけ?:夏目通伸さん
・竹市さんより:竹市栄治さん
・製品顧客横断的な動きについて:伊藤秀也さん
・@SUPPORT案件管理について:渡部裕さん

その中から、今回は夏目通伸さんからのお話についてご紹介します。
2014年初めての全体MTGにて、「Q&CSってそもそも何のための組織だっけ?」というタイトルのもと、Q&CSが組織としてやろうとしていること、やるべきことを話されました。 '

Steps to reproduce -->

  1. import cld

  2. code = cld.detect(smart_str(text), pickSummaryLanguage=True, removeWeakMatches=False)

  3. Output
    code = ('ENGLISH', 'en', True, 11, [('ENGLISH', 'en', 100, 0.8103727714748784)])

The text contains Japanese text also. but it is not been detected

Ruby uninitialized constant

It seems that the only real method I can call on the ruby port of CLD at the moment is to return a single ENUM int (0 for English etc.)

When trying to initialise the detector I just get the following:

uninitialized constant CLD::Detector (NameError)

/ports/node *.detectSync() fails too often / returns different results than *.detect()

To be honest, I don't use *.detectSync() at the moment, because i don't understand the parameter skipWeakMatches in detail. But it's elegant (requires us like 1 line of code) and darn fast (50% faster than *.detect() in some cases). So it might be cool to invest a little more devtime into this function.

During my tests i noted that *.detectSync() behaves completely different than *.detect(), i.e. it fails a lot more often. Looks like this isn't a problem with specialchars or sth., but most likely with the string length.

The following table compares the output of

Chars | V1 | V2 | V3

------- | --- | --- | ---
80 | de | de | de
160 | de | de | de
240 | de | de | de
320 | de | de | de
400 | de | false | de
480 | de | false | de
560 | de | false | de
640 | de | false | de
720 | de | de | de
800 | de | false | de
880 | de | false | de
960 | de | false | de
1040 | de | false | de
1120 | de | false | de
1200 | de | false | de
1280 | de | false | de
1360 | de | false | de
1440 | de | de | de
1520 | de | de | de
1600 | de | de | de
1680 | de | de | de
1760 | de | de | de
1840 | de | de | de
1920 | de | de | de
2000 | de | de | de
2080 | de | de | de
2160 | de | de | de
2240 | de | de | de
2320 | de | de | de
2400 | de | de | de
2480 | de | de | de
2560 | de | de | de
2640 | de | de | de
2720 | de | de | de
2800 | de | de | de
2880 | de | de | de
2960 | de | de | de
3040 | de | de | de

Aaaand some hacky code:

var ASYNC = require('async');
var LANGDETECT = new (require('languagedetect'))('iso2');
var CLD = new (require('cld/cld.node').LanguageDetector)();


//str_repeat('Foo',3) => 'FooFooFoo'
function str_repeat(a,b){ for(var c="";;)if(b&1&&(c+=a),b>>=1)a+=a;else break;return c }


//LANGDETECT.detect()
function detect_v1(str,cb)
{ 
  var res = LANGDETECT.detect(str,1); 

  cb(
    null, 
    res.length && res[0][0] ? res[0][0] : false
  ); 
}

//CLD.detectSync()
function detect_v2(str,cb)
{ 
  var res = CLD.detectSync(str);

  cb(
    null, 
    res && res !== 'un' ? res : false
  ); 
}

//CLD.detect()
function detect_v3(str,cb)
{ 
  CLD.detect(str,function(res)
  { 
    cb(
      null, 
      res && res.languageCode && res.languageCode !== 'un' ? res.languageCode : false
    ); 
  }); 
}

console.log('#Chars  | V1  | V2  | V3 ');   
console.log('------- | --- | --- | ---');   
for(var i=1, i_max=100, str; i <= i_max; ++i)
{
  //str = str_repeat('Hans und Gretel gingen in den Wald, dort war es finster und auch so bitterkalt. ',i);
    str = str_repeat('Alle meine Entchen schwimmen auf dem See, den Kopf dann in das Wasser und blub. ',i);

  (function(str)
  {
    ASYNC.parallel(
    {
      v1: function(cb){ detect_v1(str,cb); },
      v2: function(cb){ detect_v2(str,cb); },
      v3: function(cb){ detect_v3(str,cb); }
    },
    function(err,res)
    {
      console.log(str.length+' | '+res.v1+' | '+res.v2+' | '+res.v3);
    });  
  })(str);
}

All Ports: Benchmark for memory leaks

My C/C++ is really rusty and in many cases I'm not used to the embedding APIs of the various languages. Each binding in the ports directory needs to be tested for memory leaks and other issues.

OS X: configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."

With OS X v10.8.3, following the instructions here: http://mzsanford.com/blog/introducing-libcld

With brew install https://raw.github.com/mzsanford/homebrew/libcld/Library/Formula/libcld.rb I get:

######################################################################## 100.0%
==> Downloading https://github.com/mzsanford/cld/tarball/v0.1.1
Already downloaded: /Library/Caches/Homebrew/libcld-0.1.1.tgz
Warning: MD5 support is deprecated and will be removed in a future version.
Please switch this formula to SHA1 or SHA256.
==> ./configure --prefix=/usr/local/Cellar/libcld/0.1.1
configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."

READ THIS: https://github.com/mxcl/homebrew/wiki/troubleshooting

Or when following the manual instructions, the same error at the ./configure step.

configure: error: cannot find install-sh or install.sh in "." "./.." "./../.."

Any idea what's wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.