junhewk / rcppmecab Goto Github PK
View Code? Open in Web Editor NEWRcppMeCab: Rcpp Interface of CJK Morpheme Analyzer MeCab
RcppMeCab: Rcpp Interface of CJK Morpheme Analyzer MeCab
Hello! I tried to install the 'RccpMeCab' package, but failed. I have installed the 'RMeCab' package, and when I ran install.packages("RccpMeCab")
, the information returned is as following:
Package which is only available in source form, and may need compilation of
C/C++/Fortran: ‘RcppMeCab’
Do you want to attempt to install these from sources? (Yes/no/cancel) y
installing the source package ‘RcppMeCab’
downloaded 22 KB
mecab-config --cflags
-fPIC -Wall -g -O2 -c RcppExports.cpp -o RcppExports.omecab-config --cflags
-fPIC -Wall -g -O2 -c posLoopRcpp.cpp -o posLoopRcpp.omecab-config --cflags
-fPIC -Wall -g -O2 -c posParallelRcpp.cpp -o posParallelRcpp.omecab-config --cflags
-fPIC -Wall -g -O2 -c posRcpp.cpp -o posRcpp.oThe downloaded source packages are in
‘/private/var/folders/hg/q395slr53q5_6g6xcwy8cxnw0000gn/T/Rtmp6QRLLL/downloaded_packages’
My environment information is
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RMeCab_1.00 devtools_1.13.5
loaded via a namespace (and not attached):
[1] httr_1.3.1 compiler_3.5.0 rvcheck_0.1.0 R6_2.2.2 tools_3.5.0 withr_2.1.2
[7] curl_3.2 memoise_1.1.0 git2r_0.21.0 digest_0.6.15
Can you help me to fix the problem?
I often analyze different languages in the same project. Is there a way to specify which language model to use (Japanese or Korean)? I could do this by changing sys_dic
, but it would be easier if there is a lang
argument based on which pos()
switches models internally.
By the way I started advertising your package: https://koheiw.net/wp-content/uploads/2018/07/Asian-text-analysis.pdf
Hello,
I have used RcppMeCab very well, but the error has occurred since I formatted my PC and reinstalled all of the latest versions of R and Rstudio.
When pos() is used with the following code,
test <- "이것은 비타500입니다."
pos(iconv(test, to = "UTF-8"))
I found the error message like
Exception:
list()
Here is the information from R and R studio that I am using.
R version: 4.0.3
R Studio version: Version 1.4.1103
You know, only when using windows, the package uses pre-packaged libmecab.dll
for its compiling.
This approach is now quite fine so that the package works well when using 64bit Windows and MeCab for Japanese.
However, it still has some troubles using Windows and Korean MeCab.
I tried to use RcppMeCab::pos
with Korean MeCab and mecab-ko-dic under my local machine,
it occured an Exception and returned nothing.
R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.8.0 (32-bit)
> library(testthat)
> library(RcppMeCab)
>
> test_check("RcppMeCab")
[ FAIL 1 | WARN 0 | SKIP 1 | PASS 0 ]
══ Skipped tests ═══════════════════════════════════════════════════════════════
• Skip testing on Japanese (1)
══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test_pos_ko.R:4:3'): Test pos tagger works on ko ──────────────────
...[] not equal to enc2utf8("저").
1/1 mismatches
x[1]: "\xb0\xac"
y[1]: "저"
[ FAIL 1 | WARN 0 | SKIP 1 | PASS 0 ]
Error: Test failures
Execution halted
Version info:
mecab
0.996_0+ipadic
RcppMeCab
0.0.1.3-2
> tar <- "안녕하세요 저는 박찬엽 입니다."
> Encoding(tar)
[1] "UTF-8"
> pos(tar, sys_dic = "c:\\mecab")
Exception:
list()
but when format = "data.frame" return is ok.
pos(tar, format = "data.frame")
doc_id sentence_id token_id token pos subtype
1 1 1 1 안녕 NNG 행위
2 1 1 2 하 XSV
3 1 1 3 세요 EP+EF
4 1 1 4 저 NP
5 1 1 5 는 JX
6 1 1 6 박찬엽 NNP 인명
7 1 1 7 입니다 VCP+EF
8 1 1 8 . SF
my sessionInfo() below
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949 LC_MONETARY=Korean_Korea.949
[4] LC_NUMERIC=C LC_TIME=Korean_Korea.949
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppMeCab_0.0.1.2
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 Rcpp_0.12.18 RcppParallel_4.4.1
It's now on cran : https://CRAN.R-project.org/package=RcppMeCab
Need to check RcppMeCab results using Japanese.
@koheiw, Could you help on this? Any ideas?
pos()
retuns a data.frame but its columns are all integers. I think it should have columns in the following format:
doc_id: factor
sentence_id: integer
token_id: integer
token: character
subtype: character
analytic: character
Hi @junhewk, I started using your package in my actual project, and I wanted to address #6 via a PR. However, I cannot build the package on my system because of an error to link to Mecab. I have seen this kind of error in my packages but could not solve. Any clue?
==> Rcpp::compileAttributes()
* Updated R/RcppExports.R
==> R CMD INSTALL --no-multiarch --with-keep.source RcppMeCab
* installing to library ‘/home/kohei/R/x86_64-pc-linux-gnu-library/3.6’
* installing *source* package ‘RcppMeCab’ ...
** using staged installation
make: Nothing to be done for 'all'.
** libs
installing to /home/kohei/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-RcppMeCab/00new/RcppMeCab/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘RcppMeCab’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/kohei/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-RcppMeCab/00new/RcppMeCab/libs/RcppMeCab.so':
/home/kohei/R/x86_64-pc-linux-gnu-library/3.6/00LOCK-RcppMeCab/00new/RcppMeCab/libs/RcppMeCab.so: undefined symbol: mecab_strerror
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/home/kohei/R/x86_64-pc-linux-gnu-library/3.6/RcppMeCab’
* restoring previous ‘/home/kohei/R/x86_64-pc-linux-gnu-library/3.6/RcppMeCab’
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User Edition 5.15
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.