haven-jeon / konlp Goto Github PK

View Code? Open in Web Editor NEW

155.0 35.0 58.0 12.73 MB

R package for Korean NLP

Home Page: http://cran.r-project.org/web/packages/KoNLP/index.html

R 44.95% Java 55.05%

korean-nlp r

konlp's Issues

Extend to -Xmx768m to load three dictionaries on memory.

useNIADic 함수로 불러들이는 우리말샘 사전과 인사이터 사전 그리고 시스템 사전은 약 120만건의 단어를 메모리에 로딩하게 된다.
85만 단어에 약 300MB의 heap memory 가 소모된다. 따라서 기본 설정인 512m는 세개의 사전을 한번에 메모리에 올리기 어렵다. (실제 테스트 결과 heap 메모리 부족현상이 발생함)
java.parameter를 조정해야 되는데 아래와 같은 두가지 시나리오가 있다.
- 사용자가 java.parameter를 직접 정하고 있는 경우
  - rJava, KoNLP 모두 사용자 설정을 오버라이딩 한다.
- 사용자가 자바 설정을 전혀 하기 않았을 경우
  - 이때에는 rJava가 512m로 정해버린다.
- Xmx 파라메터를 파싱해 실제 768m 보다 적은 경우 사용자가 임의로 정한 설정이 있을 경우 경고를 보내어 안내하고, 임의로 정한 설정이 없을 경우 rJava의 기본 값을 무시하고 768로 정한다.

add user dictionary reload function.

User can add extra terms on user dic. so need to add reload function which unnecessary to full reload by using "library(KoNLP)"

library 호출 시 R Session Aborted 에러

library(KoNLP) 실행 시

R Session Aborted
R encountered a fatal error. 에러 발생합니다.

version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.2
year 2017
month 09
day 28
svn rev 73368
language R
version.string R version 3.4.2 (2017-09-28)
nickname Short Summer

######################################
R, RStudio, install된 패키지들까지 싹 다 지웠다가 다시 설치해도 똑같네요..

add function to replace from user dictionary to Sejong noun dictionary for easy use.

java.lang.ArrayIndexOutOfBoundsException: 10000

Hi,

I am getting this kind of error while processing a huge vector(1mln texts)

java.lang.ArrayIndexOutOfBoundsException: 10000
at kr.ac.kaist.swrc.jhannanum.plugin.MajorPlugin.PosTagger.HmmPosTagger.KoNLPHMMTagger.new_mnode(KoNLPHMMTagger.java:349)
at kr.ac.kaist.swrc.jhannanum.plugin.MajorPlugin.PosTagger.HmmPosTagger.KoNLPHMMTagger.tagPOS(KoNLPHMMTagger.java:140)
at kr.ac.kaist.swrc.jhannanum.hannanum.Workflow.analyzeInSingleThread(Workflow.java:870)
at kr.ac.kaist.swrc.jhannanum.hannanum.Workflow.analyze(Workflow.java:534)
at kr.pe.freesearch.jhannanum.comm.HannanumInterface.extractNoun(HannanumInterface.java:141)

is there any way to fix this problem?

Apply auto spacing parameter for input sentences

Require converting function between KAIST tag set and Sejong tag set.

Sejong :
http://blog.daum.net/_blog/BlogTypeView.do?blogid=02ONK&articleno=12375684&categoryId=558541&regdt=20080226172629#ajax_history_home

KAIST :
https://github.com/haven-jeon/KoNLP/wiki/KoNLP-examples

To import from Sejong dictionary to Hannanum analyzer dictionary, this function needs.

Documentation with Roxygen2

Currently, based on Roxygen.

needs to convert to Rixygen2 Rd

duplicated Morphological results

cause by duplicated words on different dictionaries.

is.jamo encoding.

is.jamo() must be inputted with UTF-8 encoding. So, this needs to check whether input was UTF-8 or not.

System dictionary issues.

Felix Song

KoNLP패키지를 만들어 주셔서 감사합니다. :)
텍스트마이닝 작업 관련해서 KoNLP패키지를 이용하는 중에 질문이 생겨서 말씀드립니다.

작업 중에 useSystemDic()을 사용하면 dic_user.txt에 13개 단어만 추가되어서 이상하게 생각하던 차에, 묶여 있는 Sejong패키지의 압축파일과 useDic()을 확인해 보니 useDic()에서 시스템 사전 파일을 Sejong\dics\handic.zip\data\kE\dic_user.txt에서 받아오고 있더라구요.
제가 판단하기에는 handic.zip\data\kE\dic_system.t xt을 읽어와야할 것 같아서 dic_system.txt를 다른 경로에 붙여넣은 후에 read.table로 읽어들이려 하니 행 오류가 발생해서 사용할 수가 없습니다. 그렇다면 dic_system.txt는 아예 사용할 수 없는 것인지요?
extractNoun()을 사용해서 명사만 모두 추출하려 하는 상황에서 문제가 있습니다. 일부 단어에서 조사(격에 관계없이)가 같이 붙어서 나오는데, 이 부분을 사전 문제라고 봐야 할지, 아니면 제가 투입한 텍스트 파일의 문제인지 궁금합니다. [예를 들어서 텍스트에 '차도'라는 단어가 있으면, extractNoun()을 이용할 때 '차도가' '차도를' '차도에'와 같은 어절에서 조사 분리 없이 각 단어가 별도의 명사ë �œ 추출되고 있습니다.]

읽어 주셔서 감사합니다!

java.lang.ArrayIndexOutOfBoundsException

> SimplePos09('공보관통상진흥국장전자공업국장무역조사실장제 1차관보')
java.lang.ArrayIndexOutOfBoundsException
Error in `Encoding<-`(`*tmp*`, value = "UTF-8") : 
  a character vector argument expected

combine useInsighterDic, useWoorimalsamDic to useNIADic

need to add is.ascii function

extracNoun 실행 시 단어로 끝나는 벡터에서 잘못된 결과를 반환합니다.

안녕하세요? 금번 업데이트에 감명 받고 아주 잘 사용하고 있습니다. 다시 한번 감사드리며...

저 같은 경우에는 정부기관이나 연구소의 보고서를 많이 다루는데 , 그러한 자료에서 형태소 분석을 할 경우에는 종종 문제가 발생합니다. 많은 보고서가 개요체에 끝에 구두 점을 찍지 않습니다. 구두점이 없는 개요체 형식의 컨텐츠 벡터를 extracNoun 함수를 사용하여 명사를 추출할 경우 잘못된 결과를 종종 반환합니다.

그 증상 패턴을 분석해 보니, 구두점 있는 경우와 어절의 형태소가 사전에 등록되어 있는 경우에는 올바른 결과를 추출합니다. 하지만 사전에 없는 마지막 어절은 강제적으로 추출되며 마지막 한 글자를 분리하여 반환합니다. 예를 들면, "힣힣힣" => "힣힣" "힣" 이렇게 분리됩니다.

저 같은 경우는 벡터 끝에 구두점을 붙여서 임시방편으로 해결하지만, 빅데이터 같은 경우에는 의미없는 부하량이 증가되어 부담이 됩니다.

원인 파악과 해결이 가능할까요?

예제 코드

useNIADic()

txt.vt0 <- "저는 유능한 돌팔이입니다."
txt.vt1 <- "나는 유능한 연구원"
txt.vt2 <- "그동안 많은 일들을 해왔습니다."
txt.vt3 <- "그동안 많은 일들을 해왔음."
txt.vt4 <- "그동안 많은 일들을 해왔음"
txt.vt5 <- "그동안 많은 일들을 힣힣힣"
txt.vt6 <- "그동안 많은 일들을 힣힣힣."

extractNoun(txt.vt0)
[1] "저" "유능" "한" "돌팔이"
extractNoun(txt.vt1)
[1] "나" "유능" "한" "연구"
extractNoun(txt.vt2)
[1] "그동안" "일" "들"
extractNoun(txt.vt3)
[1] "그동안" "일" "들"
extractNoun(txt.vt4)
[1] "그동안" "일" "들" "해왔" "음"
extractNoun(txt.vt5)
[1] "그동안" "일" "들" "힣힣" "힣"
extractNoun(txt.vt6)
[1] "그동안" "일" "들" "힣힣힣"

shiny io deploy 시 에러 발생

안녕하세요?
만들어주신 패키지 정말 잘 사용하고 있습니다^^

다름이 아니라, 제가 워드 클라우드 생성하는 shiny app 을 만들어,
팀내에서 같이 쓰기 위해 shiny io에 올려보려고 했는데요,
deploy 할때 아래와 같이 에러가 발생하여 rstudio에 쪽에 문의하였더니,
패키지 개발자 분께서 오류를 고쳐주셔야 한다고 회신을 받았습니다.
확인해주심 대단히 감사하겠습니다^^

(오류 메시지는 맨 아래 부분에 있습니다)

building: Building package: KoNLP

########################## Begin Task Log

[2016-03-27T00:46:29.686536819+0000] Execute script: packages/build/rJava.sh

set -e
apt-key adv --keyserver keyserver.ubuntu.com (http://keyserver.ubuntu.com/) --recv-keys 86F44E2A
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /tmp/tmp.GH0bBoHnbc --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver keyserver.ubuntu.com (http://keyserver.ubuntu.com/) --recv-keys 86F44E2A
gpg: requesting key 86F44E2A from hkp server keyserver.ubuntu.com (http://keyserver.ubuntu.com/)
gpg: key 86F44E2A: public key "Launchpad OpenJDK builds (all archs)" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
echo 'deb http://ppa.launchpad.net/openjdk-r/ppa/ubuntu precise main'
apt-get update -qq
W: Size of file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_precise-updates_universe_source_Sources.gz is not what the server reported 153139 153184
W: Size of file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_precise-security_universe_source_Sources.gz is not what the server reported 53161 53201
DEBIAN_FRONTEND=noninteractive
apt-get install -y --no-install-recommends openjdk-8-jdk
Reading package lists...
Building dependency tree...
Reading state information...
The following extra packages will be installed:
openjdk-8-jre openjdk-8-jre-headless
Suggested packages:
openjdk-8-demo openjdk-8-source visualvm icedtea-8-plugin
openjdk-8-jre-jamvm libnss-mdns fonts-ipafont-gothic fonts-ipafont-mincho
ttf-wqy-microhei ttf-wqy-zenhei ttf-indic-fonts-core ttf-telugu-fonts
ttf-oriya-fonts ttf-kannada-fonts ttf-bengali-fonts
Recommended packages:
libgnome2-0 libgnomevfs2-0 libgconf2-4 ttf-dejavu-extra
The following NEW packages will be installed:
openjdk-8-jdk openjdk-8-jre openjdk-8-jre-headless
0 upgraded, 3 newly installed, 0 to remove and 18 not upgraded.
Need to get 49.7 MB of archives.
After this operation, 138 MB of additional disk space will be used.
Get:1 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu/ precise/main openjdk-8-jre-headless amd64 8u72-b15-1precise1 [37.7 MB]
Get:2 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu/ precise/main openjdk-8-jre amd64 8u72-b15-1precise1 [72.5 kB]
Get:3 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu/ precise/main openjdk-8-jdk amd64 8u72-b15-1precise1 [11.9 MB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 49.7 MB in 5s (9471 kB/s)
Selecting previously unselected package openjdk-8-jre-headless.
(Reading database ... 85150 files and directories currently installed.)
Unpacking openjdk-8-jre-headless (from .../openjdk-8-jre-headless_8u72-b15-1precise1_amd64.deb) ...
Selecting previously unselected package openjdk-8-jre.
Unpacking openjdk-8-jre (from .../openjdk-8-jre_8u72-b15-1precise1_amd64.deb) ...
Selecting previously unselected package openjdk-8-jdk.
Unpacking openjdk-8-jdk (from .../openjdk-8-jdk_8u72-b15-1precise1_amd64.deb) ...
Setting up openjdk-8-jre-headless (8u72-b15-1precise1) ...
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java to provide /usr/bin/java (java) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/jjs to provide /usr/bin/jjs (jjs) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/keytool to provide /usr/bin/keytool (keytool) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/pack200 to provide /usr/bin/pack200 (pack200) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/rmid to provide /usr/bin/rmid (rmid) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/rmiregistry to provide /usr/bin/rmiregistry (rmiregistry) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/unpack200 to provide /usr/bin/unpack200 (unpack200) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/orbd to provide /usr/bin/orbd (orbd) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/servertool to provide /usr/bin/servertool (servertool) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/tnameserv to provide /usr/bin/tnameserv (tnameserv) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode.
Setting up openjdk-8-jre (8u72-b15-1precise1) ...
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/policytool to provide /usr/bin/policytool (policytool) in auto mode.
Setting up openjdk-8-jdk (8u72-b15-1~precise1) ...
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/appletviewer to provide /usr/bin/appletviewer (appletviewer) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/extcheck to provide /usr/bin/extcheck (extcheck) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/idlj to provide /usr/bin/idlj (idlj) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jar to provide /usr/bin/jar (jar) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jarsigner to provide /usr/bin/jarsigner (jarsigner) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/javac to provide /usr/bin/javac (javac) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/javadoc to provide /usr/bin/javadoc (javadoc) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/javah to provide /usr/bin/javah (javah) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/javap to provide /usr/bin/javap (javap) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jcmd to provide /usr/bin/jcmd (jcmd) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jconsole to provide /usr/bin/jconsole (jconsole) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jdb to provide /usr/bin/jdb (jdb) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jdeps to provide /usr/bin/jdeps (jdeps) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jhat to provide /usr/bin/jhat (jhat) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jinfo to provide /usr/bin/jinfo (jinfo) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jmap to provide /usr/bin/jmap (jmap) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jps to provide /usr/bin/jps (jps) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jrunscript to provide /usr/bin/jrunscript (jrunscript) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jsadebugd to provide /usr/bin/jsadebugd (jsadebugd) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jstack to provide /usr/bin/jstack (jstack) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jstat to provide /usr/bin/jstat (jstat) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jstatd to provide /usr/bin/jstatd (jstatd) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/native2ascii to provide /usr/bin/native2ascii (native2ascii) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/rmic to provide /usr/bin/rmic (rmic) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/schemagen to provide /usr/bin/schemagen (schemagen) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/serialver to provide /usr/bin/serialver (serialver) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/wsgen to provide /usr/bin/wsgen (wsgen) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/wsimport to provide /usr/bin/wsimport (wsimport) in auto mode.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/xjc to provide /usr/bin/xjc (xjc) in auto mode.
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
/usr/bin/R CMD javareconf
*** JAVA_HOME is not a valid path, ignoring
Java interpreter : /usr/bin/java
Java version : 1.8.0_72-internal
Java home path : /usr/lib/jvm/java-8-openjdk-amd64/jre
Java compiler : /usr/bin/javac
Java headers gen.: /usr/bin/javah
Java archive tool: /usr/bin/jar

trying to compile and link a JNI program
detected JNI cpp flags : -I$(JAVA_HOME)/../include -I$(JAVA_HOME)/../include/linux
detected JNI linker flags : -L$(JAVA_HOME)/lib/amd64/server -ljvm
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/lib/jvm/java-8-openjdk-amd64/jre/../include -I/usr/lib/jvm/java-8-openjdk-amd64/jre/../include/linux -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c conftest.c -o conftest.o
gcc -std=gnu99 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o conftest.so conftest.o -L/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server -ljvm -L/usr/lib/R/lib -lR

JAVA_HOME : /usr/lib/jvm/java-8-openjdk-amd64/jre
Java library path: $(JAVA_HOME)/lib/amd64/server
JNI cpp flags : -I$(JAVA_HOME)/../include -I$(JAVA_HOME)/../include/linux
JNI linker flags : -L$(JAVA_HOME)/lib/amd64/server -ljvm
Updating Java configuration in /usr/lib/R
Done.

[2016-03-27T00:46:51.509207884+0000] Execute script: packages/build/tm.sh

set -e
apt-get update -qq
apt-get install -y antiword poppler-utils
Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
antiword poppler-utils
0 upgraded, 2 newly installed, 0 to remove and 18 not upgraded.
Need to get 309 kB of archives.
After this operation, 1094 kB of additional disk space will be used.
Get:1 http://mirrordenver.fdcservers.net/ubuntu/ precise/universe antiword amd64 0.37-8 [170 kB]
Get:2 http://mirrordenver.fdcservers.net/ubuntu/ precise-updates/main poppler-utils amd64 0.18.4-1ubuntu3.1 [139 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 309 kB in 0s (697 kB/s)
Selecting previously unselected package antiword.
(Reading database ... 85502 files and directories currently installed.)
Unpacking antiword (from .../antiword_0.37-8_amd64.deb) ...
Selecting previously unselected package poppler-utils.
Unpacking poppler-utils (from .../poppler-utils_0.18.4-1ubuntu3.1_amd64.deb) ...
Processing triggers for man-db ...
Setting up antiword (0.37-8) ...
Setting up poppler-utils (0.18.4-1ubuntu3.1) ...
[2016-03-27T00:47:29.254237873+0000] Installing R package: xtable (1.8-2)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��xtable�� ...
DONE (xtable)
[2016-03-27T00:47:29.579550598+0000] Installing R package: magrittr (1.5)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��magrittr�� ...
DONE (magrittr)
[2016-03-27T00:47:29.848126156+0000] Installing R package: stringi (1.0-1)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��stringi�� ...
DONE (stringi)
[2016-03-27T00:47:32.357335428+0000] Installing R package: stringr (1.0.0)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��stringr�� ...
DONE (stringr)
[2016-03-27T00:47:32.619864708+0000] Installing R package: rJava (0.9-8)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��rJava�� ...
DONE (rJava)
[2016-03-27T00:47:33.060769806+0000] Installing R package: mime (0.4)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��mime�� ...
DONE (mime)
[2016-03-27T00:47:33.321167335+0000] Installing R package: packrat (0.4.7)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��packrat�� ...
DONE (packrat)
[2016-03-27T00:47:33.589484244+0000] Installing R package: hash (2.2.6)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��hash�� ...
DONE (hash)
[2016-03-27T00:47:33.853055729+0000] Installing R package: R6 (2.1.2)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��R6�� ...
DONE (R6)
[2016-03-27T00:47:34.147142832+0000] Installing R package: RColorBrewer (1.1-2)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��RColorBrewer�� ...
DONE (RColorBrewer)
[2016-03-27T00:47:34.402586989+0000] Installing R package: NLP (0.1-9)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��NLP�� ...
DONE (NLP)
[2016-03-27T00:47:34.674752010+0000] Installing R package: digest (0.6.9)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��digest�� ...
DONE (digest)
[2016-03-27T00:47:34.958295582+0000] Installing R package: memoise (1.0.0)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��memoise�� ...
DONE (memoise)
[2016-03-27T00:47:35.212587511+0000] Installing R package: htmltools (0.3)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��htmltools�� ...
DONE (htmltools)
[2016-03-27T00:47:35.476188144+0000] Installing R package: jsonlite (0.9.19)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��jsonlite�� ...
DONE (jsonlite)
[2016-03-27T00:47:35.889928908+0000] Installing R package: Rcpp (0.12.3)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��Rcpp�� ...
DONE (Rcpp)
[2016-03-27T00:47:37.177233917+0000] Installing R package: httpuv (1.3.3)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��httpuv�� ...
DONE (httpuv)
[2016-03-27T00:47:37.576434867+0000] Installing R package: shiny (0.13.1)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��shiny�� ...
DONE (shiny)
[2016-03-27T00:47:38.330012972+0000] Installing R package: slam (0.1-32)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��slam�� ...
DONE (slam)
[2016-03-27T00:47:38.592693749+0000] Installing R package: wordcloud (2.5)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��wordcloud�� ...
DONE (wordcloud)
[2016-03-27T00:47:38.868515066+0000] Installing R package: tm (0.6-2)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��tm�� ...
DONE (tm)
[2016-03-27T00:47:39.267587598+0000] Installing R package: Sejong (0.01)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��Sejong�� ...
DONE (Sejong)
[2016-03-27T00:47:39.589284430+0000] Installing R package: tau (0.0-18)
installing to library ��/usr/local/lib/R/site-library��
installing binary package ��tau�� ...
DONE (tau)
[2016-03-27T00:47:39.856554530+0000] Building R package: KoNLP (0.76.9)
/mnt/packages/build /mnt
installing to library ��/usr/local/lib/R/site-library��
installing source package ��KoNLP�� ...
** package ��KoNLP�� successfully unpacked and MD5 sums checked
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error : .onAttach failed in attachNamespace() for 'KoNLP', details:
call: if (all((localeToCharset()[1] == c("UTF-8", "CP949", "EUC-KR")) ==
error: missing value where TRUE/FALSE needed
Error: loading failed
Execution halted
ERROR: loading failed
removing ��/usr/local/lib/R/site-library/KoNLP��
################################# End Task Log #################################
��: Unhandled Exception: Child Task 174749962 failed: Error building image: Error building KoNLP (0.76.9). Build exited with non-zero status: 1
�� Ǿ��ϴ�

force option to apply user defined dictionary

JVM creation error

When using KoNLP on low memory system.

library(KoNLP)
패키지 rJava를 로드중입니다
패키지 bitops를 로드중입니다
패키지 Sejong를 로드중입니다
Successfully Loaded Sejong Package.
Java initialized.

Error : .onLoad failed in loadNamespace() for 'KoNLP', details:
call: .jinit(parameters = c("-Dfile.encoding=UTF-8", getOption("java.parameters")))
error: Cannot create Java virtual machine (-1)
에러:'‘KoNLP’' 에 대한 패키지/네임스페이스 로드가 실패했습니다

add open corpus on R data dir.

for easy testing, pkg needs open corpus on pkg data dir

research with "unz" function with text encoding

unz(get("SejongDicsZip", envir=KoNLP:::.KoNLPEnv), relpath, encoding=localCharset)

Continuous "[:space:]" in sentence can make infinite wait.

Hannanum related functions(extractNoun, Simple*) are needed pre-processing before get into functions like "gsub("[[:space:]]", " ", sentence)" to avoid "\t\t\t\n\t\r\n" on sentence.

Error: processing vignette 'woorimalsam-dic.Rmd' failed with diagnostics

안녕하세요.

먼저 보석같은 패키지를 만드시느라 노고가 많으신 선생님께 감사를 드립니다.

엄청난 패키지를 다른 시스템에서는 잘 쓰다가 이 단맛을 못 보니까 너무 답답해서 이렇게 이슈를 남깁니다.

KoNLP 라이브러리를 로드하고 useSejongDic()나 useNIADic()를 사용하려고 하면 자꾸 에러가 떠서 NIA Dic을 설치하려는데, 이 또한 계속 에러가 뜹니다.
제 시스템 정보와 에러 내용들을 아래에 공유해 드립니다. 혹시 해결 방안이 있는 지 궁금합니다.

#coding:cp949
Sys.setlocale("LC_ALL","Korean")
[1] "LC_COLLATE=Korean_Korea.949;LC_CTYPE=Korean_Korea.949;LC_MONETARY=Korean_Korea.949;LC_NUMERIC=C;LC_TIME=Korean_Korea.949"

Sys.info()
sysname release version nodename
"Windows" "10 x64" "build 14393" "DESKTOP-JCEI8IR"
machine login user effective_user
"x86-64" "JooYoung" "JooYoung" "JooYoung"

devtools::install_github('haven-jeon/NIADic/NIADic', build_vignettes = TRUE)
Downloading GitHub repo haven-jeon/NIADic@master
from URL https://api.github.com/repos/haven-jeon/NIADic/zipball/master
Installing NIADic
"C:/PROGRA~~1/R/R-34~~1.1/bin/x64/R" --no-site-file --no-environ --no-save
--no-restore --quiet CMD build
"C:\Users\JooYoung\AppData\Local\Temp\RtmpGEbZRb\devtools22017e25f40\haven-jeon-NIADic-5ef8093\NIADic"
--no-resave-data --no-manual

checking for file 'C:\Users\JooYoung\AppData\Local\Temp\RtmpGEbZRb\devtools22017e25f40\haven-jeon-NIADic-5ef8093\NIADic/DESCRIPTION' ... OK
preparing 'NIADic':
checking DESCRIPTION meta-information ... OK
installing the package to build vignettes
creating vignettes ...Warning: running command '"C:/PROGRA~~1/R/R-34~~1.1/bin/x64/Rscript" --vanilla --default-packages= -e "tools::buildVignettes(dir = '.', tangle = TRUE)"' had status 1
ERROR
Successfully Loaded NIADic Package.
Quitting from lines 54-58 (woorimalsam-dic.Rmd)
Error: processing vignette 'woorimalsam-dic.Rmd' failed with diagnostics:
cannot take a sample larger than the population when 'replace = FALSE'
Execution halted
Installation failed: Command failed (1)

library(KoNLP)
Checking user defined dictionary!

useNIADic()
Backup was just finished!
Downloading package from url: https://github.com/haven-jeon/NIADic/releases/download/0.0.1/NIADic_0.0.1.tar.gz
/usr/bin/tar: Cannot connect to C: resolve failed
/usr/bin/tar: Cannot connect to C: resolve failed
Installation failed: argument is of length zero
Error in tryCatch({ : can't install NIADic package!
Please refer 'https://github.com/haven-jeon/NIADic' to install.
Calls: useNIADic -> buildDictionary -> install_NIADic -> tryCatch
In addition: Warning messages:
1: running command 'tar.exe -xf "C:\Users\JooYoung\AppData\Local\Temp\RtmpGEbZRb\file22014f65c47.tar.gz" -C "C:/Users/JooYoung/AppData/Local/Temp/RtmpGEbZRb/devtools22025cd2c59"' had status 128
2: In utils::untar(src, exdir = target, compressed = "gzip") :
'tar.exe -xf "C:\Users\JooYoung\AppData\Local\Temp\RtmpGEbZRb\file22014f65c47.tar.gz" -C "C:/Users/JooYoung/AppData/Local/Temp/RtmpGEbZRb/devtools22025cd2c59"' returned error code 128
3: running command 'tar.exe -tf "C:\Users\JooYoung\AppData\Local\Temp\RtmpGEbZRb\file22014f65c47.tar.gz"' had status 128
4: In min(slashes) : no non-missing arguments to min; returning Inf
Execution halted

modify function can work on vector

blocking queue로 인한 무한 대기 상태

phase 3에서 큐에 데이터가 들어오기를 무한 대기 하는 상태 해결
haven-jeon/HanNanum-Analyzer#5
haven-jeon/HanNanum-Analyzer@56de689
KoNLP는 single thread 기반으로 동작하기 때문에 사실상 블락킹 큐가 필요 없는 상황이나, 현 상태를 유지하면서 해당 문제 해결

added corpus lingustic functions

http://www.wordandphrase.info/frequencyList.asp

need to add encoding detection function of input string

if input file encoding was not applicable to R encoding env, this error will be show.

>f <- file("TextData.txt", blocking=F)
> txtLines <- readLines(f)
> nouns <- sapply(txtLines, extractNoun, USE.NAMES=F)
이하에 에러nchar(sentence) : 1는 부적절한 복수 바이트 문자입니다

so, needs encoding detection function.

KoNLP function preprocessing in hangulUtils.R

preprocessing <- function(inputs){
if(!is.character(inputs)) {
warning("Input must be legitimate character!")
return(FALSE)
}
newInput <- gsub("[[:space:]]", " ", inputs)
newInput <- gsub("[[:space:]]+$", "", newInput)
newInput <- gsub("^[[:space:]]+", "", newInput)
if((nchar(newInput) == 0) |
(nchar(newInput) > 20 & length(strsplit(newInput, " ")[[1]]) <= 1)){
warning(sprintf("It's not kind of right sentence : '%s'", inputs))
return(FALSE)
}
return(newInput)
}

ex_str_A = '가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다'
If inputs is ex_str_A, it returns F.

ex_str_B = '하하 호호 가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다가나다'
But, If inputs is ex_str_B, it returns not F but ex_str_B.

Because, nchar(newInput) > 20 is T, but length(strsplit(newInput, " ")[[1]]) <= 1 is F.

So, If SimplePos09 gets ex_str_B, It could make a problem. (maybe related to memory using HannanumObj)

Keystroke convert option required.

README.md for package webpage

Dear maintainers,

This concerns the packages

ALKr Amelia CSS CVST CheckDigit ChemoSpec ConjointChecks DataCombine
DataFrameConstr Delaporte DiscriMiner EpiContactTrace GOsummaries
Grid2Polygons HLMdiag HiveR IgorR KoNLP MRMR MicroStrategyR Morpho
MplusAutomation NISTnls NMF OLScurve OpenMPController OpenRepGrid
PRISMA PivotalR PropCIs Quandl R1magic RAppArmor RMendeley RMessenger
RSurvey RSvgDevice RTConnect RcmdrPlugin.MA RcmdrPlugin.MPAStats
Rjpstatdb Rook Rttf2pt1 SGP SGPdata Sejong SharpeR SuperLearner
TimeProjection TriMatch WDI XLConnect agridat backtest bagRboostR
bigrf biom bisectr bitops bmp causalsens cheddar clusteval
coarseDataTools coloc countrycode crs cumplyr cvxclustr d3Network
darch datacheck decctools devtools df2json dostats downloader dvn
eeptools events expoRkit expoTree extrafont extrafontdb faoutlier
fdasrvf fontcm forecast formatR futile.logger futile.paradigm gamlr
gazetools ggdendro ggmcmc ggplot2 ggsubplot gitter govdat growthmodels
hSDM harvestr highfrequency hts httpuv httr hysteresis imputation
installr investr ipfp jtrans kelvin kitagawa knitcitations knitr
knitrBootstrap l2boost labeledLoop lambda.r lisrelToR makeR mchof mirt
mmand mmod multilevelPSA mvc myepisodes needy networkTomography ngramr
np npRmpi nsprcomp opencpu.encode pROC pander pathdiagram pavo pbdBASE
pbdDMAT pdist phcfM pheatmap pheno2geno pitchRx plsdepot plspm plyr
pnn poppr portfolio portfolioSim profanal profr prospectr pumilioR
qdap questionr rAltmetric rImpactStory rdatamarket rdryad readbitmap
rebird rentrez repmis reports reshape2 restorepoint rfigshare
rfishbase rfisheries rgbif rio robustlmm ropensnp roxygen2 rplos
rspear rvertnet scales seacarb sig simPH simboot smss snpStatsWriter
sparsediscrim splitstackshape spsmooth sqlshare sss stringr structSSI
stylo surveydata taxize tbdiag tempdisagg tester testthat treebase
trip tripEstimation trueskill turner twitteR wethepeople zendeskR

maintained by one of you.

These contain a top-level README.md file, which is now used to generate
a corresponding README.html file on the CRAN package web pages.

Pls check whether your README.md file is in fact appropriate for this
(e.g., not assuming that the content will only be accessed from the
github project page): if not, pls use .Rbuildignore to have README.md
excluded from the versions for publication on CRAN.

Best

java.lang.ArrayIndexOutOfBoundsException when input long sentence

https://disqus.com/by/disqus_P3COu8DdOs/

OutofMemory issue on mac osx with R 3.0.x

function can instantly add to current user dictionary.

addTermsToDictionary(c("감자", "ncn", "고구마", ncn))

divide to two packages.

For large dictionary, CRAN admin requests to divide two packages.

infinite wait when use " " like input to Hannanum

Need interface to edit user dictionary on Hannanum Analyzer

Any user can add and delete user dictionary privately

add function that show user dictionary statistics

SimplePos22와 MorphAnalyzer 실행 오류 문의

안녕항세요.
KoNLP 공유해 주셔서 우선 감사드립니다.

그런데 KoNLP 패키지에서 SimplePos22와 MorphAnalyzer 실행 시 아래와 같은 에러 메세지가 나옵니다. (SimplePos09로는 정상적으로 결과가 나옴)
Error in .jcall(get("HannanumObj", envir = KoNLP:::.KoNLPEnv), "S", "SimplePos22", :
java.lang.OutOfMemoryError: Java heap space
Error in .jcall(get("HannanumObj", envir = KoNLP:::.KoNLPEnv), "S", "MorphAnalyzer", :
java.lang.OutOfMemoryError: Java heap space

무엇때문에 오류가 나는 지 알려주실 수 있는지요?

is.hangul, is.ascii produce "Error in is.hangul(y) : Input must be legitimate character! "

JVM memory issue.

library(KoNLP)
패키지 rJava를 로드중입니다
패키지 bitops를 로드중입니다
Error : .onLoad failed in loadNamespace() for 'KoNLP', details:
call: .jinit(parameters = c("-Dfile.encoding=UTF-8", "-Xmx1024m"))
error: Cannot create Java virtual machine (-4)
에러:'‘KoNLP’' 에 대한 패키지/네임스페이스 로드가 실패했습니다

statDic functions needs to revisit.


R > KoNLP::statDic()
$summary
     word                tag        
 Length:390143      ncn    :269992  
 Class :character   pvg    : 66343  
 Mode  :character   pad    : 22337  
                    mag    : 22245  
                    mmd    :  3508  
                    ii     :  1594  
                    (Other):  4124  

$head
        word tag
1     가게문 ncn
2     가겟방 ncn
3 가격결정론 ncn
4       가급 ncn
5     가나무 ncn
6   가는소금 ncn

$tail
           word tag
390138 힝힝대다 pvg
390139 힝힝하다 pvg
390140 힝힝하다 pad
390141       힠 ncn
390142 힡트리다 pvg
390143   전작권 ncn

Example of extractNoun() and real result doesn't match

In github wiki, the example argues the following:

extractNoun("롯데마트가 판매하고 있는 흑마늘 양념 치킨이 논란이 되고 있다.")
[1] "롯데마트" "판매" "흑마늘" "양념" "치킨" "논란"

But on CRAN package, the result differs:

extractNoun("롯데마트가 판매하고 있는 흑마늘 양념 치킨이 논란이 되고 있다.")
[1] "롯데마트가" "판매" "흑마늘" "양념" "치킨" "논란"

Add more dictionaries

Release to CRAN

check all 'R CMD check'!
test case add and pass
NEWS update
...

Control effecient memory handling when use multiple analyzer.

When run SimPos09 function after SimplePos22 function. Java try to have duplicated dictionary in memory. So, workflow objects needs to be free before run different analyzer.

don't stop when encounter wrong sentence.

Need to fix preprocessing function.

Reusing workflow object.

for performance improvement

haven-jeon / konlp Goto Github PK

konlp's Issues

예제 코드

########################## Begin Task Log

Recommend Projects

Recommend Topics

Recommend Org