Coder Social home page Coder Social logo

sing1ee / elasticsearch-jieba-plugin Goto Github PK

View Code? Open in Web Editor NEW
520.0 520.0 142.0 1.4 MB

jieba analysis plugin for elasticsearch 7.0.0, 6.4.0, 6.0.0, 5.4.0,5.3.0, 5.2.2, 5.2.1, 5.2, 5.1.2, 5.1.1

License: MIT License

Java 100.00%
dict elasticsearch elasticsearch-jieba-plugin jieba stopwords

elasticsearch-jieba-plugin's People

Contributors

microbun avatar poying avatar ran-aa avatar sing1ee avatar thearas avatar waizuwolf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-jieba-plugin's Issues

6.4.1和7.4.2用jieba_index分词结果不一致

经过测试, 发现jieba-7.4.2与jieba-6.4.1分词结果不一致, 主要是position不一致.
请问如果我从es6.4.1升级至es7.4.2, jieba这个版本间的差异, 需要对数据进行reindex吗?

Doesn't work with elasticsearch 6.4.0

Doesn't work with elasticsearch 6.4.0 (and probably all versions > 6.0.0)
Plugin version 6.0.1

[2018-09-13T15:11:28,885][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Unknown properties in plugin descriptor: [jvm, site, isolated]

jieba tokenizer 問題

我加入了 16:9 這組字串在dic裡,但仍然會被切開,請問有什麼地方可以設定遇到 ':' 符號不分詞嗎?

关于同义词synoyms.txt 的错误

环境:es 5.3.0 jieba 5.3.0
按照readme 一步一步到
========= OK ==========
test analyzer:
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU

========================

================添加同义词 出错 ========

Pay attention to *jieba_synonym, same with jieba_stop, the format of synoyms.txt:
北京大学,北大,pku
清华大学,清华,Tsinghua University

===》 这一步我修改对应的文件synoyms.txt 后,重启 es ,会报错。日志如下 (如果是空内容是不会出错)

[2018-08-10T15:48:55,535][WARN ][o.e.g.Gateway ] [7ixgH36] recovering index [welink_index/7MYv9V1JSs-DAF8CBEKULA] failed - recovering as closed
java.lang.IllegalArgumentException: failed to build synonyms
at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.(SynonymTokenFilterFactory.java:97) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.lambda$buildTokenFilterFactories$1(AnalysisRegistry.java:169) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry$1.get(AnalysisRegistry.java:265) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:342) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenFilterFactories(AnalysisRegistry.java:171) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:155) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.IndexService.(IndexService.java:145) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:427) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:460) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.gateway.Gateway.performStateRecovery(Gateway.java:135) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.gateway.GatewayService$1.doRun(GatewayService.java:229) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:613) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.3.0.jar:5.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281) ~[?:1.8.0_171]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) ~[?:?]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:?]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_171]
at java.io.BufferedReader.read1(BufferedReader.java:210) ~[?:1.8.0_171]
at java.io.BufferedReader.read(BufferedReader.java:286) ~[?:1.8.0_171]
at java.io.BufferedReader.fill(BufferedReader.java:161) ~[?:1.8.0_171]
at java.io.BufferedReader.readLine(BufferedReader.java:324) ~[?:1.8.0_171]
at java.io.LineNumberReader.readLine(LineNumberReader.java:201) ~[?:1.8.0_171]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:82) ~[lucene-analyzers-common-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:44:09]
at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:44:09]
at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.(SynonymTokenFilterFactory.java:92) ~[elasticsearch-5.3.0.jar:5.3.0]
... 16 more

====================================

synoyms.txt 内容是如下格式???

北京大学,北大,pku
清华大学,清华,Tsinghua University

========================

Pay attention to *jieba_synonym, same with jieba_stop, the format of synoyms.txt:
北京大学,北大,pku
清华大学,清华,Tsinghua University

加载自定义的同义词出现错误如下,请问怎么解决

{'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'failed to build synonyms'}], 'type': 'illegal_argument_exception', 'reason': 'failed to build synonyms', 'caused_by': {'type': 'parse_exception', 'reason': 'Invalid synonym rule at line 1', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'term: 北京大学 analyzed to a token (北京大学) with position increment != 1 (got: 0)'}}}, 'status': 400}

关于Tokenizer Filter

我看huaban的文档中有“全角转半角、大写转小写、字符分词”的tokenizer filter,但是没有找到使用方法,也不知道这些filter的名称,希望指点一下

如何以词进行命中 而不是字?

http://localhost:9200/jieba_index/fulltext/_search
{
"query" : { "match" : { "content" : "大学" }},
"highlight" : {
"pre_tags" : ["", ""],
"post_tags" : ["", ""],
"fields" : {
"content" : {}
}
}
}

得到结果:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.68324494,
"hits": [
{
"_index": "jieba_index",
"_type": "fulltext",
"_id": "2",
"_score": 0.68324494,
"_source": {
"content": "**的伟大时代来临了,欢迎参观北京大学PKU"
},
"highlight": {
"content": [
"**的伟大时代来临了,欢迎参观北京大学PKU"
]
}
}
]
}
}

我想要的是:
"**的伟大时代来临了,欢迎参观北京大学PKU"

搜索句子时候 同义词失效

按照readme 步骤
一直到
test analyzer:
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU

=====这一步得到的结果和readme是一样的,同义词能识别出来=======

但是到下一步:
search
POST http://localhost:9200/jieba_index/fulltext/_search
Request body:

{
"query" : { "match" : { "content" : "pku" }},
"highlight" : {
"pre_tags" : ["", ""],
"post_tags" : ["", ""],
"fields" : {
"content" : {}
}
}
}

得到的结果是不会出现同义词效果的。就是 《北京大学 》没有标注,只有pku 。???为什么?

illegal_argument_exception: startOffset must be non-negative

PUT jieba_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "jieba_analyzer": {
          "tokenizer": "jieba_index"
        }
      }
    }
  },
  "mappings": {
    "fulltext": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "jieba_analyzer"
        }
      }
    }
  }
}
POST jieba_index/fulltext
{"content":"李海林"}

ES 返回:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=3,lastStartOffset=1 for field 'content'"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=3,lastStartOffset=1 for field 'content'"
  },
  "status": 400
}

Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?

], spins? [no], types [ext4]
[2017-02-06T09:57:31,526][INFO ][o.e.e.NodeEnvironment    ] [o-IYnu8] heap size [123.7mb], compressed ordinary object pointers [true]
[2017-02-06T09:57:31,672][INFO ][o.e.n.Node               ] node name [o-IYnu8] derived from node ID [o-IYnu8pRMe6qi1j9t2bXQ]; set [node.name] to override
[2017-02-06T09:57:31,675][INFO ][o.e.n.Node               ] version[5.1.2], pid[6585], build[c8c4c16/2017-01-11T20:18:39.146Z], OS[Linux/2.6.32-220.4.1.el6.centos.plus.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_91/25.91-b14]
[2017-02-06T09:57:32,980][ERROR][o.e.b.Bootstrap          ] Exception
java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?
        at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:295) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:131) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.node.Node.<init>(Node.java:294) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) [elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) [elasticsearch-5.1.2.jar:5.1.2]
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/plugins/jieba/plugin-descriptor.properties
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
        at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_91]
        at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_91]
        at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_91]
        at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_91]
        at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:292) ~[elasticsearch-5.1.2.jar:5.1.2]
        ... 13 more
[2017-02-06T09:57:32,988][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) ~[elasticsearch-5.1.2.jar:5.1.2]
Caused by: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [jieba]. Was the plugin built before 2.0?
        at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:295) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:131) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.node.Node.<init>(Node.java:294) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.1.2.jar:5.1.2]
        ... 6 more
Caused by: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/plugins/jieba/plugin-descriptor.properties
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
        at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
        at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_91]
        at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_91]
        at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_91]
        at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_91]
        at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:292) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:131) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.node.Node.<init>(Node.java:294) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.node.Node.<init>(Node.java:229) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap$6.<init>(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:214) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:306) ~[elasticsearch-5.1.2.jar:5.1.2]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.1.2.jar:5.1.2]
        ... 6 more

自定义词未被命中

自定义词:学区房
user.txt未添加”学区房“时,搜索时会命中”学区“和”房“这两个词
user.txt中添加 学区房 10(或者100/10000) 之后,未命中”学区房“这个词甚至没有”学区“和”房“这两个词
但是我测试分词器分出了”学区房“这个词
会是哪里有问题呢?

gradle gz issue

您好 在執行 gradle gz時遇到以下錯誤

想請問該如何解決?

FAILURE: Build failed with an exception.

* What went wrong:
Task 'gz' not found in root project 'elasticsearch'.

* Try:
Run gradle tasks to get a list of available tasks. Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 0.837 secs

如何构建适合Elasticsearch6.5.4版本的插件呢

我对java不是很了解, 更改了build.gradle的version为6.5.4 然后重新构建插件。 还是提示org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Plugin [analysis-jieba] was built for Elasticsearch version 6.4.0 but version 6.5.4 is running。
应该如何修改配置呢?

是否可以支持繁中分詞?

我將此插件有送繁中的文檔進行解析,可以運作但結果有點怪怪的,請問這插件現在是否支持繁中文字?

热更新词库

老大能不能给提供个思路,怎么加入热更新词库的功能呢,动态更新词典

6.0版本的用不了,索引的时候会报错

startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=93,endOffset=96,lastStartOffset=94

还有:下面的例子中 synonyms/synonyms.txt 这个文件没有在打包文件里面,是没有上传?
{
"settings": {
"analysis": {
"filter": {
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
},
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
]
}
}
}
}
}

为什么解压到plugins下面的jieba目录后在启动报错呢

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [elasticsearch-jieba-plugin-5.3.0]. Was the plugin built before 2.0?
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:127) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:58) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.Command.main(Command.java:88) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) ~[elasticsearch-5.3.0.jar:5.3.0]
Caused by: java.lang.IllegalStateException: Could not load plugin descriptor for existing plugin [elasticsearch-jieba-plugin-5.3.0]. Was the plugin built before 2.0?
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:295) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:131) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:302) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.0.jar:5.3.0]
... 6 more
Caused by: java.nio.file.NoSuchFileException: E:\elasticsearch-5.3.0\elasticsearch-5.3.0\plugins\elasticsearch-jieba-plugin-5.3.0\plugin-descriptor.properties
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:79) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) ~[?:?]
at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(WindowsFileSystemProvider.java:230) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_102]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_102]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_102]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_102]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:86) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:292) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:131) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:302) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.0.jar:5.3.0]
... 6 more

如何根据自定义词典,从文本中提取词典中的关键词?

需求:
如何根据自定义词典,从文本中提取词典中的关键词。

设想三种方案:
1、输入文本,获取结巴分词结果,编写一套代码根据分词结果对比自定义词典,输出同时包含在文本中和词典中的词。
2、输入文本,编写一套代码,逐个查询词典中的词是否在文本中出现,输出同时包含在文本中和词典中的词。
3、利用结巴词性标注的功能,在自定义词典中,将自定义词全部标注为某一特殊词性,利用结巴根据词性提取关键词功能,输入文本,提取指定词性的关键词。
4、利用结巴自定义词典功能,分词完全根据指定的自定义词典进行分词,输入文本,调用指定词典,输出分词结果。
5、利用结巴权重功能,输出分词结果中将指定自定义词典中的词的权重调大,其他词权重调低,输出分词结果后,截取权重靠前的几个词。

问题:
哪种方案可以实现需求?
结巴有没有直接根据算定义词典提取关键词的功能?

因为没有看到可以直接实现类似这样需求的资料,所以在此提问,请不吝赐教!

如果您了解这方面技术,请提供一下思路,如果能提供一下教程学习地址,或者写点参考代码就更好了。谢谢,不胜感激!

关于分词和jieba不一致问题

curl -XGET "http://172.3.0.89:9200/_analyze" -H 'Content-Type: application/json' -d'{"analyzer": "jieba_index","text": "中华人民共和国成立"}'
{"tokens":[{"token":"中华人民共和国成立","start_offset":0,"end_offset":9,"type":"word","position":0}]}

如果使用官方的结巴进行精准分词可以分出

[中华人民共和国, 成立 ]

都是使用默认的字典。 这个是什么原因呢?

ES分词插件需求征集

之前比较忙,最近有空梳理一下分词插件的问题。
大家有什么需求都可以提出来,我这边根据需求的重要程度进行开发。

6.4版本无法工作

按照介绍中的方法安装Plugin后,貌似Plugin没有工作。
curl -X GET "localhost:9200/_cat/plugins?v&s=component&h=name,component,version,description"
也发现不了这个插件。
创建index反馈:
{
"error": "IndexCreationException[[jieba_index] failed to create index]; nested: IllegalArgumentException[Custom Analyzer [my_ana] failed to find tokenizer under name [jieba_index]]; ",
"status": 400
}

PositionIncrement问题(Elasticsearch6.6.2 + jieba6.4.1)

Elasticsearch6.6.2 ,我使用的是jieba 6.4.1插件
然后创建索引的时候,自定义analyzer如下所示:
"jieba_syno_search": {
"type": "custom",
"tokenizer": "jieba_search",
"filter": [
"jieba_stop",
"my_synonym_filter"
]
},
发送请求后,就遇到下面这种错误:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: 美国和伊拉克 analyzed to a token (伊拉克) with position increment != 1 (got: 2)"
}
}
},
"status": 400
}
还请指教,谢谢!

test analyzer 报错 request body or source parameter is required

ES版本:elasticsearch-6.0.0 elasticsearch-jieba-plugin版本:elasticsearch-jieba-plugin-6.0.1 Windows

  • 下载并构建
    gradle pz

  • 复制文件
    将1中构建成功后build\distributions\elasticsearch-jieba-plugin-6.0.0.zip 复制到 elasticsearch-6.0.0\plugins解压并删除解压文件

  • 创建词库文件
    在elasticsearch-6.0.0\config 下创建stopwords/stopwords.txt 及 synonyms/synonyms.txt

  • 启动ES
    start elasticsearch

  • test analyzer
    在postman GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=测试结巴分词看看结果出乎意料

  • 错误结果:

{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "request body or source parameter is required"
}
],
"type": "parse_exception",
"reason": "request body or source parameter is required"
},
"status": 400
}

7.0版本引入插件后无法启动

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Could not load plugin descriptor for plugin directory [plugin.xml]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:163) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-7.0.0.jar:7.0.0]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:115) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.0.0.jar:7.0.0]
Caused by: java.lang.IllegalStateException: Could not load plugin descriptor for plugin directory [plugin.xml]
at org.elasticsearch.plugins.PluginsService.readPluginBundle(PluginsService.java:401) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.findBundles(PluginsService.java:386) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:379) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:151) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:306) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:251) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:325) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.0.0.jar:7.0.0]
... 6 more
Caused by: java.nio.file.FileSystemException: /home/work/fnrd/elastic-search/elasticsearch-7.0.0/plugins/plugin.xml/plugin-descriptor.properties: 不是目录
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[?:?]
at java.nio.file.Files.newByteChannel(Files.java:361) ~[?:1.8.0_181]
at java.nio.file.Files.newByteChannel(Files.java:407) ~[?:1.8.0_181]
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[?:1.8.0_181]
at java.nio.file.Files.newInputStream(Files.java:152) ~[?:1.8.0_181]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:156) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.readPluginBundle(PluginsService.java:398) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.findBundles(PluginsService.java:386) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.getPluginBundles(PluginsService.java:379) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:151) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:306) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.node.Node.(Node.java:251) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:211) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:325) ~[elasticsearch-7.0.0.jar:7.0.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) ~[elasticsearch-7.0.0.jar:7.0.0]
image

查询出错问题

按照README说明文档进行:
1: 下载 es : v6.0.0 jieba : v6.0.1
2: 编译结巴并放到es插件下
3: 拷贝停用词到指定目录下,并自己创建个synonyms.txt (找不到jieba对应的同义词文件)文件到指定目录下
4: create index 成功 PUT http://localhost:9200/jieba_index 。。。

==========

5:test analyzer 这步出错

执行以下步骤
GET http://localhost:9200/jieba_index/_analyze?analyzer=my_ana&text=**的伟大时代来临了,欢迎参观北京大学PKU

返回以下结果,请问下是什么原因导致的?

{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "request body or source parameter is required"
}
],
"type": "parse_exception",
"reason": "request body or source parameter is required"
},
"status": 400
}

如可以能否加你QQ 联系谢谢。

GET http://localhost:9200/jieba_index

response :

{
"jieba_index": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "jieba_index",
"creation_date": "1532942171024",
"analysis": {
"filter": {
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
},
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
}
},
"analyzer": {
"my_ana": {
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
],
"tokenizer": "jieba_index"
}
}
},
"number_of_replicas": "1",
"uuid": "1pDMeWYJQDmRAlD4P7KfJA",
"version": {
"created": "6000099"
}
}
}
}
}

使用6.4.1 release插件在elastic search6.4.0中使用,配置了同义词的index创建的时候报错

创建索引

DELETE /jieba_test
PUT /jieba_test
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonyms/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}

报错

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "failed to build synonyms"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "failed to build synonyms",
    "caused_by": {
      "type": "parse_exception",
      "reason": "Invalid synonym rule at line 1",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "term: 北京大学 analyzed to a token (北京大学) with position increment != 1 (got: 0)"
      }
    }
  },
  "status": 400
}

近义词文件如下:
北京大学,北大,pku
清华大学,清华,Tsinghua University

@sing1ee

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.