Coder Social home page Coder Social logo

infinilabs / analysis-ik Goto Github PK

View Code? Open in Web Editor NEW
16.1K 598.0 3.2K 7.49 MB

🚌 The IK Analysis plugin integrates Lucene IK analyzer into Elasticsearch and OpenSearch, support customized dictionary.

License: Apache License 2.0

Java 100.00%
elasticsearch ik-analysis analyzer java easysearch opensearch

analysis-ik's Introduction

IK Analysis for Elasticsearch and OpenSearch

The IK Analysis plugin integrates Lucene IK analyzer, and support customized dictionary. It supports major versions of Elasticsearch and OpenSearch. Maintained and supported with ❤️ by INFINI Labs.

The plugin comprises analyzer: ik_smart , ik_max_word, and tokenizer: ik_smart , ik_max_word

How to Install

You can download the packaged plugins from here: https://release.infinilabs.com/,

or you can use the plugin cli to install the plugin like this:

For Elasticsearch

bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/8.4.1

For OpenSearch

bin/opensearch-plugin install https://get.infini.cloud/opensearch/analysis-ik/2.12.0

Tips: replace your own version number related to your elasticsearch or opensearch.

Getting Started

1.create a index

curl -XPUT http://localhost:9200/index

2.create a mapping

curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }

}'

3.index some docs

curl -XPOST http://localhost:9200/index/_create/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:9200/index/_create/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://localhost:9200/index/_create/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘**渔船"}
'
curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/json' -d'
{"content":"**驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'

4.query with highlighting

curl -XPOST http://localhost:9200/index/_search  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "**" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}
'

Result

{
    "took": 14,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2,
        "hits": [
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "4",
                "_score": 2,
                "_source": {
                    "content": "**驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
                },
                "highlight": {
                    "content": [
                        "<tag1>**</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
                    ]
                }
            },
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "3",
                "_score": 2,
                "_source": {
                    "content": "中韩渔警冲突调查:韩警平均每天扣1艘**渔船"
                },
                "highlight": {
                    "content": [
                        "均每天扣1艘<tag1>**</tag1>渔船 "
                    ]
                }
            }
        ]
    }
}

Dictionary Configuration

Config file IKAnalyzer.cfg.xml can be located at {conf}/analysis-ik/config/IKAnalyzer.cfg.xml or {plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
	<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
	<entry key="remote_ext_dict">location</entry>
	<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>

Hot-reload Dictionary

The current plugin supports hot reloading dictionary for IK Analysis, through the configuration mentioned earlier in the IK configuration file.

	<entry key="remote_ext_dict">location</entry>
	<entry key="remote_ext_stopwords">location</entry>

Among which location refers to a URL, such as http://yoursite.com/getCustomDict. This request only needs to meet the following two points to complete the segmentation hot update.

  1. The HTTP request needs to return two headers, one is Last-Modified, and the other is ETag. Both of these are of string type, and if either changes, the plugin will fetch new segmentation to update the word library.

  2. The content format returned by the HTTP request is one word per line, and the newline character is represented by \n.

Meeting the above two requirements can achieve hot word updates without the need to restart the ES instance.

You can place the hot words that need to be automatically updated in a .txt file encoded in UTF-8. Place it under nginx or another simple HTTP server. When the .txt file is modified, the HTTP server will automatically return the corresponding Last-Modified and ETag when the client requests the file. You can also create a separate tool to extract relevant vocabulary from the business system and update this .txt file.

FAQs


  1. Why isn't the custom dictionary taking effect?

Please ensure that the text format of your custom dictionary is UTF8 encoded.

  1. What is the difference between ik_max_word and ik_smart?

ik_max_word: Performs the finest-grained segmentation of the text. For example, it will segment "中华人民共和国国歌" into "中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌", exhaustively generating various possible combinations, suitable for Term Query.

ik_smart: Performs the coarsest-grained segmentation of the text. For example, it will segment "中华人民共和国国歌" into "中华人民共和国,国歌", suitable for Phrase queries.

Note: ik_smart is not a subset of ik_max_word.

Community

Fell free to join the Discord server to discuss anything around this project:

https://discord.gg/4tKTMkkvVX

License

Copyright ©️ INFINI Labs.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

analysis-ik's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

analysis-ik's Issues

使用ik解析其进行索引时出现ArrayIndexOutOfBoundsException错误

在写入索引时如果以ik作为解析器的字段出现前导的空格,就会出现ArrayIndexOutOfBoundsException错误,而使用其他解析器则不会。

@client.index index: 'oai_ik', type: 'item', body: body_json

经实验证明,只要将字段中的前导空格去除就正常了。

elasticsearch 1.4.0不能愉快的添加elasticsearch-analysis-ik

通过源码自己build的elasticsearch-analysis-ik。配置好elasticsearch-analysis-ik,启动的时候报错,求大神更新。

Exception in thread "Thread-3" java.lang.NullPointerException
        at org.wltea.analyzer.dic.Monitor.run(Monitor.java:87)
        at java.lang.Thread.run(Thread.java:745)
org.apache.http.client.ClientProtocolException
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp
Client.java:186)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp
Client.java:82)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp
Client.java:106)
        at org.wltea.analyzer.dic.Monitor.run(Monitor.java:64)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.ProtocolException: Target host is not specified
        at org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultR
outePlanner.java:69)
        at org.apache.http.impl.client.InternalHttpClient.determineRoute(Interna
lHttpClient.java:124)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp
Client.java:183)
        ... 4 more
Exception in thread "Thread-4" java.lang.NullPointerException
        at org.wltea.analyzer.dic.Monitor.run(Monitor.java:87)
        at java.lang.Thread.run(Thread.java:745)
[2014-11-18 16:48:27,134][INFO ][mmseg-analyzer           ] chars loaded time=50
8ms, line=12638, on file=chars.dic
[2014-11-18 16:48:27,149][INFO ][mmseg-analyzer           ] words loaded time=2m
s, line=4, on file=words-my.dic
[2014-11-18 16:48:27,379][INFO ][mmseg-analyzer           ] words loaded time=22
7ms, line=157202, on file=words.dic
[2014-11-18 16:48:27,380][INFO ][mmseg-analyzer           ] load all dic use tim
e=754ms
[2014-11-18 16:48:27,382][INFO ][mmseg-analyzer           ] unit loaded time=1ms
, line=22, on file=units.dic
[2014-11-18 16:48:28,196][INFO ][gateway                  ] [Son of Satan] recov
ered [2] indices into cluster_state
[2014-11-18 16:52:12,078][INFO ][node                     ] [Son of Satan] stopp
ing ...
[2014-11-18 16:52:13,141][ERROR][marvel.agent.exporter    ] [Son of Satan] error
 sending data
java.net.ConnectException: Connection refused: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketI
mpl.java:85)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja
va:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket
Impl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java
:182)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)

卸载elasticsearch-analysis-ik后启动ok

ik分词器ik_max_word和ik_smart单独可使用,作为属性不生效

如:
index:

analysis:

analyzer:      

  ik:

      alias: [ik_analyzer]

      type: org.elasticsearch.index.analysis.IkAnalyzerProvider

  ik_max_word:

      type: ik

      use_smart: true

  ik_smart:

      type: ik

      use_smart: false

索引时用ik的话,按照配置应该选ik_max_word,因为ik_max_word设置的是true:
ik_max_word:

      type: ik

      use_smart: true

但是结果却是ik_smart的。

当我显式的在代码里用ik_max_word或ik_smart进行索引时,才生效,期待解决。

我用的是rails的这个插件:
https://github.com/karmi/retire

请问这个是IK的哪个版本?

您好,请问这集成的是IK的哪个版本?默认是不是以最长词分词模式?能否配置最长词分词模式还是最小词分词模式?

File Character Encodings

Hi,

I've been trying to build this but as the pom.xml doesn't contain the character encoding the build fails as it defaults on my machine to UTF-8. I found that you can set the character encoding with this method: http://stackoverflow.com/a/8979120

Are you able to tell me what character encoding is used or save the files as UTF-8? I've not managed to do this.

I'm happy to try and do this if you can provide any clues. After I get past this problem I hope to update it to work with the 0.90.0 beta as needed.

ik not found

es-version: 1.4.2
es-ik-version: 1.2.9

mvn package 打包得到jar包(已经在pom中将es修改到1.4.2版本),放到es plugin目录,并将config中ik目录放到es的config目录中,最后在elasticsearch.yml结尾中配置(我直接把我的配置复制过来):

index: 
  analysis: 
    analyzer: 
      ik: 
        alias: [ik_analyzer]
        type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word: 
        type: ik
        use_smart: false
      ik_smart: 
        type: ik
        use_smart: true

运行命令:


curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
    "fulltext": {
             "_all": {
            "indexAnalyzer": "ik",
            "searchAnalyzer": "ik",
            "term_vector": "no",
            "store": "false"
        },
        "properties": {
            "content": {
                "type": "string",
                "store": "no",
                "term_vector": "with_positions_offsets",
                "indexAnalyzer": "ik",
                "searchAnalyzer": "ik",
                "include_in_all": "true",
                "boost": 8
            }
        }
    }
}'

返回出错:{"error":"MapperParsingException[Analyzer [ik] not found for field [content]]","status":400}

查了下,出现这样的错误是说yml配置文件格式有问题。
我了解yml缩进格式,冒号后面必须有一个空格,缩进时不能使用tab,而是使用1个或多个空格,按照这样的格式修改,并且在http://yaml-online-parser.appspot.com/ 上解析过,正常。

能否帮我看下是什么问题?

求教IK分词问题 关于英文

IK支持纯英文分词么? 还是需要有特别的配置?
目前在使用IK作为es的分词,测试纯英文句子,就全部返回了

elasticsearch[0.90.2] 安装ik运行时出错!

ik 版本是1.2.2 是不是要降级版本

org.elasticsearch.indices.IndexCreationException: [twitter] failed to create index
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:382)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:296)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:162)
    at org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:321)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: failed to find analyzer type [ik] or tokenizer for [ik_smart]
    at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:372)
    at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
    at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:201)
    at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:82)
    at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:129)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:66)
    at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:380)
    ... 7 more
Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [ik]
    at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:349)
    at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:337)
    at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:356)
    ... 15 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.ik.IkAnalyzerProvider
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
    at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:347)
    ... 17 more

0.90.7安装ik时出现错误

ubuntu 64位系统, es的版本是0.90.7
按照项目首页的安装文档安装,出现了一下错误

org.elasticsearch.indices.IndexCreationException: [twitter] failed to create index
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:316)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:328)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:178)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:414)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: failed to find analyzer type [“ik”] or tokenizer for [ik]/
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:372)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:314)
... 7 more
Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [“ik”]
at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:349)
at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:337)
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:356)
... 15 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.“ik”.“ik”AnalyzerProvider
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:347)
... 17 more

does not work under elastic search 0.90.5

In my mac environment, it shows:

[2013-11-10 17:03:29,145][ERROR][ik-analyzer ] ik-analyzer
java.io.FileNotFoundException: /usr/local/var/config/ik/IKAnalyzer.cfg.xml (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:138)
at org.wltea.analyzer.cfg.Configuration.(Configuration.java:36)
at org.elasticsearch.index.analysis.IkTokenizerFactory.(IkTokenizerFactory.java:22)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
at org.elasticsearch.common.inject.InjectorImpl$5$1.call(InjectorImpl.java:781)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at org.elasticsearch.common.inject.InjectorImpl$5.get(InjectorImpl.java:777)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:221)
at com.sun.proxy.$Proxy19.create(Unknown Source)
at org.elasticsearch.index.analysis.AnalysisService.(AnalysisService.java:79)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)

After some debug, I found the reason is that IkAnalyzerProvider 's settings does not contains the configFile information anymore. I know little about the elastic search plugin, I am not sure it is due to the different elastic search version or not.

elasticsearch重启后不识别ik

elasticsearch版本: 0.20.5
plugins下创建目录analysis-ik, 给elasticsearch-analysis-ik-1.1.4.jar拖了进去
重启后在终端输入curl -XPUT http://localhost:9200/index,提示错误信息:
{"error":"IndexCreationException[[index] failed to create index]; nested: ElasticSearchIllegalArgumentException[failed to find analyzer type [ik] or tokenizer for [ik]]; nested: NoClassSettingsException[Failed to load class setting [type] with value [ik]]; nested: ClassNotFoundException[ik]; ","status":500}

ik词库动态更新的问题

能不能实现一个如果词库内容变化了,重新加载的功能,我看到你在elasticsearch.cn中提出的想法是基于redis,当然这个想法挺好的,我有个建议:
1、用户在IKAnalyzer.cfg.xml文件中配置好的词库可以有个线程定时去检查他的MD5,如果变化了,就重新加载一下该词库,这样可以实现动态加载词库的功能。
2、如果词库量比较大,可以使用redis集中管理词库,还可以动态的更新词库,但是一定要有一个是redis第一次启动的时候从原始的词库文件中加载数据,一个是当词库文件有变化的时候,可以将变化的数据加载在redis中,还有一个是如果直接向redis中添加新词,可以将这些备份称文本词库,便于备份和重启加载之用

The default setting of 'user_smart' is not false

According to the doc:
you can set your prefer segment mode,default use_smart is false.

But the IKSegmenter get it with default "true":

this.useSmart = settings.get("use_smart", "true").equals("true");

if the default setting is 'false',it should be?

this.useSmart = settings.get("use_smart", "false").equals("true");

[0.20.2] 安裝插件失敗...

ik max_word模式分词时丢掉单字

ik分词器max_word分词时会丢掉单字,导致搜索(搜索时采用smart分词模式)时无法检索结果。
例如:
假设词库含有“开户”和“户数”两个词,但是不含“开户数”。
索引时采用max_word模式分词,“开户数”分词结果为“开户”和“户数”,缺少了“数”和“开”两个单字
搜索时采用smart模式分词,“开户数”分词结果为“开”和“户数”,这样搜索时的分词结果不是索引时分词结果的子集,导致搜索“开户数”为空。

Dictionary class Multi Thread NullPointerException issue

Dictionary类的initial方法在初始化的时候使用的双重检测方法是有问题的,不能保证在多线程环境下Dictionary单例的初始化完成。导致后续使用isStopWord方法时出现了NullPointerException,修改initial方法为在方法级别使用synchronized关键字后问题消失,修改后的代码为:
public static synchronized Dictionary initial(Configuration cfg) {
if (singleton == null) {
singleton = new Dictionary();
singleton.configuration = cfg;
singleton.loadMainDict();
singleton.loadSurnameDict();
singleton.loadQuantifierDict();
singleton.loadSuffixDict();
singleton.loadPrepDict();
singleton.loadStopWordDict();
}
return singleton;
}

curl -XPUT http://localhost:9200/index

[root@es elasticsearch]# curl -XPUT http://localhost:9200/index
{"error":"IndexCreationException[[index] failed to create index]; nested: ElasticsearchIllegalArgumentException[failed to find analyzer type [ik] or tokenizer for [ik_smart]]; nested: NoClassSettingsException[Failed to load class setting [type] with value [ik]]; nested: ClassNotFoundException[org.elasticsearch.index.analysis.ik.IkAnalyzerProvider]; ","status":400}

無法手動安裝到ES 1.0.1

作者你好,

因為我不是使用RTF, 我手動安裝ik 1.2.5 一直失敗
想請問原因?

我的環境:

[vagrant@devc elasticsearch-1.0.1]$ find ./plugins/analysis-ik
./plugins/analysis-ik
./plugins/analysis-ik/elasticsearch-analysis-ik-1.2.5.jar
[vagrant@devcloud2 elasticsearch-1.0.1]$ find ./config/ik
./config/ik
./config/ik/custom
./config/ik/custom/ext_stopword.dic
./config/ik/custom/mydict.dic
./config/ik/custom/single_word_full.dic
./config/ik/custom/single_word_low_freq.dic
./config/ik/custom/sougou.dic
./config/ik/IKAnalyzer.cfg.xml
./config/ik/main.dic
./config/ik/preposition.dic
./config/ik/quantifier.dic
./config/ik/stopword.dic
./config/ik/suffix.dic
./config/ik/surname.dic

在elasticsearch.yml中:

index:
  analysis:
    analyzer:
      ik:
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true

ES啟動時, 顯示的log:

[2014-03-12 10:18:19,965][INFO ][plugins                  ] [Chief Examiner] loaded [analysis-smartcn, analysis-kuromoji, analysis-icu], sites [HQ, inquisitor]

並無analysis-ik, 且接下來就會出現class not found 錯誤

Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.ik.IkAnalyzerProvider
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:446)
    ... 17 more

請問可能的原因爲何?

謝謝!

安装后提示缺失class文件

使用plugin安装会报错
image

所以是把源代码时候用mvn package打包后生成jar复制到plugins目录,插件加载成功,但是使用的时候显示class文件丢失。mvn打包确实会生成class文件,但是这些文件应该放到哪个目录?从未接触过java...求解
qq 20141016150426
image
环境:
centos6.5
yum安装
elasticsearch 1.3.4 java-1.7.0-openjdk java-1.7.0-openjdk-devel

你好,关于安装 IK 遇到的问题,求解答

使用 mvn package 暴错了,能说下原因吗?

[root@poplar elasticsearch-analysis-ik-master]# mvn package
/usr/java/jdk1.7.0_65
[INFO] Scanning for projects...
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR] The project org.elasticsearch:elasticsearch-analysis-ik:1.2.9 (/tmp/elasticsearch-analysis-ik-master/pom.xml) has 1 error
[ERROR] Non-parseable POM /root/.m2/repository/org/sonatype/oss/oss-parent/7/oss-parent-7.pom: Expected root element 'project' but found 'html' (position: START_TAG seen ... @1:6) @ /root/.m2/repository/org/sonatype/oss/oss-parent/7/oss-parent-7.pom, line 1, column 6 -> [Help 2]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/ModelParseException

提示pom解析异常,我是这样安装IK的

wget https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zip
unzip master.zip
cd elasticsearch-analysis-ik/
mvn package

pom.xml里的内容只修改了jdk版本,我改成了1.7,发现包错,又改回成1.6还是包错啊,求安装方法,谢谢

我安装的maven版本:

[root@poplar elasticsearch-analysis-ik-master]# mvn -v
/usr/java/jdk1.7.0_65
Apache Maven 3.0.4 (rNON-CANONICAL_2013-04-08_07-49_mockbuild; 2013-04-08 15:49:58+0800)
Maven home: /usr/share/maven
Java version: 1.7.0_65, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_65/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "3.9.10-100.fc17.x86_64", arch: "amd64", family: "unix"

plugin class not found error

Version: elasticsearch-0.90.1

Error:

Caused by: org.elasticsearch.ElasticSearchIllegalArgumentException: failed to find analyzer type [ik] or tokenizer for [ik_max_word]
        at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:372)
        at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
        at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:201)
        at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:82)
        at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
        at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
        at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:129)
        at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:66)
        at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:380)
        ... 6 more
Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [ik]
        at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:348)
        at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:336)
        at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:356)
        ... 14 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.ik.IkAnalyzerProvider
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:346)
        ... 16 more

config:

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch/
path.plugins: /usr/share/elasticsearch/plugins

# ik configs
index:
  analysis:
    analyzer:
      ik:
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true

Plugin location:

$ find /usr/share/elasticsearch/plugins/
/usr/share/elasticsearch/plugins/
/usr/share/elasticsearch/plugins/analysis-ik
/usr/share/elasticsearch/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.1.jar

java.lang.NoClassDefFoundError: org/apache/http/client/ClientProtocolException under 1.4.0

[2014-11-29 17:38:56,633][DEBUG][action.admin.indices.create] [Tempest] [wf] failed to create
org.elasticsearch.indices.IndexCreationException: [wf] failed to create index
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:301)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:382)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:329)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/apache/http/client/ClientProtocolException
at org.elasticsearch.index.analysis.IkTokenizerFactory.(IkTokenizerFactory.java:25)
at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
at org.elasticsearch.common.inject.InjectorImpl$5$1.call(InjectorImpl.java:781)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at org.elasticsearch.common.inject.InjectorImpl$5.get(InjectorImpl.java:777)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:221)
at com.sun.proxy.$Proxy16.create(Unknown Source)
at org.elasticsearch.index.analysis.AnalysisService.(AnalysisService.java:82)
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:299)
... 6 more
Caused by: java.lang.ClassNotFoundException: org.apache.http.client.ClientProtocolException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 60 more

version branch 和 说明

master | 0.20.2 → master
1.1.3 | 0.20.2 → master

1.14 的 branch 以及上面需要改为

master | 0.90.x → master
1.1.4 | 0.90.x → master
1.1.3 | 0.20.2

抽空搞一下,谢谢

ik Connection refused

如果使用 master branch 编译出来的会出现 marvel 不能连接。
但是如果使用 v1.2.6 编译出来的没问题。
[2015-01-21 12:14:25,450][ERROR][marvel.agent.exporter ] [Scott Washington] error connecting to [127.0.0.1:9200]
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
at org.elasticsearch.marvel.agent.exporter.ESExporter.openConnection(ESExporter.java:325)
at org.elasticsearch.marvel.agent.exporter.ESExporter.openExportingConnection(ESExporter.java:182)
at org.elasticsearch.marvel.agent.exporter.ESExporter.exportXContent(ESExporter.java:248)
at org.elasticsearch.marvel.agent.exporter.ESExporter.exportEvents(ESExporter.java:161)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportEvents(AgentService.java:305)
at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:267)
at java.lang.Thread.run(Thread.java:745)

对能处理词典里的中英混合词

IKAnalyzer2012最后那个版本是可以,不知道这个版本为什么不可以处理中英混合词,对比了一下核心代码也没发现有啥不一样。

Plugin failing to install due to SSL issues

This command is failing:

plugin -i medcl/elasticsearch-analysis-ik/1.2.6 -u https://github.com/medcl/elasticsearch-rtf/raw/master/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.6.jar

The response:

-> Installing medcl/elasticsearch-analysis-ik/1.2.6...
Trying https://github.com/medcl/elasticsearch-rtf/raw/master/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.6.jar...
Failed: SSLProtocolException[handshake alert:  unrecognized_name]
Trying http://download.elasticsearch.org/medcl/elasticsearch-analysis-ik/elasticsearch-analysis-ik-1.2.6.zip...
Trying http://search.maven.org/remotecontent?filepath=medcl/elasticsearch-analysis-ik/1.2.6/elasticsearch-analysis-ik-1.2.6.zip...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/medcl/elasticsearch-analysis-ik/1.2.6/elasticsearch-analysis-ik-1.2.6.zip...
Trying https://github.com/medcl/elasticsearch-analysis-ik/archive/v1.2.6.zip...
Trying https://github.com/medcl/elasticsearch-analysis-ik/archive/master.zip...
Downloading .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
Installed medcl/elasticsearch-analysis-ik/1.2.6 into /usr/local/var/lib/elasticsearch/plugins/analysis-ik

Error while installing plugin, reason: IllegalArgumentException: Plugin installation assumed to be site plugin, but contains source code, aborting installation.

I dug a little deeper and tried to download the .jar with curl, and discovered it only worked when SSLv3 was forced.

The following command fails:

curl -o es-analysis-ik.jar -L https://github.com/medcl/elasticsearch-rtf/raw/master/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.6.jar

curl: (35) error:14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)

But works when forcing SSLv3:

curl -3 -o es-analysis-ik.jar -L https://github.com/medcl/elasticsearch-rtf/raw/master/plugins/analysis-ik/elasticsearch-analysis-ik-1.2.6.jar

Any idea how I can get this to install through the ES plugin binary? We had previously supported this plugin at Qbox.io, but I've had to remove it temporarily due to this issue.

org.elasticsearch.ElasticSearchIllegalArgumentException: failed to find analyzer

Hi,

I am using elasticsearch 0.19.0 and installed ik analyzer as plugin:
./plugin -install medcl/elasticsearch-analysis-ik/1.0.0

My elasticsearch.yaml is like the following:

index:
analysis:
analyzer:
ik:
type: org.elasticsearch.index.analysis.IkAnalyzerProvider

Also I've downloaded the ik analyzer zip file and unzipped to %ESHOME%\config\ folder.

But when I issue the following command, the ElasticSearchIlegalArgumentException was thrown:

curl -XGET "localhost:9200/_analyze?analyzer=ik" -d "this is a test"

org.elasticsearch.ElasticSearchIllegalArgumentException: failed to find analyzer [ik]
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:147)
at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:57)
at org.elasticsearch.action.support.single.custom.TransportSingleCustomOperationAction$AsyncSingleAction$1.run(TransportSingleCustomOperationAction.java:143)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Seems like the ES cannot find the ik analyzer.

What maybe the issue here?

Thanks.

elasticsearch 1.2.1 中加载analysis-ik成功,报Analyzer [ik] not found?

./bin/elasticsearch

[2014-07-08 15:09:05,884][INFO ][node ] [test-01] version[1.2.1], pid[16965], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-08 15:09:05,884][INFO ][node ] [test-01] initializing ...
[2014-07-08 15:09:05,898][INFO ][plugins ] [test-01] loaded [marvel, analysis-smartcn, analysis-ik, analysis-mmseg], sites [marvel, kopf]
[2014-07-08 15:09:07,656][INFO ][node ] [test-01] initialized
[2014-07-08 15:09:07,656][INFO ][node ] [test-01] starting ...
[2014-07-08 15:09:07,730][INFO ][transport ] [test-01] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/192.168.16.128:9301]}
[2014-07-08 15:09:10,761][INFO ][cluster.service ] [test-01] new_master [test-01][qBp7LwCZSnew5ElUaz4u4Q][testdeMacBook-Pro.local][inet[/192.168.16.128:9301]], reason: zen-disco-join (elected_as_master)
[2014-07-08 15:09:10,782][INFO ][discovery ] [test-01] elasticsearch/qBp7LwCZSnew5ElUaz4u4Q
[2014-07-08 15:09:10,817][INFO ][http ] [test-01] bound_address {inet[/0:0:0:0:0:0:0:0:9201]}, publish_address {inet[/192.168.16.128:9201]}
[2014-07-08 15:09:11,366][INFO ][ik-analyzer ] [Dict Loading]ik/custom/mydict.dic
[2014-07-08 15:09:11,367][INFO ][ik-analyzer ] [Dict Loading]ik/custom/single_word_low_freq.dic
[2014-07-08 15:09:11,373][INFO ][ik-analyzer ] [Dict Loading]ik/custom/ext_stopword.dic
[2014-07-08 15:09:11,393][INFO ][mmseg-analyzer ] chars loaded time=18ms, line=14861, on file=chars.dic
[2014-07-08 15:09:11,394][INFO ][mmseg-analyzer ] words loaded time=1ms, line=3, on file=words-my.dic
[2014-07-08 15:09:11,654][INFO ][mmseg-analyzer ] words loaded time=260ms, line=263638, on file=words.dic
[2014-07-08 15:09:11,654][INFO ][mmseg-analyzer ] load all dic use time=279ms
[2014-07-08 15:09:11,654][INFO ][mmseg-analyzer ] unit loaded time=0ms, line=35, on file=units.dic
[2014-07-08 15:09:11,978][INFO ][gateway ] [test-01] recovered [8] indices into cluster_state
[2014-07-08 15:09:11,980][INFO ][node ] [test-01] started

curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
"fulltext": {
"_all": {
"indexAnalyzer": "ik",
"searchAnalyzer": "ik",
"term_vector": "no",
"store": "false"
},
"properties": {
"content": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"indexAnalyzer": "ik",
"searchAnalyzer": "ik",
"include_in_all": "true",
"boost": 8
}
}
}
}'

{"error":"MapperParsingException[Analyzer [ik] not found for field [content]]","status":400}

1.2.1加载ik无效果

elasticsearch 1.2.1 + 使用 maven 编译后的 ik 1.2.7版本
重启服务后,日志中有看到有加载此插件,但没有字典的加载过程。使用时报:
{"error":"ElasticsearchIllegalArgumentException[failed to find analyzer [ik]]","status":400}

elasticsearch.yml内容为:

index:
  analysis:
    analyzer:
      ik:
        alias: [ik_analyzer]
        type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
        type: ik
        use_smart: false
      ik_smart:
        type: ik
        use_smart: true

启动日志为:

[2014-07-19 03:53:29,274][INFO ][node                     ] [node01] version[1.2.1], pid[1], build[6c95b75/2014-06-03T15:02:52Z]
[2014-07-19 03:53:29,275][INFO ][node                     ] [node01] initializing ...
[2014-07-19 03:53:29,294][INFO ][plugins                  ] [node01] loaded [analysis-ik, analysis-smartcn], sites []
[2014-07-19 03:53:32,463][INFO ][node                     ] [node01] initialized
[2014-07-19 03:53:32,464][INFO ][node                     ] [node01] starting ...
[2014-07-19 03:53:32,551][INFO ][transport                ] [node01] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.17.0.42:9300]}
[2014-07-19 03:53:35,595][INFO ][cluster.service          ] [node01] new_master [node01][GOoFWmPoTuWWQ98w_OwaHA][b5dc94d97881][inet[/172.17.0.42:9300]], reason: zen-disco-join (elected_as_master)
[2014-07-19 03:53:35,627][INFO ][discovery                ] [node01] ump_es/GOoFWmPoTuWWQ98w_OwaHA
[2014-07-19 03:53:35,653][INFO ][http                     ] [node01] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.42:9200]}
[2014-07-19 03:53:35,662][INFO ][gateway                  ] [node01] recovered [0] indices into cluster_state
[2014-07-19 03:53:35,663][INFO ][node                     ] [node01] started

Can't load custom dict

IKAnalyzer.cfg.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->        
        <entry key="ext_dict">custom/mydict.dic;custom/sougou.dict</entry>      
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords">custom/ext_stopword.dic</entry>      
</properties>

elasticsearch startup logs:

[2013-08-09 14:00:18,427][INFO ][cluster.routing.allocation.decider] [elephant] updating [cluster.routing.allocation.disable_allocation] from [true] to [false]
[2013-08-09 14:00:19,016][INFO ][ik-analyzer              ] [Dict Loading] ik/IKAnalyzer.cfg.xml
[2013-08-09 14:00:19,546][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/main.dic,MainDict Size:7094
[2013-08-09 14:00:21,107][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/main.dic,MainDict Size:12368
[2013-08-09 14:00:21,123][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/surname.dic,SurnameDict Size:12368
[2013-08-09 14:00:21,127][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/quantifier.dic,QuantifierDict Size:12368
[2013-08-09 14:00:21,140][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/suffix.dic,SuffixDict Size:12369
[2013-08-09 14:00:21,161][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/preposition.dic,PrepDict Size:12369
[2013-08-09 14:00:21,163][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/stopword.dic,Stopwords Size:12369
[2013-08-09 14:00:21,173][INFO ][ik-analyzer              ] [Dict Loading] /home/vincent/es/config/ik/custom/ext_stopword.dic,Stopwords Size:12369

From logs, ES or IK didn't load custom/mydict.dic, custom/sougou.dict

I have tried

http://localhost:9200/test/_analyze?text=万科&analyzer=ik

it doesn't work as I expected. 万科 is custom word in mydict.dic

1.4下不能愉快的run起来

RT.

ES1.4下面不管是plugin -install 还是自己mvn 打的package 都不能愉快的run起来

能不能再搞的1.4的RTF ^_^

关于 ik use_smart 的疑问

index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
index.analysis.analyzer.default.type: ik

es 版本: 1.4.0 插件版本 1.2.9

我已经在 elasticsearch.yml 中配置了如上信息, 为什么 分词之后的结果 还是

输入的为 : 北京一日游

{

"tokens": 

[

{

"token": "北京",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 1

},
{

"token": "京",
"start_offset": 1,
"end_offset": 2,
"type": "CN_WORD",
"position": 2

},
{

"token": "一日游",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 3

},
{

"token": "一日",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 4

},
{

"token": "一",
"start_offset": 2,
"end_offset": 3,
"type": "TYPE_CNUM",
"position": 5

},
{

"token": "日",
"start_offset": 3,
"end_offset": 4,
"type": "COUNT",
"position": 6

},

    {
        "token": "游",
        "start_offset": 4,
        "end_offset": 5,
        "type": "CN_WORD",
        "position": 7
    }
]

}

Need cmd to do dictionary reload

@medcl 大家都是**人我就写中文了。。。

每次reload词典都需要一台台的重启ES,实在是比较麻烦,能提供一个api命令,可以直接通过HTTP的请求触发词典reload么?还是自身有提供但是我没找到?(我承认,我只确认了在源码里面没有reload关键词 -.-bb)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.