Coder Social home page Coder Social logo

medcl / elasticsearch-analysis-mmseg Goto Github PK

View Code? Open in Web Editor NEW
361.0 361.0 104.0 1.02 MB

The Mmseg Analysis plugin integrates Lucene mmseg4j-analyzer:http://code.google.com/p/mmseg4j/ into elasticsearch, support customized dictionary.

License: Apache License 2.0

Java 100.00%

elasticsearch-analysis-mmseg's People

Contributors

defp avatar medcl avatar nicozhang avatar zeeshanasghar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-analysis-mmseg's Issues

path.conf setting ignored

https://www.elastic.co/guide/en/elasticsearch/reference/2.2/breaking_20_setting_changes.html#_custom_config_file

The es plugin install script will follow the path.conf setting and install config in to a customised location.

But the analysis-mmseg plugin ignores this setting and assumes the config is installed in a directory inside plugin source location.

So when using a custom config location the plugin fails. The work around is to symlink the config. But I think the plugin should read the config file location from properties.

Thanks

请教关于安装mmseg插件的一些问题

系统信息

  • Linux MyCentOS 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
  • 4核8G

安装过程

下载elasticsearch的rpm包安装,版本为1.7.2,jdk是通过yum安装,版本为1.8

./elasticsearch -v
Version: 1.7.2, Build: e43676b/2015-09-14T09:49:53Z, JVM: 1.8.0_51

从elasticsearch-rtf的1.7.1分支中下载了plugins/mmseg/elasticsearch-analysis-mmseg-1.4.0.jar放到/usr/share/elasticsearch/plugins/mmseg/elasticsearch-analysis-mmseg-1.4.0.jar下的对应目录

.
├── bin
│   ├── elasticsearch
│   ├── elasticsearch.in.sh
│   └── plugin
├── lib
│   ├── antlr-runtime-3.5.jar
│   ├── apache-log4j-extras-1.2.17.jar
│   ├── asm-4.1.jar
│   ├── asm-commons-4.1.jar
│   ├── elasticsearch-1.7.2.jar
│   ├── groovy-all-2.4.4.jar
│   ├── jna-4.1.0.jar
│   ├── jts-1.13.jar
│   ├── log4j-1.2.17.jar
│   ├── lucene-analyzers-common-4.10.4.jar
│   ├── lucene-core-4.10.4.jar
│   ├── lucene-expressions-4.10.4.jar
│   ├── lucene-grouping-4.10.4.jar
│   ├── lucene-highlighter-4.10.4.jar
│   ├── lucene-join-4.10.4.jar
│   ├── lucene-memory-4.10.4.jar
│   ├── lucene-misc-4.10.4.jar
│   ├── lucene-queries-4.10.4.jar
│   ├── lucene-queryparser-4.10.4.jar
│   ├── lucene-sandbox-4.10.4.jar
│   ├── lucene-spatial-4.10.4.jar
│   ├── lucene-suggest-4.10.4.jar
│   ├── sigar
│   │   ├── libsigar-amd64-linux.so
│   │   ├── libsigar-ia64-linux.so
│   │   ├── libsigar-x86-linux.so
│   │   └── sigar-1.6.4.jar
│   └── spatial4j-0.4.1.jar
├── LICENSE.txt
├── NOTICE.txt
├── plugins
│   └── analysis-mmseg
│       └── elasticsearch-analysis-ik-1.4.0.jar
└── README.textile

从elasticsearch-rtf的1.7.1分支中下载了config/mmseg放到了/etc/elasticsearch/config/mmseg

/etc/elasticsearch/
├── elasticsearch.yml
├── logging.yml
└── mmseg
    ├── chars.dic
    ├── units.dic
    ├── words.dic
    └── words-my.dic

1 directory, 6 files

在/etc/elasticsearch/elasticsearch.conf的最后面,参考 https://github.com/medcl/elasticsearch-analysis-mmseg 添加了

#for chinese
index:
  analysis:
    analyzer:
      mmseg:
        alias: [news_analyzer, mmseg_analyzer]
        type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
index.analysis.analyzer.default.type : "mmseg"

重启服务:

service elasticsearch restart
Stopping elasticsearch:                                    [  OK  ]
Starting elasticsearch:                                    [  OK  ]

curl  http://localhost:9200
{
  "status" : 200,
  "name" : "node-0",
  "cluster_name" : "search-srv",
  "version" : {
    "number" : "1.7.2",
    "build_hash" : "e43676b1385b8125d647f593f7202acbd816e8ec",
    "build_timestamp" : "2015-09-14T09:49:53Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}



尝试创建index

curl -XPUT http://localhost:9200/myindex

返回结果如下

{
    "error":"IndexCreationException[[myindex] failed to create index]; 
nested: ElasticsearchIllegalArgumentException[failed to find analyzer type [mmseg] or tokenizer for [mmseg]]; 
nested: NoClassSettingsException[Failed to load class setting [type] with value [mmseg]]; 
nested: ClassNotFoundException[org.elasticsearch.index.analysis.mmseg.MmsegAnalyzerProvider]; ",
    "status":400
}

服务的启动日志,没加载插件?怎么配置才能加载上?
@medcl

[2015-09-23 23:52:13,740][INFO ][node                     ] [node-0] stopping ...
[2015-09-23 23:52:13,762][INFO ][node                     ] [node-0] stopped
[2015-09-23 23:52:13,762][INFO ][node                     ] [node-0] closing ...
[2015-09-23 23:52:13,769][INFO ][node                     ] [node-0] closed
[2015-09-23 23:52:14,353][INFO ][node                     ] [node-0] version[1.7.2], pid[5814], build[e43676b/2015-09-14T09:49:53Z]
[2015-09-23 23:52:14,354][INFO ][node                     ] [node-0] initializing ...
[2015-09-23 23:52:14,436][INFO ][plugins                  ] [node-0] loaded [], sites []
.......................sth............................
[2015-09-23 23:53:06,285][DEBUG][action.admin.indices.create] [node-0] [myindex] failed to create
org.elasticsearch.indices.IndexCreationException: [myindex] failed to create index
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:338)
    at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:371)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:204)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:167)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find analyzer type [mmseg] or tokenizer for [mmseg]
    at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:372)
    at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
    at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
    at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
    at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:336)
    ... 7 more
Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [mmseg]
    at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:476)
    at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:464)
    at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:356)
    ... 15 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.mmseg.MmsegAnalyzerProvider
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

但是执行plugin工具

plugin -l
Installed plugins:
    - analysis-mmseg

请问是不是我哪里少了一步?配置问题?版本问题?or安装问题?求解~

QueryShardException: failed to create query

Recently I've upgrade ES from 2.1.0 to ES 5.2.2, which also using the analysis-mmseg plugin. I've followed the procedure to rebuild the plugin, change the index setting. And after restart ES, everything seems ok.
Here's the index mapping:

{
  "users" : {
    "mappings" : {
      "user" : {
        "dynamic" : "false",
        "properties" : {
          "analyzed_name" : {
            "type" : "text",
            "analyzer" : "mmseg_maxword"
          },
          "id" : {
            "type" : "integer"
          },
          "likes_count" : {
            "type" : "integer"
          },
          "nickname" : {
            "type" : "keyword",
            "copy_to" : [
              "analyzed_name"
            ]
          }
        }
      }
    }
  }
}

When I make a simple query, the output become randomly:

curl 'localhost:9200/users/_search?pretty'  -d '{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "nickname": {
              "query": "十分",
              "boost": 100
            }
          }
        },
        {
          "match_phrase": {
            "analyzed_name": {
              "query": "十分",
              "boost": 50,
              "slop": 10
            }
          }
        },
        {
          "prefix": {
            "analyzed_name": "十分"
          }
        },
        {
          "match": {
            "analyzed_name": "十分"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}'

Expected Output:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 35,
    "successful": 35,
    "failed": 0
  },
  "hits": {
    "total": 166,
    "max_score": 2012.748,
    "hits": [
      {
        "_index": "users",
        "_type": "user",
        "_id": "1210186",
        "_score": 2012.748,
        "_source": {
          "nickname": "十分",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "2466113",
        "_score": 734.9008,
        "_source": {
          "nickname": "十分十分",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "2493231",
        "_score": 626.3631,
        "_source": {
          "nickname": "十分十分er",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "47203",
        "_score": 533.75287,
        "_source": {
          "nickname": "十分tenmin",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "530063",
        "_score": 533.75287,
        "_source": {
          "nickname": "落日十分",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "3305185",
        "_score": 533.75287,
        "_source": {
          "nickname": "九点十分",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "3334990",
        "_score": 533.75287,
        "_source": {
          "nickname": "日出十分",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "4382556",
        "_score": 533.75287,
        "_source": {
          "nickname": "十分好奇",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "4441797",
        "_score": 533.75287,
        "_source": {
          "nickname": "黎明十分",
          "likes_count": 1
        }
      },
      {
        "_index": "users",
        "_type": "user",
        "_id": "2615409",
        "_score": 527.52295,
        "_source": {
          "nickname": "林十分",
          "likes_count": 1
        }
      }
    ]
  }
}

Actually result:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "query_shard_exception",
        "reason" : "failed to create query: {\n  \"bool\" : {\n    \"should\" : [\n      {\n        \"match_phrase\" : {\n          \"nickname\" : {\n            \"query\" : \"十分\",\n            \"slop\" : 0,\n            \"boost\" : 100.0\n          }\n        }\n      },\n      {\n        \"match_phrase\" : {\n          \"analyzed_name\" : {\n            \"query\" : \"十分\",\n            \"slop\" : 10,\n            \"boost\" : 50.0\n          }\n        }\n      },\n      {\n        \"prefix\" : {\n          \"analyzed_name\" : {\n            \"value\" : \"十分\",\n            \"boost\" : 1.0\n          }\n        }\n      },\n      {\n        \"match\" : {\n          \"analyzed_name\" : {\n            \"query\" : \"十分\",\n            \"operator\" : \"OR\",\n            \"prefix_length\" : 0,\n            \"max_expansions\" : 50,\n            \"fuzzy_transpositions\" : true,\n            \"lenient\" : false,\n            \"zero_terms_query\" : \"NONE\",\n            \"boost\" : 1.0\n          }\n        }\n      }\n    ],\n    \"disable_coord\" : false,\n    \"adjust_pure_negative\" : true,\n    \"minimum_should_match\" : \"1\",\n    \"boost\" : 1.0\n  }\n}",
        "index_uuid" : "G5kCglksQQmnc21W7nJnAw",
        "index" : "users"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "users",
        "node" : "HDqZAuQORYeeLHmmhsEMTQ",
        "reason" : {
          "type" : "query_shard_exception",
          "reason" : "failed to create query: {\n  \"bool\" : {\n    \"should\" : [\n      {\n        \"match_phrase\" : {\n          \"nickname\" : {\n            \"query\" : \"十分\",\n            \"slop\" : 0,\n            \"boost\" : 100.0\n          }\n        }\n      },\n      {\n        \"match_phrase\" : {\n          \"analyzed_name\" : {\n            \"query\" : \"十分\",\n            \"slop\" : 10,\n            \"boost\" : 50.0\n          }\n        }\n      },\n      {\n        \"prefix\" : {\n          \"analyzed_name\" : {\n            \"value\" : \"十分\",\n            \"boost\" : 1.0\n          }\n        }\n      },\n      {\n        \"match\" : {\n          \"analyzed_name\" : {\n            \"query\" : \"十分\",\n            \"operator\" : \"OR\",\n            \"prefix_length\" : 0,\n            \"max_expansions\" : 50,\n            \"fuzzy_transpositions\" : true,\n            \"lenient\" : false,\n            \"zero_terms_query\" : \"NONE\",\n            \"boost\" : 1.0\n          }\n        }\n      }\n    ],\n    \"disable_coord\" : false,\n    \"adjust_pure_negative\" : true,\n    \"minimum_should_match\" : \"1\",\n    \"boost\" : 1.0\n  }\n}",
          "index_uuid" : "G5kCglksQQmnc21W7nJnAw",
          "index" : "users",
          "caused_by" : {
            "type" : "null_pointer_exception",
            "reason" : null
          }
        }
      }
    ],
    "caused_by" : {
      "type" : "query_shard_exception",
      "reason" : "failed to create query: {\n  \"bool\" : {\n    \"should\" : [\n      {\n        \"match_phrase\" : {\n          \"nickname\" : {\n            \"query\" : \"十分\",\n            \"slop\" : 0,\n            \"boost\" : 100.0\n          }\n        }\n      },\n      {\n        \"match_phrase\" : {\n          \"analyzed_name\" : {\n            \"query\" : \"十分\",\n            \"slop\" : 10,\n            \"boost\" : 50.0\n          }\n        }\n      },\n      {\n        \"prefix\" : {\n          \"analyzed_name\" : {\n            \"value\" : \"十分\",\n            \"boost\" : 1.0\n          }\n        }\n      },\n      {\n        \"match\" : {\n          \"analyzed_name\" : {\n            \"query\" : \"十分\",\n            \"operator\" : \"OR\",\n            \"prefix_length\" : 0,\n            \"max_expansions\" : 50,\n            \"fuzzy_transpositions\" : true,\n            \"lenient\" : false,\n            \"zero_terms_query\" : \"NONE\",\n            \"boost\" : 1.0\n          }\n        }\n      }\n    ],\n    \"disable_coord\" : false,\n    \"adjust_pure_negative\" : true,\n    \"minimum_should_match\" : \"1\",\n    \"boost\" : 1.0\n  }\n}",
      "index_uuid" : "G5kCglksQQmnc21W7nJnAw",
      "index" : "users",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  },
  "status" : 400
}

The Elasticsearch Log:

Caused by: org.elasticsearch.index.query.QueryShardException: failed to create query: {
  "bool" : {
    "should" : [
      {
        "match_phrase" : {
          "nickname" : {
            "query" : "十分",
            "slop" : 0,
            "boost" : 100.0
          }
        }
      },
      {
        "match_phrase" : {
          "analyzed_name" : {
            "query" : "十分",
            "slop" : 10,
            "boost" : 50.0
          }
        }
      },
      {
        "prefix" : {
          "nickname" : {
            "value" : "十分",
            "boost" : 1.0
          }
        }
      },
      {
        "match" : {
          "analyzed_name" : {
            "query" : "十分",
            "operator" : "OR",
            "prefix_length" : 0,
            "max_expansions" : 50,
            "fuzzy_transpositions" : true,
            "lenient" : false,
            "zero_terms_query" : "NONE",
            "boost" : 1.0
          }
        }
      }
    ],
    "disable_coord" : false,
    "adjust_pure_negative" : true,
    "minimum_should_match" : "1",
    "boost" : 1.0
  }
}
	at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:333) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:311) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.search.SearchService.parseSource(SearchService.java:671) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.search.SearchService.createContext(SearchService.java:540) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:516) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:251) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:298) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:295) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:610) [elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596) [elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.2.2.jar:5.2.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: java.lang.NullPointerException
	at com.chenlb.mmseg4j.rule.MaxMatchRule.isRemove(MaxMatchRule.java:28) ~[?:?]
	at com.chenlb.mmseg4j.rule.Rule.remainChunks(Rule.java:40) ~[?:?]
	at com.chenlb.mmseg4j.ComplexSeg.seg(ComplexSeg.java:96) ~[?:?]
	at com.chenlb.mmseg4j.MaxWordSeg.seg(MaxWordSeg.java:19) ~[?:?]
	at com.chenlb.mmseg4j.MMSeg.next(MMSeg.java:178) ~[?:?]
	at com.chenlb.mmseg4j.analysis.MMSegTokenizer.incrementToken(MMSegTokenizer.java:64) ~[?:?]
	at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:44) ~[lucene-core-6.4.1.jar:6.4.1 72f75b2503fa0aa4f0aff76d439874feb923bb0e - jpountz - 2017-02-01 14:43:32]

Any help is appreciated.

cannot create index in elasticsearch-1.6.0 with mmseg plugin installed.

I used the mmseg plugin with elasticsearch-1.5.2 for the past few months without any problems. I upgraded to elasticsearch-1.6.0 and now I have a problem with mmseg plugin.

PUT /blogs
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

{
   "error": "IndexCreationException[[blogs] failed to create index]; nested: NullPointerException; ",
   "status": 500
}

Here is the exception in elasticsearch.log:

[2015-07-02 16:55:20,603][DEBUG][action.admin.indices.create] [Micro] [blogs] failed to create
org.elasticsearch.indices.IndexCreationException: [blogs] failed to create index
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:338)
    at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:371)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
    at com.chenlb.mmseg4j.Dictionary.loadDic(Dictionary.java:160)
    at com.chenlb.mmseg4j.Dictionary.reload(Dictionary.java:364)
    at com.chenlb.mmseg4j.Dictionary.init(Dictionary.java:130)
    at com.chenlb.mmseg4j.Dictionary.<init>(Dictionary.java:123)
    at com.chenlb.mmseg4j.Dictionary.getInstance(Dictionary.java:74)
    at com.chenlb.mmseg4j.Dictionary.getInstance(Dictionary.java:63)
    at org.elasticsearch.index.analysis.MMsegTokenizerFactory.<init>(MMsegTokenizerFactory.java:31)
    at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.InjectorImpl$5$1.call(InjectorImpl.java:781)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.InjectorImpl$5.get(InjectorImpl.java:777)
    at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:221)
    at com.sun.proxy.$Proxy16.create(Unknown Source)
    at org.elasticsearch.index.analysis.AnalysisService.<init>(AnalysisService.java:82)
    at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
    at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
    at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
    at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
    at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:837)
    at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
    at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
    at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
    at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
    at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
    at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:830)
    at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
    at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:336)
    ... 7 more

Hope you can take a look at the problem. Thanks again for the great plugin!

mmseg4j更新问题

@medcl , mmseg4j 目前已经升级到了1.9版本,是否可以更新下mmseg ? 同时,目前版本中,wordsxxx.dic文件的词汇已经加载,但似乎并未生效,是否可以看看,thx.

mmseg插件安装完,elasticsearch启用失败

系统环境

Linux ubuntu 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:21:40 UTC 2015 i686 i686 i686 GNU/Linux

安装过程

下载官网最新的安装包,版本如下:

bin/elasticsearch --version
Version: 2.1.1, Build: 40e2c53/2015-12-15T13:05:55Z, JVM: 1.7.0_91

下载最新源码并编译,无错误信息,在target目录有如下内容:

drwxr-xr-x 8 root root  4096 Jan 16 04:25 ./
drwxr-xr-x 5 root root  4096 Jan 16 04:21 ../
drwxr-xr-x 2 root root  4096 Jan 16 04:25 archive-tmp/
drwxr-xr-x 4 root root  4096 Jan 16 04:22 classes/
-rw-r--r-- 1 root root 59244 Jan 16 04:22 elasticsearch-analysis-mmseg-1.7.0.jar
-rw-r--r-- 1 root root 33646 Jan 16 04:23 elasticsearch-analysis-mmseg-1.7.0-sources.jar
drwxr-xr-x 3 root root  4096 Jan 16 04:22 generated-sources/
drwxr-xr-x 2 root root  4096 Jan 16 04:22 maven-archiver/
drwxr-xr-x 2 root root  4096 Jan 16 04:25 releases/
drwxr-xr-x 2 root root  4096 Jan 16 04:25 surefire/

elasticsearch-analysis-mmseg-1.7.0.jar拷贝到/usr/share/elasticsearch/plugins/mmseg/目录下。
下载最新的conf/mmseg,并将mmseg目录放在/etc/elasticsearch/目录下。
在文件/etc/elasticsearch/elasticsearch.yml中增加内容:

index:
  analysis: 
    analyzer:
      mmseg_maxword:
        type: custom
        filter:
        - lowercase
        tokenizer: mmseg_maxword
      mmseg_maxword_with_cut_letter_digi:
        type: custom
        filter:
        - lowercase
        - cut_letter_digit
        tokenizer: mmseg_maxword 

重启elasticsearch,发现elasticsearch启动不了。也就是说刚启动elasticsearch时,还存在elasticsearch进程,过一会儿elasticsearch进程就自动退出了,请问我哪儿配置错了么?

Unknown tokenizer type [mmseg] for [mmseg_maxword]

服务器环境:elasticsearch5.5.1
分词插件版本:5.5.1
客户端程序环境:spring-boot
运行客户端程序报错:Caused by: java.lang.IllegalArgumentException: Unknown tokenizer type [mmseg] for [mmseg_maxword]
at org.elasticsearch.index.analysis.AnalysisRegistry.getAnalysisProvider(AnalysisRegistry.java:387) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:338) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:176) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:154) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.index.IndexService.(IndexService.java:145) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:363) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:449) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:414) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$1.execute(MetaDataCreateIndexService.java:366) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:634) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:612) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:571) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-5.5.1.jar:5.5.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.5.1.jar:5.5.1]
相关配置:
{
"analysis": {

"tokenizer":{
    
   "mmseg_maxword":{
      "type":"mmseg",
      "seg_type":"max_word"
   },
  "mmseg_complex":{
    "type":"mmseg",
    "seg_type":"complex"
  },
  "mmseg_simple":{
    "type":"mmseg",
    "seg_type":"simple"
  },
  "semicolon_spliter":{
    "type":"pattern",
    "seg_type":";"
  },
  "pct_spliter":{
    "type":"pattern",
    "seg_type":"[%]+"
  }
},

"filter": {

  "ngram_min_2":{
    "max_gram": 10,
    "min_gram": 2,
    "type": "nGram"
  },
  "ngram_min_1":{
    "max_gram": 10,
    "min_gram": 1,
    "type": "nGram"
  },
  "min2_length":{
    "min": 2,
    "max": 4,
    "type": "length"
  },
  "analyzer":{
    "default":{
      "type":"keyword"
    },
    "lowercase_keyword":{
       "type":"custom",
       "filter":"[standard,lowercase]",
       "tokenizer":"standard"
    },
    "lowercase_keyword_ngram_min_size1":{
      "type":"custom",
      "filter":"[ngram_min_1,standard,lowercase] ",
      "tokenizer":"nGram"
    },
    "lowercase_keyword_ngram_min_size2":{
      "type":"custom",
      "filter":"[ngram_min_2,standard,lowercase,min2_length,stop] ",
      "tokenizer":"nGram"
    },
    "lowercase_keyword_ngram":{
      "type":"custom",
      "filter":"[ngram_min_1,standard,lowercase] ",
      "tokenizer":"nGram"
    },
    "lowercase_keyword_without_standard":{
      "type":"custom",
      "filter":"[lowercase]",
      "tokenizer":"keyword"
    },
    "lowercase_whitespace":{
      "type":"custom",
      "filter":"[lowercase]",
      "tokenizer":"whitespace"
    },
  
    "mmseg":{
      "alias":"[mmseg_analyzer]",
      "type":"org.elasticsearch.index.analysis.MMsegAnalyzerProvider"
    },
    "comma_spliter":{
      "pattern":"[,|\\s]+",
      "type":"pattern"
    },
    "pct_spliter":{
      "pattern":"[%]+",
      "type":"pattern"
    },
    "custom_snowball_analyzer":{
      "language":"English",
      "type":"snowball"
    },
    "simple_english_analyzer":{
      "tokenizer":"whitespace",
      "filter":"[standard,lowercase,snowball]",
      "type":"custome"
    },
    "edge_ngram":{
      "tokenizer":"edgeNGram",
      "filter":"[lowercase]",
      "type":"custome"
    },
 
    "custom_auth_en_analyzer":{
      "tokenizer":"semicolon_spliter",
      "filter":"[standard,snowball,lowercase,trim]",
      "type":"custome"
    }
  }
}

}
}

打包出错

[ERROR] Failed to execute goal on project elasticsearch-analysis-mmseg: Could not resolve dependencies for project org.elasticsearch:elasticsearch-analysis-mmseg:jar:1.9.1: Failed to collect dependencies for [org.elasticsearch:elasticsearch:jar:2.3.1 (compile), org.hamcrest:hamcrest-core:jar:1.3.RC2 (test), org.hamcrest:hamcrest-library:jar:1.3.RC2 (test)]: Failed to read artifact descriptor for org.elasticsearch:elasticsearch:jar:2.3.1: Could not transfer artifact org.elasticsearch:elasticsearch:pom:2.3.1 from/to oss.sonatype.org (http://oss.sonatype.org/content/repositories/releases/): oss.sonatype.org: 未知的名称或服务: Unknown host oss.sonatype.org: 未知的名称或服务 -> [Help 1]
请问应该怎么解决?

mmseg cause elasticsearch 5.2.0 shards unassigned

我用elasticsearch 5.2.0 上的mmseg的时候,经常会出现mmseg导致shards unassigned的情况。
比如这个配置:
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.routing.allocation.include.cluster": "rec",
"index.mapper.dynamic": false,
"analysis": {
"filter": {
"stop_filter": {
"type": "stop",
"stopwords_path": "stopwords.txt",
"ignore_case": true
},
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt",
"ignore_case": true,
"expand": true
}
},
"analyzer": {
"text_auto": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
},
"text_complex": {
"type": "custom",
"tokenizer": "mmseg_complex",
"filter": [
"stop_filter",
"synonym_filter",
"lowercase"
]
},
"text_maxword": {
"type": "custom",
"tokenizer": "mmseg_maxword",
"filter": [
"stop_filter",
"synonym_filter",
"lowercase"
]
},
"standard_text": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
},
在bulk 建索引的时候,会报这个错误:
[2017-02-15T11:54:11,011][DEBUG][o.e.a.b.TransportShardBulkAction] [datagrand-data-1] [duizhuang_v4][1] failed to execute bulk item (index) index {[duizhuang][duizhuang][878791], source[{"itemid": "878791", "last_update_time": 1487051295, "title": "H\u98d8\u5f69\u7ae5\u5b50\u4f5b\uff0c\u7389\u8d28\u7ec6\u817b\uff0c\u79cd\u8001\u7eaf\u51c0\u3002\u5b8c\u7f8e\uff0c\u4eae\u5ea6\u9ad8\uff0c\u54c1\u8d28\u7cbe\u54c1\uff01\u5c3a\u5bf84016.36.6\u3002\u8d85\u503c\u63a8\u83508", "item_score": 333000.0, "price": 500000, "item_modify_time": 1486590407, "title1": "H\u98d8\u5f69\u7ae5\u5b50\u4f5b\uff0c\u7389\u8d28\u7ec6\u817b\uff0c\u79cd\u8001\u7eaf\u51c0\u3002\u5b8c\u7f8e\uff0c\u4eae\u5ea6\u9ad8\uff0c\u54c1\u8d28\u7cbe\u54c1\uff01\u5c3a\u5bf84016.36.6\u3002\u8d85\u503c\u63a8\u83508", "create_time": 1486713188, "item_tags": "", "cateid": "2_13", "id": "878791"}]}
java.lang.NullPointerException: null

从而导致elasticsearch 的shards allocate失败,只剩一个primary replica可以work,
error

点击unassigned的1分片,可以看到错误如下:
"details": "failed recovery, failure RecoveryFailedException[[duizhuang_v4][1]: Recovery failed from {datagrand-data-1}{de1cbfW0TT60i0I0WF9Dnw}{_8mhUy4DT4qwGXEEJLzf0w}{10.45.139.177}{10.45.139.177:9300}{cluster=rec, cpu=high, disk=low, cluster_awareness=rec, memory=high, ssd=yes} into {datagrand-data-3}{el9GeOaEREmcp_s33BX1YA}{DEFEaqphRhSCxqIdYtJJFw}{10.26.241.224}{10.26.241.224:9300}{cluster=rec, disk=low, cluster_awareness=rec, memory=medium, ssd=yes, cpu=high}]; nested: RemoteTransportException[[datagrand-data-1][10.45.139.177:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[2] phase2 failed]; nested: RemoteTransportException[[datagrand-data-3][10.26.241.224:9300][internal:index/shard/recovery/translog_ops]]; nested: BatchOperationException[failed to apply batch translog operation]; nested: NullPointerException; ",

关于字典的填加问题

我听说elasticsearch-analysis-mmseg可以动态填加字典,mmseg/目录下有若干的字典,试问,哪个是关于停止词填加的文件?添加完之后,需要重启elasticsearch么?

not suppor elasticsearch0.19.4

new MMsegAnalyzerProvider src

/*

  • Licensed to Elastic Search and Shay Banon under one
  • or more contributor license agreements. See the NOTICE file
  • distributed with this work for additional information
  • regarding copyright ownership. Elastic Search licenses this
  • file to you under the Apache License, Version 2.0 (the
  • "License"); you may not use this file except in compliance
  • with the License. You may obtain a copy of the License at
    *
  • http://www.apache.org/licenses/LICENSE-2.0
  • Unless required by applicable law or agreed to in writing,
  • software distributed under the License is distributed on an
  • "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  • KIND, either express or implied. See the License for the
  • specific language governing permissions and limitations
  • under the License.
    */

package org.elasticsearch.index.analysis;

import java.io.File;

import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.settings.IndexSettings;

import com.chenlb.mmseg4j.analysis.MMSegAnalyzer;

/**

  • Created by IntelliJ IDEA. User: Medcl' Date: 8/2/11 Time: 4:44 PM
    */
    public class MMsegAnalyzerProvider extends
    AbstractIndexAnalyzerProvider {

    private final MMSegAnalyzer analyzer;

    @OverRide
    public MMSegAnalyzer get() {
    return this.analyzer;
    }

    @Inject
    public MMsegAnalyzerProvider(Index index,
    @IndexSettings Settings indexSettings, Environment env,
    @assisted String name, @assisted Settings settings) {
    super(index, indexSettings, name, settings);

    Environment environment = new Environment(settings);
    
    String path = new File(environment.configFile(), "mmseg").getPath();
    analyzer = new MMSegAnalyzer(path);
    

    }

    /*

    • @OverRide public String name() { return "ik"; }
    • @OverRide public AnalyzerScope scope() { return AnalyzerScope.INDEX; }
      */

    public MMsegAnalyzerProvider(Index index, Settings indexSettings,
    String name, Settings settings) {
    super(index, indexSettings, name, settings);
    Environment environment = new Environment(settings);

    String path = new File(environment.configFile(), "mmseg").getPath();
    analyzer = new MMSegAnalyzer(path);
    

    }

    public MMsegAnalyzerProvider(Index index, Settings indexSettings,
    String prefixSettings, String name, Settings settings) {
    super(index, indexSettings, prefixSettings, name, settings);
    Environment environment = new Environment(settings);

    String path = new File(environment.configFile(), "mmseg").getPath();
    analyzer = new MMSegAnalyzer(path);
    

    }

}

this src is ok?

ERROR: Could not find plugin descriptor 'plugin-descriptor.properties' in plugin zip

plugin install medcl/elasticsearch-analysis-mmseg
or
plugin install medcl/elasticsearch-analysis-mmseg/master

ERROR: Could not find plugin descriptor 'plugin-descriptor.properties' in plugin zip

elasticsearch -V
Version: 2.3.2, Build: b9e4a6a/2016-04-21T16:03:47Z, JVM: 1.8.0_65

install logs:

plugin install medcl/elasticsearch-analysis-mmseg/master
-> Installing medcl/elasticsearch-analysis-mmseg/master...
Trying https://download.elastic.co/medcl/elasticsearch-analysis-mmseg/elasticsearch-analysis-mmseg-master.zip ...
Trying https://search.maven.org/remotecontent?filepath=medcl/elasticsearch-analysis-mmseg/master/elasticsearch-analysis-
mmseg-master.zip ...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/medcl/elasticsearch-analysis-mmseg/master/el
asticsearch-analysis-mmseg-master.zip ...
Trying https://github.com/medcl/elasticsearch-analysis-mmseg/archive/master.zip ...
Downloading ............................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
................DONE
Verifying https://github.com/medcl/elasticsearch-analysis-mmseg/archive/master.zip checksums if available ...
NOTE: Unable to verify checksum for downloaded plugin (unable to find .sha1 or .md5 file to verify)
ERROR: Could not find plugin descriptor 'plugin-descriptor.properties' in plugin zip

Mmseg 能实现关键‘’名词‘’提取吗?

问题1:

搜索的可能是一句话 :例如 生姜能治很多常见病
但是 其实重要的信息 : 生姜 ,常见病;
像 能治 很多。。不太重要
这里能实现 提取关键名词吗

Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4.1:single (default) on project elasticsearch-analysis-mmseg: Assembly is incorrectly configured: null: Assembly is incorrectly configured: null:
[ERROR] Assembly: null is not configured correctly: Assembly ID must be present and non-empty.

I add

<id>releases</id>

to src/main/assemblies/plugin.xml to solve this problem.

请教下mmseg插件如何实现词典的在线升级?

你好,目前我们产品在使用elasticsearch-analysis-mmseg分词插件时如果需要升级词库,都是手动来升级的,需要在各个es节点替换或增加dict文件然后重启es生效,我看源码com.chenlb.mmseg4j.Dictionary中也提供了wordsFileIsChange和reload方法来判断词典是否更新和重新加),但是对外部应用来说应该无法直接访问,请问这一块是如何考虑的,es是否有机制能够让业务代码调用(google出来solor好像有MMseg4jHandler这种机制来处理)?
另外,我看ik插件已经支持远程更新,后续mmseg是否也会考虑支持这样的能力?
烦请解答下疑问,多谢!

Plugin install failed

$ bin/plugin -install medcl/elasticsearch-analysis-mmseg/1.1.2 -verbose
-> Installing medcl/elasticsearch-analysis-mmseg/1.1.2...
Trying http://download.elasticsearch.org/medcl/elasticsearch-analysis-mmseg/elasticsearch-analysis-mmseg-1.1.2.zip...
Failed: IOException[Can't get http://download.elasticsearch.org/medcl/elasticsearch-analysis-mmseg/elasticsearch-analysis-mmseg-1.1.2.zip to /home/maralla/elasticsearch-0.20.6/plugins/analysis-mmseg.zip]; nested: FileNotFoundException[http://download.elasticsearch.org/medcl/elasticsearch-analysis-mmseg/elasticsearch-analysis-mmseg-1.1.2.zip]; nested: FileNotFoundException[http://download.elasticsearch.org/medcl/elasticsearch-analysis-mmseg/elasticsearch-analysis-mmseg-1.1.2.zip];
Trying http://search.maven.org/remotecontent?filepath=medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip...
Failed: IOException[Can't get http://search.maven.org/remotecontent?filepath=medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip to /home/maralla/elasticsearch-0.20.6/plugins/analysis-mmseg.zip]; nested: FileNotFoundException[http://search.maven.org/remotecontent?filepath=medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip]; nested: FileNotFoundException[http://search.maven.org/remotecontent?filepath=medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip];
Trying https://oss.sonatype.org/service/local/repositories/releases/content/medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip...
Failed: IOException[Can't get https://oss.sonatype.org/service/local/repositories/releases/content/medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip to /home/maralla/elasticsearch-0.20.6/plugins/analysis-mmseg.zip]; nested: FileNotFoundException[https://oss.sonatype.org/service/local/repositories/releases/content/medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip]; nested: FileNotFoundException[https://oss.sonatype.org/service/local/repositories/releases/content/medcl/elasticsearch-analysis-mmseg/1.1.2/elasticsearch-analysis-mmseg-1.1.2.zip];
Trying https://github.com/medcl/elasticsearch-analysis-mmseg/zipball/v1.1.2... (assuming site plugin)
Failed: IOException[Can't get https://github.com/medcl/elasticsearch-analysis-mmseg/zipball/v1.1.2 to /home/maralla/elasticsearch-0.20.6/plugins/analysis-mmseg.zip]; nested: FileNotFoundException[https://nodeload.github.com/medcl/elasticsearch-analysis-mmseg/legacy.zip/v1.1.2]; nested: FileNotFoundException[https://nodeload.github.com/medcl/elasticsearch-analysis-mmseg/legacy.zip/v1.1.2];
Failed to install medcl/elasticsearch-analysis-mmseg/1.1.2, reason: failed to download out of all possible locations..., use -verbose to get detailed information

The workaround is download from this link:
https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/analysis-mmseg

and install it from file.

Maybe this should be documented.

支持ES6.X的Mmseg还会开发么?

我们生产的ES分词一直用的Mmseg,因为升级的需要,我们自己改了一下插件源码想让其能够兼容6.X的ES。但是在做reindex的时候,会有offset的问题。报错如下,问题原因跟这个issue类似([https://github.com/infinilabs/analysis-pinyin/issues/143]
java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=2,lastStartOffset=6 for field 'list_number'
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:767) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:240) ~[lucene-core-7.2.1.jar:7.2.1 b2b6438b37073bee1fca40374e85bf91aa457c0b - ubuntu - 2018-01-10 00:48:43]
……

mmseg插件不能正常使用

medcl 你好:
我使用的mmseg插件为1.8.0版本,es为2.2.0版本。
git mmseg插件源码后,进行了mvn编译,之后将releases中的elasticsearch-analysis-mmseg-1.8.0.zip文件解压并放到elasticsearch-2.2.0/plugins/elasticsearch-analysis-mmseg-1.8.0文件夹中。
然后启动es,进行了分词的测试,输入的代码为首页的测试代码,在Create a mapping的时候出现了一个错误:
"type": "mapper_parsing_exception",
"reason": "analyzer [mmseg_maxword] not found for field [content]"
之后我又将Create a mapping代码中的mmseg_maxword替换成mmseg_complex,创建成功
之后我尝试使用了你的elasticsearch-rtf版本,没有出现上面的问题,创建成功
请问这个是什么问题,需要怎样解决
感谢

    在使用elasticsearch-rtf版本成功后,我怀疑是我用maven编译的时候可能出现了问题,之后将elasticsearch-rtf版本的elasticsearch-analysis-mmseg-1.8.0.jar包覆盖了我之前的包,创建成功,问题解决

ElasticSearchIllegalArgumentException[failed to find analyzer [mmseg]]

我按你说的将rtf版本的mmseg及其配置文件都cp到相应的位置

从日志上看已经成功加载plugin以及词典文件

但是执行分析器测试的时候就是找不到analyzer,我按装了mmseg,ik,pinyin,都是这样

这是启动的日志:

[2013-05-31 01:33:23,058][INFO ][node                     ] [Tarot] {0.90.0}[15503]: initializing ...
[2013-05-31 01:33:23,079][INFO ][plugins                  ] [Tarot] loaded [analysis-mmseg, analysis-pinyin, analysis-ik], sites [head]
[2013-05-31 01:33:25,599][INFO ][node                     ] [Tarot] {0.90.0}[15503]: initialized
[2013-05-31 01:33:25,600][INFO ][node                     ] [Tarot] {0.90.0}[15503]: starting ...
[2013-05-31 01:33:25,717][INFO ][transport                ] [Tarot] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.105:9300]}
[2013-05-31 01:33:28,808][INFO ][cluster.service          ] [Tarot] new_master [Tarot][QyqG17rTTzuaIZvhOMvBNw][inet[/192.168.1.105:9300]], reason: zen-disco-join (elected_as_master)
[2013-05-31 01:33:28,864][INFO ][discovery                ] [Tarot] elasticsearch/QyqG17rTTzuaIZvhOMvBNw
[2013-05-31 01:33:28,885][INFO ][http                     ] [Tarot] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.105:9200]}
[2013-05-31 01:33:28,885][INFO ][node                     ] [Tarot] {0.90.0}[15503]: started
[2013-05-31 01:33:29,187][INFO ][index.analysis           ] [Tarot] [dls] /etc/elasticsearch/mmseg
[2013-05-31 01:33:29,371][INFO ][com.chenlb.mmseg4j.Dictionary] chars loaded time=182ms, line=12638, on file=/etc/elasticsearch/mmseg/chars.dic
[2013-05-31 01:33:29,375][INFO ][com.chenlb.mmseg4j.Dictionary] words loaded time=1ms, line=1, on file=/etc/elasticsearch/mmseg/words-my.dic
[2013-05-31 01:33:29,576][INFO ][com.chenlb.mmseg4j.Dictionary] words loaded time=201ms, line=157202, on file=/etc/elasticsearch/mmseg/words.dic
[2013-05-31 01:33:29,576][INFO ][com.chenlb.mmseg4j.Dictionary] load all dic use time=387ms
[2013-05-31 01:33:29,577][INFO ][com.chenlb.mmseg4j.Dictionary] unit loaded time=1ms, line=22, on file=/etc/elasticsearch/mmseg/units.dic
[2013-05-31 01:33:29,579][INFO ][index.analysis           ] [Tarot] [dls] /etc/elasticsearch/mmseg
[2013-05-31 01:33:29,584][INFO ][index.analysis           ] [Tarot] [dls] /etc/elasticsearch/mmseg
[2013-05-31 01:33:29,586][INFO ][index.analysis           ] [Tarot] [dls] /etc/elasticsearch/mmseg
[2013-05-31 01:33:29,664][INFO ][ik-analyzer              ] [Dict Loading] ik/IKAnalyzer.cfg.xml
[2013-05-31 01:33:30,460][INFO ][ik-analyzer              ] 加载扩展词典:ik/custom/mydict.dic
[2013-05-31 01:33:30,461][INFO ][ik-analyzer              ] 加载扩展停止词典:ik/custom/ext_stopword.dic
[2013-05-31 01:33:31,211][INFO ][gateway                  ] [Tarot] recovered [1] indices into cluster_state

这是找不到analyzer时的日志

[2013-05-31 01:34:08,970][DEBUG][action.admin.indices.analyze] [Tarot] failed to execute [org.elasticsearch.action.admin.indices.analyze.AnalyzeRequest@32b9bd47]
org.elasticsearch.ElasticSearchIllegalArgumentException: failed to find analyzer [mmseg]
    at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:147)
    at org.elasticsearch.action.admin.indices.analyze.TransportAnalyzeAction.shardOperation(TransportAnalyzeAction.java:57)
    at org.elasticsearch.action.support.single.custom.TransportSingleCustomOperationAction$AsyncSingleAction$1.run(TransportSingleCustomOperationAction.java:142)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

can not read dictionary when start up Elasticsearch.

Can not start up Elasticsearch and the message is:

org.elasticsearch.bootstrap.StartupException: ElasticsearchException[Failed to load plugin class [org.elasticsearch.plugin.analysis.mmseg.AnalysisMMsegPlugin]]; nested: InvocationTargetException; nested: NullPointerException; at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:125) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:112) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.cli.SettingCommand.execute(SettingCommand.java:54) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:96) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.cli.Command.main(Command.java:62) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:89) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:82) ~[elasticsearch-5.0.2.jar:5.0.2] Caused by: org.elasticsearch.ElasticsearchException: Failed to load plugin class [org.elasticsearch.plugin.analysis.mmseg.AnalysisMMsegPlugin] at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:462) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:414) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:144) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.node.Node.<init>(Node.java:278) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.node.Node.<init>(Node.java:217) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:196) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:196) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:291) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.0.2.jar:5.0.2] ... 6 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?] at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_45] at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:454) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:414) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:144) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.node.Node.<init>(Node.java:278) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.node.Node.<init>(Node.java:217) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:196) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:196) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:291) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.0.2.jar:5.0.2] ... 6 more Caused by: java.lang.NullPointerException at com.chenlb.mmseg4j.Dictionary.loadDic(Dictionary.java:163) ~[?:?] at com.chenlb.mmseg4j.Dictionary.reload(Dictionary.java:367) ~[?:?] at com.chenlb.mmseg4j.Dictionary.init(Dictionary.java:133) ~[?:?] at com.chenlb.mmseg4j.Dictionary.<init>(Dictionary.java:45) ~[?:?] at com.chenlb.mmseg4j.Dictionary.getInstance(Dictionary.java:84) ~[?:?] at com.chenlb.mmseg4j.Dictionary.getInstance(Dictionary.java:67) ~[?:?] at org.elasticsearch.plugin.analysis.mmseg.AnalysisMMsegPlugin.<init>(AnalysisMMsegPlugin.java:35) ~[?:?] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?] at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_45] at org.elasticsearch.plugins.PluginsService.loadPlugin(PluginsService.java:454) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:414) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:144) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.node.Node.<init>(Node.java:278) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.node.Node.<init>(Node.java:217) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:196) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:196) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:291) ~[elasticsearch-5.0.2.jar:5.0.2] at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:121) ~[elasticsearch-5.0.2.jar:5.0.2] ... 6 more

I checked the source code and found the method getDictRoot() in Class Dictionary get the wrong path. So I used getDefalutPath() and pass the dictionary path through -Dmmseg.dic.path="$MMSEG_DICT_PATH". It works.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.