jprante / elasticsearch-analysis-decompound Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 38.0 6.3 MB

Decompounding Plugin for Elasticsearch

License: GNU General Public License v2.0

Java 100.00%

elasticsearch-analysis-decompound's People

Contributors

Stargazers

Watchers

elasticsearch-analysis-decompound's Issues

java.lang.NumberFormatException: For input string: ""

Hi!

Thanks for your plugin.

Sometime I get exception:

[2013-06-06 16:57:49,918][DEBUG][action.bulk              ] [Quantum] [2] failed to execute bulk item (index) index 
java.lang.NumberFormatException: For input string: "" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:504)
    at java.lang.Integer.parseInt(Integer.java:527)
    at org.elasticsearch.analysis.decompound.Decompounder.reduceToBaseForm(Decompounder.java:223)
    at org.elasticsearch.analysis.decompound.Decompounder.decompound(Decompounder.java:61)
    at org.elasticsearch.index.analysis.DecompoundTokenFilter.decompound(DecompoundTokenFilter.java:68)
    at org.elasticsearch.index.analysis.DecompoundTokenFilter.incrementToken(DecompoundTokenFilter.java:55)
    at org.apache.lucene.analysis.miscellaneous.UniqueTokenFilter.incrementToken(UniqueTokenFilter.java:55)
    at org.apache.lucene.analysis.de.GermanNormalizationFilter.incrementToken(GermanNormalizationFilter.java:57)
    at org.elasticsearch.common.lucene.all.AllTokenStream.incrementToken(AllTokenStream.java:57)
    at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:202)
    at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2328)
    at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:583)
    at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:489)
    at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:330)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:158)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:431)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)

Tried to debug your module but can't find anything. It happens from time to time (when I try to make bulk reindex, 50-100 docs per time).

eg: first time it crashes but second time it works correctly with same data.

Do you have any thoughts about the problem?

Thanks a lot anyway!

Failure to decompose "Taschenhersteller"

Hi,

First of all, thanks for your plugin, which could avoid to use the obscure compound word token filter with hyphenation_decompounder (https://www.elastic.co/guide/en/elasticsearch/reference/2.0/analysis-compound-word-tokenfilter.html)

Having said that I cannot decompose "Taschenhersteller" which is a german word which should be decomposed as 2 words : Taschen & Hersteller
Having installed your plugin, I made the following (possibly erroneous) mapping :

-XPOST localhost:9200/my_index {
  "index": {
    "analysis": {
      "filter": {
        "decomp": {
          "type": "decompound"
        }
      },
      "tokenizer": {
        "decomp": {
          "type": "standard",
          "filter": [
            "decomp"
          ]
        }
      },
      "analyzer": {
        "my_anal": {
          "type": "custom",
          "tokenizer": "decomp"
        }
      }
    },
    "mappings": {
      "type1": {
        "properties": {
          "field1": {
            "type": "string",
            "analyzer": "my_anal"
          }
        }
      }
    }
  }
}

When trying to analyze the text "Taschenhersteller"

-XPOST localhost:9200/my_index {
    "analyzer": "my_anal",
    "text": "Taschenhersteller"
}

It gives me

{
    "tokens": [
        {
            "token": "Taschenhersteller",
            "start_offset": 0,
            "end_offset": 17,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Don't understand what I'm doing wrong ....

Could you help me please ? :)

Release for 5.6.0 and 6?

We would be grateful for releases for these elasticsearch versions.

xbib.com expired

Seems your website (xbib.com) expired, so downloads are not available anymore.
Probably a good occasion to also tackle #29 and set up an actual CI system. :)

plugin [decompound] is incompatible with version [5.4.0]; was designed for version [5.1.1]

-> Downloading http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-analysis-decompound/5.4.0.0/elasticsearch-analysis-decompound-5.4.0.0-plugin.zip
[=================================================] 100%
Exception in thread "main" java.lang.IllegalArgumentException: plugin [decompound] is incompatible with version [5.4.0]; was designed for version [5.1.1]
at org.elasticsearch.plugins.PluginInfo.readFromProperties(PluginInfo.java:146)
at org.elasticsearch.plugins.InstallPluginCommand.verify(InstallPluginCommand.java:428)
at org.elasticsearch.plugins.InstallPluginCommand.install(InstallPluginCommand.java:495)
at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:215)
at org.elasticsearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:199)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:69)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.Command.main(Command.java:88)
at org.elasticsearch.plugins.PluginCli.main(PluginCli.java:47)

Running on:
[2017-05-15T11:02:40,575][INFO ][o.e.n.Node ] version[5.4.0], pid[9016], build[780f8c4/2017-04-28T17:43:27.229Z], OS[Mac OS X/10.12.4/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_25/25.25-b02]

Failure to decompound "Kinderzahnheilkunde"

The plugin fails to decompound the German word "Kinderzahnheilkunde". The resulting tokens are ["kinderzahnheilkunde", "kinderzahnhe", "ilkunde"]. The expected tokes are ["kinderzahnheilkunde", "kinder", "zahn", "heil", "kunde"].

I'm using plugin Version 2.2.0.0 and elasticsearch 2.2.0.

Index settings are

{
        "analysis": {
            "filter": {
                "german_stop": {
                    "type": "stop",
                    "stopwords": "_german_"
                },
                "german_stemmer": {
                    "type": "stemmer",
                    "language": "light_german"
                },
                "german_decompound": {
                    "type": "decompound"
                }
            },
            "analyzer": {
                "german_with_decompounder": {
                    "tokenizer": "standard",
                    "filter": [
                            "lowercase",
                            "german_decompound",
                            "unique",
                            "german_stop",
                            "german_normalization",
                            "german_stemmer"
                    ]
                }
            }
        }
    }

I got the results from the _analyze API with the explain=true option.

{
    "detail": {
        "custom_analyzer": true,
        "charfilters": [
        ],
        "tokenizer": {
            "name": "standard",
            "tokens": [
                {
                    "token": "Kinderzahnheilkunde",
                    "start_offset": 0,
                    "end_offset": 19,
                    "type": "<ALPHANUM>",
                    "position": 0,
                    "bytes": "[4b 69 6e 64 65 72 7a 61 68 6e 68 65 69 6c 6b 75 6e 64 65]",
                    "positionLength": 1
                }
            ]
        },
        "tokenfilters": [
            {
                "name": "lowercase",
                "tokens": [
                    {
                        "token": "kinderzahnheilkunde",
                        "start_offset": 0,
                        "end_offset": 19,
                        "type": "<ALPHANUM>",
                        "position": 0,
                        "bytes": "[6b 69 6e 64 65 72 7a 61 68 6e 68 65 69 6c 6b 75 6e 64 65]",
                        "positionLength": 1
                    }
                ]
            },
            {
                "name": "german_decompound",
                "tokens": [
                    {
                        "token": "kinderzahnheilkunde",
                        "start_offset": 0,
                        "end_offset": 19,
                        "type": "<ALPHANUM>",
                        "position": 0,
                        "bytes": "[6b 69 6e 64 65 72 7a 61 68 6e 68 65 69 6c 6b 75 6e 64 65]",
                        "keyword": false,
                        "positionLength": 1
                    },
                    {
                        "token": "kinderzahnhe",
                        "start_offset": 0,
                        "end_offset": 19,
                        "type": "<ALPHANUM>",
                        "position": 0,
                        "bytes": "[6b 69 6e 64 65 72 7a 61 68 6e 68 65]",
                        "keyword": false,
                        "positionLength": 1
                    },
                    {
                        "token": "ilkunde",
                        "start_offset": 0,
                        "end_offset": 19,
                        "type": "<ALPHANUM>",
                        "position": 0,
                        "bytes": "[69 6c 6b 75 6e 64 65]",
                        "keyword": false,
                        "positionLength": 1
                    }
                ]
            },

Any suggestions to receive better results are well appreciated. Thanks

Can not get decompound to work

I have the current Elasticsearch version (1.5.2) and tried to setup decompound with the thin readme. I got not the expected results.

PUT /leads
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "decomp": {
            "type": "decompound"
          }
        },
        "tokenizer": {
          "decomp": {
            "type": "standard",
            "filter": [
              "decomp"
            ]
          }
        }
      }
    }
  }
}

Tested with:
GET leads/_analyze?
{Die Jahresfeier der Rechtsanwaltskanzleien auf dem Donaudampfschiff hat viel Ökosteuer gekostet}
Results in, which is not the same as shown in the readme:

{
   "tokens": [
      {
         "token": "die",
         "start_offset": 1,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "jahresfeier",
         "start_offset": 5,
         "end_offset": 16,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "der",
         "start_offset": 17,
         "end_offset": 20,
         "type": "<ALPHANUM>",
         "position": 3
      },
      {
         "token": "rechtsanwaltskanzleien",
         "start_offset": 21,
         "end_offset": 43,
         "type": "<ALPHANUM>",
         "position": 4
      },
      {
         "token": "auf",
         "start_offset": 44,
         "end_offset": 47,
         "type": "<ALPHANUM>",
         "position": 5
      },
      {
         "token": "dem",
         "start_offset": 48,
         "end_offset": 51,
         "type": "<ALPHANUM>",
         "position": 6
      },
      {
         "token": "donaudampfschiff",
         "start_offset": 52,
         "end_offset": 68,
         "type": "<ALPHANUM>",
         "position": 7
      },
      {
         "token": "hat",
         "start_offset": 69,
         "end_offset": 72,
         "type": "<ALPHANUM>",
         "position": 8
      },
      {
         "token": "viel",
         "start_offset": 73,
         "end_offset": 77,
         "type": "<ALPHANUM>",
         "position": 9
      },
      {
         "token": "ökosteuer",
         "start_offset": 78,
         "end_offset": 87,
         "type": "<ALPHANUM>",
         "position": 10
      },
      {
         "token": "gekostet",
         "start_offset": 88,
         "end_offset": 96,
         "type": "<ALPHANUM>",
         "position": 11
      }
   ]
}

Equivalent setup in via java api did not change the outcome.

        final XContentBuilder mappingBuilder2 = jsonBuilder()
            .startObject()
                .startObject("index") // decompound filter
                    .startObject("analysis")
                        .startObject("filter")
                            .startObject("decomp").field("type", "decompound").endObject()
                        .endObject()
                        .startObject("tokenizer")
                            .startObject("decomp").field("type", "standard")
                            .startArray("filter")                                               
                                .field("decomp")
                            .endArray()
                            .endObject()
                        .endObject()
                    .endObject()
                .endObject()
            .endObject();


           final CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(indexName);
     createIndexRequestBuilder.setSettings(ImmutableSettings.settingsBuilder().loadFromSource(mappingBuilder2.string()));

I tried also the your pack of plugins with the same result.
And yes I did the restart of my test elasticsearch server, otherwise it should have bailed out to create a filter of type decompound.

Support for ES 2.2.1

When I try to install the plugin, I get this error:

ERROR: Plugin [decompound] is incompatible with Elasticsearch [2.2.1]. Was designed for version [2.2.0]

Is there going to be a release for this version?

Thanks!

Decompound adds letters

Hi,

I just got stuck with some "FetchPhaseExecutionException" when using the highlighting and the decomp filter:

InvalidTokenOffsetsException[Token verzinnte exceeds length of provided text sized 83]

Drilling down into that was a little tricky since the words causing the Exceptions did not occur in the indexed text! After a while I found the following:

Using decompound add some words to the index that are longer than the orignal:

e.g. for "Kupferleiter, verzinnt" it ads "verzinnt" AND "verzinnte"
I have no clue what "verzinnte" is good for, but it sounds to me like the plural. However, since it is the last word in the text, highlighting fails because it exceeds the end of the text.

Here is an example analyzation of "verzinnt"

{
"tokens": [
{
"token": "verzinnt",
"start_offset": 0,
"end_offset": 8,
"type": "",
"position": 1
},
{
"token": "verzinnte",
"start_offset": 0,
"end_offset": 9,
"type": "",
"position": 1
}
]
}

My guess: The end_offset: 9 is the problem here because the analyzed text is just 8 characters long. So when it comes to highlighting, the highlighter probably tries to to highlight "verzinnte" as well, which leads to the Exception...

Matching tokens

Hi,

I am stuck with this issue and I am quite sure I miss something really essential:

I setup the analyzer as below and it works quite well:

GET /myIndex/_analyze?analyzer=german&text=Straßenbahnschienenritzenreiniger

gives me all kinds of tokens. But: Searching returns all documents containing just ONE of the Tokens (with an OR-Operator so to say), ranking documents containing "straße" higher then documents containing "reiniiger" - ignoring multiple matches in the score. This is of course not what I intended...

However, I can see, that an AND-Operator for tokens would not do the right thing either... In fact the operation that could work would be something like (tokens derived from "straße" combined with OR) AND (tokens derived from "bahn" combined with OR) AND (...)

I could run analyze from the external application and build the AND-/OR-query there, but this does not seem to be quite elegant.

Is there another/better way?

"analysis": {
    "filter": {
       "baseform": {
          "type": "baseform",
          "language": "de"
       },
       "decomp": {
          "type": "decompound"
       }
    },
    "analyzer": {
       "german": {
          "filter": [
             "decomp",
             "baseform"
          ],
          "type": "custom",
          "tokenizer": "baseform"
       }
    },
    "tokenizer": {
       "baseform": {
          "filter": [
             "decomp",
             "baseform"
          ],
          "type": "standard"
       }
    }
 }

elasticsearch 1.2.*

when is the plugin gonna be compatible with version 1.2.* of elasticsearch? or is there a way that I can install it manually?

incorrect offsets / fast vector highlighter

Hi,

I use the german-decompounder in conjunction with the fast vector highlighter. The offsets of decompounded words seem to be incorrect.

For example, the analyze API returns for "Die Jahresfeier der Rechtsanwaltskanzleien auf dem Donaudampfschiff hat viel Ökosteuer gekostet.":

{

"tokens": [
    {
        "token": "Die",
        "start_offset": 1,
        "end_offset": 4,
        "type": "<ALPHANUM>",
        "position": 1
    }
    ,
    {
        "token": "Die",
        "start_offset": 1,
        "end_offset": 4,
        "type": "<ALPHANUM>",
        "position": 1
    }
    ,
    {
        "token": "Jahresfeier",
        "start_offset": 5,
        "end_offset": 16,
        "type": "<ALPHANUM>",
        "position": 2
    }
    ,
    {
        "token": "Jahr",
        "start_offset": 5,
        "end_offset": 9,
        "type": "<ALPHANUM>",
        "position": 2
    }
    ,
    {
        "token": "feier",
        "start_offset": 9,
        "end_offset": 14,
        "type": "<ALPHANUM>",
        "position": 2
    }
    ,...

The fast-vector-highlighter returns "Die Jahr<tag1>esfei</tag1>er der Rechtsanwaltskanzleien..." when searching for "Feier" since the offset of the "feier"-token is incorrect.

Controlling decomposition

I'd like to be able to use a dictionary based approach to controll which words will not be decomposed. Something similar like: https://www.elastic.co/guide/en/elasticsearch/guide/current/controlling-stemming.html

The words in a dictionary will not be decomposed by the plugin and will only produce the original token as output.

Example:
I'm indexing product data and merchant information. Some of the words are merchant names like: Interdiscount. I want to be able to control the decomposition plugin by providing a dictionary with words that must not be decomposed.

German decomp adds ";" symbol for certain words

Configuration:

default:
  tokenizer: standard
  filter: [german_decomp]

german_decomp:
  type: decompound

Query: _analyze?text="tomaten"

Result:

{
  tokens: [
  {
    token: tomaten
    start_offset: 1
    end_offset: 8
    type: <ALPHANUM>
    position: 1
  }
  {
    token: ;
    start_offset: 1
    end_offset: 2
    type: <ALPHANUM>
    position: 1
  }
  ]
}

Support ES 2.4.4

I get the following error when installing:

ERROR: Plugin [decompound] is incompatible with Elasticsearch [2.4.4]. Was designed for version [2.4.1]

A patch version increase shouldn't break the compatability.

Add option to exclude certain words

It would be nice to have a option which excludes certain words like leinwand or haushalt from decompounding. I need this because the otherwise created terms wand and halt are causing relevance issues.

Release for 5.5.0

A release supporting Elasticsearch 5.5.0 would be much appreciated.

I took the master, bumped elasticsearch.version in gradle.properties and haven't had any issues so far.

Support for Elastic 2.0

The readme.md references a link to a 2.0-rc download archive, but the link is broken.
Any thoughts on supporting recent ES version?

Support for ES 2.3.2

Now, it is not possible to install this plugin for ES 2.3.2. An error:
ERROR: Plugin [decompound] is incompatible with Elasticsearch [2.3.2]. Was designed for version [2.3.0]

Is it possible to have this plugin without new release for every minor elasticsearch release?

Thanks!

Not recognizing 'Blutorange' as compound word

Hi, how exactly does this plugin work? Is it based on a german dictionary? We have the concrete problem that it does not decomposes the word 'Blutorange'. Is there a way this can be fixed?

What would be needed to use this for Dutch?

This looks interesting, what would I need to do to have this working for Dutch?

Highlighting seems to be broken

Hi,

i tried this on 1.5.2 and 1.7.2. This script should reproduce the error (NOTE: Im using port 9400 locally.):

curl -XDELETE http://localhost:9400/xyz/
curl -XPUT http://localhost:9400/xyz/ -d '
index:
  analysis:
    analyzer:
      search_analyzer:
        type: "custom"
        tokenizer: "standard"
        filter:
          - lowercase
          - x_compound
      index_analyzer:
        type: "custom"
        tokenizer: "standard"
        filter:
          - lowercase
          - x_compound
    filter:
      x_compound:
        type: "decompound"
'

curl -XPUT http://localhost:9400/xyz/_mapping/entries -d '
{
  "properties": {
    "title": {
      "type": "string",
      "search_analyzer": "search_analyzer",
      "analyzer": "index_analyzer"
    }
  }
}'

curl -XPOST http://localhost:9400/xyz/entries -d '
{"title": "dies ist ein test"}
'
curl -XPOST http://localhost:9400/xyz/entries -d '
{"title": "dies ist ein testbeitrag"}
'

curl -XPOST http://localhost:9400/xyz/entries -d '
{"title": "dies ist ein titeltest"}
'

curl -XGET http://localhost:9400/xyz/_search?pretty -d '
{
  "fields": ["title"],
  "query": {
    "multi_match": {
      "fields": ["title"],
      "query": "test",
      "analyzer": "search_analyzer"
    }
  },
  "size": 10,
  "highlight": {
    "number_of_fragments": 1,
    "fields": {
      "title": {"number_of_fragments": 1}
    }
  }
}'

The result returned by the query is:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.7123179,
    "hits" : [ {
      "_index" : "xyz",
      "_type" : "entries",
      "_id" : "AVEBt3eMGZQfGkXb9v9D",
      "_score" : 0.7123179,
      "fields" : {
        "title" : [ "dies ist ein test" ]
      },
      "highlight" : {
        "title" : [ "dies ist ein<em>dies ist ein test</em>" ]
      }
    }, {
      "_index" : "xyz",
      "_type" : "entries",
      "_id" : "AVEBt3ezGZQfGkXb9v9E",
      "_score" : 0.5036848,
      "fields" : {
        "title" : [ "dies ist ein testbeitrag" ]
      },
      "highlight" : {
        "title" : [ "dies ist ein<em>dies</em> testbeitrag" ]
      }
    }, {
      "_index" : "xyz",
      "_type" : "entries",
      "_id" : "AVEBt3fmGZQfGkXb9v9F",
      "_score" : 0.5036848,
      "fields" : {
        "title" : [ "dies ist ein titeltest" ]
      },
      "highlight" : {
        "title" : [ "dies ist ein<em>st e</em> titeltest" ]
      }
    } ]
  }
}

As you can see there are two problems with the highlights:

The matched word is not highlighted: [ "dies ist ein<em>st e</em> titeltest" ]
Parts of the sentence are duplicated: [ "dies ist ein<em>dies ist ein test</em>" ]

Maybe i have to change the mapping to use different word positions?

Thanks

Support for ES 2.2.0

Hi,
Is there an offcial version for ES 2.2.0?

I also submitted a pull request for this here.

CI builds

jprante, can I offer you some help with establishing a CI host which will automatically build jarfiles and publish them, whenever new Elasticsearch is released?

There is no zip for ElasticSearch 5.2.1

In case if someone needs these builds:
elasticsearch-analysis-decompound-5.2.0-plugin.zip
elasticsearch-analysis-decompound-5.2.1-plugin.zip

Installation:

$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install file:///path/to/elasticsearch-analysis-decompound-5.2.1-plugin.zip

$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/jprante/elasticsearch-analysis-decompound/files/807131/elasticsearch-analysis-decompound-5.2.1-plugin.zip

Any chance to get a release for 5.4.3?

5.4.0 has a number of bugs that have been fixed in point releases. The release notes don't mention any changes to the Java API. Any chance you can put out a release for 5.4.3?

https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-5.4.1.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-5.4.2.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-5.4.3.html

Elasticsearch 5.0 support

Hi, would it be possible to release an elasticsearch 5.0.x compatible version?

Thank you!

Elasticsearch 2.3.4 is not supported

Plugin [decompound] is incompatible with Elasticsearch [2.3.4]. Was designed for version [2.3.3]

Still maintained?

Last update was two years ago, so I am not sure if this plugin is still maintained?
Or maybe there is no need for this anymore in case Elasticsearch has its own implementation?

Thanks for clarifications in advance 👍

License

Is the license really GPL?

Build/release for 5.1.2

A release for 5.1.2 would be really awesome 👍

Isn't it possible to define your plugin as compatibel for the 5.1.* series so it's possible to create only one build per minor release?

Support for Elasticsearch 2.1.1

Will there be a new release for elasticsearch 2.1.1 support in the near future?

Installation doesn't work, incorrect download location

ElasticSearch Version: 0.20.5

When calling bin/plugin -install as given in README installation fails because it cannot download the plugin from the given locations. I tried to manually download the plugin from the given locations but got only 404 errors. What's the correct URL for downloading the plugin?

# bin/plugin -install jprante/elasticsearch-analysis-decompound/1.0.0
-> Installing jprante/elasticsearch-analysis-decompound/1.0.0...
Trying http://download.elasticsearch.org/jprante/elasticsearch-analysis-decompound/elasticsearch-analysis-decompound-1.0.0.zip...
Trying http://search.maven.org/remotecontent?filepath=jprante/elasticsearch-analysis-decompound/1.0.0/elasticsearch-analysis-decompound-1.0.0.zip...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/jprante/elasticsearch-analysis-decompound/1.0.0/elasticsearch-analysis-decompound-1.0.0.zip...
Trying https://github.com/jprante/elasticsearch-analysis-decompound/zipball/v1.0.0... (assuming site plugin)
Failed to install jprante/elasticsearch-analysis-decompound/1.0.0, reason: failed to download out of all possible locations...

Failure to decompound Wandhalter

The term Wandhalterung is split to the tokens wand, alterung instead of wand, halterung. When setting the threshold to 0.63 or higher, the tokens are wandh and alterung. What can I do to fix this?

These are my settings:

index :
    analysis :
        analyzer :
            analyzer_decomp :
                type : custom
                tokenizer : standard
                filter : [lowercase, decomp]
        filter :
            decomp:
                type: decompound
        tokenizer:
            decomp:
                type: standard
                filter:
                  - decomp

I'm using Elasticsearch 2.1.1 and elasticsearch-analysis-decompound 2.1.1.0

jprante / elasticsearch-analysis-decompound Goto Github PK

elasticsearch-analysis-decompound's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-analysis-decompound's Issues

Recommend Projects

Recommend Topics

Recommend Org