Coder Social home page Coder Social logo

fast-cosine-similarity's Introduction

alt text

StaySense - Fast Cosine Similarity ElasticSearch Plugin

Extremely fast vector scoring on ElasticSearch 6.4.x+ using vector embeddings.

About StaySense: StaySense is a revolutionary software company creating the most advanced marketing software ever made publicly available for Hospitality Managers in the Vacation Rental and Hotel Industries.

Company Website: http://staysense.com

Fast Elasticsearch Vector Scoring

This Plugin allows you to score Elasticsearch documents based on embedding-vectors, using dot-product or cosine-similarity. Note, this is a linear search approach in its current version. For very large data sets, this is likely not a good choice for realtime search queries.

Depending on cluster size this plugin will likely stop being performant around +1 Million docs.

ElasticSearch is working on some native vector functionality, however, their current solution uses Locality Sensitive Hashing (LSH) approaches. That is to say, their current beta releases all result in approximate results with false-negatives (not all matching records found). If you need exact results, this new feature from the ES team will not be useful to you.

We are working on a solution to close this false-negative gap common with LSH, however we are not communicating a timeline on the delivery of this solution at this time.

Until that time, if you have a large corpus, we would recommend checking out the FAISS (pronounced: Face) project.

General

  • This plugin was ported from This elasticsearch 5.x vector scoring plugin and this discussion and lior-k's original contribution for ElasticSearch 5.5+ to achieve lightning fast result times when searching across millions of documents.
  • This port is for ElasticSearch 6.4+ utilizing the ScoreScript class which was officially split from SearchScript and thus incompatible < 6.4.x

Improvements

  • lior-k's implementation had some confusing variable assignments that did not consistently match with Cosine-Sim's mathematical definition. This has been updated in the code to more accurately reflect the mathematical definition.
  • Null pointer exceptions are now skipped (e.g. a document doesn't have a vector to compare against) allowing queries to complete successfully even in sparse datasets.
  • Ported for latest version of ElasticSearch.
  • Issues and Pull-Requests welcomed!

Elasticsearch version

  • Currently designed for Elasticsearch 6.4.x+
  • Plugin is NOT backwards compatible (see note above about ScoreScript class)
  • Will succesfully build for 6.4.0 and 6.4.1 (latest). Simply modify pom.xml with the correct version then follow maven build steps below.

Maven Build Steps

  • Clone the project
  • mvn package to compile the plugin as a zip file
  • In Elasticsearch run elasticsearch-plugin install file:/PATH_TO_ZIP to install plugin

Why embeddings?

  • Ultimately, by defining the field mapping as a binary value, by storing an embedded version of the vector you are able to take advantage of Lucene's direct API to achieve direct byte access without transformation.
  • When creating the document, Lucene encodes the embedding directly to binary, making read access blazing fast on the search side.
  • Does Lucene do the same with non-embedded vectors? Unsure, but the plugin supports that too if you want to store in [1.2934, -2.0349, ...., .039] format and try!

Usage

Documents

  • Each document you score should have a field containing the base64 representation of your vector. for example:
   {
   	"_id": 1,
   	....
   	"embeddedVector": "v7l48eAAAAA/s4VHwAAAAD+R7I5AAAAAv8MBMAAAAAA/yEI3AAAAAL/IWkeAAAAAv7s480AAAAC/v6DUgAAAAL+wJi0gAAAAP76VqUAAAAC/sL1ZYAAAAL/dyq/gAAAAP62FVcAAAAC/tQRvYAAAAL+j6ycAAAAAP6v1KcAAAAC/bN5hQAAAAL+u9ItAAAAAP4ckTsAAAAC/pmkjYAAAAD+cYpwAAAAAP5renEAAAAC/qY0HQAAAAD+wyYGgAAAAP5WrCcAAAAA/qzjTQAAAAD++LBzAAAAAP49wNKAAAAC/vu/aIAAAAD+hqXfAAAAAP4FfNCAAAAA/pjC64AAAAL+qwT2gAAAAv6S3OGAAAAC/gfMtgAAAAD/If5ZAAAAAP5mcXOAAAAC/xYAU4AAAAL+2nlfAAAAAP7sCXOAAAAA/petBIAAAAD9soYnAAAAAv5R7X+AAAAC/pgM/IAAAAL+ojI/gAAAAP2gPz2AAAAA/3FonoAAAAL/IHg1AAAAAv6p1SmAAAAA/tvKlQAAAAD/I2OMAAAAAP3FBiCAAAAA/wEd8IAAAAL94wI9AAAAAP2Y1IIAAAAA/rnS4wAAAAL9vriVgAAAAv1QxoCAAAAC/1/qu4AAAAL+inZFAAAAAv7aGA+AAAAA/lqYVYAAAAD+kNP0AAAAAP730BiAAAAA="
   }
  • Use this field mapping:
      "embeddedVector": {
        "type": "binary",
        "doc_values": true
      }
  • The vector can be of any dimension

Converting a vector to Base64

to convert an array of doubles to a base64 string we use these example methods:

Java

public static final String convertArrayToBase64(double[] array) {
	final int capacity = 8 * array.length;
	final ByteBuffer bb = ByteBuffer.allocate(capacity);
	for (int i = 0; i < array.length; i++) {
		bb.putDouble(array[i]);
	}
	bb.rewind();
	final ByteBuffer encodedBB = Base64.getEncoder().encode(bb);
	return new String(encodedBB.array());
}

public static double[] convertBase64ToArray(String base64Str) {
	final byte[] decode = Base64.getDecoder().decode(base64Str.getBytes());
	final DoubleBuffer doubleBuffer = ByteBuffer.wrap(decode).asDoubleBuffer();

	final double[] dims = new double[doubleBuffer.capacity()];
	doubleBuffer.get(dims);
	return dims;
}

Python

import base64
import numpy as np

dbig = np.dtype('>f8')

def decode_float_list(base64_string):
    bytes = base64.b64decode(base64_string)
    return np.frombuffer(bytes, dtype=dbig).tolist()

def encode_array(arr):
    base64_str = base64.b64encode(np.array(arr).astype(dbig)).decode("utf-8")
    return base64_str

Go

import(
    "math"
    "encoding/binary"
    "encoding/base64"
)
func convertArrayToBase64(array []float64) string {
	bytes := make([]byte, 0, 8*len(array))
	for _, a := range array {
		bits := math.Float64bits(a)
		b := make([]byte, 8)
		binary.BigEndian.PutUint64(b, bits)
		bytes = append(bytes, b...)
	}
 	encoded := base64.StdEncoding.EncodeToString(bytes)
	return encoded
}
func convertBase64ToArray(base64Str string) ([]float64, error) {
	decoded, err := base64.StdEncoding.DecodeString(base64Str)
	if err != nil {
		return nil, err
	}
 	length := len(decoded)
	array := make([]float64, 0, length/8)
 	for i := 0; i < len(decoded); i += 8 {
		bits := binary.BigEndian.Uint64(decoded[i : i+8])
		f := math.Float64frombits(bits)
		array = append(array, f)
	}
	return array, nil
}

Querying

Querying with encodings

  • Query for documents based on their cosine similarity:

    For ES 6.4.x:

{
  "query": {
    "function_score": {
    "boost_mode" : "replace",
        "functions": [
          {
            "script_score": {
              "script": {
                  "source": "staysense",
                  "lang" : "fast_cosine",
                  "params": {
                      "field": "embeddedVector",
                      "cosine": true,
                      "encoded_vector" : "v+kopYAAAAA/wivkYAAAAD+wfJeAAAAAv8DL4QAAAAA/waYiwAAAAL+zAmvAAAAAv8c+aiAAAAC/07MyQAAAAL+ccr9AAAAAP9feCOAAAAC/y+ivYAAAAL/R34XgAAAAv+G8nuAAAAA/09hlwAAAAL/MkSWAAAAAP9EXn4AAAAC/zBBxYAAAAD/UY+3AAAAAP7zQSkAAAAC/zRijgAAAAA=="
                  }
              }
            }
          }
        ]
    }
  }
}
  • The example above shows a vector of 64 dimensions
  • Parameters:
    1. field: The document field containing the base64 vector to compare against.
    2. cosine: Boolean. if true - use cosine-similarity, else use dot-product.
    3. encoded_vector: The encoded vector to compare to.

Querying with vectors

  • Query for documents based on their cosine similarity:

    For ES 6.4.x:

{
  "query": {
    "function_score": {
    "boost_mode" : "replace",
        "functions": [
          {
            "script_score": {
              "script": {
                  "source": "staysense",
                  "lang" : "fast_cosine",
                  "params": {
                      "field": "embeddedVector",
                      "cosine": true,
                      "vector" : [
                      -0.09217305481433868, 0.010635560378432274, -0.02878434956073761, ... , 0.08279753476381302
                      ]
                  }
              }
            }
          }
        ]
    }
  }
}
  • The example above shows a vector of 64 dimensions
  • Parameters:
    1. field: The document field containing the base64 vector to compare against.
    2. cosine: Boolean. if true - use cosine-similarity, else use dot-product.
    3. vector: The comma separated non-encoded vector to compare to.

fast-cosine-similarity's People

Contributors

cakirmuha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast-cosine-similarity's Issues

script_lang not supported [fast_cosine]

Hi,

I am getting this error: script_lang not supported [fast_cosine] after successfully build and install to Elasticsearch v6.4.2. I only changed es version in pom.xml as suggested.
Could you please help?

Also I have tried ES official examples score-expert-script, and it ouputs a similar error: script_lang not supported [expert_scripts].

Please forgive me for being quite new to Elasticsearch, I return this lang name in the getType() function in the nested FastCosineSimilarityEngine class, it seems nothing you write there works.

`

private static class FastCosineSimilarityEngine implements ScriptEngine {
//The normalized vector score from the query
//
double queryVectorNorm;

@Override
public String getType() {
    return "fast_cosine";
}

`
ES log if it helps

Caused by: org.elasticsearch.index.query.QueryShardException: script_score: the script could not be loaded
4530 >---at org.elasticsearch.index.query.functionscore.ScriptScoreFunctionBuilder.doToFunction(ScriptScoreFunctionBuilder.java:99) ~[elasticsearch-6.4.2.jar:6.4.2]
4531 >---at org.elasticsearch.index.query.functionscore.ScoreFunctionBuilder.toFunction(ScoreFunctionBuilder.java:138) ~[elasticsearch-6.4.2.jar:6.4.2]
4532 >---at org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder.doToQuery(FunctionScoreQueryBuilder.java:298) ~[elasticsearch-6.4.2.jar:6.4.2]
4533 >---at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:98) ~[elasticsearch-6.4.2.jar:6.4.2]
4534 >---at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$2(QueryShardContext.java:305) ~[elasticsearch-6.4.2.jar:6.4.2]
4535 >---at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:317) ~[elasticsearch-6.4.2.jar:6.4.2]
4536 >---at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:304) ~[elasticsearch-6.4.2.jar:6.4.2]
4537 >---at org.elasticsearch.search.SearchService.parseSource(SearchService.java:724) ~[elasticsearch-6.4.2.jar:6.4.2]
4538 >---at org.elasticsearch.search.SearchService.createContext(SearchService.java:575) ~[elasticsearch-6.4.2.jar:6.4.2]
4539 >---at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:551) ~[elasticsearch-6.4.2.jar:6.4.2]
4540 >---at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:347) ~[elasticsearch-6.4.2.jar:6.4.2]
4541 >---at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333) ~[elasticsearch-6.4.2.jar:6.4.2]
4542 >---at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329) ~[elasticsearch-6.4.2.jar:6.4.2]
4543 >---at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019) ~[elasticsearch-6.4.2.jar:6.4.2]
4544 >---at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[elasticsearch-6.4.2.jar:6.4.2]
4545 >---at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.2.jar:6.4.2]
4546 >---at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.4.2.jar:6.4.2]
4547 >---at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.2.jar:6.4.2]
4548 >---at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_191]
4549 >---at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_191]
4550 >---at java.lang.Thread.run(Thread.java:745) [?:1.8.0_191]
4551 Caused by: java.lang.IllegalArgumentException: script_lang not supported [fast_cosine]
4552 >---at org.elasticsearch.script.ScriptService.getEngine(ScriptService.java:240) ~[elasticsearch-6.4.2.jar:6.4.2]
4553 >---at org.elasticsearch.script.ScriptService.compile(ScriptService.java:294) ~[elasticsearch-6.4.2.jar:6.4.2]
4554 >---at org.elasticsearch.index.query.functionscore.ScriptScoreFunctionBuilder.doToFunction(ScriptScoreFunctionBuilder.java:95) ~[elasticsearch-6.4.2.jar:6.4.2]
4555 >---at org.elasticsearch.index.query.functionscore.ScoreFunctionBuilder.toFunction(ScoreFunctionBuilder.java:138) ~[elasticsearch-6.4.2.jar:6.4.2]
4556 >---at org.elasticsearch.index.query.functionscore.FunctionScoreQueryBuilder.doToQuery(FunctionScoreQueryBuilder.java:298) ~[elasticsearch-6.4.2.jar:6.4.2]
4557 >---at org.elasticsearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:98) ~[elasticsearch-6.4.2.jar:6.4.2]
4558 >---at org.elasticsearch.index.query.QueryShardContext.lambda$toQuery$2(QueryShardContext.java:305) ~[elasticsearch-6.4.2.jar:6.4.2]
4559 >---at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:317) ~[elasticsearch-6.4.2.jar:6.4.2]
4560 >---at org.elasticsearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:304) ~[elasticsearch-6.4.2.jar:6.4.2]
4561 >---at org.elasticsearch.search.SearchService.parseSource(SearchService.java:724) ~[elasticsearch-6.4.2.jar:6.4.2]
4562 >---at org.elasticsearch.search.SearchService.createContext(SearchService.java:575) ~[elasticsearch-6.4.2.jar:6.4.2]
4563 >---at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:551) ~[elasticsearch-6.4.2.jar:6.4.2]
4564 >---at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:347) ~[elasticsearch-6.4.2.jar:6.4.2]
4565 >---at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333) ~[elasticsearch-6.4.2.jar:6.4.2]
4566 >---at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329) ~[elasticsearch-6.4.2.jar:6.4.2]
4567 >---at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019) ~[elasticsearch-6.4.2.jar:6.4.2]
4568 >---at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[elasticsearch-6.4.2.jar:6.4.2]
4569 >---at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.2.jar:6.4.2]
4570 >---at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) ~[elasticsearch-6.4.2.jar:6.4.2]
4571 >---at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.2.jar:6.4.2]
4572 >---at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_191]
4573 >---at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_191]
4574 >---at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_191]

[illegal_argument_exception] Seeking to negative position during search

I'm seeing the following errors when running queries against a 512-dimensional binary vector field.

[illegal_argument_exception] Seeking to negative position: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/DtszBfcZRL286BzLzYxQIA/0/index/_139.cfs") [slice=_139_Lucene70_0.dvd] [slice=fixed-binary] (and) [illegal_argument_exception] Seeking to negative position: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/DtszBfcZRL286BzLzYxQIA/1/index/_13w.cfs") [slice=_13w_Lucene70_0.dvd] [slice=fixed-binary] (and) [illegal_argument_exception] Seeking to negative position: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/DtszBfcZRL286BzLzYxQIA/2/index/_15c.cfs") [slice=_15c_Lucene70_0.dvd] [slice=fixed-binary] (and) [illegal_argument_exception] Seeking to negative position: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/DtszBfcZRL286BzLzYxQIA/3/index/_16g.cfs") [slice=_16g_Lucene70_0.dvd] [slice=fixed-binary] (and) [illegal_argument_exception] Seeking to negative position: MMapIndexInput(path="/usr/share/elasticsearch/data/nodes/0/indices/DtszBfcZRL286BzLzYxQIA/4/index/_15b.cfs") [slice=_15b_Lucene70_0.dvd] [slice=fixed-binary]

I don't get this error consistently, but enough to make me consider this a breaking bug.

Elasticsearch version: 6.4.2

Search vector - not always found

Hi, we are seeing the following inconsistent results:
We search for a vector, and most of the times ElasticSearch will return the right record with a score of 1 as the first result. But sometimes it does not return this record at all, and then we'd have another record with 0.98... score as the first result.
There doesn't seem to be an issue with our cluster, all shards are queried successfully in all cases.
Is this an expected behavior?

Our vectors have ~300 dimensions, we're using ElasticSearch version 6.4.3

Randomness in cosine results - `queryVectorNorm` declaration

Hi,

I am using the fast-cosine-similarity plugin on a ES 6.7.1 instance and have found that cosine results show some randomness, especially when the embedded vector dimension was large (more than 50). Cosine values were sometimes greater than 1, which cannot be a valid value.

Doing some further tests I have found that the variable queryVectorNorm was one (the?) source of randomness. From what I have seen its value was sometimes changing during its calculation in the for loop below:

// If cosine calculate the query vec norm
if(cosine) {
    queryVectorNorm = 0d;
    // compute query inputVector norm once
    for (double v : inputVector) {
        queryVectorNorm += Math.pow(v, 2.0); 
    }
}

Therefore I tried moving its declaration inside the new ScoreScript.LeafFactory() and first tests are promising: no random results anymore and cosine similarities are no longer greater than 1.

I would be grateful to have your opinion on this especially as I am completely new to ES plugin development.

Thanks

Script could not be loaded while querying index

I was able to install the plugin successfully after following instructions on the README.

However, when I try to query the index using the given query format, I get the following error:
RequestError(400, search_phase_execution_exception, script_score: the script could not be loaded)

How should I resolve this in order to query the index using cosine similarity?

Compilation error when building

Hi,

I attempted to build the project with mvn clean install and got following error

[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /tmp/fast-cosine-similarity/src/main/java/com/staysense/fastcosinesimilarity/FastCosineSimilarityPlugin.java:[30,55] error: package com.elasticsearch.staysense.fastcosinesimilarity does not exist
[ERROR] /tmp/fast-cosine-similarity/src/main/java/com/staysense/fastcosinesimilarity/FastCosineSimilarityPlugin.java:[171,43] error: variable docVector is already defined in method execute()
[ERROR] /tmp/fast-cosine-similarity/src/main/java/com/staysense/fastcosinesimilarity/FastCosineSimilarityPlugin.java:[172,40] error: no suitable method found for get(ByteArrayDataInput)
[ERROR]
/tmp/fast-cosine-similarity/src/main/java/com/staysense/fastcosinesimilarity/FastCosineSimilarityPlugin.java:[180,50] error: array required, but ByteArrayDataInput found
[ERROR] /tmp/fast-cosine-similarity/src/main/java/com/staysense/fastcosinesimilarity/FastCosineSimilarityPlugin.java:[184,69] error: array required, but ByteArrayDataInput found

Seems like docVector is defined twice with the same name and wrong package name is used.

"main" java.lang.IllegalArgumentException: the version needs to contain major, minor, and revision,

Hi,
I closely followed the instructions in the readme and apparently after this line executing :

./elasticsearch-plugin install file:/Users/francisdamachi/Downloads/elasticsearch-6.4.1/staysense-cosine-sim-6.4.1.zip

I get the error : Exception in thread "main" java.lang.IllegalArgumentException: the version needs to contain major, minor, and revision, and optionally the build: ${elasticsearch.version}

I made sure that I installed the elasticsearch version [https://www.elastic.co/downloads/past-releases/elasticsearch-6-4-1]

How did you solve this ?

Best regards

Comparison to dense_vector and cosineSimilarity

Hi,
I really like your plugin. I had to update to the latest version (7.0.1) for my usecase and it still worked! (I later incorporated andrassy's changes when I realized that there are some current forks...)
I found a builtin feature that is really similar, so I want to ask how they compare.

  • dense_vector mapping type (as well as a sparse variant)
  • cosineSimilarity, dotProduct
    โ†’ only in master/future version (not in current 7.0.1)
    because I either do something wrong or it is not implemented (according to docs), as I always get a compile error in der search_phase_execution_exception when searching

The limitation seems to be 500 (current) / 1024 (master/future) Elements for dense_vector. And it is an experimental feature but I'm interested in the performance.
For 5 Mio. doc2vec (dim=300) vectors I currently have an average search time of about 5 seconds. (Retrieval of search-vector for given id included.)

Null pointer exception

Greetings:
I am using Elastic 6.4 version and built in plugin from the targer/release folder.

My schema is like this.

curl -X PUT "localhost:9200/testplugin" -H 'Content-Type: application/json' -d'
{
"settings" : { "number_of_shards" : 1 },
"mappings" :
{ "doc" : {
"properties" : {
"doc_id" : { "type" : "long" },
"embedding_vector" : {
"type" : "binary", "doc_values": true
},
"vector" : {
"type" : "float"
},
"sentence" : {
"type" : "text"
}

                                  }
                 }
          }

} '
I query like this:
{
"query": {
"function_score": {
"boost_mode": "replace",
"functions" : [
{
"script_score": {
"script": {
"source": "staysense",
"lang": "fast_cosine",
"params": {
"field": "embedding_vector",
"vector": query_vec,
"cosine" : cosine
}
}
}

            }]
        }
    }
}

I get nullpointer exception.

Caused by: org.elasticsearch.search.query.QueryPhaseExecutionException: Query Failed [Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:298) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:107) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:324) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:357) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.0.jar:6.4.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

Caused by: java.lang.NullPointerException
at com.staysense.fastcosinesimilarity.FastCosineSimilarityPlugin$FastCosineSimilarityEngine$1$1.setDocument(FastCosineSimilarityPlugin.java:121) ~[?:?]
at org.elasticsearch.common.lucene.search.function.ScriptScoreFunction$1.score(ScriptScoreFunction.java:78) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$FunctionFactorScorer.computeScore(FunctionScoreQuery.java:392) ~[elasticsearch-6.4.0.jar:6.4.0]
at org.elasticsearch.common.lucene.search.function.FunctionScoreQuery$FunctionFactorScorer.score(FunctionScoreQuery.java:374) ~[elasticsearch-6.4.0.jar:6.4.0]


Is there something wrong with my query or schema? Any help is appreciated.

Thanks a lot.

Cannot Seem to Load StaySense Script

I am trying to use this plugin with ES 6.4.1, but am getting an error of: elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'script_score: the script could not be loaded') during loading of my Flask app. I have, to my knowledge followed the installation instructions correctly and have elasticsearch-fast-cosine-similarity in my plugins folder of ES.

I map the feature like so:

"name_embedding_encoded": {
                            "type": "binary",
                            "doc_values": True
                          }

Create the query:

name_embedding_query = {
                                "function_score": {
                                "boost_mode" : "replace",
                                    "functions": [
                                      {
                                        "script_score": {
                                          "script": {
                                              "source": "staysense",
                                              "lang" : "fast_cosine",
                                              "params": {
                                                  "field": "name_embedding_encoded",
                                                  "cosine": True,
                                                  "encoded_vector": term
                                              }
                                          }
                                        }
                                      }
                                    ]
                                }
                              }

and then make the query:

client = Elasticsearch()
s = Search(using=client, index=INDEX_NAME, doc_type=DOC_TYPE)
docs = s.query(name_embedding_query)[:count].execute()

But cannot seem to get the script to load. What am I missing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.