Coder Social home page Coder Social logo

koursaros-ai / nboost Goto Github PK

View Code? Open in Web Editor NEW
675.0 17.0 69.0 14.48 MB

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

License: Apache License 2.0

Python 87.14% Shell 0.25% Dockerfile 0.51% Smarty 1.72% JavaScript 4.86% HTML 4.66% CSS 0.85%
elasticsearch tensorflow pytorch python machine-learning deep-learning microservices proxy nboost nlp

nboost's Introduction

🧪 We're looking for beta testers for our virtual assistant widget. Contact us if you're interested in using it on your website.

Nboost

PyPI Documentation Status PyPI - License

HighlightsOverviewBenchmarksInstallGetting StartedKubernetesDocumentationTutorialsContributingRelease NotesBlog

What is it

NBoost is a scalable, search-engine-boosting platform for developing and deploying state-of-the-art models to improve the relevance of search results.

Nboost leverages finetuned models to produce domain-specific neural search engines. The platform can also improve other downstream tasks requiring ranked input, such as question answering.

Contact us to request domain-specific models or leave feedback

Overview

The workflow of NBoost is relatively simple. Take the graphic above, and imagine that the server in this case is Elasticsearch.

In a conventional search request, the user sends a query to Elasticsearch and gets back the results.

In an NBoost search request, the user sends a query to the model. Then, the model asks for results from Elasticsearch and picks the best ones to return to the user.

Benchmarks

🔬 Note that we are evaluating models on differently constructed sets than they were trained on (MS Marco vs TREC-CAR), suggesting the generalizability of these models to many other real world search problems.

Fine-tuned Models Dependency Eval Set Search Boost[1] Speed on GPU
nboost/pt-tinybert-msmarco (default) PyTorch bing queries +45% (0.26 vs 0.18) ~50ms/query
nboost/pt-bert-base-uncased-msmarco PyTorch bing queries +62% (0.29 vs 0.18) ~300 ms/query
nboost/pt-bert-large-msmarco PyTorch bing queries +77% (0.32 vs 0.18) -
nboost/pt-biobert-base-msmarco PyTorch biomed +66% (0.17 vs 0.10) ~300 ms/query

Instructions for reproducing here.

[1] MRR compared to BM25, the default for Elasticsearch. Reranking top 50.
[2] https://github.com/nyu-dl/dl4marco-bert

To use one of these fine-tuned models with nboost, run nboost --model_dir bert-base-uncased-msmarco for example, and it will download and cache automatically.

Using pre-trained language understanding models, you can boost search relevance metrics by nearly 2x compared to just text search, with little to no extra configuration. While assessing performance, there is often a tradeoff between model accuracy and speed, so we benchmark both of these factors above. This leaderboard is a work in progress, and we intend on releasing more cutting edge models!

Install NBoost

There are two ways to get NBoost, either as a Docker image or as a PyPi package. For cloud users, we highly recommend using NBoost via Docker.

🚸 Depending on your model, you should install the respective Tensorflow or Pytorch dependencies. We package them below.

For installing NBoost, follow the table below.

Dependency 🐳 Docker 📦 Pypi 🐙 Kubernetes
Pytorch (recommended) koursaros/nboost:latest-pt pip install nboost[pt] helm install nboost/nboost --set image.tag=latest-pt
Tensorflow koursaros/nboost:latest-tf pip install nboost[tf] helm install nboost/nboost --set image.tag=latest-tf
All koursaros/nboost:latest-all pip install nboost[all] helm install nboost/nboost --set image.tag=latest-all
- (for testing) koursaros/nboost:latest-alpine pip install nboost helm install nboost/nboost --set image.tag=latest-alpine

Any way you install it, if you end up reading the following message after $ nboost --help or $ docker run koursaros/nboost --help, then you are ready to go!

success installation of NBoost

Getting Started

📡The Proxy

component overview

The Proxy is the core of NBoost. The proxy is essentially a wrapper to enable serving the model. It is able to understand incoming messages from specific search apis (i.e. Elasticsearch). When the proxy receives a message, it increases the amount of results the client is asking for so that the model can rerank a larger set and return the (hopefully) better results.

For instance, if a client asks for 10 results to do with the query "brown dogs" from Elasticsearch, then the proxy may increase the results request to 100 and filter down the best ten results for the client.

Setting up a Neural Proxy for Elasticsearch in 3 minutes

In this example we will set up a proxy to sit in between the client and Elasticsearch and boost the results!

Installing NBoost with tensorflow

If you want to run the example on a GPU, make sure you have Tensorflow 1.14-1.15, Pytorch or ONNX Runtime with CUDA to support the modeling functionality. However, if you want to just run it on a CPU, don't worry about it. For both cases, just run:

pip install nboost[pt]

Setting up an Elasticsearch Server

🔔 If you already have an Elasticsearch server, you can skip this step!

If you don't have Elasticsearch, not to worry! We recommend setting up a local Elasticsearch cluster using docker (providing you have Docker installed). First, get the ES image by running:

docker pull elasticsearch:7.4.2

Once you have the image, you can run an Elasticsearch server via:

docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.4.2

Deploying the proxy

Now we're ready to deploy our Neural Proxy! It is very simple to do this, run:

nboost                                  \
    --uhost localhost                   \
    --uport 9200                        \
    --search_route "/<index>/_search"   \
    --query_path url.query.q            \
    --topk_path url.query.size          \
    --default_topk 10                   \
    --choices_path body.hits.hits       \
    --cvalues_path _source.passage

📢 The --uhost and --uport should be the same as the Elasticsearch server above! Uhost and uport are short for upstream-host and upstream-port (referring to the upstream server).

If you get this message: Listening: <host>:<port>, then we're good to go!

Indexing some data

NBoost has a handy indexing tool built in (nboost-index). For demonstration purposes, will be indexing a set of passages about traveling and hotels through NBoost. You can add the index to your Elasticsearch server by running:

travel.csv comes with NBoost

nboost-index --file travel.csv --index_name travel --delim , --id_col

Now let's test it out! Hit the Elasticsearch with:

curl "http://localhost:8000/travel/_search?pretty&q=passage:vegas&size=2"

If the Elasticsearch result has the nboost tag in it, congratulations it's working!

success installation of NBoost

What just happened?

Let's check out the NBoost frontend. Go to your browser and visit localhost:8000/nboost.

If you don't have access to a browser, you can curl http://localhost:8000/nboost/status for the same information.

The frontend recorded everything that happened:

  1. NBoost got a request for 2 search results. (average_topk)
  2. NBoost connected to the server at localhost:9200.
  3. NBoost sent a request for 50 search results to the server. (topn)
  4. NBoost received 50 search results from the server. (average_choices)
  5. The model picked the best 2 search results and returned them to the client.

Elastic made easy

To increase the number of parallel proxies, simply increase --workers. For a more robust deployment approach, you can distribute the proxy via Kubernetes (see below).

Kubernetes

See also

For in-depth query DSL and other search API solutions (such as the Bing API), see the docs.

Deploying NBoost via Kubernetes

We can easily deploy NBoost in a Kubernetes cluster using Helm.

Add the NBoost Helm Repo

First we need to register the repo with your Kubernetes cluster.

helm repo add nboost https://raw.githubusercontent.com/koursaros-ai/nboost/master/charts/
helm repo update

Deploy some NBoost replicas

Let's try deploying four replicas:

helm install --name nboost --set replicaCount=4 nboost/nboost

All possible --set (values.yaml) options are listed below:

Parameter Description Default
replicaCount Number of replicas to deploy 3
image.repository NBoost Image name koursaros/nboost
image.tag NBoost Image tag latest-pt
args.model Name of the model class nil
args.model_dir Name or directory of the finetuned model pt-bert-base-uncased-msmarco
args.qa Whether to use the qa plugin False
args.qa_model_dir Name or directory of the qa model distilbert-base-uncased-distilled-squad
args.model Name of the model class nil
args.host Hostname of the proxy 0.0.0.0
args.port Port for the proxy to listen on 8000
args.uhost Hostname of the upstream search api server elasticsearch-master
args.uport Port of the upstream server 9200
args.data_dir Directory to cache model binary nil
args.max_seq_len Max combined token length 64
args.bufsize Size of the http buffer in bytes 2048
args.batch_size Batch size for running through rerank model 4
args.multiplier Factor to increase results by 5
args.workers Number of threads serving the proxy 10
args.query_path Jsonpath in the request to find the query nil
args.topk_path Jsonpath to find the number of requested results nil
args.choices_path Jsonpath to find the array of choices to reorder nil
args.cvalues_path Jsonpath to find the str values of the choices nil
args.cids_path Jsonpath to find the ids of the choices nil
args.search_path The url path to tag for reranking via nboost nil
service.type Kubernetes Service type LoadBalancer
resources resource needs and limits to apply to the pod {}
nodeSelector Node labels for pod assignment {}
affinity Affinity settings for pod assignment {}
tolerations Toleration labels for pod assignment []
image.pullPolicy Image pull policy IfNotPresent
imagePullSecrets Docker registry secret names as an array [] (does not add image pull secrets to deployed pods)
nameOverride String to override Chart.name nil
fullnameOverride String to override Chart.fullname nil
serviceAccount.create Specifies whether a service account is created nil
serviceAccount.name The name of the service account to use. If not set and create is true, a name is generated using the fullname template nil
serviceAccount.create Specifies whether a service account is created nil
podSecurityContext.fsGroup Group ID for the container nil
securityContext.runAsUser User ID for the container 1001
ingress.enabled Enable ingress resource false
ingress.hostName Hostname to your installation nil
ingress.path Path within the url structure []
ingress.tls enable ingress with tls []
ingress.tls.secretName tls type secret to be used chart-example-tls

Documentation

ReadTheDoc

The official NBoost documentation is hosted on nboost.readthedocs.io. It is automatically built, updated and archived on every new release.

Contributing

Contributions are greatly appreciated! You can make corrections or updates and commit them to NBoost. Here are the steps:

  1. Create a new branch, say fix-nboost-typo-1
  2. Fix/improve the codebase
  3. Commit the changes. Note the commit message must follow the naming style, say Fix/model-bert: improve the readability and move sections
  4. Make a pull request. Note the pull request must follow the naming style. It can simply be one of your commit messages, just copy paste it, e.g. Fix/model-bert: improve the readability and move sections
  5. Submit your pull request and wait for all checks passed (usually 10 minutes)
    • Coding style
    • Commit and PR styles check
    • All unit tests
  6. Request reviews from one of the developers from our core team.
  7. Merge!

More details can be found in the contributor guidelines.

Citing NBoost

If you use NBoost in an academic paper, we would love to be cited. Here are the two ways of citing NBoost:

  1. \footnote{https://github.com/koursaros-ai/nboost}
    
  2. @misc{koursaros2019NBoost,
      title={NBoost: Neural Boosting Search Results},
      author={Thienes, Cole and Pertschuk, Jack},
      howpublished={\url{https://github.com/koursaros-ai/nboost}},
      year={2019}
    }

License

If you have downloaded a copy of the NBoost binary or source code, please note that the NBoost binary and source code are both licensed under the Apache License, Version 2.0.

Koursaros AI is excited to bring this open source software to the community.
Copyright (C) 2019. All rights reserved.

nboost's People

Contributors

colethienes avatar eltu avatar kaykanloo avatar klasocki avatar pertschuk avatar rajeshkp avatar sagarpalyal avatar teozosa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nboost's Issues

Complex query

Hello there,

Firstly, thank you for your work.

I got a question about complex query. I looked into es.py in codex folder and we can see you are looking for body['query']['match'] or body['query']['match']['query'] to find the query.

My question is, for a complex query like below, is it possible to use nboost ?
Because when I tried it, I think my query is just proxied to ElasticSearch without post process.

{
  "size": 11,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "text": {
                  "query": "fréquenc recyclag format conducteur professionnel",
                  "operator": "and"
                }
              }
            },
            {
              "match": {
                "text": {
                  "query": "fréquenc recyclag format conducteur professionnel",
                  "operator": "or"
                }
              }
            }
          ]
        }
      },
      "script_score": {
        "script": {
          "source": "1 + ((5 - doc[\"priority\"].value) / 10.0) + ((doc[\"branch\"].value == \"All\") ? 0.5 : 0)"
        }
      }
    }
  }
}

Thanks,

Alexandre

Supported ElasticSearch version

I intent to deploy nboost on AWS ES service, but AWS ES service only support ES v7.1, is there a dependency on which ES version for nboost?

Kubernetes install issue - ModuleNotFoundError

I am trying to deploy nboost in Kubernetes cluster using "koursaros/nboost:latest-pt" image. But it is failing. Getting the following message in the logs

Traceback (most recent call last):
  File "/opt/conda/bin/nboost", line 5, in <module>
    from nboost.__main__ import main
  File "/opt/conda/lib/python3.6/site-packages/nboost/__main__.py", line 2, in <module>
    from nboost.proxy import Proxy
  File "/opt/conda/lib/python3.6/site-packages/nboost/proxy.py", line 3, in <module>
    from flask import (
ModuleNotFoundError: No module named 'flask'

Can you please advise.
Thank you.

Index multiple fields.

I couldn't index multiple fields in the index of Elasticsearch using nboost-index command. My csv file contains 5 columns and I want to index all the field and search on one field. How can I achieve that in NBoost?

nboost Results-Not giving correct number of output passages

Hi,

I tried travel.csv file to produce search results using nboost. The below parameters are given along with other required parameters. But the API returned only 5 passage results. I was expecting 10 as size was given 10. Could you please rectify the same/let me know what would be the issue? (I have followed the same steps as mentioned in the document)
'default_topk': 20, 'topn': 100, 'q': 'passage:How long will it take from airport to hotel', 'size': 10

Thanks&Regards,
Jisha Joseph.

NBoost using existing indexes

Do you have an example of how to use nboost with an existing index? I have an elastic instance with documents already ingested and indexed. I want to try using nboost. I've done the demo provided with travel.csv and got the following output:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  } ...
  "nboost": {}

but when I run another query pointing at my existing index like so:

curl "http://localhost:8000/knowledge-management/_search?pretty&q=body:technology&size=2"

I get the following output:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 16,
    "successful" : 16,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

which looks like it's just by-passing nboost? Thoughts? Suggestions? I really appreciate the help!

Cannot download models (biobert)

trying to run nboost with Tensorflow using Biobert model. Getting the following stack trace.

:resolve_model:[__i:res: 43]:Extracting "/usr/local/lib/python3.6/dist-packages/nboost/.cache/biobert-base-uncased-msmarco" from /usr/local/lib/python3.6/dist-packages/nboost/.cache/biobert-base-uncased-msmarco.tar.gz                                                                                                                                                                                                                
 Traceback (most recent call last):                                                                                                                                                                                    
   File "/usr/local/bin/nboost", line 8, in <module>                                                                                                                                                                   
     sys.exit(main())                                                                                                                                                                                                  
   File "/usr/local/lib/python3.6/dist-packages/nboost/__main__.py", line 10, in main                                                                                                                                  
     proxy = Proxy(**vars(args))                                                                                                                                                                                       
   File "/usr/local/lib/python3.6/dist-packages/nboost/proxy.py", line 56, in __init__                                                                                                                                 
     **cli_args)  # type: RerankModelPlugin                                                                                                                                                                            
   File "/usr/local/lib/python3.6/dist-packages/nboost/plugins/models/__init__.py", line 44, in resolve_model                                                                                                          
     extract_tar_gz(binary_path, data_dir)                                                                                                                                                                             
   File "/usr/local/lib/python3.6/dist-packages/nboost/helpers.py", line 96, in extract_tar_gz                                                                                                                         
     tar = tarfile.open(fileobj=fileobj)                                                                                                                                                                               
   File "/usr/lib/python3.6/tarfile.py", line 1576, in open                                                                                                                                                            
     raise ReadError("file could not be opened successfully")                                                                                                                                                          
 tarfile.ReadError: file could not be opened successfully

Looks like there's an issue with the hosted file: https://storage.googleapis.com/koursaros/biobert-base-uncased-msmarco.tar.gz I'm getting:

<Code>UserProjectAccountProblem</Code>
<Message>User project billing account not in good standing.</Message>
<Details>
The billing account for the owning project is disabled in state closed
</Details>
</Error>```

Kubernetes Install issue

Hi,

I'm trying to install nboost in Kubernetes cluster using below command,
helm install nboost --set replicaCount=3 --set service.type=ClusterIP --set args.uhost=elasticsearch-elasticsearch-coordinating-only nboost/nboost

Its crashing,

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
usage: nboost [-h] [--debug DEBUG] [--no_rerank NO_RERANK] [--search_route SEARCH_ROUTE]
              [--query_path QUERY_PATH] [--topk_path TOPK_PATH] [--default_topk DEFAULT_TOPK]
              [--cvalues_path CVALUES_PATH] [--cids_path CIDS_PATH] [--choices_path CHOICES_PATH]
              [--query_prep QUERY_PREP] [--verbose VERBOSE] [--host HOST] [--port PORT]
              [--uhost UHOST] [--uport UPORT] [--ussl USSL] [--delim DELIM] [--lr LR]
              [--max_seq_len MAX_SEQ_LEN] [--bufsize BUFSIZE] [--batch_size BATCH_SIZE]
              [--topn TOPN] [--workers WORKERS] [--data_dir DATA_DIR] [--model MODEL]
              [--model_dir MODEL_DIR] [--qa QA] [--prerank PRERANK] [--qa_model QA_MODEL]
              [--qa_model_dir QA_MODEL_DIR] [--filter_results FILTER_RESULTS]
nboost: error: unrecognized arguments: --config=elasticsearch --multiplier=5 --rerank=true --search_path=/.*/_search

Can you please help.

CPU based TF error

I'm getting this error while starting, I have changed the docker image since I don't have GPU on my server, but still it complains about a missing libcuda

I've tried to build the image, and also tried tensorflow/tensorflow without gpu tag (tensorflow/tensorflow:1.15.0-py3)

nboost_1                 | C:BertModel:[pro:run:273]:Upstream host is data.humanoyd.com:80
nboost_1                 | 2019-12-10 06:30:49.581369: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
nboost_1                 | 2019-12-10 06:30:49.619399: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2294605000 Hz
nboost_1                 | 2019-12-10 06:30:49.622090: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f3a75d30d10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
nboost_1                 | 2019-12-10 06:30:49.622150: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
nboost_1                 | 2019-12-10 06:30:49.629999: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
nboost_1                 | 2019-12-10 06:30:49.630250: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
nboost_1                 | 2019-12-10 06:30:49.630376: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (a9720d7c87ae): /proc/driver/nvidia/version does not exist
nboost_1                 | 2019-12-10 06:30:50.895906: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 93763584 exceeds 10% of system memory.

Is this related to dependency of nboost or what do I do, any hints are appreciated. I imagine a lot of people would like to use CPU since it's more of a server thing and not a lot of providers support GPUs

When using QA: "....0-py3.7.egg/nboost/plugins/models/__init__.pyline 24, in resolve_model"

I have the distilbert-base-uncased-distilled-squad model downloaded into the .cache directory and have the following configuration. All values are at default except these.
The distilber model does not seem to be resolvable as it isn't in the CLASS_MAP.

nboost    --uhost 192.168.5.123    --uport 9200   --search_route "/<index>/_search"   \
    --query_path url.query.q            \
    --topk_path url.query.size          \
    --default_topk 10                   \
    --choices_path body.hits.hits       \
    --cvalues_path _source.context      \
    --qa True                           

I had the qa.model_dir in the above script but that also does not resolve the issue.
Question: should I rebuild nboost after adding the model to the CLAAS_MAP?

File "/home/steph/anaconda3/envs/bert/lib/python3.7/site-packages/nboost-0.3.0-py3.7.egg/nboost/plugins/models/__init__.py", line 24, in resolve_model
    raise ImportError('Class "%s" not in %s.' % CLASS_MAP.keys())
TypeError: not enough arguments for format string

train model?

Would it be possible to train a model on our own data leveraging pretrained BERT models from transformerslibrary for instance?
I don't see anything related to the training part (on the Pytorch part in particular)

How to use AWS ES proxy url?

Hello,

I indexed all my data in the cloud AWS ES instance and trying to handle search by connecting via proxy SSL URL along with authentication token. I am failing to establish a connection since there is no default argument to input the authentication token. Could you please help me how to connect?

How does it compare to S-BERT + Elastic vector field?

Hi,

I have read with great interest the discussion between @realsergii (one of the author of https://github.com/UKPLab/sentence-transformers) and @pertschuk (author of nboost) here.

I got that S-BERT task is harder because:

BERT is able to use attention to compare directly both sentences(e.g. word-by-word comparison), while SBERT must map individual sentences from an unseen topic to a vector space such that arguments with similar claims and reasons are close.

(from https://arxiv.org/pdf/1908.10084.pdf)

What I want to know is how big the difference is?
My understanding is that @pertschuk has run quite a lot of tests before starting this project, and I am wondering if we are speaking of a 5 / 10 / 20 / more points difference in relevancy measure (for instance)?

Thank you for all the info you can bring.

Kind regards,
Michael

Support for complex elasticsearch dsl queries - like queries using "OR", "AND"

Hi,
Thanks for this amazing work.
I could make it work for simple Elasticsearch queries, but can't find a way to use it for a bit complex queries. Like suppose, I have to write a query to search for both "head" and "pain".

Eg:-
If I just have to search for "head", it works
Below is the request format:-

response = requests.get(
    url='http://localhost:8000/temp/_search',
    json={
        'nboost': {
            'uhost': 'localhost',
            'uport': 9205,
            'query_path': 'body.query.bool.must.term.*',
            'topk_path': 'body.size',
            'default_topk': 2,
            'topn': 10,
            'choices_path': '_source',
            'cvalues_path': '_source.*'
        },
        'size': 2,
        "query": {
            "bool": {
                      "must": {"term": {"message": "head"}}
                    }
            }
        }
)

Now when I have to search for both "head" and "pain", it should be something like this, but it does not work.

response = requests.get(
    url='http://localhost:8000/temp/_search',
    json={
        'nboost': {
            'uhost': 'localhost',
            'uport': 9205,
            'query_path': 'body.query.bool.must.term.*',
            'topk_path': 'body.size',
            'default_topk': 2,
            'topn': 10,
            'choices_path': '_source',
            'cvalues_path': '_source.*'
        },
        'size': 2,
        "query": {
            "bool": {
                      "must": [{"term": {"message": "head"}}, {"term": {"message": "pain"}}]
                    }
            }
        }
)

So, does NBoost support this kind of complex queries, if so how can it be done. I am not able to find anything in the docs.

Thanks,

Error while deploying the proxy : TypeError: func() takes 1 positional argument but 2 were given

When I am running nboost --uhost localhost, I am getting this error.
Traceback (most recent call last):
File "c:\users\7328637\appdata\local\continuum\anaconda3\envs\tensorflow_env\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "c:\users\7328637\appdata\local\continuum\anaconda3\envs\tensorflow_env\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\7328637\AppData\Local\Continuum\anaconda3\envs\tensorflow_env\Scripts\nboost.exe_main
.py", line 7, in
File "c:\users\7328637\appdata\local\continuum\anaconda3\envs\tensorflow_env\lib\site-packages\nboost_main
.py", line 10, in main
proxy = Proxy(**vars(args))
File "c:\users\7328637\appdata\local\continuum\anaconda3\envs\tensorflow_env\lib\site-packages\nboost\proxy.py", line 57, in init
**cli_args) # type: RerankModelPlugin
File "c:\users\7328637\appdata\local\continuum\anaconda3\envs\tensorflow_env\lib\site-packages\nboost\plugins\models_init_.py", line 40, in resolve_model
logger.info('Downloading "%s" model.', model_dir)
TypeError: func() takes 1 positional argument but 2 were given

Dependencie missing: numpy

I'm trying to run a container using docker-compose configured like:

nboost_mrturing:
   container_name: nboost_mrturing
   image: koursaros/nboost:latest-alpine
   environment:
     - uhost=elasticsearch-master
     - uport=9200
   ports:
     - 9090:8000

But the container returns an error:

nboost_mrturing         |   File "/usr/local/lib/python3.7/site-packages/nboost/model/bert_model/__init__.py", line 3, in <module>
nboost_mrturing         |     import numpy as np
nboost_mrturing         | ModuleNotFoundError: No module named 'numpy'

Am I missing something?

Tensorflow Drone Build

Here's the stack from the drone:

WARNING:tensorflow:From /drone/src/nboost/model/bert_model/__init__.py:26: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
--
97 |  
98 | WARNING:tensorflow:From /drone/src/nboost/model/bert_model/__init__.py:26: The name tf.logging.ERROR is deprecated. Please use tf.compat.v1.logging.ERROR instead.
99 |  
100 | 2019-12-02 18:45:47.464940: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
101 | 2019-12-02 18:45:47.493108: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1996200000 Hz
102 | 2019-12-02 18:45:47.495342: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f036d87ea20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
103 | 2019-12-02 18:45:47.495363: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
104 | 2019-12-02 18:45:47.497434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
105 | 2019-12-02 18:45:47.497454: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (-1)
106 | 2019-12-02 18:45:47.497473: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (5fd1963b10ef): /proc/driver/nvidia/version does not exist

Not sure if this is the sole issue, or there is another issue. Regardless, the drone is timing out.

use case; tyres database

Hi,

Hope you are all well !

I have created a small project for collecting data from online tyre sellers, https://github.com/lucmichalski/peaks-tires, and I was wondering what kind of benefits nboost can provide to my search engine. Actually, I have setup with manticoresearch as a full-text search engine.

So, how nboost can solve issues like tyre dimension search or complex search like tyres brand/model ?

Thanks for any insights or inputs.

Cheers,
Luc Michalski

Benchmarking guide doesn't work

Hi,

Thank you for this work! It's great!

I'm trying to reproduce the benchmark results and tried to follow the guide here, but I suspect that it is not up to date?
I received the following error when sending the requests:

E:Proxy:[pro:loo:110]:Request (127.0.0.1:61243): server status {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/trec_car/_search] contains unrecognized parameter: [nboost]"}],"type":"illegal_argument_exception","reason":"request [/trec_car/_search] contains unrecognized parameter: [nboost]"},"status":400}

What is the right way to do the benchmarking with the current codebase?

Thanks!

More documentation please

This library is excellent and I would really like to use it, so first of all thank you for making it available!

Unfortunately, there isnt enough documentation to make it particularly usable beyond a default deployment. For example it is not clear how I could:

  1. send a query directly to es through the nboost proxy without reranking (this is useful for evaluating performance)
  2. use my own model; you provide cli access to control from a list of hosted models but I would like to load a custom model

I am digging through the codebase to solve these issues but it would be great if you could document the API as I am sure these things are straight forward.

Thanks again for the great library.

ModuleNotFoundError: No module named 'jsonpath_ng.bin'; 'jsonpath_ng' is not a package

Hi,
I was getting the following error - ModuleNotFoundError: No module named 'jsonpath_ng.bin'; 'jsonpath_ng' is not a package

I installed nboost with pip install nboost[pt]

I found out the problem to be with jsonpath_ng. See the issue here.

I solved it by uninstalling jsonpath_ng. Do this:

pip uninstall jsonpath_ng

Reinstall older version

 pip install jsonpath-ng==1.4.3

Using nboost with Elastic Cloud/AWS Elasticsearch service

Hi,

First of all, thanks for this awesome project!

I have gone through the examples in the docs and all of them were on a local server. I would really appreciate if I can get an example/docs for deploying this with Elastic Cloud/AWS Elasticsearch service.

Will I have to host the proxy somewhere and point it towards my ES index?

Thanks,
Abhijith

ES search template support

first of all - thank you very much for making search better :)
Question: are you planning to extend nboost by adding support for search templates? E.g. a query could look like:

POST http://192.168.99.100:8000/index/_search/template
{
  "id": "searchTemplate",
  "params": {
    "query": "my query",
    "from": 0,
    "size": 9
  }
}

Thanks!

ElasticSearch

In your experiments, did you tune ElasticSearch with analyzers (stem, lemmatization, stopwords...) or you only used ES out of the box?

Error while deploying proxy

I am facing an error when I run the below deploy proxy command -

Error -
I:resolve_model:[__i:res: 37]:Found model cache in /opt/conda/lib/python3.7/site-packages/nboost/.cache/pt-tinybert-msmarco.tar.gz
I:resolve_model:[__i:res: 43]:Extracting "/opt/conda/lib/python3.7/site-packages/nboost/.cache/pt-tinybert-msmarco" from /opt/conda/lib/python3.7/site-packages/nboost/.cache/pt-tinybert-msmarco.tar.gz
Traceback (most recent call last):
File "/opt/conda/bin/nboost", line 10, in
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/nboost/main.py", line 10, in main
proxy = Proxy(**vars(args))
File "/opt/conda/lib/python3.7/site-packages/nboost/proxy.py", line 48, in init
**cli_args) # type: RerankModelPlugin
File "/opt/conda/lib/python3.7/site-packages/nboost/plugins/models/init.py", line 44, in resolve_model
extract_tar_gz(binary_path, data_dir)
File "/opt/conda/lib/python3.7/site-packages/nboost/helpers.py", line 96, in extract_tar_gz
tar = tarfile.open(fileobj=fileobj)
File "/opt/conda/lib/python3.7/tarfile.py", line 1578, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

Command -
nboost
--uhost localhost
--uport 9200
--search_route "//_search"
--query_path url.query.q
--topk_path url.query.size
--default_topk 10
--choices_path body.hits.hits
--cvalues_path _source.passage

P.S. - I am populating correct values of uhost and uport while running it and installed the NBoost package via Pip method

Also, when I try to look into the file /opt/conda/lib/python3.7/site-packages/nboost/.cache/pt-tinybert-msmarco.tar.gz it seems to be an empty file with 0 bytes

Has anyone faced the same issue ??

Thanks in advance for your help

Container not accessible

Hi!

I'm able to run my nboost container but I can't make an http request.

So this is my docker ps output:

CONTAINER ID        IMAGE                                                     COMMAND                  CREATED             STATUS              PORTS                              NAMES
07bc9e7b2e4a        koursaros/nboost:latest-tf                                "nboost --uhost elas…"   4 minutes ago       Up 4 minutes        0.0.0.0:8000->8000/tcp             nboost

And the output of my cURL - curl "http://localhost:8000/travel/_search?pretty&q=passage:vegas&size=2" - is:

curl: (56) Recv failure: Connection reset by peer

And here is my latests outputs of the container:

nboost         | 2019-11-27 01:36:57.934037: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
nboost         | 2019-11-27 01:36:57.945768: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (-1)
nboost         | 2019-11-27 01:36:57.945820: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (07bc9e7b2e4a): /proc/driver/nvidia/version does not exist
nboost         | 2019-11-27 01:36:59.531419: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 93763584 exceeds 10% of system memory.

My docker-compose config is:

  nboost:
    container_name: nboost
    image: koursaros/nboost:latest-tf
    command: --uhost elasticsearch-master --uport 9200 --field passage --workers 1
    ports:
      - 8000:8000

And finally I'm seeing the:

C:BertModel:[__i:run:247]:Upstream host is elasticsearch-master:9200
nboost         | I:BertModel:[__i:run: 48]:Starting 1 workers...
nboost         | C:BertModel:[__i:run: 55]:Listening on 127.0.0.1:8000...

Changing port does not work? Argument ignored?

When starting nboost with argument --uport 6754 I still see errors that nboost can't connect to instance at 0.0.0.0:9200 urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=9200): Max retries exceeded with url: /

Seems like port is overridden by default values anyways?

Train model using domain data

Hi, Thank you for amazing framework. I'm planning to use this framework for question answer system.
Can you please help me how can I train model with my own domain data? I know that nboost doesn't support training, but can you help me how to train outside nboost and configure to nboost later.

Thank you.

Do I need to re-index an existing ElasticSearch index?

I have an existing large index inside my ElasticSearch (~million documents, some of them pretty long).
I would like to use it with nboost, but avoid costly re-indexing and creating a csv file.

Is it possible, or do I need to use the nboost-index tool every time I want to work with new data?

Dataset For benchmarking

I need data for benchmarking. I am not getting correctly formatted data. If anyone can help with that , that would be great.

ReactiveSearch compatibility

Hi, I tried to use it with reactive search, for react js

and it makes it crash, did you change the normal response. I'm using ES 6

Unable to open database file

Hi, I have tried to run nboost and encounter database issue (error message: unable to open database file)

This is my step:

  • setup and run elasticsearch locally (have tried to crud and works)
  • clone this repo and python setup.py install, then run nboost --uport 9200 (also have try the full example on README.md
    -then indexing the sample file nboost-index --file travel.csv --index_name travel --delim , --id_col
  • when I try to access the dashboard localhost:8000/nboost it only show the header (without config card widget)
  • when I try to curl "http://localhost:8000/travel/_search?pretty&q=passage:vegas&size=2"

it returning:

{"doc":null,"msg":"('unable to open database file',)","type":"OperationalError"}

am I missing any step?

I guess this is because I did not have the nboost.db file (and I did not find it in this repo)

Unable to provide field "title_lang.en"

The platform works really well and I thank the authors for releasing it to the world.

I have a field "title_lang.en" in my ElasticSearch index. I want to rerank my documents based on this field. In the --query_path, --cvalues_path, and q, when I provide the field "title_lang.en", Nboost just acts as a proxy to Elasticsearch and I see the message "Missed query". Perhaps, NBoost is looking at ".en" as a JSON path instead of looking at "title_lang.en" as a whole field to query ES. What would be a work around to get NBoost to accept this?

I am currently looking at creating alias field names on ES or reindex with new field names. It would be very convenient if there exists a work around within NBoost? Thanks for the support.

Model name 'nboost/pt-tinybert-msmarco' was not found in model name list

I'm trying to follow the example but following either the pip (or docker routes) I get:

nboost                                      --uhost localhost                       
--uport 9200                           
 --search_route "/<index>/_search"      
 --query_path url.query.q               
 --topk_path url.query.size            
  --default_topk 10                      
 --choices_path body.hits.hits           
--cvalues_path _source.passage

Model name 'nboost/pt-tinybert-msmarco' was not found in model name list. This looks similar in some ways to #58 . Any suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.