Comments (15)
@Gladiator566 : I have created vespa-engine/vespa#30219 for the X-Content-Hash issue you reported - this is most likely a different issue than reported by @ricoms here. Thanks for reporting!
from pyvespa.
Hi, sorry for slow reponse! The (first) problem to solve is the configproxy not reaching the config server. Please try https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/docker-compose.yaml and validate that works - then you can modify the compose file, adding your stuff.
Maybe best place to look is network/hostnames - this looks like a connectivity problem, so maybe add a network and use a fully qualified hostname instead of just vespa
from pyvespa.
@kkraune Hi, I try to use bge-m3 model to do embedding hybrid search, and I use refer to official tutorial to deploy a local docker containter to use vespa. Since I have over millions data to feed, so I try to use feed_iterable function to feed iterable bulk data, and I encountered same problems as above, like WARNING/urllib3.connectionpool: Retrying NewConnectionError: Failed to establish a new connection
or Max retries exceeded with URL
sth like that. I try to set max_connections
params to a huge number, and try to create a session to do feed, but it doesn't work, how can I solve this connection full error to insert bulk data to vespa? Thank you !
my env: linux, pyvespa version is 0.39, docker image is latest
from pyvespa.
Hi @Gladiator566 I think you must look in the vespa.log to validate what the problem might be - and if so, follow the advise to try /multinode-HA/docker-compose.yaml to verify this works, before trying your own configuration
You can also try https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html to make it easier, using the free trial, to eliminate other failures.
from pyvespa.
@kkraune I try to use vespa cloud as tutorial, but i got error like RuntimeError: Status code 400 doing POST at https://api.vespa-external.aws.oath.cloud:4443/application/v4/tenant/bge-m3/application/bgeM3/instance/default/deploy/dev-aws-us-east-1c: Value of X-Content-Hash header does not match computed content hash
, how to solve this problem?
from pyvespa.
Thanks for reporting. Can you add the steps you took, so we can reproduce? Or did you follow the steps in https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html and it failed here?
app = vespa_cloud.deploy()
A good hint is also to make sure there are no applications already deployed.
@hmusum I assume this is an error from our API, we should document how to fix this
from pyvespa.
@kkraune yes, I follow the exact steps in https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html , concretely as bellow:
- download vespa-cli from github latest release
- vespa config set target cloud
- vespa config set application bge-m3.bgeM3
- vespa auth cert -N
- vespa auth api-key
- add public api-key to cloud browser key site
vespa_cloud = VespaCloud( tenant=os.environ["TENANT_NAME"], application='bgeM3', key_content=None, key_location=api_key_path, application_package=application_package)
and it failed at
app = vespa_cloud.deploy()
, there are no applications already deployed in cloud.
Thanks.
from pyvespa.
Hi again, I tried https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html and it worked for me. I run the notebook locally on my laptop. Some ideas
the .vespa
directory in your home dir stores credentials - you cna temprarily move this to another name to reset all credentials, and try the guide again, with no other changes. You can also delete the api-key in the console and try with a fresh one
from pyvespa.
The "Value of X-Content-Hash header does not match computed content hash" error is due to some misconfiguration or bug on the client side, but it's hard to say what the user should do without knowing the root cause of the error.
from pyvespa.
The problem seems to be a mismatch with the hash computed in
https://github.com/vespa-engine/pyvespa/blob/master/vespa/deployment.py#L644
and validation of this in Vespa Cloud. pyvespa 0.39, which is the latest. We are looking into.
from pyvespa.
@kkraune Hi, I try to use bge-m3 model to do embedding hybrid search, and I use refer to official tutorial to deploy a local docker containter to use vespa. Since I have over millions data to feed, so I try to use feed_iterable function to feed iterable bulk data, and I encountered same problems as above, like
WARNING/urllib3.connectionpool: Retrying NewConnectionError: Failed to establish a new connection
orMax retries exceeded with URL
sth like that. I try to setmax_connections
params to a huge number, and try to create a session to do feed, but it doesn't work, how can I solve this connection full error to insert bulk data to vespa? Thank you !
Hi, I also encounter the same problem. Did you know how to fix it?
from pyvespa.
Hi @vudangthinh ! I don't think increasing number of connections will help, the error message is probably a symptom of a maxed out instance.
The https://docs.vespa.ai/en/vespa-cli.html has better feed flow control, can you please try that and see how the feeding goes and let me know?
from pyvespa.
I tried to use vespa feed, however the error still persistent:
At first, the indexing process was ok, but when I run many document the error start happen:
feed: got error "Post "http://127.0.0.1:8080/document/v1/benchmark/hybridsearch/docid/2936": write tcp 127.0.0.1:52604->127.0.0.1:8080: write: broken pipe" (no body) for put id:benchmark:hybridsearch::2936: retrying
feed: got error "Post "http://127.0.0.1:8080/document/v1/benchmark/hybridsearch/docid/3283": write tcp 127.0.0.1:52610->127.0.0.1:8080: write: broken pipe" (no body) for put id:benchmark:hybridsearch::3283: retrying
feed: got error "Post "http://127.0.0.1:8080/document/v1/benchmark/hybridsearch/docid/3090": write tcp 127.0.0.1:52622->127.0.0.1:8080: write: broken pipe" (no body) for put id:benchmark:hybridsearch::3090: retrying
feed: got error "Post "http://127.0.0.1:8080/document/v1/benchmark/hybridsearch/docid/3273": write tcp 127.0.0.1:52634->127.0.0.1:8080: write: broken pipe" (no body) for put id:benchmark:hybridsearch::3273: retrying
from pyvespa.
OK - can you please check vespa.log inside the Docker Container? Could be a resource problem, the log might say
from pyvespa.
You are probably sending more requests than the system can handle timely and therefore some of them end up crossing a connection recycling event. These will be retried until timeout so not really an error in itself, but you probably want to increase your resources (maybe run with GPU) or feed slower. Setting a lower timeout (--timeout) should get rid of these messages and lead to less queuing which is probably advantageous if you want to determine faster what actual max throughput you can get.
from pyvespa.
Related Issues (20)
- OSError when using multiple async feed_batch() instead of feed_data_point() HOT 2
- Access to user query in reranking phase when using approximate neighbor search HOT 1
- Use rank profile inputs HOT 4
- What about increasing the minimum required Python version? HOT 1
- Add field to schema HOT 2
- No document can be fed because ostensibly there isn't enough disk space HOT 7
- Allow to add configuration to content HOT 2
- Dependency Dashboard
- Types for RankProfile.inputs HOT 1
- VespaSync::delete_all_docs only makes a single delete call HOT 1
- Better feed implementation/interface HOT 1
- Deprecation of PyVespa functionality HOT 1
- Add support for mode streaming with pyvespa HOT 1
- Add support for node resource specification and count
- Remove dependency on pandas
- Struct inconsistencie HOT 3
- Deploy with docker compose and similar setup vespa to other vector dbs HOT 5
- clear explanation where does the docker container stores files HOT 1
- Type hint of RankProfile inputs HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyvespa.