Comments (7)
I have updated https://pyvespa.readthedocs.io/en/latest/troubleshooting.html#full-disk, linking to a new example in https://pyvespa.readthedocs.io/en/latest/application-packages.html , where one can set a higher limit.
I know this is a bit cumbersome, please give this a try and let me know.
https://docs.vespa.ai/en/proton.html#proton-maintenance-jobs means that Vespa needs disk space for compaction jobs. How much is schema-dependent, 75% is a conservative number - this helps operators avoiding index corruption due to full disk.
from pyvespa.
- When running locally I suggest configure disk size with Docker Desktop. Yes, Vespa could have its own way of setting disk limits, but it doesn't seem necessary given that Docker/Linux containers already do this, and running on a container is preferable in any case.
- Since Vespa assumes it controls the entire disk given to it, it follows that if the disk is 90% full Vespa has already used 90% of it and should not use even more.
from pyvespa.
@ch3rn0v
1 - For development I see that I am running with a volume of 170G and using docker desktop for mac. In production I think we are using xfs anyway. Not a docker expert so I have no expert advise to give here.
2 - It is slightly more complicated as it is a variable number which initially will be very small and not enough to cater for the fixed overhead. Doing that would also allow other services using disk/memory to run on the same node, which would be a feature for development nodes, but make running a production system much more complicated. And since there are other ways to solve this issue we have not prioritized making our own solution.
from pyvespa.
Thanks a lot for your effort! If I were making design choices regarding the architecture I would fix the issue at its root. Which I believe is the fact that the disk space required by Vespa for "compaction jobs" is proportional to the space already occupied by the storage. It has absolutely nothing to do with the total disk space. An empty storage doesn't need free extra 100Gb to add ten 5Kb documents to an empty storage.
I wonder what other convenient choices were made by Vespa authors, but I'd rather not spend my time satisfying my curiosity so long as there are properly designed alternatives that just work.
from pyvespa.
Some time ago, in a distant past, the conclusion was vespa was made for large systems spanning multiple distributed machines focusing on scalability and ease of operation. From this the conclusion followed that apart for monitoring and other minor services vespa would be the only service running.
This still holds true.
If you want to present a smaller portion of your machine to vespa docker/podman is the better solution for that.
from pyvespa.
@baldersheim , thank you for the explanation! If you don't mind:
- How does one experiment with Vespa locally or develop it before deploying to multiple distributed machines?
- How is a proportion of already occupied space worse than a proportion of all available space?
I tried using docker. However, specifying "overlay2.size=10G"
doesn't work for file systems other than xfs and passing --storage-opt size=10G
to a docker call requires to either change the call in the Vespa's source code or pass it as an argument or config somewhere, but I don't see it being mentioned in Vespa's documentation anywhere.
from pyvespa.
Now that I see the reasoning behind this choice, I understand it much better. Thanks a lot for clarifying this! Unfortunately I didn't find any easy and clean way to run and test Vespa locally w/o Docker GUI, so perhaps I'll try it out some other time out of curiosity. For now running Weaviate seemed simple enough, but Vespa and Faiss is something I'd like to do in order to compare things.
from pyvespa.
Related Issues (20)
- How does Vespa handle ColBERT type queries? HOT 1
- Expose URL in VespaDocker
- Fix windows-compatibility of `VespaCloud()._vespa_auth_login`
- Add OS and python-version matrix for unit tests
- ModuleNotFoundError: No module named 'termios'" error after pyvespa version 0.40.0 in Windows
- Pin `requests`-version HOT 1
- Support document expiry using pyvespa
- Improve docs on TokenAuth - both deployment and connect HOT 1
- Possible bug in VespaAsync/Vespa.asyncio HOT 3
- Add example usage to `VespaAsync`docstring HOT 1
- Unable to access termios submodule on Windows HOT 1
- CI: Fix and add back `scaling_personal_assistants`-notebook to tests HOT 1
- Add compression support HOT 1
- Configure workflows for running actions on PR from forks HOT 1
- Consider custom JSON-library
- Add compression support for feed and query (post)
- Notebook batch updates - simplify auth HOT 1
- Add support for configuring tuning/persearch threads
- Fix/update langchain links in notebooks
- Ensure notebooks-cloud integration tests run on current pyvespa-version HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyvespa.