Comments (10)
One other distinction to make: "in memory" doesn't necessarily mean "all in client with no backend."
It could also mean a full backend/frontend split, but with the backend implemented in-memory + very simple persistence (as opposed to a more complex vector database) for the MVP.
In the past when I've advocated to start "in memory", that's what I was referring to... a full backend but with a trivial in-memory implementation. Not trying to cram all the computation into the frontend.
MLFlow is an interesting model, but note that they are highly modular and support many different topologies. We could do that too, but my guess is we want to streamline and present "one way" as the default. We can enable other modalities as options if that's where the market pushes us.
from chroma.
@levand @atroyn Opened this issue to discuss. I realized we should talk through this while building #13
from chroma.
Making a list (will be updated inline) of projects that we can perhaps find some inspiration from...
Logging / APM
- https://github.com/getsentry/sentry (python agent and backend)
- https://github.com/prometheus/prometheus (backend is go however)
- https://github.com/SigNoz/signoz (backend is go however)
Product Analytics
- https://github.com/PostHog/posthog (python agent and backend)
Orchestration
- https://github.com/dagster-io/dagster
- https://github.com/apache/airflow
- https://github.com/PrefectHQ/prefect
ML monitoring / experiment management
- https://github.com/mlflow/mlflow (this flexible agent/server modality is particularly interesting)
to be continued........
from chroma.
This is a great question.
We're talking about an initial MVP right? My question is... is a "Wolf A" model actually satisfy the "V" in "MVP"... is it viable? It could make a good demo and help generate sales leads, for sure. but as I understand the product, we're almost certainly going to need Wolf B for any kind of production use. For example, as soon as we start persisting data, we're going to need to do it to somewhere other than a developer's laptop or CI instance.
So I'm going to go out on a limb and say that I don't think a completely in-process model makes sense even for an MVP. So, in this hypothesis, we're going to have a frontend, and a backend (chroma-client
and chroma-server
, as hypothesized)
But there's still a decision point to be made here:
- Do we have a "thick" backend and "thin" frontend, with most the logic and algorithmic work performed in the backend and the frontend just serving as a developer interface?
- Or do we have a "thick" frontend and a "thin" backend, with all the real logic and work performed in-process in the client, and the backend just being a thin proxy for persistence?
- Or technically, you could split it and have some algorithmic work performed on the server and some on the client.
I do tend to disregard #3, just because it could get a lot more complicated for probably not a ton of benefit.
When trying to compare between type 1 and type 2, we need to consider:
- Where is it easier/possible to satisfy the computational requirements of the algorithms we want to run?
- What are the network transport constraints of satisfying a type 1 vs a type 2 model?
Satisfying the computational requirements is probably going to be easier in a 1 model, since it's easier to establish a requirement that a Chroma Server has $x amount of RAM & CPU/GPU power than it is that every possible client will.
The network transport constraints are another question. Ultimately, it's a wash, because the same data has to be brought together at some point to perform the operations we want. If we're considering batch-mode operations, it genuinely doesn't matter because ultimately the same amount of data has to traverse some wire, somewhere, to make it happen. For high-frequency non-batched operations, you have to add 1-3 milliseconds of latency for each request and in that case it could make sense to have the computation local to the request, if it's a particularly performance-intensive scenario.
from chroma.
Here is another slightly different perspective - I like how MLFlow handles tracking - https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded. (ignore the artifact
part of the charts since we don't have heavy files to move around like mlflow does) In this paradigm, the lightest weight place things are serialized is a .chroma
folder. (MLFlow is 100% python a well and Apache 2.0). I think how they accomplish this is that all code is packaged up in the pip project. That means there is not a separate client or server... it just about which code you are using in various scenarios. I guess the downside of that is (1) size of the project in megabytes, and (2) versioning frequency and version management across the front and backend without explicit version pinning.
For (1)/(2)/(3) "where to put the business logic". I 100% agree that (3) is bad. My general bias is towards having a thick backend and thin client. Especially since most operations - the operations will need context from the db in order to complete (will need to query the NN index for example).
The discussion is still very open! :) @atroyn join the mix as well!
from chroma.
one additional note... mlflow is purely a store - it does no processing on it. that is different from us where our processing is computationally expensive.
@atroyn should we try to make the in-memory thing work at all? i'm starting to tend towards "not worth it"
from chroma.
I am inclined to thin client Wolf B for a few reasons:
- It is hard to switch from A to B and we almost certainly will in future.
- It is a neater separation of concerns to separate Chroma processing from the user's training / inference runtime.
- Most computation will be close to the DB since we'll need the thing stored in the DB like training embeddings to do the computation.
There's risks around things like breaking the client/server versioning in the future, and there is added complexity, but this gut checks as the right move to me.
Anthropic has something called Garçon, similar-ish for probing remote running models with the aim that their scientists can easily examine something running somewhere else, which uses a client-server setup
from chroma.
I also read luke's 'in memory' as referring to where the processing was done; flat storage with all computation done in memory rather than in a vector DB, I favor this as well for development speed and deployment ease into the user's machine.
from chroma.
Ok I agree with all of this. I think it was good to talk through, thanks for the thoughts! Keeping things simpler and opinionated is the right way to go (assuming we have the right opinions of course).
So I believe we all agree we will move forward with:
chroma-client
- a thin python client that writes to the backendchroma-server
- a fat python backend
That means then that if a user is using a notebook - they will need to do docker-compose up
(or whatever our backend init script is) in the notebook. Docker does work on Google colab! I am ok with this. Just confirming we are all on the same page here.
There is the additional question of how thin the client is... and specifically whether the backend has the idea of log
or whether the client
simply knows to call the things that log does (eg store the data here, trigger this reprocessing
). The current open discussion is here #13 (comment)
from chroma.
Closing this issue as we have agreed on a direction
from chroma.
Related Issues (20)
- [Install issue]: sqlite3.IntegrityError: NOT NULL constraint failed: collections.database_id HOT 4
- [Bug]: `chroma:document` as metadata key
- [Feature Request]: Universal Sentence Encoder(USE) Embedding Function HOT 1
- [Bug]: Warning raised when query to Persistent Client HOT 4
- [Feature Request]: Use AWS S3 or Azure Blob Storage for persisting chroma db
- [Bug]: Chromadb will fail to return the embeddings with the closest results unless I set n_results to a sufficiently large number. HOT 3
- [Bug]: Error creating and inserting to collections using Persistent Client HOT 3
- [Feature Request]: Query max distance in addition to n_results HOT 1
- [Bug]: getCollection missing DefaultEmbeddingFunction (JavaScript client)
- [Feature Request]: Faster default EF on apple silicon
- [Feature Request]: Passing pre-computed embeddings directly to VectorStore HOT 2
- [Bug]: Cosine Similarity: Unusual Negative Distance in Same Sentence Search HOT 2
- [Bug]: The Write-ahead Log (embeddings_queue) doesn't get cleaned up HOT 4
- [Feature Request]: Ability to close local clients
- [Bug]: When using AzureOpenAI for embedding `azure_deployment` needs to be provided as well HOT 1
- Extending Search Filter to Include Multiple Metadata Fields HOT 3
- what happens if I call chroma.from_documents twice? HOT 4
- [Bug]: ONNXRuntime error on multiple document upserts HOT 2
- [Bug]: add_documents gets slower with each call HOT 2
- exec /docker_entrypoint.sh: no such file or directory[Install issue]: HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chroma.