chroma-client is mainly is responsible for the public

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I am inclined to thin client Wolf B for a few reasons: It is h

Discussion: in-memory and chroma client-server --> sharing code? about chroma HOT 10 CLOSED

chroma-core commented on July 24, 2024

Discussion: in-memory and chroma client-server --> sharing code?

from chroma.

Comments (10)

levand commented on July 24, 2024 2

One other distinction to make: "in memory" doesn't necessarily mean "all in client with no backend."

It could also mean a full backend/frontend split, but with the backend implemented in-memory + very simple persistence (as opposed to a more complex vector database) for the MVP.

In the past when I've advocated to start "in memory", that's what I was referring to... a full backend but with a trivial in-memory implementation. Not trying to cram all the computation into the frontend.

MLFlow is an interesting model, but note that they are highly modular and support many different topologies. We could do that too, but my guess is we want to streamline and present "one way" as the default. We can enable other modalities as options if that's where the market pushes us.

from chroma.

jeffchuber commented on July 24, 2024

@levand @atroyn Opened this issue to discuss. I realized we should talk through this while building #13

from chroma.

jeffchuber commented on July 24, 2024

Making a list (will be updated inline) of projects that we can perhaps find some inspiration from...

Logging / APM

https://github.com/getsentry/sentry (python agent and backend)
https://github.com/prometheus/prometheus (backend is go however)
https://github.com/SigNoz/signoz (backend is go however)

Product Analytics

https://github.com/PostHog/posthog (python agent and backend)

Orchestration

ML monitoring / experiment management

https://github.com/mlflow/mlflow (this flexible agent/server modality is particularly interesting)

to be continued........

from chroma.

levand commented on July 24, 2024

This is a great question.

We're talking about an initial MVP right? My question is... is a "Wolf A" model actually satisfy the "V" in "MVP"... is it viable? It could make a good demo and help generate sales leads, for sure. but as I understand the product, we're almost certainly going to need Wolf B for any kind of production use. For example, as soon as we start persisting data, we're going to need to do it to somewhere other than a developer's laptop or CI instance.

So I'm going to go out on a limb and say that I don't think a completely in-process model makes sense even for an MVP. So, in this hypothesis, we're going to have a frontend, and a backend (chroma-client and chroma-server, as hypothesized)

But there's still a decision point to be made here:

Do we have a "thick" backend and "thin" frontend, with most the logic and algorithmic work performed in the backend and the frontend just serving as a developer interface?
Or do we have a "thick" frontend and a "thin" backend, with all the real logic and work performed in-process in the client, and the backend just being a thin proxy for persistence?
Or technically, you could split it and have some algorithmic work performed on the server and some on the client.

I do tend to disregard #3, just because it could get a lot more complicated for probably not a ton of benefit.

When trying to compare between type 1 and type 2, we need to consider:

Where is it easier/possible to satisfy the computational requirements of the algorithms we want to run?
What are the network transport constraints of satisfying a type 1 vs a type 2 model?

Satisfying the computational requirements is probably going to be easier in a 1 model, since it's easier to establish a requirement that a Chroma Server has $x amount of RAM & CPU/GPU power than it is that every possible client will.

The network transport constraints are another question. Ultimately, it's a wash, because the same data has to be brought together at some point to perform the operations we want. If we're considering batch-mode operations, it genuinely doesn't matter because ultimately the same amount of data has to traverse some wire, somewhere, to make it happen. For high-frequency non-batched operations, you have to add 1-3 milliseconds of latency for each request and in that case it could make sense to have the computation local to the request, if it's a particularly performance-intensive scenario.

from chroma.

jeffchuber commented on July 24, 2024

Here is another slightly different perspective - I like how MLFlow handles tracking - https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded. (ignore the artifact part of the charts since we don't have heavy files to move around like mlflow does) In this paradigm, the lightest weight place things are serialized is a .chroma folder. (MLFlow is 100% python a well and Apache 2.0). I think how they accomplish this is that all code is packaged up in the pip project. That means there is not a separate client or server... it just about which code you are using in various scenarios. I guess the downside of that is (1) size of the project in megabytes, and (2) versioning frequency and version management across the front and backend without explicit version pinning.

For (1)/(2)/(3) "where to put the business logic". I 100% agree that (3) is bad. My general bias is towards having a thick backend and thin client. Especially since most operations - the operations will need context from the db in order to complete (will need to query the NN index for example).

The discussion is still very open! :) @atroyn join the mix as well!

from chroma.

jeffchuber commented on July 24, 2024

one additional note... mlflow is purely a store - it does no processing on it. that is different from us where our processing is computationally expensive.

@atroyn should we try to make the in-memory thing work at all? i'm starting to tend towards "not worth it"

from chroma.

atroyn commented on July 24, 2024

I am inclined to thin client Wolf B for a few reasons:

It is hard to switch from A to B and we almost certainly will in future.
It is a neater separation of concerns to separate Chroma processing from the user's training / inference runtime.
Most computation will be close to the DB since we'll need the thing stored in the DB like training embeddings to do the computation.

There's risks around things like breaking the client/server versioning in the future, and there is added complexity, but this gut checks as the right move to me.

Anthropic has something called Garçon, similar-ish for probing remote running models with the aim that their scientists can easily examine something running somewhere else, which uses a client-server setup

from chroma.

atroyn commented on July 24, 2024

I also read luke's 'in memory' as referring to where the processing was done; flat storage with all computation done in memory rather than in a vector DB, I favor this as well for development speed and deployment ease into the user's machine.

from chroma.

jeffchuber commented on July 24, 2024

Ok I agree with all of this. I think it was good to talk through, thanks for the thoughts! Keeping things simpler and opinionated is the right way to go (assuming we have the right opinions of course).

So I believe we all agree we will move forward with:

chroma-client - a thin python client that writes to the backend
chroma-server - a fat python backend

That means then that if a user is using a notebook - they will need to do docker-compose up (or whatever our backend init script is) in the notebook. Docker does work on Google colab! I am ok with this. Just confirming we are all on the same page here.

There is the additional question of how thin the client is... and specifically whether the backend has the idea of log or whether the client simply knows to call the things that log does (eg store the data here, trigger this reprocessing). The current open discussion is here #13 (comment)

from chroma.

jeffchuber commented on July 24, 2024

Closing this issue as we have agreed on a direction

from chroma.

Discussion: in-memory and chroma client-server --> sharing code? about chroma HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent