Coder Social home page Coder Social logo

Comments (10)

levand avatar levand commented on July 24, 2024 2

One other distinction to make: "in memory" doesn't necessarily mean "all in client with no backend."

It could also mean a full backend/frontend split, but with the backend implemented in-memory + very simple persistence (as opposed to a more complex vector database) for the MVP.

In the past when I've advocated to start "in memory", that's what I was referring to... a full backend but with a trivial in-memory implementation. Not trying to cram all the computation into the frontend.

MLFlow is an interesting model, but note that they are highly modular and support many different topologies. We could do that too, but my guess is we want to streamline and present "one way" as the default. We can enable other modalities as options if that's where the market pushes us.

from chroma.

jeffchuber avatar jeffchuber commented on July 24, 2024

@levand @atroyn Opened this issue to discuss. I realized we should talk through this while building #13

from chroma.

jeffchuber avatar jeffchuber commented on July 24, 2024

Making a list (will be updated inline) of projects that we can perhaps find some inspiration from...

Logging / APM

Product Analytics

Orchestration

ML monitoring / experiment management

to be continued........

from chroma.

levand avatar levand commented on July 24, 2024

This is a great question.

We're talking about an initial MVP right? My question is... is a "Wolf A" model actually satisfy the "V" in "MVP"... is it viable? It could make a good demo and help generate sales leads, for sure. but as I understand the product, we're almost certainly going to need Wolf B for any kind of production use. For example, as soon as we start persisting data, we're going to need to do it to somewhere other than a developer's laptop or CI instance.

So I'm going to go out on a limb and say that I don't think a completely in-process model makes sense even for an MVP. So, in this hypothesis, we're going to have a frontend, and a backend (chroma-client and chroma-server, as hypothesized)

But there's still a decision point to be made here:

  1. Do we have a "thick" backend and "thin" frontend, with most the logic and algorithmic work performed in the backend and the frontend just serving as a developer interface?
  2. Or do we have a "thick" frontend and a "thin" backend, with all the real logic and work performed in-process in the client, and the backend just being a thin proxy for persistence?
  3. Or technically, you could split it and have some algorithmic work performed on the server and some on the client.

I do tend to disregard #3, just because it could get a lot more complicated for probably not a ton of benefit.

When trying to compare between type 1 and type 2, we need to consider:

  • Where is it easier/possible to satisfy the computational requirements of the algorithms we want to run?
  • What are the network transport constraints of satisfying a type 1 vs a type 2 model?

Satisfying the computational requirements is probably going to be easier in a 1 model, since it's easier to establish a requirement that a Chroma Server has $x amount of RAM & CPU/GPU power than it is that every possible client will.

The network transport constraints are another question. Ultimately, it's a wash, because the same data has to be brought together at some point to perform the operations we want. If we're considering batch-mode operations, it genuinely doesn't matter because ultimately the same amount of data has to traverse some wire, somewhere, to make it happen. For high-frequency non-batched operations, you have to add 1-3 milliseconds of latency for each request and in that case it could make sense to have the computation local to the request, if it's a particularly performance-intensive scenario.

from chroma.

jeffchuber avatar jeffchuber commented on July 24, 2024

Here is another slightly different perspective - I like how MLFlow handles tracking - https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded. (ignore the artifact part of the charts since we don't have heavy files to move around like mlflow does) In this paradigm, the lightest weight place things are serialized is a .chroma folder. (MLFlow is 100% python a well and Apache 2.0). I think how they accomplish this is that all code is packaged up in the pip project. That means there is not a separate client or server... it just about which code you are using in various scenarios. I guess the downside of that is (1) size of the project in megabytes, and (2) versioning frequency and version management across the front and backend without explicit version pinning.

For (1)/(2)/(3) "where to put the business logic". I 100% agree that (3) is bad. My general bias is towards having a thick backend and thin client. Especially since most operations - the operations will need context from the db in order to complete (will need to query the NN index for example).

The discussion is still very open! :) @atroyn join the mix as well!

from chroma.

jeffchuber avatar jeffchuber commented on July 24, 2024

one additional note... mlflow is purely a store - it does no processing on it. that is different from us where our processing is computationally expensive.

@atroyn should we try to make the in-memory thing work at all? i'm starting to tend towards "not worth it"

from chroma.

atroyn avatar atroyn commented on July 24, 2024

I am inclined to thin client Wolf B for a few reasons:

  • It is hard to switch from A to B and we almost certainly will in future.
  • It is a neater separation of concerns to separate Chroma processing from the user's training / inference runtime.
  • Most computation will be close to the DB since we'll need the thing stored in the DB like training embeddings to do the computation.

There's risks around things like breaking the client/server versioning in the future, and there is added complexity, but this gut checks as the right move to me.

Anthropic has something called Garçon, similar-ish for probing remote running models with the aim that their scientists can easily examine something running somewhere else, which uses a client-server setup

from chroma.

atroyn avatar atroyn commented on July 24, 2024

I also read luke's 'in memory' as referring to where the processing was done; flat storage with all computation done in memory rather than in a vector DB, I favor this as well for development speed and deployment ease into the user's machine.

from chroma.

jeffchuber avatar jeffchuber commented on July 24, 2024

Ok I agree with all of this. I think it was good to talk through, thanks for the thoughts! Keeping things simpler and opinionated is the right way to go (assuming we have the right opinions of course).

So I believe we all agree we will move forward with:

  • chroma-client - a thin python client that writes to the backend
  • chroma-server - a fat python backend

That means then that if a user is using a notebook - they will need to do docker-compose up (or whatever our backend init script is) in the notebook. Docker does work on Google colab! I am ok with this. Just confirming we are all on the same page here.

There is the additional question of how thin the client is... and specifically whether the backend has the idea of log or whether the client simply knows to call the things that log does (eg store the data here, trigger this reprocessing). The current open discussion is here #13 (comment)

from chroma.

jeffchuber avatar jeffchuber commented on July 24, 2024

Closing this issue as we have agreed on a direction

from chroma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.