Coder Social home page Coder Social logo

Comments (16)

ded-ditat avatar ded-ditat commented on June 1, 2024

Also requesting this. We would like to use auto embeddings with the small model and not ada.

This is Dylan with revtron.ai btw.

Excellent call out.

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

Thanks for reporting this, this makes sense and should be an easy addition. We will take a look as soon as possible.

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

Would you mind testing v0.5.0-beta1? This version adds a new column model to the autoembeddings_configuration table that can have one of the following values:

  • text-embedding-ada-002 (default)
  • text-embedding-3-small
  • text-embedding-3-large

We still don't have support in the dashboard so you will have to update the value directly using the database tab

from nhost.

osseonews avatar osseonews commented on June 1, 2024

Sure, I can test, but probably won't get to it until later next week. btw, how do we install the v0.5.0-beta1?

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

if you are using the cli/toml just place it under:

[ai]
version=xxx

(it should already be there so just udpate the version)

If you aren't, just go the dashboard->settings->ai and enter the version (don't worry if it doesn't show in the menu, just enter the custom value)

from nhost.

osseonews avatar osseonews commented on June 1, 2024

I changed the version, in the settings, but all I see now is that my nhost workspace is updating, and I don't see the "model" column added.

from nhost.

osseonews avatar osseonews commented on June 1, 2024

And I just got this error in my project: "Error deploying the project most likely due to invalid configuration. Please review your project's configuration and logs for more information."

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

Apologies, it should be 0.5.0-beta1.

from nhost.

osseonews avatar osseonews commented on June 1, 2024

OK, it updated and I change the embeds. I'll test it soon.

from nhost.

osseonews avatar osseonews commented on June 1, 2024

Quick question: When we are setting the query for the autoembedding, it says that the "id" field is required, but what about the other fields that are in query. Are the other fields the ones that are used to create the actual vector? So we should only include those fields with the text we want to create a vector for? For example, your sample query below. The embed will be created for "name, genre, overview" fields? These fields are concatenated and a vector is created? So if we want to only let's say embed, "overview", we would just include that field in the query and remove the others?

query GetOutdatedMovies {
  movies(where: {
    _or: [
      {embeddings: {_is_null: true}}, # new rows without embeddings
      {outdated: {_eq: true},         # existing rows with changed data
    },
  ]}) {
    id                                # id column is mandatory
    name
    genre
    overview
  }
}

from nhost.

osseonews avatar osseonews commented on June 1, 2024

BTW, I just ran the embeds with the "small embedding", and the vector searches are meaningless for the most part. For example, I did a graphite search with the keyword: " supercalifragilisticexpialidocious" and it turned up a result, even though none of our content has anything to do with that at all. I would have expected this to return nothing. Also, for other real searches, the results don't match the query at all. I can't say if this is a problem with the model or pgvector, and it's possible this error would have occured before this update. Just that there is something wrong here.

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

There is nothing wrong with the models. The queries will return the best matches (even if they are unrelated). There is another feature coming alongside this one when 0.5.0 is released that will allow to set the maximum distance to avoid the issue you describe.

from nhost.

osseonews avatar osseonews commented on June 1, 2024

OK that makes sense, I was actually wondering about what distance was used. Also, is there an easy way to turn off embeddings on a table without deleting the auto embeddings set up? Was thinking of just deleting the triggers we created based on the docs for the outdated field.

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

You can try with 0.5.0. I still need to update documentation and we need to add the feature to the dashboard but it should be usable so no need to wait. Re maxDistance, you now have a maxDistance option when doing similarity/search queries. Something like:

graphiteXXXSearch(args={
      query: "blah",
      amount: 10,
      maxDistance: 0.20,
  }) {
      ...
}

maxDistance is a float between 0 (exact match) and 1 (nothing in common) and it defaults to 1 (for backwards compatibility).

from nhost.

osseonews avatar osseonews commented on June 1, 2024

Can we already use 0.5.0? We can just change the version in the AI Settings page?

from nhost.

dbarrosop avatar dbarrosop commented on June 1, 2024

Yes

from nhost.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.