Coder Social home page Coder Social logo

Comments (8)

theholymath avatar theholymath commented on July 22, 2024 7

Is there a way to pipe a FAISS index into this pipeline? Is there an example of how to ingest other vector DBs?

from dspy.

okhat avatar okhat commented on July 22, 2024

Yes! We’re almost done releasing that cc: @VThejas

from dspy.

okhat avatar okhat commented on July 22, 2024

See here: https://github.com/stanford-futuredata/ColBERT#running-a-lightweight-colbertv2-server

Before you launch the server, use the ColBERT intro notebook (or the Overview in the ColBERT README) to index your collection

from dspy.

drawal1 avatar drawal1 commented on July 22, 2024

A RAG example using Pinecone for retrieval would be helpful. Or... are there reasons not to use Pinecone here?

from dspy.

okhat avatar okhat commented on July 22, 2024

You can use Pinecone for sure, @drawal1. We don't have it built-in though. Would you like to add it?

We just need something as minimal as this wrapper for it:

import dspy

class Pinecone(dspy.Retrieve):
    def __init__(self, k=3):
        super().__init__(k=k)
        # TODO: initialize pinecone here with any kwargs you need

    def forward(self, query):
        # TODO: passages = search with pinecone for self.k top passages for `query`
        return dspy.Prediction(passages=passages)

For more info (probably not necessary), see https://github.com/stanfordnlp/dspy/blob/main/dspy/retrieve/retrieve.py

from dspy.

okhat avatar okhat commented on July 22, 2024

then you can use dspy.Pinecone instead of dspy.Retrieve (there's a cleaner way to do this, but we can start like that)

from dspy.

drawal1 avatar drawal1 commented on July 22, 2024

@okhat, ty! I tested below code and it works. I will submit a pull request if this looks reasonable

"""
Retriever model for Pinecone
"""

import pinecone  # type: ignore
import openai   # type: ignore
import dspy     # type: ignore

OPENAI_API_KEY = 'YOUR OPENAPI KEY'
PINECONE_API_KEY = 'YOUR_PINECONE_API_KEY'
PINECONE_ENVIRONMENT = 'YOUR PINCONE ENVIRONMENT' # for example 'us-east4-gcp'
INDEX_NAME = "YOUR PINECONE INDEX NAME" # You should have an index build already. See Pinecone docs
EMBED_MODEL = "YOUR EMBEDDING MODEL" # For example 'text-embedding-ada-002' for OpenAI gpt-3.5-turbo

def init_pinecone(pinecone_api_key, pinecone_env, index_name):
    """Initialize pinecone and load the index"""
    pinecone.init(
        api_key=pinecone_api_key,  # find at app.pinecone.io
        environment=pinecone_env,  # next to api key in console
    )

    return pinecone.Index(index_name)

PINECONE_INDEX = init_pinecone(PINECONE_API_KEY, PINECONE_ENVIRONMENT, INDEX_NAME)

class PineconeRM(dspy.Retrieve):
    """
        Retrieve module for Pinecone
        Example usage:
            self.retrieve = PineconeRM(k=num_passages)
    """
    def __init__(self, k=3):
        super().__init__(k=k)

    def forward(self, query_or_queries):
        """ search with pinecone for self.k top passages for query"""
        # convert query_or_queries to a python list if it is not
        queries = [query_or_queries] if isinstance(query_or_queries, str) else query_or_queries

        embedding = openai.Embedding.create(input=queries, engine=EMBED_MODEL, openai_api_key=OPENAI_API_KEY)
        query_vec = embedding['data'][0]['embedding']

        # retrieve relevant contexts from Pinecone (including the questions)
        results_dict = PINECONE_INDEX.query(query_vec, top_k=self.k, include_metadata=True)

        passages = [result['metadata']['text'] for result in results_dict['matches']]
        return dspy.Prediction(passages=passages)

from dspy.

drawal1 avatar drawal1 commented on July 22, 2024

@okhat I have submitted the pull request, fyi

from dspy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.