Ray Distributed Computing with FastEmbed and Qdrant

This repository contains code to demonstrate the usage of Ray distributed computing framework along with FastEmbed for embedding generation and Qdrant for similarity search. Specifically, it shows how to efficiently generate embeddings for text data, store them in Qdrant, and perform similarity search queries.

Requirements

Python 3.x
Jupyter Notebook (for running RayQdrant.ipynb)
PyPDF2
nltk
ray
fastembed
qdrant_client

You can install the required libraries using pip:

pip install PyPDF2 nltk fastembed qdrant-client[fastembed]
pip install -U "ray[data,train,tune,serve]"

Usage

Clone the Repository:

Clone this repository to your local machine:
```
git clone https://github.com/yash9439/RayQdrantFastEmbed.git
```

Start Docker Environment:

Open the RayQdrant.ipynb file using Jupyter Notebook:

sudo docker pull qdrant/qdrant
sudo docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant

Run Jupyter Notebook:

Open the RayQdrant.ipynb file using Jupyter Notebook:
```
jupyter notebook RayQdrant.ipynb
```
Execute each cell in the notebook sequentially to run the code. Ensure you have the necessary dependencies installed.
Interpret Results:

After running the notebook, you will see the time taken for embedding generation using Ray distributed computing. Additionally, you'll get the results of similarity search queries using Qdrant.

Folder Structure

Docs/: This directory contains the PDF documents for which embeddings are generated.
RayQdrant.ipynb: Jupyter Notebook containing the code for embedding generation using Ray and similarity search using Qdrant.

License

This code is provided under the Apache License 2.0.

Feel free to modify and distribute it as needed. If you find any issues or have suggestions for improvements, please feel free to open an issue or create a pull request.

use fastembed-gpu Error！

from typing import List
import os
import fitz
import time
import numpy as np
from nltk.tokenize import sent_tokenize
from fastembed import TextEmbedding
import ray
from typing import List
import time,requests
#from typing import List
#import numpy as np
documents : List[str] =  list(np.repeat('This is built to be faster and lighter than other embedding libraries',500))
embedding_model_gpu = TextEmbedding(model_name="BAAI/bge-base-en", cache_dir="./embeddings", providers=['CUDAExecutionProvider'])
print(embedding_model_gpu.model.model.get_providers())
def extract_text_from_pdf(pdf_path):
    reader =  fitz.open(pdf_path)
    extracted_text = ""
    for page_num in range(len(reader)):
        page = reader.load_page(page_num)
        extracted_text += page.get_text()
    return extracted_text


def extract_text_from_pdfs_in_directory(directory):
    for filename in os.listdir(directory):
        if filename.endswith(".pdf"):
            pdf_path = os.path.join(directory, filename)
            extracted_text = extract_text_from_pdf(pdf_path)
            txt_filename = os.path.splitext(filename)[0] + ".txt"
            txt_filepath = os.path.join(directory, txt_filename)

            with open(txt_filepath, "w",encoding='utf-8') as txt_file:
                txt_file.write(extracted_text)

# Specify the directory containing PDF files
directory_path = r"/home/xxx/fastembd/demo"
s_time = time.time()
# Extract text from PDFs in the directory and save as text files
extract_text_from_pdfs_in_directory(directory_path)
e_time = time.time()
print(f"Time taken to extract text from all PDFs: {e_time - s_time} seconds")

# List all .txt files in the directory
txt_files = [file for file in os.listdir(directory_path) if file.endswith('.txt')]

# List to store sentences from all files
all_sentences = []

# Read each text file, split into sentences, and store
for txt_file in txt_files:
    file_path = os.path.join(directory_path, txt_file)
    with open(file_path, "r") as file:
        text = file.read()
        sentences = sent_tokenize(text)
        all_sentences.extend(sentences)
e_time = time.time()
print(len(all_sentences))
print(f"Time Langchain split: {e_time - s_time} seconds")

os.environ["CUDA_VISIBLE_DEVICES"] = '0' 

embedding_model_gpu = TextEmbedding(model_name="BAAI/bge-base-en", cache_dir="./embeddings", providers=['CUDAExecutionProvider'])
embedding_model_gpu.model.model.get_providers()
#
ray.init()

@ray.remote
class EmbeddingWorker:
    def __init__(self):
        embedding_model_gpu = TextEmbedding(model_name="BAAI/bge-base-en", cache_dir="./embeddings", providers=['CUDAExecutionProvider'])
        embedding_model_gpu.model.model.get_providers()

        self.embedding_model = embedding_model_gpu

    def embed_documents(self, documents):
        embeddings = []
        for document in documents:
            embeddings.append(np.array(list(self.embedding_model.embed([document]))))
        return embeddings

# Define the number of workers
num_workers = 1  # Adjust this according to your resources
documents = all_sentences

# Split documents into chunks for each worker
chunk_size = len(documents) // num_workers
document_chunks = [documents[i:i+chunk_size] for i in range(0, len(documents), chunk_size)]

# Start the workers
embedding_workers = [EmbeddingWorker.remote() for _ in range(num_workers)]

# Perform embedding generation in parallel
start_time = time.time()
embedding_tasks = [worker.embed_documents.remote(chunk) for worker, chunk in zip(embedding_workers, document_chunks)]
embeddings = ray.get(embedding_tasks)
end_time = time.time()

# Flatten the embeddings list

embeddings = [embedding for sublist in embeddings for embedding in sublist]
print(len(embeddings))
# print(embeddings)
print("Time taken to generate embeddings with Ray Distributed Computing:", end_time - start_time, "seconds")

# Shutdown Ray
ray.shutdown()

ray::EmbeddingWorker.__init__() (pid=3197588, ip=192.168.45.164, actor_id=7a76e5641370afec20a0db9403000000, repr=<test.EmbeddingWorker object at 0x7f29f8207070>)
  File "/home/xxx/fastembd/test.py", line 82, in __init__
    embedding_model_gpu = TextEmbedding(model_name="BAAI/bge-base-en", cache_dir="./embeddings", providers=['CUDAExecutionProvider'])
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/fastembed/text/text_embedding.py", line 68, in __init__
    self.model = EMBEDDING_MODEL_TYPE(
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/fastembed/text/onnx_embedding.py", line 227, in __init__
    self.load_onnx_model(
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/fastembed/text/onnx_text_model.py", line 46, in load_onnx_model
    super().load_onnx_model(
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/fastembed/common/onnx_model.py", line 84, in load_onnx_model
    self.model = ort.InferenceSession(
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in __init__
    raise fallback_error from e
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 427, in __init__
    self._create_inference_session(self._fallback_providers, None)
  File "/root/miniconda3/envs/FastEmbed/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:123 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, common::Status> = void] CUDA failure 100: no CUDA-capable device is detected ; GPU=32554 ; hostname=yingke ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=280 ; expr=cudaSetDevice(info_.device_id);

If this program does not use ray, there will be no problem. I hope you can provide a GPU version for use. Looking forward to your reply. Thank you very much!

yash9439 / rayqdrantfastembed Goto Github PK

rayqdrantfastembed's Introduction

Ray Distributed Computing with FastEmbed and Qdrant

Requirements

Usage

Folder Structure

License

rayqdrantfastembed's People

Contributors

Stargazers

Watchers

rayqdrantfastembed's Issues

use fastembed-gpu Error！

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent