run-llama / finetune-embedding Goto Github PK
View Code? Open in Web Editor NEWFine-Tuning Embedding for RAG with Synthetic Data
Fine-Tuning Embedding for RAG with Synthetic Data
Hi, the BGE model can instructions, but the finetune-embedding example doesn't mention including instructions in either the fine-tuning or evaluation process. Is there any additional info for that?
Trying to use generate_qa_embedding_pairs
method to create synthetic data.
from llama_index.finetuning import generate_qa_embedding_pairs
I run into an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 2
1 import torch
----> 2 from llama_index.finetuning import generate_qa_embedding_pairs
3 from llama_index.llms.openllm import OpenLLM
4 from llama_index.llms.mistralai import MistralAI
File /mnt/team-shared/.venv/lib/python3.10/site-packages/llama_index/finetuning/__init__.py:13
6 from llama_index.finetuning.embeddings.common import (
7 EmbeddingQAFinetuneDataset,
8 generate_qa_embedding_pairs,
9 )
10 from llama_index.finetuning.embeddings.sentence_transformer import (
11 SentenceTransformersFinetuneEngine,
12 )
---> 13 from llama_index.finetuning.gradient.base import GradientFinetuneEngine
14 from llama_index.finetuning.openai.base import OpenAIFinetuneEngine
15 from llama_index.finetuning.rerankers.cohere_reranker import (
16 CohereRerankerFinetuneEngine,
17 )
File /mnt/team-shared/.venv/lib/python3.10/site-packages/llama_index/finetuning/gradient/__init__.py:1
----> 1 from llama_index.finetuning.gradient.base import GradientFinetuneEngine
3 __all__ = ["GradientFinetuneEngine"]
...
702 self.__forward_evaluated__ = True`
File /mnt/team-shared/.venv/lib/python3.10/site-packages/llama_index/finetuning/gradient/__init__.py:1
----> 1 from llama_index.finetuning.gradient.base import GradientFinetuneEngine
3 __all__ = ["GradientFinetuneEngine"]
...
702 self.__forward_evaluated__ = True
File <string>:1
TypeError: conlist() got an unexpected keyword argument 'max_items'
ValueError: File ../llama_index/docs/examples/data/10k/lyft_2021.pdf does not exist. Can you please link to the data required.
Running pip install -r requirements.txt
is producing several errors:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
s3fs 2023.4.0 requires fsspec==2023.4.0, but you have fsspec 2023.9.2 which is incompatible.
llama-index 0.8.5.post2 requires fsspec>=2023.5.0, but you have fsspec 2023.4.0 which is incompatible.
Any ideas how to fix that?
OS: MacOS Ventura 13.6 (22G120)
Python 3.11.4
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.