marker-inc-korea / ragchain Goto Github PK

Extension of Langchain for RAG. Easy benchmarking, multiple retrievals, reranker, time-aware RAG, and so on...

License: Apache License 2.0

Python 97.18% Mermaid 2.82%

ragchain's Introduction

RAGchain

RAGchain is a framework for developing advanced RAG(Retrieval Augmented Generation) workflow powered by LLM (Large Language Model). While existing frameworks like Langchain or LlamaIndex allow you to build simple RAG workflows, they have limitations when it comes to building complex and high-accuracy RAG workflows.

RAGchain is designed to overcome these limitations by providing powerful features for building advanced RAG workflow easily. Also, it is partially compatible with Langchain, allowing you to leverage many of its integrations for vector storage, embeddings, document loaders, and LLM models.

Docs | API Spec | QuickStart

Quick Install

pip install RAGchain

Why RAGchain?

RAGchain offers several powerful features for building high-quality RAG workflows:

OCR Loaders

Simple file loaders may not be sufficient when trying to enhance accuracy or ingest real-world documents. OCR models can scan documents and convert them into text with high accuracy, improving the quality of responses from LLMs.

Reranker

Reranking is a popular method used in many research projects to improve retrieval accuracy in RAG workflows. Unlike LangChain, which doesn't include reranking as a default feature, RAGChain comes with various rerankers.

Great to use multiple retrievers

In real-world scenarios, you may need multiple retrievers depending on your requirements. RAGchain is highly optimized for using multiple retrievers. It divides retrieval and DB. Retrieval saves vector representation of contents, and DB saves contents. We connect both with Linker, so it is really easy to use multiple retrievers and DBs.

pre-made RAG pipelines

We provide pre-made pipelines that let you quickly set up RAG workflow. We are planning to make much complex pipelines, which hard to make but powerful. With pipelines, you can build really powerful RAG system quickly and easily.

Easy benchmarking

It is crucial to benchmark and test your RAG workflows. We have easy benchmarking module for evaluation. Support your own questions and various datasets.

Installation

From pip

simply install at pypi.

pip install RAGchain

From source

First, clone this git repository to your local machine.

git clone https://github.com/Marker-Inc-Korea/RAGchain.git
cd RAGchain

Then, install RAGchain module.

python3 setup.py develop

For using files at root folder and test, run dev requirements.

pip install dev_requirements.txt

Supporting Features

Advanced RAG features

Retrievals

BM25
Vector DB
Hybrid (rrf and cc)
HyDE

OCR Loaders

Rerankers

UPR
TART
BM25
LLM
MonoT5

Web Search

Google Search
Bing Search

Workflows (pipeline)

Basic
Visconde
Rerank
Google Search

Extra utils

Query Decomposition
Evidence Extractor
REDE Search Detector
Semantic Clustering
Cluster Time Compressor

Dataset Evaluators

Contributing

We welcome any contributions. Please feel free to raise issues and submit pull requests.

Acknowledgement

This project is an early version, so it can be unstable. The project is licensed under the Apache 2.0 License.

ragchain's People

Contributors

Stargazers

Watchers

ragchain's Issues

see referenced file contents at web UI

view what files are stored at vector store in gradio web UI

We can use DB and DB linker for this feature. But, how can we get many files at once? And what about file system?

Fix LLM continuing questions in answers

sparse retrieval system for Korean

I think I can use pyserini with custom tokenizer, or this?

Make evaluation method and get test dataset.

KorQuad is one of candidates

Simple GUI

gradio can be useful
or Anything LLM can be one of options.
Or Streamlit.

Add Colab Demo

Make deploy method to private server

add ko-sroberta-multitask support

https://github.com/jhgan00/ko-sentence-transformers

Enable CPU for KoAlpaca-Polyglot model

prevent duplicate file embedding

adapt new KuLLM-ggml model

I think there is an error at tokenizer or ggml error that we're using at KuLLM model.
I will quantize that myself and adapt it. I think it can achieve better performance?

Local Installer for windows

Local installer for everyone.

Ingest table (표) for better understanding table contents of LLM

special table token, new way organizing table contents?
We need more research about this, but this is crucial for performance.

[HotFix] Enable docker workflows

workflow => workflows

refactor LLM model loader using Factory Pattern

Factory Pattern

add HyDE support

Langchain docs

Fix LLM can't finish an answer by itself

Add Dockerfile and Docker-compose

For easy deployment

add fasttext-ko-vectors

https://huggingface.co/facebook/fasttext-ko-vectors

Enable CPU for KoSimCSE Embedding

There was options at original localGPT for cpu.
Enable for better compatibility.

Add example documents for demo and demo conversation results

write test code for each features

TDD is good ^^

add pinecone support

[HotFix] Delete left constants

I think this is not merged right

Add README content for intial explains of KoPrivateGPT (한국어 ver)

Change to Korean embedding

Pipeline system for execute whole system

pipeline system after refactoring whole projects

[HotFix] Error at accuracy in evaluate.py

[HotFix] Readme.md docker network connet explanation is wrong

docker connect ~ should become docker network connect ~

sry ;)

Add KULLM model

KULLM

add openai embedding

gpt3.5 chat model for openai choice

Upload file and ingest GUI

Chroma or Pinecone instance once at db.py search func

        db = self.load()
        # TODO : Load Chroma or Pinecone instance everytime when you want to search?
        return db.similarity_search(query=query, k=top_k)

marker-inc-korea / ragchain Goto Github PK

ragchain's Introduction

RAGchain

Quick Install

Why RAGchain?

OCR Loaders

Reranker

Great to use multiple retrievers

pre-made RAG pipelines

Easy benchmarking

Installation

From pip

From source

Supporting Features

Advanced RAG features

Retrievals

OCR Loaders

Rerankers

Web Search

Workflows (pipeline)

Extra utils

Dataset Evaluators

Contributing

Acknowledgement

ragchain's People

Contributors

Stargazers

Watchers

Forkers

ragchain's Issues

Recommend Projects

Recommend Topics

Recommend Org