View Code? Open in Web Editor NEW

AI FAQ Proof-of-Concept project: it provides a chatbot that replies to the questions on Hyperledger Ecosystem

License: Apache License 2.0

Python 100.00%

aifaq's Introduction

Hyperledger QA PoC

This is a Proof-of-Concept application that allows you to ask questions to a python script chatbot, fine-tuned with Hyperledger Standard Documents. I implemented this first version, as mentee, during the Hyperledger Mentorship Program 2023.

Use case

This NLP application allows people to access to the Hyperledger Standard Documentation. The scope of the lab is to support the Hyperledger users (users, developer, etc.) to their work, avoiding to wade through oceans of documents to find information they are looking for. Large Language Models have yielded remarkable results, either pay and open source tools. Today we can implement a conversational AI tool which replies to questions related to specific context.

Architecture

The model is XML-R pre-trained (HuggingFace deepset/xlm-roberta-large-squad2) with SQuAD Dataset. Below the architecture of the model:

Pipeline

In this PoC I use Haystack (Haystack by Deepset) to Build the QA pipeline. Below an image of the architecture:

I use Elastic Search (Elastic Search website) as Retriever component.

Installation

For the installation istructions read the links below:
Haystack installation

Elastic Search Windows installation

Ingestion files

In ingest folder, you can find two kinds of files:

es format (Elastic Search) which contains data for the unstructured documents
one squad format file (Stanford Question Anwsering Dataset) for the fine-tuning process

Current version notes

That is the first version of a PoC. Below a list of improvements that will be applied soon:

Model: more sophisticated model (e.g. Zephyr 7B alpha)
Dataset: currently I implemented only 2 documents as example, but real systems work with hundreds of documents
Retriever: more sophisticated techniques use embeddings
QA type: I will use generative (RAG) instead of extractive QA
Hardware: now the system requires 10 minutes to ingest the files, GPU can help to save much time

Recommend Projects

my-golang-repo / aifaq Goto Github PK

aifaq's Introduction

Hyperledger QA PoC

Use case

Architecture

Pipeline

Installation

Ingestion files

Current version notes

aifaq's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent