Coder Social home page Coder Social logo

my-golang-repo / aifaq Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hyperledger-labs/aifaq

0.0 0.0 0.0 88 KB

AI FAQ Proof-of-Concept project: it provides a chatbot that replies to the questions on Hyperledger Ecosystem

License: Apache License 2.0

Python 100.00%

aifaq's Introduction

Hyperledger QA PoC

This is a Proof-of-Concept application that allows you to ask questions to a python script chatbot, fine-tuned with Hyperledger Standard Documents. I implemented this first version, as mentee, during the Hyperledger Mentorship Program 2023.

Use case

This NLP application allows people to access to the Hyperledger Standard Documentation. The scope of the lab is to support the Hyperledger users (users, developer, etc.) to their work, avoiding to wade through oceans of documents to find information they are looking for. Large Language Models have yielded remarkable results, either pay and open source tools. Today we can implement a conversational AI tool which replies to questions related to specific context.

Architecture

The model is XML-R pre-trained (HuggingFace deepset/xlm-roberta-large-squad2) with SQuAD Dataset. Below the architecture of the model:
alt text

Pipeline

In this PoC I use Haystack (Haystack by Deepset) to Build the QA pipeline. Below an image of the architecture:
alt text

I use Elastic Search (Elastic Search website) as Retriever component.

Installation

For the installation istructions read the links below:
Haystack installation

Elastic Search Windows installation

Ingestion files

In ingest folder, you can find two kinds of files:

  1. es format (Elastic Search) which contains data for the unstructured documents
  2. one squad format file (Stanford Question Anwsering Dataset) for the fine-tuning process

Current version notes

That is the first version of a PoC. Below a list of improvements that will be applied soon:

  1. Model: more sophisticated model (e.g. Zephyr 7B alpha)
  2. Dataset: currently I implemented only 2 documents as example, but real systems work with hundreds of documents
  3. Retriever: more sophisticated techniques use embeddings
  4. QA type: I will use generative (RAG) instead of extractive QA
  5. Hardware: now the system requires 10 minutes to ingest the files, GPU can help to save much time

aifaq's People

Contributors

gcapuzzi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.