The foundation-models-for-documentation from redhat-et

[EPIC KR1] Assemble ROSA Documentation Text

The first step to train a QA model for ROSA documentation search is to assemble and process the documentation text from various sources. This issue should focus on solving the following steps:

#3
Collect PDFs of product and FAQ pages for ROSA
#4
#6

[Research Spike] Collect open source implementations and frameworks of QA systems

There are open source implementation and resources available for training QA models. We should collect and discuss those resources here. Once the discussion converges, we should use the frameworks for implementation.

Optimizing LLMs for max performance when serving on ODH

What is the resource requirement of the deployed model? Explain the resources defined for the model pod.

What is the throughput of the model? How can we increase the throughput?

Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput?

[EPIC] Adapting Foundation Models

Foundation models need to be adapted for specific use cases and domains. There are several questions on around how to target different use cases. As a part of this epic, we will find answers to the following questions:

Issues with the application

Increase timeout for cases where it takes longer to generate the answer
Add try catch around llama model querying so that the application exists if the model is not responsive
[ ]

document `credentials.env`

Is your feature request related to a problem? Please describe.
In https://github.com/redhat-et/foundation-models-for-documentation/blob/master/notebooks/langchain-openai.ipynb there is a credentials.env, what is in there? how to create it?

Create a squad like toy dataset with ROSA documentation

We would need a dataset to fine tune Large language models for searching ROSA documentation and evaluating the performance of different models. As a start, take existing FAQs and convert them into QA format.
As a part of this issue, discuss and answer the following questions:

What Quick enough volume of QA data that we can manage? 100-1000 rows?
What kind of questions do we want the model to answer?
Do we want quick questions or longer ones or both?
Can the model point back to the resources?

[spike] Fine-tuning options for LLMs

As part of #8, in order to adapt the language models we use to the domain we work on (ROSA), explore and document the various options available:

prompt engineering with context
model fine-tuning

For each option we want to understand:

its pre-requisites (e.g. type, quality and quantity of data required, level of model access, cost of the adaptation process)
pros and cons
what OSS options are recommended for each (see #9)

[POC] Create a notebook that standardizes our experiments: data processing, model inferencing, and evaluation

To compare frameworks, models, and model parameters for text generation, we should have a notebook that has data processing, model inferencing, and evaluation cells. This notebook shall be used to create a comparison result of different variables.

[POC] Create a minimalistic flask app that allows QA with ROSA documentation using the best model found

[EPIC KR2] Create a POC with large language models

The second step of creating a QA search system for the ROSA documentation is to train and compare the open source versions of language models. The issue should accomplish the following tasks:

[EPIC] Resource requirements and cost of foundation models

The large size of foundation models raise several resource and cost questions around deploying them in production. This EPIC will focus on creating experiments and showing results around some of the following questions:

What is the relationship between model parameters and its memory consumption? Create a "rosetta stone" document of GPU memory required by models belonging to different parameter size. Create a notebook that captures the footprint of the GPU memory used.
How are the models loaded into GPU RAM? Is it directly from S3 or do we also require significant RAM? If so, capture the RAM requirements in a notebook. What about CPU? Update the cost document with RAM, and CPU information. Are there ways to optimize this?
What happens when we load the models in a lower precision format like INT-8? How is the accuracy, CPU, and memory performance affected? Explain theoretically and show results in a notebook. Touch upon challenges of frameworks like bitsandbytes in production.
Is distributed training and inference with a lot of cheap instances more efficient per dollar than 1 instance with a large GPU? If we have just one GPU of 16GB memory, how much can be done with it in the space of LLMs? Design experiments and share results in a notebook.
What are the options of running these models just on CPU? Are there ways of optimizing more than 1B = 1GB of GPU with INT8 precision?

[POC] Create a notebook that shows different metrics to evaluate the LLMs

Acceptance

Notebook that shows how to use evaluation metrics for text generation. Take 2-3 hardcoded examples to show the behavior of each metric. For example, exact match may be very low but BERT score is high because the prediction captures the semantics.

[POC] Use the standard experiment notebook to add results for the models: GPT2, BLOOM, Openai

The experiments with LLM models like GPT2, BLOOM, Openai should be created in accordance with notebook from #24. The results should reflect qualitative and quantitate comparison between outputs.

[EPIC] Serving Foundation Models

Serving foundation models on an OpenShift cluster is a crucial step for using them in production. As a part of this issue we will deploy demo models using open source tools and/or the stack recommended by RHODS. Following questions would be further explored as a part of this epic:

Add requirements.txt with pinned versions for each notebook

It would be very helpful to have the dependencies and their versions defined in a requirements.txt along with each notebook.
It would be one less thing to troubleshoot when facing any issue while running the notebooks.

text-generation-webui as a platform for serving LLMs

Explore text-generation-webui to serve language models on ODH.

How flexible is it for different models?

Would we need a custom server.py for some models?

Human Evaluation of Generated Answers: Establishing Criteria and Guidelines for Assessment

The purpose of this issue is to establish a comprehensive framework for conducting human evaluation of generated answers and comparing them against real answers.

Serving LLMs on Open Data Hub / RHODS with ModelMesh

How can we create model endpoints using Modelmesh on the cluster?

This should involve instructions on deploying modelmesh using Operator Hub and adding the model to a s3 bucket.

Create the predict function and explain how it is being called and demo a notebook that sends a payload to the model endpoint.

Dataset: plain text version of the various data sources, generated directly from the source files

Describe the solution you'd like

A mechanism to obtain (a set of) plain text (ascii files) directly from the source files from the various documentation sources around ROSA wherever possible, to avoid having to rely on text extraction from rendered websites or PDF files.

Describe alternatives you've considered

We have a collection of various PDF files, from which plain text can be extracted.
However, standard text extration from PDF has some limitations / problems:

hyperlinks are not visible
document structure is harder to retain
the PDF renderer introduces noise (page headers/footers, numbered references, etc)
the PDFs are harder to refresh / keep up to date with the current docs

Additional context

Sources include:

OpenShift docs repo: https://github.com/openshift/openshift-docs (asciidoc)
- this includes ROSA product documentation, and also all OpenShift product documentation
ROSA Workshop: https://github.com/openshift-cs/rosaworkshop (markdown)
MOBB: https://github.com/rh-mobb/documentation (markdown)

Distributed fine tuning of LLMs

The finetuning notebook uses 1 GPU and LoRA technique to fine tune a T5 model with 3B parameters. The task to be completed in this issue is to fine tune the same model (or 7B version of the model) on multiple GPU nodes. Use Instascale and Codeflare to schedule the training job and retrieve the finetuned model. Create a notebook that demos this.

Initialize Repository

Update Readme #21
Update OWNERS file and update CODEOWNERS #10
Add ReviewNB
~~Add Sesheta~~

Serve Rosa-Flan Model using Ray Serve

[Research Spike] Explore metrics to analyze different question answering model outputs

In order to compare different QA models for user documentation, we will have to decide on quantitative metrics that evaluate the models.
As a part of this issue add resources and discussion points for the metrics to be used.
Add a notebook that show some of these metrics.

Deploy Language Model(s) to our cluster exposing a completion API

This should be an enabler for #25 and #26, but also will help to explore open language models and how to serve them.

Goal: to deploy one or more of the publicly available Language Models to our cluster, so that they can be accessed via an API hosted by ourselves.

A comprehensive list of such models can be found in Hugging Face's Transformers' list of models, e.g. https://huggingface.co/docs/transformers/model_doc/bloom (see the list of text models). A summary / classification is also available here: https://huggingface.co/docs/transformers/model_summary#summary-of-the-models

Prepare repository for public collaboration

Add a how to contribute doc
Update Readme if required
Tag good first issues/help wanted

add "external" collaborators

It would be cool if you could add @goern and @RHRolun at least as 'external collaborators'.

/assign @codificat

A mechanism to introspect language chain operations

Is your feature request related to a problem? Please describe

As we build language "chains" that involve multiple components and steps (creation of embeddings, querying embeddings, prompt template selection, API queries) I would like to be able to see what is going in detail during a query to the model.

Describe the solution you'd like

Some "debug" option for the whole chains. Possibly something that can be external so that it does not depend on being inside a notebook to be able to see it.

Additional context

#15 (review)

Collection of Comparison Data for Reward Model Training

The process of training a reward model for Reinforcement Learning from Human Feedback (RLHF) involves gathering comparison data to accurately assess the quality of prompt-response pairs. This comparison data aids in fine-tuning a Large Language Model (LLM) using Reinforcement Learning (RL) to generate responses that align with human preferences. This issue aims to guide the development of a systematic workflow for collecting comparison data using the Argilla platform.

redhat-et / foundation-models-for-documentation Goto Github PK

foundation-models-for-documentation's People

Contributors

Stargazers

Watchers

Forkers

foundation-models-for-documentation's Issues

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Additional context

Recommend Projects

Recommend Topics

Recommend Org