Coder Social home page Coder Social logo

redhat-et / foundation-models-for-documentation Goto Github PK

View Code? Open in Web Editor NEW
25.0 10.0 12.0 32.23 MB

Improve ROSA customer experience (and customer retention) by leveraging foundation models to do “gpt-chat” style search of Red Hat customer documentation assets.

License: Other

Python 1.16% Makefile 0.25% Jupyter Notebook 98.46% Shell 0.12% Dockerfile 0.01%

foundation-models-for-documentation's People

Contributors

codificat avatar goern avatar llmet avatar michaelclifford avatar oindrillac avatar shreyanand avatar suppathak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

foundation-models-for-documentation's Issues

[EPIC KR1] Assemble ROSA Documentation Text

The first step to train a QA model for ROSA documentation search is to assemble and process the documentation text from various sources. This issue should focus on solving the following steps:

  • #3
  • Collect PDFs of product and FAQ pages for ROSA
  • #4
  • #6

Optimizing LLMs for max performance when serving on ODH

What is the resource requirement of the deployed model? Explain the resources defined for the model pod.

What is the throughput of the model? How can we increase the throughput?

Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput?

[EPIC] Adapting Foundation Models

Foundation models need to be adapted for specific use cases and domains. There are several questions on around how to target different use cases. As a part of this epic, we will find answers to the following questions:

  • How do different variants of llms compare with each other, in terms of architecture (input tokens, hidden & attention layers, parameters, decoder encoder variations), licenses, hardware utilization, etc.?
  • What is the difference between small FMs (<15B) and large FMs(>50B)?
    • How does performance vary for few shot prompting large models vs fine tuning smaller models?
    • #49
    • Do we need a hierarchy of models for specific tasks? For example, one base large model for text generation and two smaller models each for code generation and documentation QA? What's the difference between Bloom 13B and Bloom 3B?
    • Do smaller models have a smaller context window or token limit and is that a limitation? How are contexts used by the models, in other words how is the model learning complemented by the context to generate a response?
    • What is the relevance of vector databases in these solutions? Are they still relevant in smaller fine-tuned models with smaller context windows?
    • What are the production cost and performance comparisons of these approaches? Design experiments to show some of these comparisons.
  • What is the role of datasets in fine tuning? Does fine tuning for a domain require a QA format dataset or self-supervised masking words in a sentence (recheck) dataset? Can we try BERT based models that have a different architecture?
  • What are the various steps that take place in QA with FMs? #30
  • Adapt learnings from this to solve ROSA use case #18

Issues with the application

  • Increase timeout for cases where it takes longer to generate the answer
  • Add try catch around llama model querying so that the application exists if the model is not responsive
  • [ ]

Create a squad like toy dataset with ROSA documentation

We would need a dataset to fine tune Large language models for searching ROSA documentation and evaluating the performance of different models. As a start, take existing FAQs and convert them into QA format.
As a part of this issue, discuss and answer the following questions:

  • What Quick enough volume of QA data that we can manage? 100-1000 rows?
  • What kind of questions do we want the model to answer?
  • Do we want quick questions or longer ones or both?
  • Can the model point back to the resources?

[spike] Fine-tuning options for LLMs

As part of #8, in order to adapt the language models we use to the domain we work on (ROSA), explore and document the various options available:

  • prompt engineering with context
  • model fine-tuning

For each option we want to understand:

  • its pre-requisites (e.g. type, quality and quantity of data required, level of model access, cost of the adaptation process)
  • pros and cons
  • what OSS options are recommended for each (see #9)

[EPIC] Resource requirements and cost of foundation models

The large size of foundation models raise several resource and cost questions around deploying them in production. This EPIC will focus on creating experiments and showing results around some of the following questions:

  • What is the relationship between model parameters and its memory consumption? Create a "rosetta stone" document of GPU memory required by models belonging to different parameter size. Create a notebook that captures the footprint of the GPU memory used.
  • How are the models loaded into GPU RAM? Is it directly from S3 or do we also require significant RAM? If so, capture the RAM requirements in a notebook. What about CPU? Update the cost document with RAM, and CPU information. Are there ways to optimize this?
  • What happens when we load the models in a lower precision format like INT-8? How is the accuracy, CPU, and memory performance affected? Explain theoretically and show results in a notebook. Touch upon challenges of frameworks like bitsandbytes in production.
  • Is distributed training and inference with a lot of cheap instances more efficient per dollar than 1 instance with a large GPU? If we have just one GPU of 16GB memory, how much can be done with it in the space of LLMs? Design experiments and share results in a notebook.
  • What are the options of running these models just on CPU? Are there ways of optimizing more than 1B = 1GB of GPU with INT8 precision?

[EPIC] Serving Foundation Models

Serving foundation models on an OpenShift cluster is a crucial step for using them in production. As a part of this issue we will deploy demo models using open source tools and/or the stack recommended by RHODS. Following questions would be further explored as a part of this epic:

Serving LLMs on Open Data Hub / RHODS with ModelMesh

How can we create model endpoints using Modelmesh on the cluster?

This should involve instructions on deploying modelmesh using Operator Hub and adding the model to a s3 bucket.

Create the predict function and explain how it is being called and demo a notebook that sends a payload to the model endpoint.

Dataset: plain text version of the various data sources, generated directly from the source files

Describe the solution you'd like

A mechanism to obtain (a set of) plain text (ascii files) directly from the source files from the various documentation sources around ROSA wherever possible, to avoid having to rely on text extraction from rendered websites or PDF files.

Describe alternatives you've considered

We have a collection of various PDF files, from which plain text can be extracted.
However, standard text extration from PDF has some limitations / problems:

  • hyperlinks are not visible
  • document structure is harder to retain
  • the PDF renderer introduces noise (page headers/footers, numbered references, etc)
  • the PDFs are harder to refresh / keep up to date with the current docs

Additional context

Sources include:

Distributed fine tuning of LLMs

The finetuning notebook uses 1 GPU and LoRA technique to fine tune a T5 model with 3B parameters. The task to be completed in this issue is to fine tune the same model (or 7B version of the model) on multiple GPU nodes. Use Instascale and Codeflare to schedule the training job and retrieve the finetuned model. Create a notebook that demos this.

Deploy Language Model(s) to our cluster exposing a completion API

This should be an enabler for #25 and #26, but also will help to explore open language models and how to serve them.

Goal: to deploy one or more of the publicly available Language Models to our cluster, so that they can be accessed via an API hosted by ourselves.

A comprehensive list of such models can be found in Hugging Face's Transformers' list of models, e.g. https://huggingface.co/docs/transformers/model_doc/bloom (see the list of text models). A summary / classification is also available here: https://huggingface.co/docs/transformers/model_summary#summary-of-the-models

A mechanism to introspect language chain operations

Is your feature request related to a problem? Please describe

As we build language "chains" that involve multiple components and steps (creation of embeddings, querying embeddings, prompt template selection, API queries) I would like to be able to see what is going in detail during a query to the model.

Describe the solution you'd like

Some "debug" option for the whole chains. Possibly something that can be external so that it does not depend on being inside a notebook to be able to see it.

Additional context

#15 (review)

Collection of Comparison Data for Reward Model Training

The process of training a reward model for Reinforcement Learning from Human Feedback (RLHF) involves gathering comparison data to accurately assess the quality of prompt-response pairs. This comparison data aids in fine-tuning a Large Language Model (LLM) using Reinforcement Learning (RL) to generate responses that align with human preferences. This issue aims to guide the development of a systematic workflow for collecting comparison data using the Argilla platform.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.