This deployment is based on validated pattern framework
that uses GitOps to easily provision all operators and apps. It deploys a Chatbot application that leverages the power of Large Language Models (LLMs) in conjunction with the Retrieval-Augmented Generation (RAG) framework running on Red Hat OpenShift to generate a project proposal for a given Red Hat product.
- Podman
- Red Hat Openshift cluster
- GPU Node to run Hugging Face Text Generation Inference server on Red Hat OpenShift cluster.
- Create a fork of the rag-llm-gitops git repository.
The goal of this demo is to demonstrate a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product
- LLM Application augmented with content from Red Hat product documentation.
- Multiple LLM providers (OpenAI, Hugging Face, NVIDIA)
- Redis Vector Database to store embeddings of RedHat product documentation.
- Monitoring dashboard to provide key metrics such as ratings
- GitOps setup to deploy e2e demo (frontend / vector database / served models)
- Hugging Face Text Generation Inference Server: The pattern deploys a Hugging Face TGIS server. The server deploys
meta-llama/Llama-2-7b-chat-hf
model. The server will require a GPU node. - Redis Server: A Redis Server is deployed to store vector embeddings created from Red Hat product documentation.
- Populate VectorDb Job: The job creates the embeddings and populates the vector database (Redis).
- LLM Application: This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
- Prometheus: Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
- Grafana: Deploys Grafana application to visualize the metrics.
Following commands will take about 15-20 minutes
Validated pattern will be deployed
git clone https://github.com/<<your-username>>/rag-llm-gitops.git
cd rag-llm-gitops
oc login --token=<> --server=<> # login to Openshift cluster
podman machine start
# Copy values-secret.yaml.template to ~/values-secret-rag-llm-gitops.yaml.
# You should never check these files
# Add secrets to the values-secret.yaml that needs to be added to the vault.
cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml
./pattern.sh make install
- Login to the OpenShift web console.
- Navigate to the Workloads --> Pods.
- Select the
rag-llm
project from the drop down. - Following pods should be up and running.
Note: If the hf-text-generation-server is not running, make sure you have followed the steps to configure a node with GPU from the instructions provided above.
-
Navigate to routes, Networking --> Routes
-
Click on the Location link and it should launch the application
-
It will use the default provider and model configured as part of the application deployment. The default provider is a Hugging Face model server running in the OpenShift. The model server is deployed with this valdiated pattern and requires a node with GPU.
-
Enter any company name
-
Enter the product as
RedHat OpenShift
-
Click the
Generate
button, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document.
You can optionally add additional providers. The application supports the following providers
- Hugging Face Text Generation Inference Server
- OpenAI
- NVIDIA
Click on the Add Provider
tab to add a new provider. Fill in the details and click Add Provider
button. The provider should be added in the Providers
dropdown uder Chatbot
tab.
Follow the instructions in step 3 to generate the proposal document using the OpenAI provider.
You can provide rating to the model by clicking on the Rate the model
radio button. The rating will be captured as part of the metrics and can help the company which model to deploy in prodcution.
By default, Grafana application is deployed in llm-monitoring
namespace.To launch the Grafana Dashboard, follow the instructions below:
- Grab the credentials of Grafana Application
- Navigate to Workloads --> Secrets
- Click on the grafana-admin-credentials and copy the GF_SECURITY_ADMIN_USER, GF_SECURITY_ADMIN_PASSWORD
- Launch Grafana Dashboard
- Navigate to Networking --> Routes in the llm-monitoring namespace.
- Click on the
Location
link forgrafana-route
. - Enter the Grafana admin credentials.
- Ratings are displayed for each model.