Coder Social home page Coder Social logo

muhammadashiqameer / generativeaiexamples Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvidia/generativeaiexamples

0.0 0.0 0.0 16.91 MB

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Home Page: https://nvidia.github.io/GenerativeAIExamples/latest/index.html

License: Apache License 2.0

Shell 1.47% JavaScript 0.01% Python 50.06% Go 4.76% CSS 0.05% Makefile 0.76% HTML 0.72% Smarty 0.04% Jupyter Notebook 41.14% Dockerfile 0.55% Jinja 0.44%

generativeaiexamples's Introduction

NVIDIA Generative AI Examples

documentation

Introduction

State-of-the-art Generative AI examples that are easy to deploy, test, and extend. All examples run on the high performance NVIDIA CUDA-X software stack and NVIDIA GPUs.

NVIDIA NGC

Generative AI Examples can use models and GPUs from the NVIDIA NGC: AI Development Catalog.

Sign up for a free NGC developer account to access:

  • GPU-optimized containers used in these examples
  • Release notes and developer documentation

Retrieval Augmented Generation (RAG)

A RAG pipeline embeds multimodal data -- such as documents, images, and video -- into a database connected to a LLM. RAG lets users chat with their data!

Developer RAG Examples

The developer RAG examples run on a single VM. The examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM programming frameworks using NVIDIA's open source connectors. The examples are easy to deploy with Docker Compose.

Examples support local and remote inference endpoints. If you have a GPU, you can inference locally with TensorRT-LLM. If you don't have a GPU, you can inference and embed remotely with NVIDIA API Catalog endpoints.

Model Embedding Framework Description Multi-GPU TRT-LLM NVIDIA Endpoints Triton Vector Database
mixtral_8x7b nvolveqa_40k LangChain NVIDIA API Catalog endpoints chat bot [code, docs] No No Yes Yes Milvus or pgvector
llama-2 e5-large-v2 LlamaIndex Canonical QA Chatbot [code, docs] Yes Yes No Yes Milvus or pgvector
llama-2 all-MiniLM-L6-v2 LlamaIndex Chat bot, GeForce, Windows [repo] No Yes No No FAISS
llama-2 nvolveqa_40k LangChain Chat bot with query decomposition agent [code, docs] No No Yes Yes Milvus or pgvector
mixtral_8x7b nvolveqa_40k LangChain Minimilastic example: RAG with NVIDIA AI Foundation Models [code, README] No No Yes Yes FAISS
mixtral_8x7b
Deplot
Neva-22b
nvolveqa_40k Custom Chat bot with multimodal data [code, docs] No No Yes No Milvus or pvgector
llama-2 e5-large-v2 LlamaIndex Chat bot with quantized LLM model [docs] Yes Yes No Yes Milvus or pgvector
mixtral_8x7b none PandasAI Chat bot with structured data [code, docs] No No Yes No none
llama-2 nvolveqa_40k LangChain Chat bot with multi-turn conversation [code, docs] No No Yes No Milvus or pgvector

Enterprise RAG Examples

The enterprise RAG examples run as microservices distributed across multiple VMs and GPUs. These examples show how to orchestrate RAG pipelines with Kubernetes and deployed with Helm.

Enterprise RAG examples include a Kubernetes operator for LLM lifecycle management. It is compatible with the NVIDIA GPU operator that automates GPU discovery and lifecycle management in a Kubernetes cluster.

Enterprise RAG examples also support local and remote inference with TensorRT-LLM and NVIDIA API Catalog endpoints.

Model Embedding Framework Description Multi-GPU Multi-node TRT-LLM NVIDIA Endpoints Triton Vector Database
llama-2 NV-Embed-QA LlamaIndex Chat bot, Kubernetes deployment [README] No No Yes No Yes Milvus

Tools

Example tools and tutorials to enhance LLM development and productivity when using NVIDIA RAG pipelines.

Name Description NVIDIA Endpoints
Evaluation RAG evaluation using synthetic data generation and LLM-as-a-judge [code, docs] Yes
Observability Monitoring and debugging RAG pipelines [code, docs] Yes

Open Source Integrations

These are open source connectors for NVIDIA-hosted and self-hosted API endpoints. These open source connectors are maintained and tested by NVIDIA engineers.

Name Framework Chat Text Embedding Python Description
NVIDIA AI Foundation Endpoints Langchain Yes Yes Yes Easy access to NVIDIA hosted models. Supports chat, embedding, code generation, steerLM, multimodal, and RAG.
NVIDIA Triton + TensorRT-LLM Langchain Yes Yes Yes This connector allows Langchain to remotely interact with a Triton inference server over GRPC or HTTP for optimized LLM inference.
NVIDIA Triton Inference Server LlamaIndex Yes Yes No Triton inference server provides API access to hosted LLM models over gRPC.
NVIDIA TensorRT-LLM LlamaIndex Yes Yes No TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs.

Support, Feedback, and Contributing

We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback. We invite contributions via GitHub Issues or pull requests!

Known Issues

  • Some known issues are identified as TODOs in the Python code.
  • The datasets provided as part of this project are under a different license for research and evaluation purposes.
  • This project downloads and installs third-party open source software projects. Review the license terms of these open source projects before use.

generativeaiexamples's People

Contributors

chiachihchen avatar dependabot[bot] avatar dharmendrach avatar fciannella avatar harperjuanl avatar jliberma avatar mohammedpithapur avatar muhammadashiqameer avatar nealvaidya avatar nv-pranjald avatar rmkraus avatar shubhadeepd avatar sumitkbh avatar tabrizian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.