nvidia / chatrtx Goto Github PK

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

License: Other

Python 100.00%

chatrtx's Introduction

🚀 RAG on Windows using TensorRT-LLM and LlamaIndex 🦙

ChatRTX is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content—docs, notes, photos. Leveraging retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration, you can query a custom chatbot to quickly get contextually relevant answers. This app also lets you give query through your voice and lets you retreive images matching your voice or text input. And because it all runs locally on your Windows RTX PC or workstation, you’ll get fast and secure results. ChatRTX supports various file formats, including text, pdf, doc/docx, xml, png, jpg, bmp. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds.

The AI models that are supported in this app:

LLaMa 2 13B
Mistral 7B
ChatGLM3 6B
Whisper Medium (for supporting voice input)
CLIP (for images)

The pipeline incorporates the above AI models, TensorRT-LLM, LlamaIndex and the FAISS vector search library. In the sample application here, we have a dataset consists of recent articles sourced from NVIDIA Gefore News.

What is RAG? 🔍

Retrieval-augmented generation (RAG) for large language models (LLMs) seeks to enhance prediction accuracy by connecting the LLM to your data during inference. This approach constructs a comprehensive prompt enriched with context, historical data, and recent or relevant knowledge.

Getting Started

Hardware requirement

ChatRTX is currently built for RTX 3xxx and RTX 4xxx series GPUs that have at least 8GB of GPU memory.
50 GB of available hard disk space
Windows 10/11
Driver 535.11 or later

Installer

If you are using ChatRTX installer, setup of the models selected during installation is done by the installer. You can skip the insatllation steps below, launch the installed 'NVIDIA ChatRTX' desktop icon, and refer to the Use additional model section to add additional models.

Install Prerequisites

Install Python 3.10.11 or create a virtual environment.
- create your virtual environment (recommended)
```
python3.10 -m venv ChatRTX
```
- activate your environment
```
ChatRTX\Scripts\activate
```
You can also use conda to create your virtual environment (optional)
- create conda environment
```
conda create -n chatrtx_env python=3.10
```
- activate your conda environment
```
conda activate chatrtx_env
```

Clone ChatRTX code repo into a local dir (%ChatRTX Folder%) using Git for Windows, and install necessary dependencies. This directory will be the root directory for this guide.

git clone https://github.com/NVIDIA/trt-llm-rag-windows.git
cd trt-llm-rag-windows # root dir

#install dependencies
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Install TensorRT-LLM wheel. The wheel is already present in the wheel directory.

cd wheel
pip install tensorrt_llm-0.9.0-cp310-cp310-win_amd64.whl --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121

Download 'ngcsdk-3.41.2-py3-none-any.whl' from here and install it using the command below. This enables us to downloads from NGC:
```
pip install .\ngcsdk-3.41.2-py3-none-any.whl
```
Microsoft MPI Download and install Microsoft MPI. You will be prompted to choose between an exe, which installs the MPI executable, and an msi, which installs the MPI SDK. Download and install both.

Setup Mistral AWQ INT4 model

In this project, we use the AWQ int4 quantized models for the LLMs. Before using it, you'll need to build a TensorRT Engine specific to your GPU. Below we have the steps to build the engine.

Create a model directory for Mistral Models

cd model
mkdir mistral_model
cd mistral_model

#Create the relevant directories
mkdir engine model_checkpoints tokenizer

Download tokenizer files in model/mistral_model/tokenizer direcotry

cd model/mistral_model/tokenizer

#Use curl to download the tokenizer files
"C:\Windows\System32\curl.exe" -L -o config.json "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=mistral7b_hf_tokenizer/config.json"
"C:\Windows\System32\curl.exe" -L -o tokenizer.json "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=mistral7b_hf_tokenizer/tokenizer.json"
"C:\Windows\System32\curl.exe" -L -o tokenizer.model "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=mistral7b_hf_tokenizer/tokenizer.model"
"C:\Windows\System32\curl.exe" -L -o tokenizer_config.json "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=mistral7b_hf_tokenizer/tokenizer_config.json"

Download Mistral awq int4 engine checkpoints in model/mistral_model/model_checkpoints folder

cd model/mistral_model/model_checkpoints

#Use curl to download the model checkpoint files files
"C:\Windows\System32\curl.exe" -L -o config.json "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=config.json"
"C:\Windows\System32\curl.exe" -L -o license.txt "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=license.txt"
"C:\Windows\System32\curl.exe" -L -o rank0.safetensors "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=rank0.safetensors"
"C:\Windows\System32\curl.exe" -L -o README.txt "https://api.ngc.nvidia.com/v2/models/org/nvidia/team/llama/mistral-7b-int4-chat/1.2/files?redirect=true&path=README.txt"

Build the Mistral TRT-LLM int4 AWQ Engine

#inside the root directory
trtllm-build --checkpoint_dir .\model\mistral_model\model_checkpoints --output_dir .\model\mistral_model\engine --gpt_attention_plugin float16 --gemm_plugin float16 --max_batch_size 1 --max_input_len 7168 --max_output_len 1024 --context_fmha=enable --paged_kv_cache=disable --remove_input_padding=disable

We use the following directories that we previously created for the build command:

Name	Details
--checkpoint_dir	TRT-LLM checkpoints direcotry
--output_dir	TRT-LLM engine direcotry

Refer to the TRT-LLM repository to learn more about the various commands and parameters.

Setup Whisper medium INT8 model

Create the directories to store the Whisper model

cd model
mkdir whisper
cd whisper

#Create the relevant directories
mkdir whisper_assets whisper_medium_int8_engine

Download model weights and tokenizer

cd model/whisper/whisper_assets

#Use curl to download the tokenizer and model weights files 
"C:\Windows\System32\curl.exe" -L -o mel_filters.npz "https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz"
"C:\Windows\System32\curl.exe" -L -o multilingual.tiktoken "https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/multilingual.tiktoken"
"C:\Windows\System32\curl.exe" -L -o medium.pt "https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt"

Build command

# call command form root_dir
python .\whisper\build_files\build.py --output_dir .\model\whisper\whisper_medium_int8_engine --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --enable_context_fmha --max_batch_size 1 --max_beam_width 1 --model_name medium --use_weight_only --model_dir .\model\whisper\whisper_assets

We use the following directories that we previously created for the build command:

Name	Details
--checkpoint_dir	TRT-LLM checkpoints direcotry
--output_dir	TRT-LLM engine direcotry

Refer to the TRT-LLM repository to learn more about the various commands and parameters.

Get Embedding Model:

Make the below direcotry structure in model folder
```
cd model
mkdir multilingual-e5-base
```
Download the below 'multilingual-e5-base' embedding model file from here

files to download: 1_Pooling/config.json, commit.txt, config.json, model.safetensors, modules.json, README.md, sentence_bert_config.json, sentencepiece.bpe.model, special_tokens_map.json, tokenizer.json, tokenizer_config.json

Building above two models are sufficient to run the app. Other models can be downloaded and built after running the app.

Deploying the App

Run App

Running below commands would launch the UI of app in your browser

``` 
# call command form root_dir

python verify_install.py

python app.py

```

You can refer to User Guide for additional information on using the app.

Use additional model

In the app UI that gets launched in browser after running app.py, click on 'Add new models' in the 'AI model' section.
Select the model from drop down list, read the model license and check the box of 'License'
Click on 'Download models' icon to start the download of model files in the background.
After downloading finishes, click on the newly appearing button 'Install'. This will build the TRT LLM engine files if necessary.
The installed model will now show up in the 'Select AI model' drop down list.

Deleting model

In case any model is not needed, model can be removed by:

Clicking on the gear icon on the top right of the UI.
Clicking on 'Delete AI model' icon adjacent to the model name.

Using your own data

By default this app loads data from the dataset/ directory into the vector store. To use your own data select the folder in the 'Dataset' section of UI.

Known Issues and Limitations

The following known issues exist in the current version:

The app currently works with Microsoft Edge and Google Chrome browsers. Due to a bug, the application does not work with FireFox browser.
The app does not remember context. This means follow up questions will not be answered based on the context of the previous questions. For example, if you previously asked “What is the price of the RTX 4080 Super?” and follow that up with “What are its hardware specifications?”, the app will not know that you are asking about the RTX 4080 Super.
The source file attribution in the response is not always correct.
Unlikely case where the app gets stuck in an unusable state that cannot be resolved by restarting, could often be fixed by deleting the preferences.json file (by default located at C:\Users<user>\AppData\Local\NVIDIA\ChatRTX\RAG\trt-llm-rag-windows-main\config\preferences.json) and restarting.

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

chatrtx's People

Contributors

Stargazers

Watchers

Forkers

mohammedsaifali daxiongshu fofna kenplusplus wayan123 lahiru-sandanuwan kustomzone aiworkspace seshakiran tomchapin mivanovitch klokedm techthiyanes robbiej mpmiyata mustaphau amphancm apetree333 siliciuss mehdi-elion kingkw1 aredubya steveisnthere athy125 poganesh putoshi tenoorjamusubi frederik256 camenduru cygwynd kanukollunaren repos-ai-local willianpatrick honorkeles artart788 joontju zixuniaowu pm2845 jongbaekim 1worldcapture lodpaine taocao chenjuean-159 lookastarik tourist-c pwh-veritas findnix ruiatelsevier gptpaid jem-experience janeferdinant nick13033 richardranft jakubkwiatkowski chunde rexxavier spring-mirage samuraiit althetinkerer sliorbar pebbleshx sirleao baozhi888 wxwcoder ryansmith432 thecoderbabu dualword clockfort tangyiyong fs-dualis cedececa zhaodingmao janssma75 kharsus skiermonnonfutou 347981373 ox-softonal coolig8 aazanabili run2ai-m guilhermeasouza hinagate richiesh tutumomo mrb0y ailabteam yanxg j-messagering tailagency yonghe1979 crayonupdatesf hulkferdy58 maxroladyin f901107 79maintema f-farerthebest i-excillu demofilovizuete bpromnica beathite-deltatana

chatrtx's Issues

CHM file support

Please add CHM support, PrivateGPT also didn't had support for it before, but they now have it with this: langchain-ai/langchain#15519

I see Chat with RTX (and all underlaying tech) as a better alternative to PrivateGPT, much faster and stable.
CHM files are pretty useful since most documentation uses it, so I could just feed it all my software documentation.

Thank you.

garbage output ?

hi, rag team, many thanks for this demo work.

wonder what's wrong here, why I am getting only meaningless output ?

setup as following

download hf checkpoint: llama2-13b-chat-hf
build trt-llm engine as following:

python3 convert_checkpoint.py  --model_dir /workspace/llama2/Llama-2-13b-chat-hf/ --output_dir /workspace/llama2/engine --dtype float16  --use_weight_only --weight_only_precision int4

trtllm-build --checkpoint_dir /workspace/llama2/engine --output_dir /workspace/llama2/engine --gemm_plugin float16 --max_input_len 15360 --max_output_len 1024 --max_batch_size 1

thanks for helping

I had to install Microsoft MPI to be able to run it otherwise it was throwing error

Installed it from here: https://www.microsoft.com/en-us/download/details.aspx?id=105289

Here's the error I was getting:

Traceback (most recent call last):
File "C:\Users\Phoenix\AppData\Local\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\app.py", line 101, in
llm = TrtLlmAPI(
File "C:\Users\Phoenix\AppData\Local\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\trt_llama_api.py", line 106, in init
runtime_rank = tensorrt_llm.mpi_rank()
File "C:\Users\Phoenix\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\tensorrt_llm_utils.py", line 221, in mpi_rank
return mpi_comm().Get_rank()
File "C:\Users\Phoenix\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\tensorrt_llm_utils.py", line 216, in mpi_comm
from mpi4py import MPI
ImportError: DLL load failed while importing MPI: The specified module could not be found.

ChatWithRTX Installer is corrupt.

Sorry if this isn't the correct place to post this but I don't know where else to notify the dev team about this. The zip file containing the prebuilt installer from Nvidia's web page (https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/) is corrput. Myself and others have tried downloading the zip file multiple times and it still errors out saying that the file is corrupt.

theres a whole reddit thread with others having the same issue. https://www.reddit.com/r/ChatWithRTX/comments/1avw4vn/corrupt_zip_file/

Some questions crash the app

I am using a custom dataset, RTX 4090, the trt-llm-0.7.1 branch (39a4bd5 specifically) with an engine built as per instructions.

It answers some questions just fine, i.e. it works most of the time. But some questions crash the app with the following in the console:

Traceback (most recent call last):
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/gradio/routes.py", line 507, in predict
    output = await route_utils.call_process_api(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/gradio/route_utils.py", line 219, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1437, in process_api
    result = await self.call_function(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1107, in call_function
    prediction = await fn(*processed_input)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/gradio/utils.py", line 616, in async_wrapper
    response = await f(*args, **kwargs)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/gradio/chat_interface.py", line 417, in _submit_fn
    response = await anyio.to_thread.run_sync(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/username/.distrobox/homes/deepstream64/trt-llm-rag-windows/app.py", line 94, in chatbot
    response = query_engine.query(query)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/query_engine/retriever_query_engine.py", line 178, in _query
    response = self._response_synthesizer.synthesize(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/response_synthesizers/base.py", line 128, in synthesize
    response_str = self.get_response(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/response_synthesizers/compact_and_refine.py", line 34, in get_response
    response = super().get_response(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/response_synthesizers/refine.py", line 116, in get_response
    response = self._give_response_single(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/response_synthesizers/refine.py", line 175, in _give_response_single
    StructuredRefineResponse, program(context_str=cur_text_chunk)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/response_synthesizers/refine.py", line 56, in __call__
    answer = self._llm_predictor.predict(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/llm_predictor/base.py", line 149, in predict
    response = self._llm.complete(formatted_prompt)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/llama_index/llms/base.py", line 277, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/home/username/.distrobox/homes/deepstream64/trt-llm-rag-windows/trt_llama_api.py", line 230, in complete
    output_ids = self._model.decode(input_ids, input_lengths, self._sampling_config)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 704, in wrapper
    ret = func(self, *args, **kwargs)
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 2271, in decode
    return self.decode_regular(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 2004, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
  File "/home/username/.distrobox/homes/deepstream64/.local/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 1739, in handle_per_step
    raise RuntimeError('Executing TRT engine failed!')
RuntimeError: Executing TRT engine failed!

[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[02/21/2024-00:08:50] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )

Any idea what could be the issue?

I am building a software like this right now, and finding this is depressing

I have an enthusiast project I am working on right now with Omdena AI that essentially operates identical to this software. I was never looking for compensation or anything. I planned on building a freeware that any developer can use. You guys just cranked it out when I was doing project proposals. Now that I am week 3 in the project I find this. And I am upset. My only consolation is that you published it faster than I did because you guys are so rich you can pay 1000's of developers.

My only concern about your guys code, is that it's a black-box, and not a white box. I am leaving all my code open, as well as documentation how to tweak every fine detail. And make it easier to install than this.

I don't think all these GUI's are necessary. We just doing a streamlit UI that any developer can come along and put into their own game, we use RAG, and "in character" speech, and all these context tweaks and everything. It's really coming along well. But I feel bad now.

Have a nice day.

invalid session error

After a normal installation,

I tried to open the program, just automatically after setup because i wasn't able to find any installed program or command or a shorcut to run it (another huge enhance opportunity there), the program on browser throws invalid session even with cognito mode on different browsers.

Tested on windows 11.
Rtx 3080
Ram 64 gb

not able to debug this even with gdb

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ python3 app.py
Invalid MIT-MAGIC-COOKIE-1 key[anil-gpu2:47878] *** Process received signal ***
[anil-gpu2:47878] Signal: Segmentation fault (11)
[anil-gpu2:47878] Signal code: Address not mapped (1)
[anil-gpu2:47878] Failing at address: 0x440000e9
[anil-gpu2:47878] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f5a86cc0420]
[anil-gpu2:47878] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7f582b18efc7]
[anil-gpu2:47878] [ 2] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9abf0)[0x7f5815d2dbf0]
[anil-gpu2:47878] [ 3] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2decf)[0x7f5815cc0ecf]
[anil-gpu2:47878] [ 4] python3(PyModule_ExecDef+0x70)[0x597d40]
[anil-gpu2:47878] [ 5] python3[0x5990c9]
[anil-gpu2:47878] [ 6] python3[0x4fd37b]
[anil-gpu2:47878] [ 7] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[anil-gpu2:47878] [ 8] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [ 9] python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2856]
[anil-gpu2:47878] [10] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [11] python3(_PyEval_EvalFrameDefault+0x731)[0x4ee461]
[anil-gpu2:47878] [12] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [13] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:47878] [14] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [15] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:47878] [16] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [17] python3[0x4fd514]
[anil-gpu2:47878] [18] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[anil-gpu2:47878] [19] python3(PyImport_ImportModuleLevelObject+0x525)[0x50b685]
[anil-gpu2:47878] [20] python3[0x517454]
[anil-gpu2:47878] [21] python3[0x4fd907]
[anil-gpu2:47878] [22] python3(PyObject_Call+0x209)[0x50a259]
[anil-gpu2:47878] [23] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[anil-gpu2:47878] [24] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [25] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:47878] [26] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:47878] [27] python3[0x4fd514]
[anil-gpu2:47878] [28] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[anil-gpu2:47878] [29] python3(PyImport_ImportModuleLevelObject+0x9da)[0x50bb3a]
[anil-gpu2:47878] *** End of error message ***
Segmentation fault (core dumped)

#environment version

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi
accelerate 0.20.3 pypi_0 pypi
aiofiles 23.2.1 pypi_0 pypi
aiohttp 3.9.3 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
alembic 1.13.1 pypi_0 pypi
altair 5.2.0 pypi_0 pypi
annotated-types 0.6.0 pypi_0 pypi
anyio 3.7.1 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
beautifulsoup4 4.12.3 pypi_0 pypi
blas 1.0 mkl
build 1.1.1 pypi_0 pypi
bzip2 1.0.8 h5eee18b_5
ca-certificates 2024.2.2 hbcca054_0 conda-forge
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colored 2.2.4 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
contourpy 1.2.0 pypi_0 pypi
ctransformers 0.2.26 pypi_0 pypi
cuda-cudart 12.1.105 0 nvidia
cuda-cupti 12.1.105 0 nvidia
cuda-libraries 12.1.0 0 nvidia
cuda-nvrtc 12.1.105 0 nvidia
cuda-nvtx 12.1.105 0 nvidia
cuda-opencl 12.4.99 0 nvidia
cuda-python 12.2.0 pypi_0 pypi
cuda-runtime 12.1.0 0 nvidia
cycler 0.12.1 pypi_0 pypi
cython 3.0.9 pypi_0 pypi
dataclasses-json 0.6.4 pypi_0 pypi
datasets 2.14.6 pypi_0 pypi
deprecated 1.2.14 pypi_0 pypi
diffusers 0.15.0 pypi_0 pypi
dill 0.3.7 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
docx2txt 0.8 pypi_0 pypi
environs 9.5.0 pypi_0 pypi
evaluate 0.4.1 pypi_0 pypi
exceptiongroup 1.2.0 pypi_0 pypi
faiss-cpu 1.7.4 pypi_0 pypi
fastapi 0.110.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
ffmpy 0.3.2 pypi_0 pypi
filelock 3.13.1 py310h06a4308_0
flask 2.2.3 pypi_0 pypi
flask-marshmallow 0.15.0 pypi_0 pypi
flask-migrate 4.0.4 pypi_0 pypi
flask-sqlalchemy 3.0.3 pypi_0 pypi
flatbuffers 24.3.7 pypi_0 pypi
fonttools 4.50.0 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
frozenlist 1.4.1 pypi_0 pypi
fsspec 2023.10.0 pypi_0 pypi
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py310heeb90bb_0
gnutls 3.6.15 he1e5248_0
gradio 4.14.0 pypi_0 pypi
gradio-client 0.8.0 pypi_0 pypi
greenlet 3.0.3 pypi_0 pypi
grpcio 1.56.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
httpcore 1.0.4 pypi_0 pypi
httpx 0.27.0 pypi_0 pypi
huggingface-hub 0.21.4 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
idna 3.4 py310h06a4308_0
importlib-metadata 7.1.0 pypi_0 pypi
importlib-resources 6.4.0 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306
itsdangerous 2.1.2 pypi_0 pypi
janus 1.0.0 pypi_0 pypi
jinja2 3.1.3 py310h06a4308_0
joblib 1.3.2 pypi_0 pypi
jpeg 9e h5eee18b_1
jsonpatch 1.33 pypi_0 pypi
jsonpointer 2.4 pypi_0 pypi
jsonschema 4.21.1 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
lame 3.100 h7b6447c_0
langchain 0.0.310 pypi_0 pypi
langsmith 0.0.43 pypi_0 pypi
lark 1.1.9 pypi_0 pypi
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 12.1.0.26 0 nvidia
libcufft 11.0.2.4 0 nvidia
libcufile 1.9.0.20 0 nvidia
libcurand 10.3.5.119 0 nvidia
libcusolver 11.4.4.55 0 nvidia
libcusparse 12.0.2.55 0 nvidia
libdeflate 1.17 h5eee18b_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 h14aa051_20 conda-forge
libgfortran4 7.5.0 h14aa051_20 conda-forge
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
libnpp 12.0.2.50 0 nvidia
libnvjitlink 12.1.105 0 nvidia
libnvjpeg 12.1.1.14 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.41.5 h5eee18b_0
libwebp-base 1.3.2 h5eee18b_0
llama-index 0.9.27 pypi_0 pypi
llvm-openmp 14.0.6 h9e868ea_0
lz4-c 1.9.4 h6a678d5_0
mako 1.3.2 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.3 py310h5eee18b_0
marshmallow 3.21.1 pypi_0 pypi
matplotlib 3.8.3 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py310h5eee18b_1
mkl_fft 1.3.8 py310h5eee18b_0
mkl_random 1.2.4 py310hdb19cb5_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpi 1.0 mpich conda-forge
mpi4py 3.1.4 py310hfc96bbd_0
mpich 3.3.2 hc856adb_0
mpmath 1.3.0 py310h06a4308_0
multidict 6.0.5 pypi_0 pypi
multiprocess 0.70.15 pypi_0 pypi
mypy-extensions 1.0.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi
nettle 3.7.3 hbbd107a_1
networkx 3.1 py310h06a4308_0
ninja 1.11.1.1 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
numpy 1.24.0 pypi_0 pypi
nvidia-ammo 0.7.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.18.1 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
onnx 1.14.1 pypi_0 pypi
onnx-graphsurgeon 0.3.27 pypi_0 pypi
onnxruntime 1.16.3 pypi_0 pypi
openai 1.14.2 pypi_0 pypi
openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 3.0.13 h7f8727e_0
optimum 1.17.1 pypi_0 pypi
orjson 3.9.15 pypi_0 pypi
packaging 24.0 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 10.2.0 py310h5eee18b_0
pip 23.3.1 py310h06a4308_0
polygraphy 0.49.0 pypi_0 pypi
protobuf 5.26.0 pypi_0 pypi
psutil 5.9.7 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pyarrow 15.0.2 pypi_0 pypi
pyarrow-hotfix 0.6 pypi_0 pypi
pydantic 2.3.0 pypi_0 pypi
pydantic-core 2.6.3 pypi_0 pypi
pydantic-settings 2.0.3 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pygments 2.17.2 pypi_0 pypi
pymilvus 2.3.0 pypi_0 pypi
pynvml 11.5.0 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
pypdf 3.15.5 pypi_0 pypi
pypdf2 3.0.1 pypi_0 pypi
pyproject-hooks 1.0.0 pypi_0 pypi
python 3.10.14 h955ad1f_0
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
python-multipart 0.0.9 pypi_0 pypi
pytorch-cuda 12.1 ha16c6d3_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pytube 15.0.0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 py310h5eee18b_0
readline 8.2 h5eee18b_0
referencing 0.34.0 pypi_0 pypi
regex 2023.12.25 pypi_0 pypi
requests 2.31.0 py310h06a4308_1
responses 0.18.0 pypi_0 pypi
rich 13.7.1 pypi_0 pypi
rouge-score 0.1.2 pypi_0 pypi
rpds-py 0.18.0 pypi_0 pypi
safetensors 0.4.2 pypi_0 pypi
scikit-learn 1.4.1.post1 pypi_0 pypi
scipy 1.12.0 pypi_0 pypi
semantic-version 2.10.0 pypi_0 pypi
sentence-transformers 2.2.2 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 68.2.2 py310h06a4308_0
shellingham 1.5.4 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soupsieve 2.5 pypi_0 pypi
sqlalchemy 2.0.28 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
starlette 0.36.3 pypi_0 pypi
sympy 1.12 py310h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tenacity 8.2.3 pypi_0 pypi
tensorrt 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-bindings 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-libs 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-llm 0.7.1 pypi_0 pypi
threadpoolctl 3.4.0 pypi_0 pypi
tiktoken 0.3.3 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.4rc3 pypi_0 pypi
tomli 2.0.1 pypi_0 pypi
tomlkit 0.12.0 pypi_0 pypi
toolz 0.12.1 pypi_0 pypi
torch 2.1.2 pypi_0 pypi
torchaudio 2.2.1 py310_cu121 pytorch
torchvision 0.17.1 py310_cu121 pytorch
tqdm 4.66.2 pypi_0 pypi
transformers 4.33.1 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typer 0.9.0 pypi_0 pypi
typing-inspect 0.9.0 pypi_0 pypi
typing_extensions 4.9.0 py310h06a4308_1
tzdata 2024.1 pypi_0 pypi
ujson 5.9.0 pypi_0 pypi
urllib3 2.1.0 py310h06a4308_0
uvicorn 0.29.0 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
werkzeug 3.0.1 pypi_0 pypi
wheel 0.41.2 py310h06a4308_0
wrapt 1.16.0 pypi_0 pypi
xxhash 3.4.1 pypi_0 pypi
xz 5.4.6 h5eee18b_0
yaml 0.2.5 h7b6447c_0
yarl 1.9.4 pypi_0 pypi
youtube-transcript-api 0.6.2 pypi_0 pypi
zipp 3.18.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

Can Chat with RTX support Pre-trained models as dataset input?

Hi there,

Exciting to know that we can use LLM locally. But we have created few customized models in Bert and also GPT, now we want to move them into locally with RTX. But we don't want to start from scratch. Any way the local RTX chat framework can read our pre-trained modesl, then we can leverage it and keep adding more dataset into it?

feat: can it read repository

Is it possible to give an access to git repository for example hosted gitlab on server and use those repos as dataset

Incredibly unclear instructions

Was this posted with the intent of people actually using it?
Which files are the tokenizer and where do I put them? Where is the .engine file?
Has anyone actually gotten this to work, or is it fake?

Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match.

When I run the command to start the application, I get the version mismatch error:

The command:

python app.py --trt_engine_path model/ --trt_engine_name llama_float16_tp1_rank0.engine --tokenizer_dir_path Llama-2-13b-chat-hf --data_dir dataset/

The error:

Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 228, Serialized Engine Version: 226)
Traceback (most recent call last):
  File "C:\Users\unubi\trt-llm-rag-windows\app.py", line 63, in <module>
    llm = TrtLlmAPI(
  File "C:\Users\unubi\trt-llm-rag-windows\trt_llama_api.py", line 166, in __init__
    decoder = tensorrt_llm.runtime.GenerationSession(self._model_config,
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 457, in __init__
    self.runtime = _Runtime(engine_buffer, mapping)
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 150, in __init__
    self.__prepare(mapping, engine_buffer)
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 168, in __prepare
    assert self.engine is not None
AssertionError
Exception ignored in: <function _Runtime.__del__ at 0x000001F6C370CA60>
Traceback (most recent call last):
  File "C:\Users\unubi\anaconda3\envs\myenv\lib\site-packages\tensorrt_llm\runtime\generation.py", line 266, in __del__
    cudart.cudaFree(self.address)  # FIXME: cudaFree is None??
AttributeError: '_Runtime' object has no attribute 'address'

Can it run on dual GPU ?

I have two 3090FE which connected by nvlink but is it possible to run ?

default response language

How can I choose the default response language?

[TensorRT-LLM][ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool

When running NVIDIA Chat with RTX, I get the following error

RuntimeError: [TensorRT-LLM] [ERROR] CUDA runtime error in cudaDeviceGetDefaultMemPool(&memPool, device): operation not s upported (C: \Users\tejaswinp\workspace\tekit\cpp\tensorrt_llm\runtime\bufferManager.cpp:171)

Windows 11 Pro with 16GB RAM with NVIDIA A-40-16Q running in a VM environment.

Let me know what else can I provide to help with troubleshooting.

ModuleNotFoundError: No module named 'tensorrt_llm'

Hello, after installing and running Chat with RTX I get error ModuleNotFoundError: No module named 'tensorrt_llm'. How to fix it?

Not enough free memory to build TRT engine

Hi,

I want to check if anyone has been able to complete the pre-requisites and run this reference app on GTX 4070 Ti?

I get stuck on building the TRT engine step with the following error:
Requested amount of GPU memory (1024 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.

I opened an issue issues/1045 on TensorRT-LLM repo with more detailed info on my system configuration and repro steps.

I tried playing with different --max_input_len and --max_output_len values, reducing to 512, but that doesn't make any difference. I also tried following the same process but using the smaller model Llama-2-7b, which I theoretically understand should require much less dedicated GPU memory. However, I run into the exact same memory issue.

I would appreciate any suggestions on how to proceed and get this reference app running

Fine-tune WhereIsAI/UAE-Large-V1 embeddings for Models First ?

Hi,

The embeddings model used on default ChatRTX installation is WhereIsAI/UAE-Large-V1. To use the generated embeddings in an LLM model imported to ChatRTX, do I first need to fine tune a pre-trained LLM model with AnglE so that WhereIsAI/UAE-Large-V1 embeddings are compatible with an LLM? e.g.

angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2')

Thank you !

How to implement chat history memory and system prompt?

I see the possibility of implementing chat history and system prompt features like ChatGPT because I found that the stream_chatbot function in app.py had unused chat_history and session_id parameters, but I don't know where to start due to the lack of development guidance and my limited knowledge of programming.

Tasks

Beta Give feedback

No tasks being tracked yet.

Options

Support for Markdown files

Just started playing around and this is awesome. Would love to see support for recursive search of markdown files as most of my notes from Obisidan/Logseq are plain-text but with a .md ending.

Connection error

I clean-installed this program and got this error message when I run the program. Any idea?

App running with config
{
"models": {
"supported": [
{
"name": "Mistral 7B int4",
"installed": true,
"metadata": {
"model_path": "model\mistral\mistral7b_int4_engine",
"engine": "llama_float16_tp1_rank0.engine",
"tokenizer_path": "model\mistral\mistral7b_hf",
"max_new_tokens": 1024,
"max_input_token": 7168,
"temperature": 0.1
}
},
{
"name": "Llama 2 13B int4",
"installed": true,
"metadata": {
"model_path": "model\llama\llama13_int4_engine",
"engine": "llama_float16_tp1_rank0.engine",
"tokenizer_path": "model\llama\llama13_hf",
"max_new_tokens": 1024,
"max_input_token": 3900,
"temperature": 0.1
}
}
],
"selected": "Mistral 7B int4"
},
"sample_questions": [
{
"query": "How does NVIDIA ACE generate emotional responses?"
},
{
"query": "What is Portal prelude RTX?"
},
{
"query": "What is important about Half Life 2 RTX?"
},
{
"query": "When is the launch date for Ratchet & Clank: Rift Apart on PC?"
}
],
"dataset": {
"sources": [
"directory",
"youtube",
"nodataset"
],
"selected": "directory",
"path": "dataset",
"isRelative": true
},
"strings": {
"directory": "Folder Path",
"youtube": "YouTube URL",
"nodataset": "AI model default"
}
}
Traceback (most recent call last):
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 793, in urlopen
response = self._make_request(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 491, in _make_request
raise new_e
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connection.py", line 653, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connection.py", line 806, in ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\util\ssl.py", line 465, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\util\ssl.py", line 509, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\ssl.py", line 1104, in _create
self.do_handshake()
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\ssl.py", line 1375, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 847, in urlopen
retries = retries.increment(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\util\retry.py", line 470, in increment
raise reraise(type(error), error, _stacktrace)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\util\util.py", line 38, in reraise
raise value.with_traceback(tb)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 793, in urlopen
response = self._make_request(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 491, in _make_request
raise new_e
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connection.py", line 653, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\connection.py", line 806, in ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\util\ssl.py", line 465, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\urllib3\util\ssl.py", line 509, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\ssl.py", line 1104, in _create
self.do_handshake()
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\ssl.py", line 1375, in do_handshake
self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\app.py", line 114, in
embed_model = HuggingFaceEmbeddings(model_name=embedded_model)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\langchain\embeddings\huggingface.py", line 66, in init
self.client = sentence_transformers.SentenceTransformer(
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 87, in init
snapshot_download(model_name_or_path,
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\sentence_transformers\util.py", line 442, in snapshot_download
model_info = _api.model_info(repo_id=repo_id, revision=revision, token=token)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\huggingface_hub\hf_api.py", line 2219, in model_info
r = get_session().get(path, headers=headers, timeout=timeout, params=params)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\requests\sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\huggingface_hub\utils_http.py", line 67, in send
return super().send(request, *args, **kwargs)
File "C:\Users\bckim\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag\lib\site-packages\requests\adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)), '(Request ID: c6321ef8-fa49-48f6-b52c-f68f29a3d51c)')
Press any key to continue . . .

AttributeError: 'WeightOnlyGroupwiseQuantLinear' object has no attribute 'prequant_scaling_factor'

Receiving the above error when attempting to build the TRT engine.

Using a 3090 with driver 546.33, CUDA 12.3 and tensorrt_llm-0.7.1

Traceback (most recent call last):
File "Z:\oracle\model\TensorRT-LLM\examples\llama\build.py", line 983, in
build(0, args)
File "Z:\oracle\model\TensorRT-LLM\examples\llama\build.py", line 927, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "Z:\oracle\model\TensorRT-LLM\examples\llama\build.py", line 727, in build_rank_engine
load_from_awq_llama(tensorrt_llm_llama=tensorrt_llm_llama,
File "C:\Users\andrew\anaconda3\envs\test\lib\site-packages\tensorrt_llm\models\llama\weight.py", line 1564, in load_from_awq_llama
process_and_assign_qkv_weight(prefix + awq_key_list[3],
File "C:\Users\andrew\anaconda3\envs\test\lib\site-packages\tensorrt_llm\models\llama\weight.py", line 1511, in process_and_assign_qkv_weight
mOp.prequant_scaling_factor.value = qkv_pre_quant_scale.to(
File "C:\Users\andrew\anaconda3\envs\test\lib\site-packages\tensorrt_llm\module.py", line 51, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'WeightOnlyGroupwiseQuantLinear' object has no attribute 'prequant_scaling_factor'

pycryotodome

Hi nVidia,

Please include the pycryptodome in the requirements.txt file.

Thanks

Update to Llama 3 8B model

It would be great if the LLaMa 2 13B AWQ 4bit quantized model currently used would be upgraded to the Llama 3 8B model. It can be quantized similarly. This would have several advantages:

Llama 3 8B model performs significantly better on all benchmarks
Being an 8B model instead of a 13B model;
- it could reduce the VRAM requirement from 8GB to 6GB, enabling popular GPUs like the RTX
  3050, RTX 3060 Laptop and RTX 4050 Laptop to run this demo.
- It would be more than 50% faster due to the reduction in parameter count.

The models are available at: https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6

ModuleNotFoundError: No module named 'tensorrt'

Have this error:

Traceback (most recent call last):
File "F:\Programs\RTXChat\RAG\trt-llm-rag-windows-main\app.py", line 28, in
from trt_llama_api import TrtLlmAPI
File "F:\Programs\RTXChat\RAG\trt-llm-rag-windows-main\trt_llama_api.py", line 42, in
from utils import (DEFAULT_HF_MODEL_DIRS, DEFAULT_PROMPT_TEMPLATES,
File "F:\Programs\RTXChat\RAG\trt-llm-rag-windows-main\utils.py", line 22, in
import tensorrt_llm
File "F:\Programs\ChatWithRTX\env_nvd_rag\lib\site-packages\tensorrt_llm_init_.py", line 15, in
import tensorrt_llm.functional as functional
File "F:\Programs\ChatWithRTX\env_nvd_rag\lib\site-packages\tensorrt_llm\functional.py", line 26, in
import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

Not sure if it's worth mentioning, but the first install has failed building Mistral, this one, however, did complete installation successfully just won't launch.

Win10, RTX 3060ti, i5-12400F, installed through an exe from nvidia site.

Can i install without 8gbvram?

I have only 6gb vram on my laptop asus tuf gaming f506hm,

Missing config/preferences.json and Cookie Handling Errors

Description

I encountered several issues while attempting to run a Gradio-based application. The first problem was the application's failure to start due to a missing config/preferences.json file. I addressed this issue by copying an existing config.json file to preferences.json. However, after resolving the missing file issue, I faced further errors related to cookie handling in web requests, which led to session validation failures.

Steps Taken

Missing config/preferences.json File: Noticed the application failed to start due to the absence of config/preferences.json. To resolve this, I copied config.json to preferences.json, which allowed the application to start.
Cookie Handling Errors: After the application started, it encountered errors during the session validation process, specifically a ValueError related to unpacking cookie key-value pairs.

Environment

Operating System: win11
Python Version: lastest
Gradio Version: lastest

Expected Behavior

The application should handle the absence of config/preferences.json gracefully, either by providing clear instructions for its creation or handling its absence without preventing the application from starting. Furthermore, cookie handling in web requests should be robust enough to manage different formats without causing session validation failures.

Actual Behavior

The application did not start due to the missing config/preferences.json, which I worked around by copying config.json to preferences.json.
Post-resolution, the application faced ValueError: too many values to unpack (expected 2) during session validation, indicating issues with cookie handling.

Possible Solution

Enhance the application's startup process to handle missing configuration files more gracefully, perhaps by providing default settings or clearer setup instructions.
Improve cookie parsing logic to ensure that different cookie formats do not lead to session validation errors.

Additional Context

This issue was encountered in a Conda environment specifically set up for running a Gradio application with particular machine learning models. The session validation errors occurred despite resolving the initial configuration file issue.

LInux

Why forcing Windows only?

Installed both models in Chat-With-RTX but can only use Mistral 7B int4

I followed the installation instructions and the installer reported that I have successfully installed both models - Llama and Mistral. However when I started the app-launcher.bat file, it said:

Environment path found: C:\Users\Jason\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag
App running with config
 {
    "models": {
        "supported": [
            {
                "name": "Mistral 7B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\mistral\\mistral7b_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\mistral\\mistral7b_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 7168,
                    "temperature": 0.1
                }
            },
            {
                "name": "Llama 2 13B int4",
                "installed": false,
                "metadata": {
                    "model_path": "model\\llama\\llama13_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\llama\\llama13_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 3900,
                    "temperature": 0.1
                }
            }
        ],
        "selected": "Mistral 7B int4"
    },
...

It said that Llama is not installed, but it should be! What might be the possible cause of this problem? How can I fix it?
Should I follow the README file instead?
Extra information:
I am a China user, using an (almost) new computer with a RTX 3060 GPU and 16GB of RAM.

Not able to generate engine on RTX4060 Laptop 8GB 100% CPU being utilised

I have RTX4060 8GB in my laptop with 16gb ram and intel i7-12700H cpu when i run build-llama.sh or build-mistral.sh it gets killed automatically with below output and I found that my cpu gets 100% utilised when running build-llama.sh or build-mistral.sh attaching the ss of the same kindly Help me with that

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-mistral.sh
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
[03/22/2024-20:50:37] [TRT-LLM] [I] Serially build TensorRT engines.
[03/22/2024-20:50:39] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB)
[03/22/2024-20:50:41] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB)
[03/22/2024-20:50:41] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[03/22/2024-20:50:41] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[03/22/2024-20:50:41] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1113 (GiB) Device 1.3216 (GiB)
build-mistral.sh: line 1: 6084 Killed python build.py --model_dir './model/mistral/mistral7b_hf' --quant_ckpt_path './model/mistral/mistral7b_int4_quant_weights/mistral_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/mistral/mistral7b_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 1024 --max_batch_size 1 --max_output_len 1024

Building TRT Engine on Windows 11 Results in LoraConfig.from_hf() Arguments Error

I currently run an i9 with a RTX 3080. I was trying to build a custom engine using the instructions.

I download a copy of meta-llama/Llama-2-13b-chat-hf model and the LLaMa 2 13B AWQ int4 checkpoints model.pt file, when I run the command below (I double checked the dirs and even used direct notation, i.e., C:\users[username]...):

python .\build.py --model_dir %USERPROFILE%\inference\models\Llama-2-13b-chat-hf\ --quant_ckpt_path %USERPROFILE%\inference\models\ --dtype float16 --use_gpt_attention_plugin float16 --remove_input_padding --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir %USERPROFILE%\inference\models\engines\

But it always results in:

Traceback (most recent call last):
  File "C:\Users\MYNAME\inference\TensorRT-LLM\examples\llama\build.py", line 1040, in <module>
    args = parse_arguments()
  File "C:\Users\MYNAME\inference\TensorRT-LLM\examples\llama\build.py", line 627, in parse_arguments
    lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() takes 3 positional arguments but 4 were given

I uninstalled all libraries, CUDA, etc. and reinstalled using the instructions... redownloaded the models and double checked versions. It still gives the same error.

Any thoughts?

Importing from langchain will no longer be supported

I get warn
Importing from langchain will no longer be supported as of langchain==0.2.0. Please import from langchain-community instead

how about using langchain-community 0.2.1 or latest one when there will be next update?
https://pypi.org/project/langchain-community/#history

ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

The bindings module is missingit seems. This results in an error when I run the command to build the TRT engine based on the instruction in the readme

For RTX 4090 (TensorRT 9.1.0.4 & TensorRT-LLM 0.5.0), a prebuilt TRT engine is provided. For other RTX GPUs or TensorRT versions, follow these steps to build your TRT engine:

Download LLaMa 2 13B chat model from https://huggingface.co/meta-llama/Llama-2-13b-chat-hf

Download LLaMa 2 13B AWQ int4 checkpoints model.pt from here

Clone the TensorRT LLM repository:

git clone https://github.com/NVIDIA/TensorRT-LLM.git

Navigate to the examples\llama directory and run the following script:

python build.py --model_dir <path to llama13_chat model> --quant_ckpt_path <path to model.pt> --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir <TRT engine folder>

Here is my custom build command:

python build.py --model_dir C:\Users\unubi\trt-llm-rag-windows\Llama-2-13b-chat-hf --quant_ckpt_path C:\Users\unubi\trt-llm-rag-windows\checkpoint\model.pt --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_on ly_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3000 --max_output_len 1024 --output_dir C:\Users\unubi\trt-llm-rag-windows\engine

Here is the full error log:


Traceback (most recent call last):
  File "C:\Users\unubi\trt-llm-rag-windows\TensorRT-LLM\examples\llama\build.py", line 39, in <module>
    import tensorrt_llm
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\__init__.py", line 29, in <module>
    from .hlapi.llm import LLM, ModelConfig
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\hlapi\__init__.py", line 1, in <module>     
    from .llm import LLM, ModelConfig
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\hlapi\llm.py", line 18, in <module>
    from ..executor import GenerationExecutor, GenerationResult
  File "c:\users\unubi\trt-llm-rag-windows\tensorrt-llm\tensorrt_llm\executor.py", line 10, in <module>
    import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'

build TRT engine fail

Loading weights from groupwise AWQ LLaMA checkpoint...
Traceback (most recent call last):
File "C:\Users\peter\TensorRT-LLM\examples\llama\build.py", line 906, in
build(0, args)
File "C:\Users\peter\TensorRT-LLM\examples\llama\build.py", line 850, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "C:\Users\peter\TensorRT-LLM\examples\llama\build.py", line 665, in build_rank_engine
load_func(tensorrt_llm_llama=tensorrt_llm_llama,
File "C:\Users\peter\TensorRT-LLM\examples\llama\weight.py", line 1433, in load_from_awq_llama
assert False, "Unsupported AWQ quantized checkpoint format"
AssertionError: Unsupported AWQ quantized checkpoint format

model.pt was download from https://catalog.ngc.nvidia.com/orgs/nvidia/models/llama2-13b/files?version=1.2

'trtllm-build' is not recognized

Hello,

 I am trying to run this command below but I get the  error 'trtllm-build' is not recognized. I noticed that 'trtllm-build' build script is not even in the branch. Are the instructions wrong? Can someone help me resolve this error and find out how to get 'trtllm-build'? Thanks in advance.

Full Command:
trtllm-build --checkpoint_dir .\model\mistral_model\model_checkpoints --output_dir .\model\mistral_model\engine --gpt_attention_plugin float16 --gemm_plugin float16 --max_batch_size 1 --max_input_len 7168 --max_output_len 1024 --context_fmha=enable --paged_kv_cache=disable --remove_input_padding=disable

[Question] Would it be possible for the system to use spreadsheets as Dataset as well as .txt, .pdf, .doc

Hello,

It's a question or a request more than an issue, but would it be possible for the system to use spreadsheets as Dataset, for it to be trained?

Thank you!

Large amount of files get stuck before loading

I'm using Chat With RTX, and i'm trying to load my whole disk X wich is a mounted network disk.

It has aprox 200/300 Gb of content, mostly scripts.

I left it loading the data for about 3 days, and it didn't show any difference in the terminal, so i changed the SimpleDirectoryLoader function, so it "log" each file that it found, so i realize that it just get stuck, after some time while still "looking" for files, not even in the load_data function, but in the _add_files.

I Added the line:

print(f"Added {ref} to the list of files to process.", flush=True)

After a certain time running, it just stop logging, and don't do anything, and it always stop in the same files.

It don't give any error, close the terminal, or anything, it just stops.

Is there any limit to it?

It always stop at the same file count, if i remove this file, will stop in the next one, at the same number.

Also, if there is, is possible for me to create various embedings, so if theres a limit, i can make then manually while still inside the limit

⁉️ Benefit of using ChatRTX instead of LMStudio or Ollama and other similar tools❓❔

Hello,
I'd like to genuinely understand the purpose and difference between using this vs the alternate tools like Ollama, BigAGI and Anything LLM that also have RAG, Inference, etc.

Don't they also use the same NVIDIA drivers? What is the benefit of developing ChatRTX?

If I need to do inference on any of the models, should I use ollama or ChatRTX? Does ChatRTX use any specific additional feature of the NVIDIA GPUs that the other tools can't use or don't have access to?

Thanks

Dtype value read by app.py empty. Gets error message : Unsupported dtype

I am trying to use Trt_llm rag with Mistral 7B model.
I have used int8 weight-only quantization during the building of the TRT engine.
The app launches but drops an error when an input is passed to the chat :

tensorrt_llm/runtime/generation.py", line 834, in dtype
return str_dtype_to_torch(self._model_config.dtype)
tensorrt_llm/_utils.py", line 149, in str_dtype_to_torch
assert ret is not None, f'Unsupported dtype: {dtype}'

The reason is dtype is empty (== ""). This may be due to an error in the reading of the config file.

Here is config.json for the engine :

{
"version": "0.9.0.dev2024040900",
"pretrained_config": {
"architecture": "MistralForCausalLM",
"dtype": "float16",
"logits_dtype": "float32",
"vocab_size": 32000,
"max_position_embeddings": 32768,
"hidden_size": 4096,
"num_hidden_layers": 32,
"num_attention_heads": 32,
"num_key_value_heads": 8,
"head_size": 128,
"hidden_act": "silu",
"intermediate_size": 14336,
"norm_epsilon": 1e-05,
"position_embedding_type": "rope_gpt_neox",
"use_parallel_embedding": false,
"embedding_sharding_dim": 0,
"share_embedding_table": false,
"mapping": {
"world_size": 1,
"tp_size": 1,
"pp_size": 1
},
"quantization": {
"quant_algo": "W8A16",
"kv_cache_quant_algo": null,
"group_size": 128,
"smoothquant_val": null,
"has_zero_point": false,
"pre_quant_scale": false,
"exclude_modules": [
"lm_head"
]
},
"kv_dtype": "float16",
"rotary_scaling": null,
"moe_normalization_mode": null,
"rotary_base": 1000000.0,
"moe_num_experts": 0,
"moe_top_k": 0,
"moe_tp_mode": 2,
"attn_bias": false,
"disable_weight_only_quant_plugin": false,
"mlp_bias": false
},
"build_config": {
"max_input_len": 1024,
"max_output_len": 1024,
"max_batch_size": 1,
"max_beam_width": 1,
"max_num_tokens": 1024,
"opt_num_tokens": 1,
"max_prompt_embedding_table_size": 0,
"gather_context_logits": false,
"gather_generation_logits": false,
"strongly_typed": false,
"builder_opt": null,
"profiling_verbosity": "layer_names_only",
"enable_debug_output": false,
"max_draft_len": 0,
"use_refit": false,
"input_timing_cache": null,
"output_timing_cache": "model.cache",
"lora_config": {
"lora_dir": [],
"lora_ckpt_source": "hf",
"max_lora_rank": 64,
"lora_target_modules": [],
"trtllm_modules_to_hf_modules": {}
},
"auto_parallel_config": {
"world_size": 1,
"gpus_per_node": 8,
"cluster_key": "A40",
"cluster_info": null,
"sharding_cost_model": "alpha_beta",
"comm_cost_model": "alpha_beta",
"enable_pipeline_parallelism": false,
"enable_shard_unbalanced_shape": false,
"enable_shard_dynamic_shape": false,
"enable_reduce_scatter": true,
"builder_flags": null,
"debug_mode": false,
"infer_shape": true,
"validation_mode": false,
"same_buffer_io": {
"past_key_value_(\d+)": "present_key_value_\1"
},
"same_spec_io": {},
"sharded_io_allowlist": [
"past_key_value_\d+",
"present_key_value_\d*"
],
"fast_reduce": true,
"fill_weights": false,
"parallel_config_cache": null,
"profile_cache": null,
"dump_path": null,
"debug_outputs": []
},
"weight_sparsity": false,
"use_fused_mlp": false,
"plugin_config": {
"bert_attention_plugin": "float16",
"gpt_attention_plugin": "float16",
"gemm_plugin": "float16",
"smooth_quant_gemm_plugin": null,
"identity_plugin": null,
"layernorm_quantization_plugin": null,
"rmsnorm_quantization_plugin": null,
"nccl_plugin": null,
"lookup_plugin": null,
"lora_plugin": null,
"weight_only_groupwise_quant_matmul_plugin": null,
"weight_only_quant_matmul_plugin": "float16",
"quantize_per_token_plugin": false,
"quantize_tensor_plugin": false,
"moe_plugin": "float16",
"mamba_conv1d_plugin": "float16",
"context_fmha": true,
"context_fmha_fp32_acc": false,
"paged_kv_cache": true,
"remove_input_padding": true,
"use_custom_all_reduce": true,
"multi_block_mode": false,
"enable_xqa": true,
"attention_qk_half_accumulation": false,
"tokens_per_block": 128,
"use_paged_context_fmha": false,
"use_fp8_context_fmha": false,
"use_context_fmha_for_generation": false,
"multiple_profiles": false,
"paged_state": true,
"streamingllm": false
}
}
}

nrw

already passed the verify_install.py but cant pass the app.py "ImportError: DLL load failed while importing tensorrt: 找不到指定的模块。"

(ChatRTX) E:\chat1\trt-llm-rag-windows>python app.py
E:\chat1\ChatRTX\lib\site-packages\transformers\utils\generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils.pytree.register_pytree_node instead.
torch_pytree.register_pytree_node(
Traceback (most recent call last):
File "E:\chat1\trt-llm-rag-windows\app.py", line 31, in
from trt_llama_api import TrtLlmAPI
File "E:\chat1\trt-llm-rag-windows\trt_llama_api.py", line 42, in
from utils import (DEFAULT_HF_MODEL_DIRS, DEFAULT_PROMPT_TEMPLATES,
File "E:\chat1\trt-llm-rag-windows\utils.py", line 22, in
from tensorrt_llm.builder import get_engine_version
File "E:\chat1\ChatRTX\lib\site-packages\tensorrt_llm_init.py", line 32, in
import tensorrt_llm.functional as functional
File "E:\chat1\ChatRTX\lib\site-packages\tensorrt_llm\functional.py", line 26, in
import tensorrt as trt
File "E:\chat1\ChatRTX\lib\site-packages\tensorrt_init.py", line 18, in
from tensorrt_bindings import *
File "E:\chat1\ChatRTX\lib\site-packages\tensorrt_bindings_init.py", line 71, in
from .tensorrt import *
ImportError: DLL load failed while importing tensorrt: 找不到指定的模块。

ModuleNotFoundError: No module named 'llama_index'

Environment path found: G:\Chat_RTX\env_nvd_rag
App running with config
 {
    "models": {
        "supported": [
            {
                "name": "Mistral 7B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\mistral\\mistral7b_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\mistral\\mistral7b_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 7168,
                    "temperature": 0.1
                }
            },
            {
                "name": "Llama 2 13B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\llama\\llama13_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\llama\\llama13_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 3900,
                    "temperature": 0.1
                }
            }
        ],
        "selected": "Mistral 7B int4"
    },
    "sample_questions": [
        {
            "query": "How does NVIDIA ACE generate emotional responses?"
        },
        {
            "query": "What is Portal prelude RTX?"
        },
        {
            "query": "What is important about Half Life 2 RTX?"
        },
        {
            "query": "When is the launch date for Ratchet & Clank: Rift Apart on PC?"
        }
    ],
    "dataset": {
        "sources": [
            "directory",
            "youtube",
            "nodataset"
        ],
        "selected": "directory",
        "path": "dataset",
        "isRelative": true
    },
    "strings": {
        "directory": "Folder Path",
        "youtube": "YouTube URL",
        "nodataset": "AI model default"
    }
}
Traceback (most recent call last):
  File "G:\Chat_RTX\RAG\trt-llm-rag-windows-main\app.py", line 28, in <module>
    from trt_llama_api import TrtLlmAPI
  File "G:\Chat_RTX\RAG\trt-llm-rag-windows-main\trt_llama_api.py", line 23, in <module>
    from llama_index.bridge.pydantic import Field, PrivateAttr
ModuleNotFoundError: No module named 'llama_index'
Press any key to continue . . .

2x 3090 windows 10. Went onto went into cd C:\Users\<USER>\AppData\Local\NVIDIA\MiniConda\Scripts and conda activate G:\Chat_RTX\env_nvd_rag pip install -U llama_index same error popped up

Enhancement: Support for reading PDFs with partial DRM (AES) - include PyCryptodome dependency

Description
When attempting to read PDF files that have partial DRM capabilities (e.g., Printing, Content Copying, and Content Copying for Accessibility allowed), the operation fails when reading local files with the following error message: "Failed to load file <filename.pdf> with error: PyCryptodome is required for AES algorithm. Skipping..." This issue arises due to the absence of the PyCryptodome library, which is necessary for handling AES encryption used by these DRM features.

Expected Behavior
The expected behavior is that the project should be able to read PDF files, including those with partial DRM capabilities, without throwing errors related to the absence of cryptographic support. Users should be able to process such PDFs for legitimate use cases, such as reading text for accessibility purposes, where the use complies with the DRM's allowances. Note if there is a restriction that would prevent reading the file, an error should still be thrown stating that the necessary DRM permissions do not allow reading of this document.

Actual Behavior
The actual behavior is that when attempting to read a PDF with partial DRM capabilities, the process is aborted due to the missing PyCryptodome dependency, and the file cannot be read or processed further.

Steps to Reproduce
Attempt to read a PDF file with partial DRM capabilities using the project.
Observe the error message indicating the absence of PyCryptodome for AES algorithm support.

Suggested Enhancement
To resolve this issue and enhance the capability to read a wider range of PDF files, suggest including PyCryptodome as a dependency/requirement within the project's Python implementation.

Additional Context
The ability to read PDFs with partial DRM is crucial for various legitimate use cases, including accessibility and content analysis, where the user is not infringing on the copyright or DRM protections but merely accessing the content in a manner that the DRM allows (e.g., reading for visually impaired users), or where legal and necessary references are provided in their document.

Incorrect instructions to build TRT engine

README.md indicates, quote:

Clone the TensorRT LLM repository:
git clone https://github.com/NVIDIA/TensorRT-LLM.git
Navigate to the examples\llama directory and run the following script:
python build.py --model_dir ...

These instructions are incorrect, there's no "build.py" file inside examples\llama\

Could you please correct the instructions to be able to generate the engine?

Minimum Requirements GPU RAM

Hi everyone, is this repo also need GPU RAM minimum 8 GB to run like in this article? https://blogs.nvidia.com/blog/chat-with-rtx-available-now/

Thanks.

Override stringent GPU requirements

There must be a flag or something to override the stringent GPU requirements

Yes, that would probably mean performance would suck, but still better than nothing

So if anyone knows how to I'd love to hear it :)

Unable to stream results. TypeError: 'NoneType' object is not iterable

I am attempting to build a chatbot using TrtLlmAPI as the llm

llm = TrtLlmAPI(    
    model_path=trt_engine_path,
    engine_name=trt_engine_name,
    tokenizer_dir=tokenizer_dir_path,
    temperature=0.1,
    
    max_new_tokens=1024,
    context_window=1024 * 4,
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=False,
)
...
          documents = SimpleDirectoryReader(self.data_dir, recursive=True, required_exts=exts).load_data()
            faiss_index = faiss.IndexFlatL2(self.d)
            vector_store = FaissVectorStore(faiss_index=faiss_index)
            storage_context = StorageContext.from_defaults(vector_store=vector_store)
            index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)
            index.storage_context.persist(persist_dir = storage_path) 
            return index

and a query retrieve engine to perform the query

return self.index.as_query_engine(
            streaming=True,
            similarity_top_k = 4,
        )

I can successfully execute the query when waiting for the full response, but once I enable the streaming flag it just starts throwing exceptions

response = query_engine.query(compiledQuery)  
        for token in response.response_gen:
            print(token)`

TypeError: 'NoneType' object is not iterable

I have a tried a number of different ways to get streaming to work, and from what I can see in the RTX Chat codebase, this is what they are doing, but it is not working for me, with the above error

How to switch language?!

I want to get responses in spanish, not in english!!!!! Why is the app advertised in spanish when it only runs in english and there are no instructions on how to change it??!

how do I define other large language models in the configuration?

Hello, in ChatRtx 0.3, how do I define other large language models in the configuration? Can you provide an example configuration? I have modified the config.json file, but the page is showing a prompt to download the model instead of using the model I have already configured as in previous versions.
{
"name": "internlm 7B int4",
"id": "internlm_model",
"ngc_model_name": "nvidia/internlm/internlm-7b-int4-rtx:1",
"is_downloaded_required": true,
"downloaded": false,
"is_installation_required": true,
"setup_finished": true,
"min_gpu_memory": 16,
"should_show_in_UI": true,
"prerequisite": {
"checkpoints_files": [
"config.json",
"rank0.safetensors",
"Prohibited_use_policy.txt",
"license.txt",
"Notice.txt"
],
"tokenizer_ngc_dir": "internlm7b_hf_tokenizer",
"tokenizer_files": {
"vocab_file": "tmp_vocab.model"
},
"checkpoints_local_dir": "model_checkpoints",
"vocab_local_dir": "tokenizer",
"engine_build_command": "trtllm-build --checkpoint_dir %checkpoints_local_dir% --gemm_plugin float16 --gpt_attention_plugin float16 --max_batch_size 1 --max_input_len 7168 --max_output_len 1024 --output_dir %engine_dir%",
"engine_dir": "engine"
},
"metadata": {
"engine": "rank0.engine",
"max_new_tokens": 1024,
"max_input_token": 7168,
"temperature": 0.1
},
"model_info": "internlm-7B is a 7B parameter model from Gemma family of models from shanghai | License ",
"model_license": " License ",
"model_size": "6.6GB"
},
Here's a general approach to defining a new model in a configuration file, assuming the software you're using follows a similar structure to many other applications:

What version of Cuda is needed?

I'm trying to run with the engine (llama_float16_tp1_rank0.engine)

but I'm getting this error