Coder Social home page Coder Social logo

snexus / llm-search Goto Github PK

View Code? Open in Web Editor NEW
424.0 11.0 50.0 12.01 MB

Querying local documents, powered by LLM

License: MIT License

Dockerfile 0.04% Shell 0.03% Python 3.09% Jupyter Notebook 96.84%
chroma langchain-python large-language-models openai-chatgpt splade chatbot llm streamlit hyde rag

llm-search's People

Contributors

andreped avatar lapcd1 avatar snexus avatar yeok-c avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-search's Issues

Generate embeddings with CPU only

Hi @snexus,

Thank you for your support and patient response to my (beginner) questions. I truly appreciate your assistance.

I don't have a dedicated graphics card, but I am eager to create the embeddings.
Is there a way to achieve this without one (Google Colab unfortunately crashes after a certain period of time while creating embeddings with ~13000 chunks)?

Thank you again for your support and I look forward to hearing from you.

feature request - add chat history

the current version, every time a query is ran, the previous query is overrwriten. It is not too difficult to add chat history. via the streamlit api. You can simply use the session_state.messages.append method to have previous messages appended to the chat window, rather than erased on every new query.

Here's what I hacked together -

  ` 
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": text})

    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": output.response})
    st.session_state.messages.append({"role": "assistant", "content": f"***sources :*** {source_links}"})
    st.session_state.messages.append({"role": "assistant", "content": f"**Search results quality score: {output.average_score:.2f}**\n"})`

However, I am not a skilled programmer, and while it does work, I am sure there's a more elegant way to do this.

Can this please be added?

Support openai for Embedding

Would it be possible to add OpenAIEmbeddings as a new possibility? It would help to leverage other self hosted embeddings when we couple it with litellm. If needed, I can create a PR.

question: how does instructor-large and splade differ in semantic search

Great project, I really liked the idea which sums up all good techniques in one project to build rag. Trying to implement the same using llama-index library. However, I find that the documents returned from dense + sparse embeddings are almost always the same.

How does your implementation differ in querying splade embeddings + instructor-large embeddings? What I see in the code is that you are doing the same similarity search and the only difference is in storing the splade embeddings as sparse matrix.

Can you provide some more insights into the hybrid search ? Thanks!

phi2 support

llama.cpp now supports the phi2 model by microsoft: phi2 support in llama.cpp

How can we update the fork/implementation/binary of llama.cpp in llmsearch to support this new update?

Output truncated / error in generating response (using Google Colab free)

Hello @snexus,

Thank you very much for creating this project!

I am using google colab with the provided template and ingested 20 PDF documents.
The embeddings have been generated without any problem and I can query the llm, but the response is always truncated / there seems to be an error in generating the response (see below).

Thank you very much for your help!

Here is an example (copied from Google Colab):
`
2024-01-08 13:53:35.106 | INFO | llmsearch.config:validate_params:165 - Loading model paramaters in configuration class LlamaModelConfig
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:43 - Setting SENTENCE_TRANSFORMERS_HOME folder: /content/llm/cache
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:44 - Setting TRANSFORMERS_CACHE folder: /content/llm/cache/transformers
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:45 - Setting HF_HOME: /content/llm/cache/hf_home
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:46 - Setting MODELS_CACHE_FOLDER: /content/llm/cache
2024-01-08 13:53:35.106 | INFO | llmsearch.models.llama:model:134 - Loading model...
2024-01-08 13:53:35.107 | INFO | llmsearch.models.llama:model:137 - Initializing LLAmaCPP model...
2024-01-08 13:53:35.107 | INFO | llmsearch.models.llama:model:138 - {'n_ctx': 1024, 'n_batch': 512, 'n_gpu_layers': 43}
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /content/llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 5120
llama_model_loader: - kv 4: llama.block_count u32 = 40
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 13824
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 40
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 40
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 15
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q4_K: 241 tensors
llama_model_loader: - type q6_K: 41 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 5120
llm_load_print_meta: n_embd_v_gqa = 5120
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 7.33 GiB (4.83 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 '
'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: system memory used = 88.03 MiB
llm_load_tensors: VRAM used = 7412.96 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
...................................................................................................
llama_new_context_with_model: n_ctx = 1024
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 800.00 MiB, K (f16): 400.00 MiB, V (f16): 400.00 MiB
llama_build_graph: non-view tensors processed: 844/844
llama_new_context_with_model: compute buffer total size = 115.19 MiB
llama_new_context_with_model: VRAM scratch buffer: 112.00 MiB
llama_new_context_with_model: total VRAM used: 7524.96 MiB (model: 7412.96 MiB, context: 112.00 MiB)
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
2024-01-08 13:53:58.464 | INFO | llmsearch.embeddings:get_embedding_model:65 - Embedding model config: type=<EmbeddingModelType.instruct: 'instruct'> model_name='hkunlp/instructor-large' additional_kwargs={}
load INSTRUCTOR_Transformer
2024-01-08 13:53:59.968343: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-08 13:53:59.968395: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-08 13:53:59.975660: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-08 13:54:01.951666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
max_seq_length 512
2024-01-08 13:54:25.921 | INFO | llmsearch.ranking:init:39 - Initialized BGE-base Reranker
2024-01-08 13:54:29.218 | INFO | llmsearch.splade:init:33 - Setting device to cuda:0
2024-01-08 13:54:35.700 | INFO | llmsearch.splade:load:100 - SPLADE: Got 0 labels.
2024-01-08 13:54:35.700 | INFO | llmsearch.splade:load:104 - Loaded sparse (SPLADE) embeddings from /content/llm/embeddings/splade/splade_embeddings.npz
2024-01-08 13:54:35.700 | INFO | llmsearch.utils:get_hyde_chain:110 - Creating HyDE chain...
2024-01-08 13:54:35.701 | INFO | llmsearch.utils:get_multiquery_chain:117 - Creating MultiQUery chain...

ENTER QUESTION >> Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:54:39.989 | DEBUG | llmsearch.ranking:get_relevant_documents:84 - Evaluating query: Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:54:39.990 | INFO | llmsearch.splade:query:208 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 1519
[0.00914291 0.0224314 0.01972341 ... 0.02577811 0.03467241 0.02387246]
2024-01-08 13:54:42.806 | INFO | llmsearch.ranking:get_relevant_documents:92 - Stage 1: Got 15 documents.
2024-01-08 13:54:42.806 | INFO | llmsearch.ranking:get_relevant_documents:104 - Dense embeddings filter: None
2024-01-08 13:54:44.367 | DEBUG | llmsearch.ranking:get_relevant_documents:113 - NUMBER OF NEW DOCS to RETRIEVE: 25
2024-01-08 13:54:44.382 | INFO | llmsearch.ranking:rerank:51 - Reranking documents ...
2024-01-08 13:54:44.382 | INFO | llmsearch.ranking:get_scores:42 - Reranking documents ...
[-4.017726898193359, -5.740996360778809, -3.4490861892700195, -6.219937801361084, -0.5473086833953857, -7.063520431518555, -6.515655994415283, -9.21086311340332, -6.386524677276611, -7.356011390686035, -6.924832820892334, -8.909055709838867, -7.650751113891602, -6.111538410186768, -7.745747089385986, -5.9694342613220215, -7.448235988616943, -6.252921104431152, -6.285423278808594, -6.576879501342773, -7.744513511657715, -8.150556564331055, -6.460150718688965, -7.074395179748535, -4.118349552154541]
2024-01-08 13:55:04.713 | INFO | llmsearch.ranking:rerank:59 - [-0.5473086833953857, -3.4490861892700195, -4.017726898193359, -4.118349552154541, -5.740996360778809, -5.9694342613220215, -6.111538410186768, -6.219937801361084, -6.252921104431152, -6.285423278808594, -6.386524677276611, -6.460150718688965, -6.515655994415283, -6.576879501342773, -6.924832820892334, -7.063520431518555, -7.074395179748535, -7.356011390686035, -7.448235988616943, -7.650751113891602, -7.744513511657715, -7.745747089385986, -8.150556564331055, -8.909055709838867, -9.21086311340332]
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:131 - New most relevant query: Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:138 - Number of documents after stage 2 (dense + sparse): 25
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:141 - Re-ranker avg. scores for top 5 resuls, chunk size 1024: -3.57
[chain/start] [1:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] Entering Chain run with input:
{
"question": "Could you provide me with some of the best methods for effectively marketing a product",
"context": "And, this book came out of my studies and experiences, you know, like \nresearching and reading and just living my life in a high pressure, high stakes \nenvironment. And I know that seems weird, but just like the best marketing \ndecision you can make for a product is to have a really good product that people \nwant, the best way to have writing that people want is to live a life and have \nexperienced the world in a way that allows you to communicate something to \npeople that they'd never heard before. \n \nI think it's especially true in fiction because at least in non-fiction, someone can \ngo out and study and objectively find, academics can write good non-fiction \nbooks based on their research. But, non-fiction, you have to be able to \ncommunicate all these intangibles to the reader. \n \nTim Ferriss: \nYou mean in fiction. \n \nRyan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles \nabout life and relationships and how the world works. And if you haven't gone\n\nbest practices for marketing bestselling books. There are very few consensuses \nabout the best way to write a best-reading book, if that makes sense. \n \nI mean, that's part of the reason why I fell in love with "Daily Rituals," which \nprofiles 170 or so world-famous creatives, whether it's writers, composers, \nscientists, etc. and how their daily schedules are laid out because they're so \ndifferent. It's really fascinating to me. \n \nDo you watch documentaries? If so, what are your favorite documentaries that \ncome to mind? \n \nRyan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch \nas many as I like because... yeah. But some favorites, I like "Fog of War," I \nthink is amazing. \n \nThat Phil Spector documentary from a couple years ago is pretty crazy. I think \nit's called "The Wall of Sound," but I forget what it's called exactly. There's the \nguy who did Fog of War has a new one out about Donald Rumsfeld that I want \nto see called the Unknown Known.\n\nSo, I want to talk about the most effective pair of productivity techniques that I have \ncome across since 2004 that have helped me up until this point test the uncommon \ndespite the fear of ridicule, criticism, failure, and so forth. And both techniques – I \ncheated a bit with the format. Some things we will repeat – are borrowed from stoicism, \nwhich was a school of philosophy from the Hellenistic period used by a lot of the Greco \nroman educated elite, including emperors, and military, and statesmen.\n\nthey wanted to plug by coming on the show? \n \nNeil Strauss: \nOh, yeah. I'll plug for you. I'll always tell somebody, and this is \ntrue: When you're going on, and you're trying to promote your \nbusiness, or your brand, or your book, or movie – whatever you're"
}
[llm/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] Entering LLM run with input:
{
"prompts": [
"### Instruction:\nUse the following pieces of context to provide detailed answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.\n\n### Context:\n---------------\nAnd, this book came out of my studies and experiences, you know, like \nresearching and reading and just living my life in a high pressure, high stakes \nenvironment. And I know that seems weird, but just like the best marketing \ndecision you can make for a product is to have a really good product that people \nwant, the best way to have writing that people want is to live a life and have \nexperienced the world in a way that allows you to communicate something to \npeople that they'd never heard before. \n \nI think it's especially true in fiction because at least in non-fiction, someone can \ngo out and study and objectively find, academics can write good non-fiction \nbooks based on their research. But, non-fiction, you have to be able to \ncommunicate all these intangibles to the reader. \n \nTim Ferriss: \nYou mean in fiction. \n \nRyan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles \nabout life and relationships and how the world works. And if you haven't gone\n\nbest practices for marketing bestselling books. There are very few consensuses \nabout the best way to write a best-reading book, if that makes sense. \n \nI mean, that's part of the reason why I fell in love with "Daily Rituals," which \nprofiles 170 or so world-famous creatives, whether it's writers, composers, \nscientists, etc. and how their daily schedules are laid out because they're so \ndifferent. It's really fascinating to me. \n \nDo you watch documentaries? If so, what are your favorite documentaries that \ncome to mind? \n \nRyan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch \nas many as I like because... yeah. But some favorites, I like "Fog of War," I \nthink is amazing. \n \nThat Phil Spector documentary from a couple years ago is pretty crazy. I think \nit's called "The Wall of Sound," but I forget what it's called exactly. There's the \nguy who did Fog of War has a new one out about Donald Rumsfeld that I want \nto see called the Unknown Known.\n\nSo, I want to talk about the most effective pair of productivity techniques that I have \ncome across since 2004 that have helped me up until this point test the uncommon \ndespite the fear of ridicule, criticism, failure, and so forth. And both techniques – I \ncheated a bit with the format. Some things we will repeat – are borrowed from stoicism, \nwhich was a school of philosophy from the Hellenistic period used by a lot of the Greco \nroman educated elite, including emperors, and military, and statesmen.\n\nthey wanted to plug by coming on the show? \n \nNeil Strauss: \nOh, yeah. I'll plug for you. I'll always tell somebody, and this is \ntrue: When you're going on, and you're trying to promote your \nbusiness, or your brand, or your book, or movie – whatever you're\n---------------\n\n### Question: Could you provide me with some of the best methods for effectively marketing a product\n### Response:"
]
}
The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.

They wanted to plug by coming on the show?
Oh, yeah. I'll plug for you. I'll always tell somebody, and
llama_print_timings: load time = 8475.15 ms
llama_print_timings: sample time = 92.82 ms / 144 runs ( 0.64 ms per token, 1551.32 tokens per second)
llama_print_timings: prompt eval time = 16788.27 ms / 880 tokens ( 19.08 ms per token, 52.42 tokens per second)
llama_print_timings: eval time = 29640.72 ms / 143 runs ( 207.28 ms per token, 4.82 tokens per second)
llama_print_timings: total time = 47041.79 ms
[llm/end] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] [47.06s] Exiting LLM run with output:
{
"generations": [
[
{
"text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and",
"generation_info": null
}
]
],
"llm_output": null,
"run": null
}
[chain/end] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] [47.06s] Exiting Chain run with output:
{
"text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and"
}
[chain/end] [1:chain:StuffDocumentsChain] [47.06s] Exiting Chain run with output:
{
"output_text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and"
}

============= SOURCES ==================
sample_docs/15-neil-strauss.pdf
{'chunk_size': 1024, 'document_id': 'dcbf9a82-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 20, 'score': -7.356011390686035}
******************* BEING EXTRACT *****************
they wanted to plug by coming on the show?

Neil Strauss:
Oh, yeah. I'll plug for you. I'll always tell somebody, and this is
true: When you're going on, and you're trying to promote your
business, or your brand, or your book, or movie – whatever you're

sample_docs/17-tim-ferriss-the-power-of-negative-visualization.pdf
{'chunk_size': 1024, 'document_id': 'dc58d00e-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 0, 'score': -4.017726898193359}
******************* BEING EXTRACT *****************
So, I want to talk about the most effective pair of productivity techniques that I have
come across since 2004 that have helped me up until this point test the uncommon
despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I
cheated a bit with the format. Some things we will repeat – are borrowed from stoicism,
which was a school of philosophy from the Hellenistic period used by a lot of the Greco
roman educated elite, including emperors, and military, and statesmen.

sample_docs/04-ryan-holiday.pdf
{'chunk_size': 1024, 'document_id': 'dca64352-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 19, 'score': -3.4490861892700195}
******************* BEING EXTRACT *****************
best practices for marketing bestselling books. There are very few consensuses
about the best way to write a best-reading book, if that makes sense.

I mean, that's part of the reason why I fell in love with "Daily Rituals," which
profiles 170 or so world-famous creatives, whether it's writers, composers,
scientists, etc. and how their daily schedules are laid out because they're so
different. It's really fascinating to me.

Do you watch documentaries? If so, what are your favorite documentaries that
come to mind?

Ryan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch
as many as I like because... yeah. But some favorites, I like "Fog of War," I
think is amazing.

That Phil Spector documentary from a couple years ago is pretty crazy. I think
it's called "The Wall of Sound," but I forget what it's called exactly. There's the
guy who did Fog of War has a new one out about Donald Rumsfeld that I want
to see called the Unknown Known.

sample_docs/04-ryan-holiday.pdf
{'chunk_size': 1024, 'document_id': 'dca64a32-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 22, 'score': -0.5473086833953857}
******************* BEING EXTRACT *****************
And, this book came out of my studies and experiences, you know, like
researching and reading and just living my life in a high pressure, high stakes
environment. And I know that seems weird, but just like the best marketing
decision you can make for a product is to have a really good product that people
want, the best way to have writing that people want is to live a life and have
experienced the world in a way that allows you to communicate something to
people that they'd never heard before.

I think it's especially true in fiction because at least in non-fiction, someone can
go out and study and objectively find, academics can write good non-fiction
books based on their research. But, non-fiction, you have to be able to
communicate all these intangibles to the reader.

Tim Ferriss:
You mean in fiction.

Ryan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles
about life and relationships and how the world works. And if you haven't gone

============= RESPONSE =================
The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.

They wanted to plug by coming on the show?
Oh, yeah. I'll plug for you. I'll always tell somebody, and

ENTER QUESTION >>

`

Support for .org files?

Thanks for writing and extensively explaining this project on Reddit!
I am a LogSeq user and have all my files in .org format.
I contributed an org parser for unstructured a while ago.
Would this project automatically support org through unstructured, or does sth. else need to be done (e.g. adding .org to some whitelist)?

Thank you!

Best Practices & ###### results problem

Hi @snexus

Thanks for your efforts and beatifull project.

1 . Would you mind to give more example for hugginaface llamacpp models ? ie what would be the best accurate result for multilingual or language different from the english ?

2 - What would be the choice for summarization document ?

3 - sometimes result adding many ##### and stopping

4 - Example config for mistral , mixtral and dolphin2

5 Do you know haystack looks commercialized version.

6 what will be the future for enterprise AI search for internal documents ? Is it worth to invest ? Can we talk ?

Reduce logging

Would it be possible with an argument or an environment variable to change the level of the logger?

Repository Not Found for url:https://huggingface.co/api/models/infloat/e5-large-v2

After installation, I run command llmsearch index create -c /path/to/config.yaml to create vector db from local documents. I got the following error.


`huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6545a903-603ca58107038257355ef890;303ff9ff-61b3-4505-9375-10ff6e1b6b6a)

Repository Not Found for url: https://huggingface.co/api/models/infloat/e5-large-v2.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.`


I tried BAAI/bge-large-en-v1.5 and this work fine.

Output is always '#' repeated

Hi,

Cool project!

Was trying it with some books from the Gutemberg Project, for example "The Diary of a Turk by Çerkesseyhizade Halil Halit.epub".

The retrieval works well, I can see that the paragraphs with relationed info are picked up well.

Yet the output of the model is always # repeated a number of times.

Steps to reproduce:

  • On Ubuntu 20.04, CUDA 12.1
  • Install llm-search as indicared (in anaconda env)
  • pip install pandoc
  • mkdir documents
  • mkdir models
  • download book: cd documents; wget https://www.gutenberg.org/ebooks/50048.epub.noimages; cd ..
  • download model: cd models; sudo apt-get install git-lfs; git lfs clone https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF; cd ..
  • mv airoboros-l2-13B-gpt4-1.4.1-GGUF models
  • copy sample_templates/generic/config_template.yaml config.yaml
  • change config.yaml to point to models/airoboros-l2-13B-gpt4-1.4.1-GGUF/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf
  • change confir.yaml to point to models/intfloat/e5-large-v2
  • download e5-large-v2 with the code:

from sentence_transformers import SentenceTransformer
modelPath = "models/intfloat/e5-large-v2"
model = SentenceTransformer('intfloat/e5-large-v2')
model.save(modelPath)

Index, run web interface, and ask:

"What does Halvati mean?"

(or any other question answered in the book)

Answer:

"-###############################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################"

update requirements.txt legacy langchain, chroma, openai

The project sounds great, I am just unable to test it as I would like to.
I am hitting some bugs with some dependencies, and I am unable to upgrade the packages to get them fixed.
There is a chroma redesign (chroma migration with openai (v1.x) redesigned).
Is this project still maintained? if yes, is there a willingness to upgrade the requirements/code to support langchain>=0.10 ?

Scope for batched predictions

@snexus Kudos on this awesome project!

I was wondering if support for batched prompts is in your roadmap? There are solutions that make this possible for several language models, so are you planning on including these optimisations in your source?

TIA

Any examples or videos?

Hi team
Do you have any examples or know of any videos of people showing this?
I literally can't find a thing - but it sounds really good for RAG.

However, as you can imagine, searching web or Youtube for "llm search" is so generic, that the results contain anything and everything.
Even searching for "llm-search", nothing, just generic results for... llm and... search... and building search engines with llms...

I'd consider updating the project name.
Anyway, that said, this sounds like it has more/better RAG options than most other stuff I've been trying out. But I do really like to see demos of things too, before I spend time trying to get it to run.

As an aside, any plans to enable this to run via APIs so we can use it with Ollama or Oobabooga, as other tools can? This would be great for using all kinds of GPU-accelerated models.

Thanks!

Progress while creating index from documents

I have a couple of PDF's: llmsearch.parsers.splitter:split:74 - Got 279643 chunks for type: pdf
And would really love to see how far along llmsearch.chroma:create_index_from_documents:38 - Generating and persisting the embeddings.. is. As it's been a few hours now and I'm not sure if it's stuck, or if this is a hopeless amount of data to index and I'm only at 1%.

have ability to ask follow-up questions that remember context of previous result

This would be a very useful feature to have. I have noticed that with further testing (at least using gpt3.5), asking follow-up queries, it will not remember the context of the previous query, and instead give an "i dont know".

For example, I ask "what is the the ghg footprint of yellow pea protein?"
and it will give an answer like "15.5kg".

Then I ask a follow-up question, "what about green pea protein?"

In normal chatgpt, and other llms, it would normally remember the context of the previous answer it gave, and follow up with the ghg footprint of green pea protein. But with this application, it will instead say "i don't know" to this follow-up question.

Is this something that can be easily added? It may be something to do with langchain and how we query the llm. Not really sure.

If it's something you can look into, that would be a very useful enhancement.

Thank you!

Model params for GPU setup

Hello @nexus! How do I choose the parameters for the models for my GPU setup? I'm manipulate different params, as example n_ctx or n_batch, but is so long, and not effectivity. Maybe do you know formula or method for calculate optimal params for GPU.

I have 2 1080Ti with 11 gb memory, please give me advice ))

feature request - api endpoint to vectordb and/or db update button in UI

OK, I see you are still actively making updates to this application, so this is a feature that I think would be great to add at some point, especially since it looks like you're focusing on updating embeddings.

Let's say a user wants to have their data stored on a amazon s3 bucket. would be wonderful if there was a way for the app to have an api endpoint that points to the vectordb. There's an application I know of that is designed to work with aws & pinecone ( & other hosted vector dbs). It's purpose is to "listen" for changes on the aws datastore, and any time a new file is added, it will update the vectordb on-the-fly, so that the process of updating embeddings is instant & autonomous.

I told the dev about your app, and he said they could easily support it, provided there was an api endpoint they could point to.

Not sure the challenge this would add, especially since up to now it seems this app is mostly ran locally (aside from openai api support).

But perhaps something to add to the to-do list.

Another one that would be helpful, that I think should be easy to implement, is a button to update the vectordb inside the UI.
Basically, it could just run the llmsearch index create command from inside the UI. That, combined with your recent updates that appear to me like it incrementally updates the db with only new files, rather than rebuilds it every time the command is ran, would be useful!

You effectively allow end users to never need to interact with the cli to use the application.

add support for using unstructured SaaS API for handling unstructured data pre-processing

As we know, data used in RAG applications comes from all kinds of sources, and while its easy to work with unstructured data that is in a text format like markdown, html, txt, json, etc, it is not so easy to work with pdfs & images as it requires OCR tools that can widely vary in quality. On top of that, you sometimes have structured data embedded in unstructured data, for instance, a table or graph inside a pdf, which requires complex solutions if you were to build your own code for this.
But I recently stumbled upon a free learning course on deeplearning.ai that stood out to me, in that it taught how to easily pre-process unstructured data from all kinds of datatypes, including pdfs & images, and can even extract graphs & tables.
When I started the course, I realized it's using a cloud service called "unstructured" for handling the heavy lifting and making simple api calls to deal with the pre-processing.
Here's a link to the site. You can sign up for an API key -
https://unstructured.io/api-key-hosted
Here's a link to the course that shows how to use the API service in practice for data pre-processing -
https://learn.deeplearning.ai/courses/preprocessing-unstructured-data-for-llm-applications/lesson/1/introduction
In essence, I think it would be a nice addition to this application if it can add support for this service. In doing so, it can let a 3rd party service handle the pre-processing, and this application can focus on what it does best - implementing advanced RAG pipelines, and not having to worry so much about keeping up with the latest open source pdf or image parser, etc. But of course the end user can have the option to still use those local open source tools if they want, similar to how you can use openai api, or a local llm.
How hard do you think it would be to integrate the unstructured API as an option?

feature request - load config files from inside of UI

OK I have one more request I'm hoping you can implement for this wonderful app.

The way I use the app, and I imagine others would too, is I am testing various models & verifying the performance to see which gives best results.
The way I do this is create a config.yaml file for each model I use that is set up as needed.
I then created a bash script that runs the webapp with the different config files as an argument, depending on what model I want to test.
If you had this feature inside of the UI it would be extremely handy and I can see this as something other users would appreciate too.
A summary

  • a config file folder is defined, perhaps as an environment variable in the .env file
  • the app searches all .yaml files in the config folder and makes them selectable from a drop-down list on the sidebar (we can just assume the user is putting valid config yaml files in the config folder)
  • if a user wishes to use a load config file, they select the config file they want to change to, press a "load config" button or something to that affect, and the app will load the config inside the UI without having to exit the app and pass the config file as an argument.

This is really the only "quality of life" feature this app is missing for me, since I have about 4 different config files I switch between because I want to test the various llms performance against my questions.

Hope this is possible!

No progress after generating initial chain of prompts

I am running this in Colab with their free tier GPU (15GB), using WizardLM-13B-1.0.ggmlv3.q5_K_S.bin.

I have been testing this out by generating some random PDFs from Wikipedia articles. I can parse about 50 pdfs and create an index in less than a minute. I then run the 'Interact' part and it quickly loads up the "Enter Question >>" prompt. I can then ask a question, and it seems to start compiling the chain. However, afterwards nothing happens.

The prompt below successful finds the PDF of (https://en.wikipedia.org/wiki/Olive_Edis) in my docs foler, and starts putting the prompt together, but then nothing happens.

My GPU usage remains low (2GB/15GB) and I can wait 30 minutes or longer and nothing else happens.

Any hints on how to diagnose this? What should I expect to happen next?

ENTER QUESTION >> What did Olive Edis own?
[chain/start] [1:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] Entering Chain run with input:
{
  "question": "What did Olive Edis own?",
  "context": "Olive Edis\nMary Olive Edis\nAutochrome self-portrait\nBorn\n3 September 1876\nDied\n28 December 1955 (aged 79)\nNationality\nBritish\nOccupation Photographer\nFrom Wikipedia, the free encyclopedia\nMary Olive Edis, later Edis-Galsworthy, (3 September 1876 –\n28 December 1955) was a British female photographer and\nsuccessful business woman who, throughout her career, owned\nseveral studios in London and East Anglia.[1]\nKnown primarily for her studio portrait photography, Edis’ sitters\nranged from royalty to politicians, to influential women, and local\nNorfolk fisherfolk. Edis was one of the first women to adopt the\nautochrome process professionally and became Britain’s first\nofficial female war photographer in 1919.[2]\nContents [hide]\n1 Life\n2 Career\n3 Legacy\n4 Gallery\n5 References\n6 External links\nLife [edit]\nEdis, born at 22 Wimpole Street, London, was the eldest daughter\nof Mary née Murray (1853–1931) and Arthur Wellesley Edis,\n\nWikimedia Commons has\nmedia related to Olive Edis.\nFrance and Flanders between 1918 and 1919 for the Imperial War Museum.[5][6][7] In 1920 she was\ncommissioned to create advertising photographs for the Canadian Pacific Railway and her autochromes of this\ntrip to Canada are believed to be some of the earliest colour photographs of that country.[8]\nThroughout her career Edis photographed many influential figures of early 20th century society. Notable\nexamples include authors Thomas Hardy (1914) and George Bernard Shaw (1936); prime ministers H. H.\nAsquith (1917–18) and David Lloyd George (1917) and the future King George VI (c.1920s). Edis\nphotographed many prominent women at a time of great change for the role of women in British society\nincluding Elizabeth Garrett Anderson (1909), Nancy Astor (1920) and Emmeline Pankhurst (1920). As well as\nfamous sitters, Edis produced many portraits of local working fisherman their families at her studios in North"
}
[llm/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] Entering LLM run with input:
{
  "prompts": [
    "### Instruction:\nUse the following pieces of context to answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.\n\n### Context:\n---------------\nOlive Edis\nMary Olive Edis\nAutochrome self-portrait\nBorn\n3 September 1876\nDied\n28 December 1955 (aged 79)\nNationality\nBritish\nOccupation Photographer\nFrom Wikipedia, the free encyclopedia\nMary Olive Edis, later Edis-Galsworthy, (3 September 1876 –\n28 December 1955) was a British female photographer and\nsuccessful business woman who, throughout her career, owned\nseveral studios in London and East Anglia.[1]\nKnown primarily for her studio portrait photography, Edis’ sitters\nranged from royalty to politicians, to influential women, and local\nNorfolk fisherfolk. Edis was one of the first women to adopt the\nautochrome process professionally and became Britain’s first\nofficial female war photographer in 1919.[2]\nContents [hide]\n1 Life\n2 Career\n3 Legacy\n4 Gallery\n5 References\n6 External links\nLife [edit]\nEdis, born at 22 Wimpole Street, London, was the eldest daughter\nof Mary née Murray (1853–1931) and Arthur Wellesley Edis,\n\nWikimedia Commons has\nmedia related to Olive Edis.\nFrance and Flanders between 1918 and 1919 for the Imperial War Museum.[5][6][7] In 1920 she was\ncommissioned to create advertising photographs for the Canadian Pacific Railway and her autochromes of this\ntrip to Canada are believed to be some of the earliest colour photographs of that country.[8]\nThroughout her career Edis photographed many influential figures of early 20th century society. Notable\nexamples include authors Thomas Hardy (1914) and George Bernard Shaw (1936); prime ministers H. H.\nAsquith (1917–18) and David Lloyd George (1917) and the future King George VI (c.1920s). Edis\nphotographed many prominent women at a time of great change for the role of women in British society\nincluding Elizabeth Garrett Anderson (1909), Nancy Astor (1920) and Emmeline Pankhurst (1920). As well as\nfamous sitters, Edis produced many portraits of local working fisherman their families at her studios in North\n---------------\n\n### Question: What did Olive Edis own?\n### Response:"
  ]
}

---- Edit ----

This may be a resource issue with Google Colab. I'm now trying to run a different code all together and its also getting stuck at 2 GB of GPU usage and not actually outputting a result. I will try this again tomorrow.

Use E5-multilingual as a default

Instruct-xl indeed is on rank 2 atm, while E5-multilingual is on place 5, however multilingualism will greatly enhance the overall usability of the document search imo, it performs really well when used.

I guess just make it optional by default, just as the LLM, but I suggest using that as a default embedding model. It's also only 2.2 Gb not ~5GB large...

Great Repo!

image

Re-ranking model too slow

Hi,

I have been using the tool but the problem is that re-ranking model "bge-reranker-base" is too slow. If I use "macro", there is significant decrease in accuracy. Do you have any suggestions on how can I optimize this?

My hardware:

  • 1 x H100 80GB PCIe, 32 vCPU 251 GB RAM

Can't change embedding model

Hi,

I am trying to change the embedding model/specify the model in config.yaml file but getting HF authentication error. Can you please help.

embedding_model:
type: sentence_transformer
model_name: "infloat/e5-large-v2"

Plans for RRF?

Hey @snexus
Are there any plans of including an option for [RRF](Reciprocal Rank Fusion) along with Marco and BGE for reranking?

CSV data parsing

Hi @snexus,

Is it possible to work with CSV/SQL data. Since you have mentioned unstructured supported formats which includes csv as well. I am trying to parse csv but getting errors:

`Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 297, in add_texts
self._collection.upsert(
File "/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py", line 477, in upsert
) = self._validate_embedding_set(
File "/usr/local/lib/python3.10/dist-packages/chromadb/api/models/Collection.py", line 554, in _validate_embedding_set
validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
File "/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py", line 310, in validate_metadatas
validate_metadata(metadata)
File "/usr/local/lib/python3.10/dist-packages/chromadb/api/types.py", line 278, in validate_metadata
raise ValueError(
ValueError: Expected metadata value to be a str, int, float or bool, got None which is a <class 'NoneType'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/llmsearch", line 8, in
sys.exit(main_cli())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/llmsearch/cli.py", line 44, in generate_index
create_embeddings(config, vs)
File "/usr/local/lib/python3.10/dist-packages/llmsearch/embeddings.py", line 80, in create_embeddings
vs.create_index_from_documents(all_docs=all_docs)
File "/usr/local/lib/python3.10/dist-packages/llmsearch/chroma.py", line 66, in create_index_from_documents
vectordb = Chroma.from_documents(
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 778, in from_documents
return cls.from_texts(
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 736, in from_texts
chroma_collection.add_texts(
File "/usr/local/lib/python3.10/dist-packages/langchain_community/vectorstores/chroma.py", line 309, in add_texts
raise ValueError(e.args[0] + "\n\n" + msg)
ValueError: Expected metadata value to be a str, int, float or bool, got None which is a <class 'NoneType'>

Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.`

docker

Hi ,

Is there any docker container or any plan in near future ?

Best

ValidationError: 1 validation error for Config llm -> params -> model_name field required (type=value_error.missing)

Hi,

Sory to possible newbie question. I succesfully indexed the pfds and try to system . Downloaded advised airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf models and attached my config.

I have below errors.

config_template.yaml.txt

ValidationError: 1 validation error for Config llm -> params -> model_name field required (type=value_error.missing)
Traceback:
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 530, in _run_script
self._session_state.on_script_will_rerun(rerun_data.widget_states)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/state/safe_session_state.py", line 61, in on_script_will_rerun
self._state.on_script_will_rerun(latest_widget_states)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/state/session_state.py", line 500, in on_script_will_rerun
self._call_callbacks()
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/state/session_state.py", line 513, in _call_callbacks
self._new_widget_state.call_callback(wid)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/state/session_state.py", line 260, in call_callback
callback(*args, **kwargs)
File "/home/bc/Projects/OpenSource/llm-search/src/llmsearch/webapp.py", line 167, in reload_model
config = load_config(config_file)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 241, in call
return self._get_or_create_cached_value(args, kwargs)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 267, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/home/bc/Projects/OpenSource/llm-search/venvLLMSearch/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 321, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/home/bc/Projects/OpenSource/llm-search/src/llmsearch/webapp.py", line 112, in load_config
return Config(**config_dict)
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init

Question.

Is this compatible with Windows 11, and CUDA 11.8?

documentation updates - update GGML to GGUF for llama.cpp

Worth noting that in your documentation, you mention GGML files with llama.cpp.
GGML is no longer supported in llama.cpp. However, you can load the exact same model as GGUF and it will work just fine.
I feel this needs to be clarified in your documentation so it's up to date. If you follow your docs and download the GGML version of the llm, it won't work.
Just modify that section of the readme to "load GGUF file", and the extension of GGUF files is .gguf, not .bin.
Simple fix but helpful to new users.

CMake failed

I tried to follow the build/install instructions. It failed whilst building vender/llama

Building wheels for collected packages: llmsearch, llama-cpp-python
  Building wheel for llmsearch (pyproject.toml) ... done
  Created wheel for llmsearch: filename=llmsearch-0.6.1.dev0+g5243360.d20240214-py3-none-any.whl size=51977 sha256=88d36d6f58cf861648ea59aca6b622c60902d85e9ed073303d41fe789458c350
  Stored in directory: /home/brad/.cache/pip/wheels/19/34/30/d9d88eb34ce7925c34d871c614b71065630f4ba7aff90abcf6
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [44 lines of output]
      *** scikit-build-core 0.8.0 using CMake 3.28.3 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpof6qnuza/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.3.0
      -- The CXX compiler identification is GNU 11.3.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Git: /usr/bin/git (found version "2.41.0")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Could not find nvcc, please set CUDAToolkit_ROOT.
      CMake Warning at vendor/llama.cpp/CMakeLists.txt:381 (message):
        cuBLAS not found


      -- CUDA host compiler is GNU
      CMake Error at vendor/llama.cpp/CMakeLists.txt:784 (get_flags):
        get_flags Function invoked with incorrect arguments for function named:
        get_flags


      -- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      CMake Warning (dev) at CMakeLists.txt:21 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at CMakeLists.txt:30 (install):
        Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Successfully built llmsearch
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Any ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.