andrewnguonly / lumos Goto Github PK
View Code? Open in Web Editor NEWA RAG LLM co-pilot for browsing the web, powered by local LLMs
License: MIT License
A RAG LLM co-pilot for browsing the web, powered by local LLMs
License: MIT License
Using tinyllama
[GIN] 2024/02/10 - 12:33:07 | 200 | 51.673542ms | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 | 51.70225ms | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 | 51.951042ms | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2024/02/10 - 12:33:07 | 200 | 43.755125ms | 127.0.0.1 | POST "/api/embeddings"
Maybe you can "get away with" using a smaller model for quick embeddings to make things a bit more responsive ??
Hey! thx for your time in this poc.
Not working for me. Waiting for instructions to debug this. Installation not user friendly.
Apple M1 Pro
Sonoma 14.1.1
16GB RAM
Ollama working well:
ollama -v
ollama version 0.1.12
OLLAMA_ORIGINS=chrome-extension://* ollama serve
[GIN] 2023/11/30 - 11:55:01 | 400 | 174.833µs | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2023/11/30 - 11:55:03 | 400 | 411.917µs | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2023/11/30 - 11:55:06 | 400 | 285.208µs | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2023/11/30 - 11:55:11 | 400 | 375.584µs | 127.0.0.1 | POST "/api/embeddings"
[GIN] 2023/11/30 - 11:55:19 | 400 | 382.666µs | 127.0.0.1 | POST "/api/embeddings"
When I try to use the extension, I receive the following error. I have Ollama running locally and can query it from Emacs and receive responses.
Access to fetch at 'http://127.0.0.1:11434/api/embeddings' from origin 'chrome-extension://asfdasfasdfs' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
I fed this to Llama and it suggested the following fixes. However I don't know see where in background.ts the fetch call is for me to add a no-cors mode.
The error message you encountered is related to the Cross-Origin
Resource Sharing (CORS) policy in web browsers. The browser is
blocking access to thehttp://127.0.0.1:11434/api/embeddingsURL
from originating from a different domain than the one that served the
HTML document.The error message specifically states that there is no
Access-Control-Allow-Originheader present in the response to the
preflight request (which is a request made by the browser to check if
the server supports CORS). As a result, the browser is blocking the
request from proceeding.To fix this issue, you have two options:
Add the
Access-Control-Allow-Originheader to the server
response. This can be done by adding the following line to the
server-side code that handles the API request:
#+BEGIN_SRC php
header('Access-Control-Allow-Origin: *');
#+END_SRC
This will allow the browser to make requests to the API from any origin.Disable CORS checking for the specific API request by setting the
modeparameter to'no-cors'in the fetch() function. This can be
done like this:
#+BEGIN_SRC javascript
fetch('http://127.0.0.1:11434/api/embeddings', {
mode: 'no-cors'
})
#+END_SRC
This will disable CORS checking for the specific API request, allowing
it to proceed even though there is noAccess-Control-Allow-Origin
header present in the response.Note that disabling CORS checking can be a security risk, as it allows
requests from any origin to access the API. You should only use this
option when you have verified that the API is being accessed from a
trusted source.
This is helpful for use cases where a user just wants to prompt the LLM without any content.
Also, consider implementing function calling to retrieve binary yes
or no
response.
Each domain should have its own chunkSize
and chunkOverlap
values. These values should be passed to the background script for processing.
Also, investigate if it's useful for each domain should have its own vectorstore retrieval config (e.g. number of documents to return).
https://github.com/ollama/ollama/blob/v0.1.23/api/types.go#L45
Note: This may be dependent on LangChain changes.
First of all thank you for your excellent work, but I have found some issues with my local deployment. My docker backend has successfully pulled up the local model service. In the command line curl access works fine, but when I set the OLLAMA_BASE_URL and OLLAMA_MODEL in script/background.ts the plugin is not responding in the browser.
My specific process is to use ollama inside docker for local model packaging, outside docker using curl access there is normal return, but after setting two parameters in the plugin there is no response from the plugin inside the browser.
Hi @andrewnguonly can you please try to build another project with https://github.com/pieces-app/client
Pieces allows for using LLMs to also get ChatGPT like Persistent chats that can answer any questions. And it supports using local LLMs. there is a /chats API endpoint.
The chrome extension is here: https://docs.pieces.app/extensions-plugins/chrome
Can you build a similar project but with LLM running through their Typescript SDK.
On long pages it seems to halt (e.g. https://news.ycombinator.com/item?id=39190468)
Maybe fixed in new versions
Might be nice to have some indication of the amount of work it's doing
Progress bar or something
I mean you know how many chunks it needs to embed, right?
I don't know the feasibility, but wondering if you can do embedding in parallel somehow?
I suppose with mmap'd model with model shared by multiple processes it could be?
But that's more of an ollama question perhaps?
Thanks
As the title says, can you add the License for this awesome project?
I just updated to commit 72439bf but it seems like there is a regression?
The TTL is 60 minutes but it seems like it's requesting a series of embeddings for each query.
Ok, so I uninstalled it, then reinstalled it, in case my chrome storage options got in a wonky state somehow.
It's then not showing the connection indicator (which I was /was/ seeing at first!) for the model 404
So, back to the embeddings, I've removed/installed. Once I select a model in options hopefully we are good?
Response:
Lots of embeddings (long page):
Hrmmm, it definitely seems like it's calling the embedding endpoint many times for each query. I could have sworn you were caching, that's what the TTL means, right!?
Oh, it's not cached when isHighlightedContent
:
chrome.runtime.sendMessage({
prompt: prompt,
skipRAG: false,
chunkSize: config.chunkSize,
chunkOverlap: config.chunkOverlap,
url: activeTabUrl.toString(),
skipCache: isHighlightedContent,
imageURLs: imageURLs,
});
Based on:
const getHighlightedContent = (): string => {
const selection = window.getSelection();
return selection ? selection.toString().trim() : "";
};
Oh, I see! I guess it's a bit complicated to use the cache easily, eh?
Hrmmmmmm, hhere's other optimizations you could do, but compared to creating completely new embeddings a simple linear search over the highlighted string to see if it contains any of the chunks that would otherwise be returned by the configured parser (i.e. "canonical" chunks?) ?
Can it do function calling? It will automate so much stuff if it can. Please close this if it already can do that.
I have played with function calling on ChatGPT and tried to make a local ChatGPT based tool. But I can't just let ChatGPT go to pages and do research for me (will be too expensive).
With function calling Lumos will be able to answer any question sending request to appropriate tool
Played with this again, did a git reset hard to origin to update (a bit mindlessly, oops), then of course it overwrote DEFAULT_MODEL (Oh, I see there's a gui for that now in options)
An indicator somewhere for:
Is Ollama alive?
Needs starting? Responding at all ?
Is model available ( I guess on the fly model config is a whole other issue)
Needs origins configuring ? 403 (or whatever) Forbidden responses?
I guess you could use the browser action icon even.
The thing is it just seems to quietly go about doing nothing when ollama is not running.
I have various quants of the same model, dunno how often people would actually??? but yeah, I guess if there is maybe should it show tag as discriminator?!
This is a feature request. Since sending the entire page is quite time consuming, can the extension allow for sending only the selected portion of a web page to Ollama.
This will be especially helpful with processing documents such as Google docs, as well as large web pages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.