Coder Social home page Coder Social logo

upstash / degree-guru Goto Github PK

View Code? Open in Web Editor NEW
128.0 5.0 34.0 30.06 MB

AI chatbot for expert answers on university degrees

Home Page: https://degreeguru.vercel.app/

JavaScript 5.74% TypeScript 55.60% CSS 2.00% Python 36.10% Dockerfile 0.56%
ai python rag vercel-ai

degree-guru's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

degree-guru's Issues

Additional Scrollbar Fix

Remove overflow-y-scroll from page.tsx main tag as it's showing an additional scroll bar area which isn't being used

scrape doesn't crawl any pages?

I tried to get scrapy to crawl a basic website, but it doesn't seem to crawl anything. First I thought it was due to the vercel deploy, but even on a basic droplet nothing happens. The documentation is also a bit sparse. Any idea what could be wrong?

2024-05-13 17:24:26 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: degreegurucrawler)
2024-05-13 17:24:26 [scrapy.utils.log] INFO: Versions: lxml 5.2.2.0, libxml2 2.12.6, cssselect 1.2.0, parsel 1.9.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0], pyOpenSSL 24.1.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.7, Platform Linux-6.8.0-31-generic-x86_64-with-glibc2.39
2024-05-13 17:24:26 [httpx] DEBUG: load_ssl_context verify=True cert=None trust_env=True http2=False
2024-05-13 17:24:26 [httpx] DEBUG: load_verify_locations cafile='/root/scrape/venv/lib/python3.12/site-packages/certifi/cacert.pem'
2024-05-13 17:24:26 [httpx] DEBUG: load_ssl_context verify=True cert=None trust_env=True http2=False
2024-05-13 17:24:26 [httpx] DEBUG: load_verify_locations cafile='/root/scrape/venv/lib/python3.12/site-packages/certifi/cacert.pem'
2024-05-13 17:24:26 [httpcore.connection] DEBUG: connect_tcp.started host='adjusted-quagga-67119-eu1-vector.upstash.io' port=443 local_address=None timeout=5.0 socket_options=None
2024-05-13 17:24:26 [httpcore.connection] DEBUG: connect_tcp.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7189d254c6b0>
2024-05-13 17:24:26 [httpcore.connection] DEBUG: start_tls.started ssl_context=<ssl.SSLContext object at 0x7189d252c750> server_hostname='adjusted-quagga-67119-eu1-vector.upstash.io' timeout=5.0
2024-05-13 17:24:26 [httpcore.connection] DEBUG: start_tls.complete return_value=<httpcore._backends.sync.SyncStream object at 0x7189d2921070>
2024-05-13 17:24:26 [httpcore.http11] DEBUG: send_request_headers.started request=<Request [b'POST']>
2024-05-13 17:24:26 [httpcore.http11] DEBUG: send_request_headers.complete
2024-05-13 17:24:26 [httpcore.http11] DEBUG: send_request_body.started request=<Request [b'POST']>
2024-05-13 17:24:26 [httpcore.http11] DEBUG: send_request_body.complete
2024-05-13 17:24:26 [httpcore.http11] DEBUG: receive_response_headers.started request=<Request [b'POST']>
2024-05-13 17:24:26 [httpcore.http11] DEBUG: receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Date', b'Mon, 13 May 2024 17:24:26 GMT'), (b'Content-Type', b'application/json'), (b'Content-Length', b'270'), (b'Connection', b'keep-alive'), (b'Strict-Transport-Security', b'max-age=31536000; includeSubDomains')])
2024-05-13 17:24:26 [httpx] INFO: HTTP Request: POST https://MYVECTORURL.vector.upstash.io/info "HTTP/1.1 200 OK"
2024-05-13 17:24:26 [httpcore.http11] DEBUG: receive_response_body.started request=<Request [b'POST']>
2024-05-13 17:24:26 [httpcore.http11] DEBUG: receive_response_body.complete
2024-05-13 17:24:26 [httpcore.http11] DEBUG: response_closed.started
2024-05-13 17:24:26 [httpcore.http11] DEBUG: response_closed.complete
Creating a vector index at https://MYVECTORURL.vector.upstash.io.
Vector store info before crawl: InfoResult(vector_count=0, pending_vector_count=0, index_size=0, dimension=1536, similarity_function='DOT_PRODUCT', namespaces={'': NamespaceInfo(vector_count=0, pending_vector_count=0)})
2024-05-13 17:24:26 [scrapy.addons] INFO: Enabled addons:
[]
2024-05-13 17:24:26 [asyncio] DEBUG: Using selector: EpollSelector
2024-05-13 17:24:26 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2024-05-13 17:24:26 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.unix_events._UnixSelectorEventLoop
2024-05-13 17:24:26 [scrapy.extensions.telnet] INFO: Telnet Password: a8d1a25a67da58af
2024-05-13 17:24:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2024-05-13 17:24:26 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'degreegurucrawler',
'DEPTH_LIMIT': 3,
'FEED_EXPORT_ENCODING': 'utf-8',
'NEWSPIDER_MODULE': 'degreegurucrawler.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'SPIDER_MODULES': ['degreegurucrawler.spiders'],
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2024-05-13 17:24:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-05-13 17:24:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-05-13 17:24:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-05-13 17:24:26 [scrapy.core.engine] INFO: Spider opened
2024-05-13 17:24:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-05-13 17:24:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-05-13 17:24:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://WEBSITEURL.com> (referer: None)
2024-05-13 17:24:26 [scrapy.core.engine] INFO: Closing spider (finished)
2024-05-13 17:24:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 217,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 11688,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 0.223009,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 5, 13, 17, 24, 26, 770066, tzinfo=datetime.timezone.utc),
'httpcompression/response_bytes': 65674,
'httpcompression/response_count': 1,
'log_count/DEBUG': 4,
'log_count/INFO': 10,
'memusage/max': 89600000,
'memusage/startup': 89600000,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2024, 5, 13, 17, 24, 26, 547057, tzinfo=datetime.timezone.utc)}
2024-05-13 17:24:26 [scrapy.core.engine] INFO: Spider closed (finished)

httpx.UnsupportedProtocol

I have followed all the steps in the README, but I end up with this error everytime I run this code

scrapy crawl configurable --logfile degreegurucrawl.log from the README. May you know why ? I haven't found a solution yet.

httpx.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.

An example of the crawl.yaml:
Capture d’écran, le 2024-03-15 à 21 34 34

Suggestions for this repo

This repo is amazing and thank you for it. My suggestions are:

  1. Somehow merge or refactor this repo with Vercel's own https://github.com/vercel/ai-chatbot, as they have some features and improvements in the UI. Also, they are already using Vercel KV (which is upstash), but they are missing the Upstash vector database for RAG. Combining to KV for login and saving chat with the Vector for RAG would be very powerful

  2. Instead of just providing code for scraping, I think it would be better to just store the data that can be vectorized in a Redis KV from Upstash. That will allow editing of the content and updating a vector. I don't scraping is the best thing, especially if you can't edit the content after before creating the vector entries.

Hallucinations and unresponsiveness

After doing several crawls of the same website, the answers are not exact at all. I have tried changing models but I have not achieved good results.

But, no results. Thanks!

ModuleNotFoundError: No module named 'langchain'

Hi,

I am completely new and while I try to run the project on my iMac, I encountered error - ModuleNotFoundError: No module named 'langchain'

Do I need to install something?
I have installed Ollama.

螢幕截圖 2024-02-19 下午6 34 42

Thanks,
Wallace.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.