Coder Social home page Coder Social logo

benthecoder / classgpt Goto Github PK

View Code? Open in Web Editor NEW
209.0 9.0 29.0 1.18 MB

ChatGPT for lecture slides

Home Page: https://benneo.super.site/

License: MIT License

Python 16.59% Jupyter Notebook 82.55% Dockerfile 0.86%
chatgpt gpt openai python llama-index langchain

classgpt's Introduction

ClassGPT

ChatGPT for my lecture slides

SCR-20230307-isgj

Built with Streamlit, powered by LlamaIndex and LangChain.

Uses the latest ChatGPT API from OpenAI.

Inspired by AthensGPT

App Demo

demo.mp4

How this works

  1. Parses pdf with pypdf
  2. Index Construction with LlamaIndex's GPTSimpleVectorIndex
  3. indexes and files are stored on s3
  4. Query the index
    • uses the latest ChatGPT model gpt-3.5-turbo

Usage

Configuration and secrets

  1. configure aws (quickstart)
    aws configure
  1. create an s3 bucket with a unique name

  2. Change the bucket name in the codebase (look for bucket_name = "classgpt" to whatever you created.

  3. rename [.env.local.example] to .env and add your openai credentials

Locally

  1. create python env
    conda create -n classgpt python=3.9
    conda activate classgpt
  1. install dependencies
    pip install -r requirements.txt
  1. run streamlit app
    cd app/
    streamlit run app/01_❓_Ask.py

Docker

Alternative, you can use Docker

    docker compose up

Then open up a new tab and navigate to http://localhost:8501/

TODO

  • local mode for app (no s3)
    • global variable use_s3 to toggle between local and s3 mode
  • deploy app to streamlit cloud
    • have input box for openai key
    • uses pyarrow local FS to store files
  • update code for new langchain update
  • Custom prompts and tweak settings
    • create a settings page for tweaking model parameters and provide custom prompts example
  • Add ability to query on multiple files

FAQ

Tokens

Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

  • 1 token ~= 4 chars in English
  • 1 token ~= ¾ words
  • 100 tokens ~= 75 words
  • 1-2 sentence ~= 30 tokens
  • 1 paragraph ~= 100 tokens
  • 1,500 words ~= 2048 tokens

Try the OpenAI Tokenizer tool

Source

Embeddings

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

For text-embedding-ada-002, cost is $0.0004 / 1k tokens or 3000 pages/dollar

Models

For gpt-3.5-turbo model (ChatGPTAPI) cost is $0.002 / 1K tokens

For text-davinci-003 model, cost is $0.02 / 1K tokens

References

Streamlit

Deplyoment

LlamaIndex

Loading data

multimodal

ChatGPT

Langchain

Boto3

Docker stuff

classgpt's People

Contributors

benthecoder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

classgpt's Issues

Access Denied

Hi, when I setup the environment, I found that the bucket name should be unique in global namespace and therefore I cannot create a bucket named classgpt.
I don't know if this is the reason for getting "Access Denied" (ClientError: An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied) when I ran streamlit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.