Coder Social home page Coder Social logo

deutschey / ai-mreflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vdutts7/ai-mreflow

0.0 0.0 0.0 72.18 MB

YouTubeGPT β€’ AI Chat with 100+ videos ft. YouTuber Matt Wolfe (@mreflow) πŸΊπŸŸ£πŸ€–πŸ’¬

Home Page: https://mreflow-ai.vercel.app/

Shell 1.01% JavaScript 14.68% Python 32.66% TypeScript 46.19% CSS 5.47%

ai-mreflow's Introduction


Logo Logo Logo

YouTubeGPT ft. Matt Wolfe (@mreflow)

AI Chatbot with 100+ videos from YouTuber Matt Wolfe @mreflow

Table of Contents

    πŸ“ About
    πŸ’» How to build πŸš€ Next steps πŸ”§ Tools used
    πŸ‘€ Contact



πŸ“ About

Chat with 100+ YouTube videos from any creator in less than 10 minutes. This project combines basic Python scripting, vector embeddings, OpenAI, Pinecone, and Langchain into a modern chat interface, allowing you to quickly reference any content your favorite YouTuber covers. Type in natural language and get returned detailed answers: (1) in the style / tone of your YouTuber, and (2) with the top 2-3 specific videos referenced hyperlinked.

(back to top)

πŸ’» How to build

Note: macOS version, adjust accordingly for Windows / Linux

Initial setup

Clone and install dependencies:

git clone https://github.com/vdutts7/ai-mreflow
cd ai-mreflow
npm i

Copy .env.example and rename to .env in root directory. Fill out API keys:

ASSEMBLY_AI_API_TOKEN=""
OPENAI_API_KEY=""
PINECONE_API_KEY=""
PINECONE_ENVIRONMENT=""
PINECONE_INDEX=""

Get API keys:

IMPORTANT: Verify that .gitignore contains .env in it.

Handle massive data

Outline:

  • Export metadata (.csv) of YouTube videos ⬇️
  • Download the audio files
  • Transcribe audio files

Navigate to scripts folder, which will host all of the data from the YouTube videos.

cd scripts

Setup python environemnt:

conda env list
conda activate youtube-chat
pip install -r requirements.txt

Scrape YouTube channel-- replace @mreflow with @ of your choice. Replace <k-last-vids> with the number of videos you want included (the script traverses backwards starting from most recent upload). A new file <your-csv-file>.csv will be created at the directory as referenced below:

python scripts/scrape_vids.py https://www.youtube.com/@<username> `<k-last-vids>` scripts/vid_list/<your-csv-file>.csv

Refer to example.csv inside folder and verify your output matches this format:

image

Download audio files:

python scripts/download_yt_audios.py scripts/vid_list/<your-csv-file>.csv scripts/audio_files/
image

We will utilize AssemblyAI's API wrapper class for OpenAI's Whisper API. Their script provides step-by-step directions for a more efficient, faster speech-to-text conversion as Whisper is way too slow and will cost you more. I spent ~ $3.50 to transcribe the 112 videos for Matt Wolfe.

image
python scripts/transcribe_audios.py scripts/audio_files/ scripts/transcripts
image

Upsert to Pinecone database:

python scripts/pinecone_helper.py scripts/vid_list/<your-csv-file>.csv scripts/transcripts/

Pinecone index setup I used below. I used P1 since this is optimized for speed. 1536 is OpenAI's standard we're limited to when querying data from the vectorstore: image

Embeddings and database backend

Breaking down scripts/pinecone_helper.py :

  • Chunk size of 1000 characters with 500 character overlap. I found this working for me but obviously experiment and adjust according to your content library's size, complexity, etc.
  • Metadata: (1) video url and (2) video title

With Pinecone vectorstore loaded, we use Langchain's Conversational Retrieval QA to ask questions, extract relevant metadata from our embeddings, and deliver back to the user in a packaged format as an answer.

The relevant video titles are cited via hyperlinks directly to the video url.

Frontend UI with chat

NextJs styled with Tailwind CSS. src/pages/index.tsx contains base skeleton. src/pages/api/chat-chain.ts is heart of the code where the Langchain connections are outlined.

Run app

npm run dev

Go to http://localhost:3000. You should be able to type and ask questions now. Done βœ…

Logo Screenshot 2023-06-20 at 4 17 08 PM

πŸš€ Next steps

Deploy

I used Vercel as this was a relatively small project.

Alternatives: Heroku, Firebase, AWS Elastic Beanstalk, DigitalOcean, etc.

Customizations

UI/UX: change to your liking.

Bot personality: edit prompt template in /src/pages/api/chat-chain.ts to fine-tune and add greater control on the bot's outputs.

(back to top)

πŸ”§ Built With

Next Typescript Python Langchain OpenAI AssemblyAI Pinecone Tailwind CSS Vercel

(back to top)

πŸ‘€ Contact

[email protected]

πŸ”— Project Link: https://github.com/vdutts7/ai-mreflow

(back to top)

ai-mreflow's People

Contributors

vdutts7 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.