Coder Social home page Coder Social logo

caption-transcript-summarize's Introduction

Caption Transcript Summarize

Python scripts to help parse and summarize .vtt or .txt transcripts from Zoom meetings or Youtube videos into summaries, topic lists and action items with via ChatGPT

With these scripts, a hour one meeting with ~6000 spoken words can be summarize down to ~1500 words.


Project Functionality

Two separate Python scripts. The first script will convert the .vtt or .txt caption file into a plain text transcript, and the second script will parse the transcript into 2000-character chunks and generate a summary and action items using ChatGPT.

Installation

This project uses Python programming language and requires pip to install the necessary packages.

  • pip install openai
  • pip install tqdm

Usage

Use the following commands to convert the .vtt file and generate the summary:

  • python caption-to-transcript.py input.vtt
  • python summarize_transcript.py output.txt

Remember to replace 'your_api_key' in the summarize_transcript.py script with your actual OpenAI API key. The output will be printed to the console. You can redirect the output to a file if you wish to save it.

How to run summarize-transcript.py

Follow these steps to run the summarize-transcript.py script:

  1. Make sure you have Python installed on your computer. You can check by running the following command in your terminal or command prompt:
  • python --version

If you don't have Python installed, you can download it from the official website: https://www.python.org/downloads/

  1. Install the required libraries. Open your terminal or command prompt, navigate to the directory containing the summarize-transcript.py script, and run the following command:
  • pip install openai
  • pip install tqdm
  1. Rename config_template.py to config.py, open in a text editor and replace the placeholder your_api_key in the config.py file with your actual OpenAI API key. You can obtain an API key by signing up for an account on the OpenAI website: https://beta.openai.com/signup/

  2. In the terminal or command prompt, navigate to the directory containing both the summarize-transcript.py script and the transcript file (generated using caption-to-transcript.py or any other transcript file you'd like to summarize). Run the following command:

  • python summarize-transcript.py transcript_file.txt

Replace transcript_file.txt with the name of your transcript file. The script will generate a summary and a bullet list of topics discussed and action items, then save the result in a text file in the same directory with a format like YYMMDD_meeting_saved_closed_caption_transcript_summary.txt.

caption-transcript-summarize's People

Contributors

mylesdebastion avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

caption-transcript-summarize's Issues

caption-to-transcript.py

I didn't have a native WebVTT laying around, so I converted a SRT to VTT using Subtitle Edit.

The first WebVTT output format option (without #) outputted mostly fine, small thing was there were a lot of double spaces between words.

The second WebVTT format output option (with #) outputted a mess with # and time codes still remaining in the text.

Would be cool if the script could work with WebVTT files apart from those from Zoom.

I'd love to see the script work with SRT files, too.

Something else I noticed is my output files still had .vtt extensions instead of the expected .txt on MacOS.

examples.zip
formatted WebTT files though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.