pashpashpash / manga-reader Goto Github PK

Generate a video recap of any manga volume PDF with GPT Vision and Elevenlabs narration. Discord: https://discord.gg/MMqcuDe2WZ

Home Page: https://mangarecap.ai

License: MIT License

Python 79.47% Jupyter Notebook 20.53%

accessibility comics elevenlabs manga manga-reader openai summarization video vision comic

manga-reader's Introduction

Manga Recap with GPT-4 Vision

Project Overview

This project aims to generate summaries of manga volumes by analyzing images extracted from PDF files of the manga. It uses the GPT-4 Vision API to understand the content of manga pages and produce compelling, story-telling tone summaries. The project processes PDFs to extract images, scales them to a specific size, encodes them in base64, and then uses these images as input for the GPT-4 Vision API alongside custom prompts to generate summaries. Once a summary is generated, it is sent to ElevenLabs API for narration. The resulting narration and relevant panel images are then combined to create a video recap summarizing the volume.

Join the Discord: https://discord.gg/MMqcuDe2WZ

example-1.mp4

Features

PDF processing to extract manga pages as images as well as panel extraction from within pages.
Image scaling to fit the requirements of the GPT-4 Vision API.
Base64 encoding of images for API submission.
Generating text summaries of manga volumes in a story-telling tone.
Narration of the generated summaries using the ElevenLabs API.
Video creation from the narration and relevant panel images/pages.

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.7+
Pip3 (Python package manager)
Virtual environment (recommended)

Installation Steps

Create a virtual environment to manage your project's dependencies separately.

python3 -m venv venv

Activate the virtual environment

source venv/bin/activate

Install Required Python Packages

pip3 install -r requirements.txt

Set Up Environment Variables

Create a .env file in the root directory of your project. Add your OpenAI API key to this file:

OPENAI_API_KEY=your_openai_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Prepare Your Manga PDFs

Place your manga volume PDF files in a directory structure as expected by the script, for example, naruto/v10/v10.pdf. Additionally, you should have a chapter-reference.pdf and a profile-reference.pdf in each manga directory. For example, naruto/chapter-reference.pdf and naruto/profile-reference.pdf. These files are used by GPT vision to identify the chapter pages and character introductions, respectively, so that jobs can be split up by chapter and for characters to be identified correctly by GPT Vision.

Running the Project

To run the project, execute the app.py script from the root directory of your project:

python3 app.py --manga naruto --volume-number 10

This script processes the specified PDF files, extracts and scales images, encodes them in base64, and sends them to the GPT-4 Vision API for analysis. The summaries generated by the API are printed to the console, including the total tokens used. The script then sends the summaries to the ElevenLabs API for narration. The resulting narration and relevant panel images are then combined to create a video recap summarizing the volume. The video is saved inside the relevant volume directory, i.e. naruto/v10/recap.mp4.

Optional/Recommended running instructions

I personally recommend running this in a Jupiter notebook (anime_recap.ipynb), as it allows you run the script one cell at a time, which is useful for debugging and understanding the process.

manga-reader's People

Contributors

Stargazers

Watchers

Forkers

rishi23root

manga-reader's Issues

[Short term] Clean up the open source code to make it maximally convenient for new people to get set up and running it.

This is a more general task, but it's important to clean up the code. Right now it's a bit of a mess with python objects being created with unstandardized parameters (for example, the volume object, narration_script, movie_script object, etc. These should likely be classes instead of objects.

The goal here is for the source code to be cleanly organized into understandable, standardized objects/classes/functions so that even people with a limited understanding of code can feel like they can modify the behavior of the code to their liking. This will also help other developers get involved and understand what the code does.

Lastly, the README should be updated to include all of the setup steps, as right now certain things are excluded (i.e. needing to set up torch among other libraries prior to running)

[Short term] Add an argument like --only-summary that just outputs a summary of a given volume without narration or video creation

Picrel, many people have expressed interest in this feature. cc @rishi23root

[Short term] Add option to download manga pdf instead of uploading one

https://github.com/Zehina/Webtoon-Downloader

[Medium Term] Optimize/reduce prices of the generated summaries

There's lots of room to optimize the processing costs, including switching away from elevenlabs to a cheaper azure TTS for example.

[Medium term] Better system for character identification.

Right now users have to create a "profile reference" PDF that has an example profile reference page for the manga. Then as part of the "identify important_pages" step, GPT vision is used to identify a character profile page within the volume. This is far from ideal, as it slows down the time it takes for a user to run the script both in terms of extra setup and slow GPT vision processing time. This is far from ideal.

Perhaps this can help:
https://github.com/ragavsachdeva/magi

I'm happy for other ideas on how to improve this.

[Long term] Animate key frames with SORA

Looking forward, the AI-generated videos SORA model looks promising. The pieces that we have built now can be used to turn MangaRecap into a full fledged animation studio. Input a manga and it creates an entire animated recap for you, with accurate characters and plots.

[Medium Term] Colorize black&white panels/pages automatically

Perhaps we can use the same tech these guys use? https://toona.io/colorizer

Text extraction

I have the manga chapters in a pdf files. They are in english. I want to extract all the text present in that manga chapters. can you provide me a solution for that ? Thank you : )

[Short term] Better system for splitting volumes into chapters

Right now, the script feeds every single page of a manga volume into GPT vision to identify chapter start pages in order to split up the volume into chapters later in the code. This happens as part of the "identifying important_pages" step, which is slow and expensive. There are better ways of doing this including but not limited to identifying the table of contents and mapping the table of contents chapter pages to the relevant PDF page indexes.

https://github.com/pashpashpash/manga-reader/blob/main/app.py#L39-L66

[Creative] Improve quality of the videos

There's also a lot that can be improved with the quality of the videos. There can be customizeable backgrounds, panning, zooming into panels, and more creative panel/page placement.

This is a good benchmark for video quality: https://www.youtube.com/watch?v=CrYApEUR904&t=14599s

[Short term] Implement retries and waiting for concurrent API calls in the case of throttling.

I have had some people tell me that the GPT concurrent calls I am making right now are being throttled -- most likely because my Openai organization has higher concurrency limits compared to new accounts. This can be solved by

Limiting max concurrent requests
Retries should be implemented as well.

https://github.com/pashpashpash/manga-reader/blob/main/app.py#L47-L66

https://github.com/pashpashpash/manga-reader/blob/main/app.py#L173-L185

From someone who attempted to run the code:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-xxxxxxxxxx on tokens per min (TPM): Limit 10000, Requested 25012. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}