GPT4V-Image-Captioner / GPT4V图像打标器

We now have sd-webui-GPT4V-Image-Captioner for SD WebUI

This is a multifunctional image processing toolbox built with Gradio, capable of tagging images using the GPT-4-vision API, the cogVLM model, Qwen-VL(Alibaba Cloud), the Moondream model.

Key features include:

One-click installation and use
Single image and multi-image batch tagging
Choice of online GPT4V or Qwen-VL(Alibaba Cloud) & local CogVLM and Moondream models
Visual tag analysis and processing
Image pre-compression
Keyword filtering and watermark image recognition

Developers: Jiaye, LEOSAM是只兔狲, SleeeepyZhou, Fok, GPT4. Welcome everyone to add more new features to this project.

Installation and Startup Guide

Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)

Open Command Prompt as administrator and navigate to the directory where you want to clone the repository.

Clone the repository using the following command:

git clone https://github.com/jiayev/GPT4V-Image-Captioner

Double-click install_windows.bat to run and install all necessary dependencies.
After the installation is complete, you can launch the GPT4V-Image-Captioner by double-clicking start_windows.bat.
Hold down Ctrl and click on the URL in the terminal (or copy the URL to your browser), which will open the Gradio app interface in your default browser.
Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.

Linux / macOS

Open a terminal and navigate to the directory where you want to clone the repository.

Clone the repository using the following command:

git clone https://github.com/jiayev/GPT4V-Image-Captioner

Navigate to the cloned directory:
```
cd GPT4V-Image-Captioner
```
Make the install and start scripts executable with the following command:
```
chmod +x install_linux_mac.sh; chmod +x start_linux_mac.sh
```
Execute the install script:
```
./install_linux_mac.sh
```
Launch the GPT4V-Image-Captioner in the terminal by executing the launch script:
```
./start_linux_mac.sh
```
Copy the URL displayed in the terminal and open it in your browser to access the Gradio app interface.
Enter the official OpenAI or third-party GPT-4V API Key and API Url at the top of the interface. After setting the image address, you can start tagging the image.

Windows Manual Installation Instructions

Open the Command Prompt by pressing Win + R, typing cmd, and then pressing Enter.
Clone the repository to your local machine using the following command:
```
git clone https://github.com/jiayev/GPT4V-Image-Captioner
```
Once cloning is complete, navigate to the cloned directory:
```
cd GPT4V-Image-Captioner
```
Before installing any dependencies, make sure that Python is installed on your system. Check for Python's presence by typing the following command and pressing Enter in the Command Prompt:
```
python --version
```
If Python is not installed, you will get an error message. In that case, please visit the Python official download page and follow the instructions to install it.
Create a virtual environment named myenv to avoid contaminating the global Python environment:
```
python -m venv myenv
```
Activate the virtual environment you just created:
```
myenv\Scripts\activate
```
Update pip to date:
```
python -m pip install --upgrade pip
```

Install libraries within the virtual environment:

pip install scipy networkx wordcloud matplotlib Pillow tqdm gradio requests

After completing the steps above, you can start GPT4V-Image-Captioner by double-clicking the start_windows.bat file.

mooneese / gpt4v-image-captioner Goto Github PK

gpt4v-image-captioner's Introduction

GPT4V-Image-Captioner / GPT4V图像打标器

Installation and Startup Guide

Windows (If the automatic installation fails, please refer to the Manual Installation Instructions)

Linux / macOS

Windows Manual Installation Instructions

gpt4v-image-captioner's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent