Coder Social home page Coder Social logo

yaofanguk / video-subtitle-generator Goto Github PK

View Code? Open in Web Editor NEW
767.0 10.0 153.0 4.25 GB

视频音频生成字幕,生成srt文件。无需申请第三方API,本地实现音频转文本。基于Transformer的视频字幕生成框架。A GUI tool for generating subtitle from videos and generating srt files.

License: Apache License 2.0

Python 100.00%
whisper audio2text generation srt subtitle transcription

video-subtitle-generator's People

Contributors

wongchichong avatar yaofanguk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video-subtitle-generator's Issues

运行出错

这个是什么问题: Failed to open file b'C:\Users\\xe5\x9d\xa4\xe5\x9d\xa4\xe5\x93\x8e\AppData\Local\Temp\scipy-6mn3p0vy'

SystemError: initialization of _internal failed without raising an exception

GPU: Tesla P4
pyTorch: 2.0.1 (CUDA 11.8).

I first set up a new conda environment following the instructions, but the model was running on the CPU.

I then installed pyTorch CUDA using the command generated on the official website. Now running vsg throws the following error.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "gui.py", line 173, in task
    from backend import main
  File "D:\Program Files\Video Subtitle Generator\backend\main.py", line 15, in <module>
    import librosa
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\librosa\__init__.py", line 209, in <module>
    from . import core
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\librosa\core\__init__.py", line 5, in <module>
    from .convert import *  # pylint: disable=wildcard-import
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\librosa\core\convert.py", line 7, in <module>
    from . import notation
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\librosa\core\notation.py", line 8, in <module>
    from ..util.exceptions import ParameterError
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\librosa\util\__init__.py", line 77, in <module>
    from .utils import *  # pylint: disable=wildcard-import
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\librosa\util\utils.py", line 9, in <module>
    import numba
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\numba\__init__.py", line 42, in <module>
    from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\numba\np\ufunc\__init__.py", line 3, in <module>
    from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
  File "C:\Users\USERNAME\.conda\envs\vsgEnv\lib\site-packages\numba\np\ufunc\decorators.py", line 3, in <module>
    from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception

Is there any specific pyTorch version/configuration I should use? Thanks!

ssl链接错误 我用的是conda环境

Collecting package metadata (current_repodata.json): ...working... failed

CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to
download and install packages.

Exception: HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/win-64/current_repodata.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))

能支持视频url地址输入吗?

项目很棒,非常符合我的想法,就是如何支持视频url输入呢?
不是做这方面的,不太懂,大概告诉我思路就行

目前代码的large模型无法正常调用

环境是win10 ltsc python3.8.
以下是报错信息:
Traceback (most recent call last): sg.run() File "backend/main.py", line 214, in run recognizer = AudioRecogniser(language=self.language) File "backend/main.py", line 32, in __init__ self.model = whisper.load_model(self.model_path) File "D:\Projects\video-subtitle-generator\backend\whisper\__init__.py", line 147, in load_model dims = ModelDimensions(**checkpoint["dims"]) KeyError: 'dims'
medium模型可以正常生成字幕文件.

CUDA error: no kernel image

image

I built the environment according to doc, and after run python backend/main.py got this error

Traceback (most recent call last):
  File "backend/main.py", line 275, in <module>
    sg.run()                                                                                                                                                                                                                                                  
  File "backend/main.py", line 214, in run                                                                                                                                                                                                                      
    recognizer = AudioRecogniser(language=self.language)                                                                                                                                                                                                      
  File "backend/main.py", line 32, in __init__                                                                                                                                                                                                                  
    self.model = whisper.load_model(self.model_path)                                                                                                                                                                                                          
  File "/home/ddl/video_cut/video-subtitle-generator/backend/whisper/__init__.py", line 149, in load_model                                                                                                                                                      
    model.load_state_dict(checkpoint["model_state_dict"])                                                                                                                                                                                                     
  File "/home/ddl/anaconda3/envs/vsgEnv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict                                                                                                                                  
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(                                                                                                                                                                               
RuntimeError: Error(s) in loading state_dict for Whisper:                                                                                                                                                                                                           
While copying the parameter named "encoder.blocks.0.attn.query.weight", whose dimensions in the model are torch.Size([1024, 1024]) and whose dimensions in the checkpoint are torch.Size([1024, 1024]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device',). 
While copying the parameter named "encoder.blocks.0.attn.key.weight", whose dimensions in the model are torch.Size([1024, 1024]) and whose dimensions in the checkpoint are torch.Size([1024, 1024]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device',).
While copying the parameter named "encoder.blocks.0.attn.value.weight", whose dimensions in the model are torch.Size([1024, 1024]) and whose dimensions in the checkpoint are torch.Size([1024, 1024]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device',). 
While copying the parameter named "encoder.blocks.0.attn.out.weight", whose dimensions in the model are torch.Size([1024, 1024]) and whose dimensions in the checkpoint are torch.Size([1024, 1024]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device',). 

My GPU is 3090, what should I do about it? Looking forward to your reply!

安装没有报错,运行python main.py时出现问题

(vsgEnv) E:\video-subtitle-generator\backend>python main.py
Traceback (most recent call last):
File "main.py", line 15, in
import librosa
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\librosa_init_.py", line 209, in
from . import core
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\librosa\core_init_.py", line 6, in
from .audio import * # pylint: disable=wildcard-import
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\librosa\core\audio.py", line 11, in
import scipy.signal
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\scipy\signal_init_.py", line 331, in
from .peak_finding import *
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\scipy\signal_peak_finding.py", line 8, in
from scipy.stats import scoreatpercentile
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\scipy\stats_init
.py", line 468, in
from ._rvs_sampling import rvs_ratio_uniforms, NumericalInverseHermite # noqa
File "D:\Python\Miniconda3\envs\vsgEnv\lib\site-packages\scipy\stats_rvs_sampling.py", line 3, in
from ._unuran import unuran_wrapper
File "unuran_wrapper.pyx", line 221, in init scipy.stats._unuran.unuran_wrapper
File "unuran_wrapper.pyx", line 200, in scipy.stats._unuran.unuran_wrapper._setup_unuran
File "messagestream.pyx", line 36, in scipy._lib.messagestream.MessageStream.cinit
OSError: Failed to open file b'C:\Users\\xe5\x8f\xbc\xe7\x9d\x80\xe9\x9b\xb6~1\AppData\Local\Temp\scipy-_60rnr8b'

运行报错

请输入视频地址: F:\video-subtitle-generator-main\test\test_cn.mp4
return program
'ffmpeg' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\user007\miniconda3\envs\vsgEnv\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "F:\video-subtitle-generator-main\backend\main.py", line 64, in call
subprocess.check_output(command, stdin=open(os.devnull), shell=use_shell)
File "C:\Users\user007\miniconda3\envs\vsgEnv\lib\subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\Users\user007\miniconda3\envs\vsgEnv\lib\subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-ss', '0.006000000000000005', '-t', '1.524', '-y', '-i', 'C:\Users\user007\AppData\Local\Temp\tmpyt4zyzet.wav', '-loglevel', 'error', 'C:\Users\user007\AppData\Local\Temp\tmpk3bh6dlw.flac']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "backend/main.py", line 228, in
sg.run()
File "backend/main.py", line 194, in run
for i, extracted_region in enumerate(pool.imap(converter, regions)):
File "C:\Users\user007\miniconda3\envs\vsgEnv\lib\multiprocessing\pool.py", line 868, in next
raise value
subprocess.CalledProcessError: Command '['ffmpeg', '-ss', '0.006000000000000005', '-t', '1.524', '-y', '-i', 'C:\Users\user007\AppData\Local\Temp\tmpyt4zyzet.wav', '-loglevel', 'error', 'C:\Users\user007\AppData\Local\Temp\tmpk3bh6dlw.flac']' returned non-zero exit status 1.
(vsgEnv) PS F:\video-subtitle-generator-main>

请问是缺了什么组件吗?

处理时报错RuntimeError: Numpy is not available

成功打开文件:D:/0software/video-subtitle-generator/test/test_cn.mp4
选择识别的语言: 自动检测
选择识别模式: 标准
['D:/0software/video-subtitle-generator/test/test_cn.mp4']
【处理中】开始生成字幕,此步骤可能花费较长时间,请耐心等待...
Exception in thread Thread-1 (task):
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Program Files\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\0software\video-subtitle-generator\gui.py", line 175, in task
self.sg.run()
File "D:\0software\video-subtitle-generator\backend\main.py", line 227, in run
transcript = recognizer(data)
File "D:\0software\video-subtitle-generator\backend\main.py", line 37, in call
mel = whisper.log_mel_spectrogram(audio_data).to(self.model.device)
File "D:\0software\video-subtitle-generator\backend\whisper\audio.py", line 131, in log_mel_spectrogram
audio = torch.from_numpy(audio)
RuntimeError: Numpy is not available

使用了venv,不知道为什么会和系统上装的python有关系

启动失败

按照说明,运行程序,执行python backend/main.py,会提示No module named 'fsplit'

然后自己单独pip install fsplit安装fsplit,仍然会提示

Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Fisher\AppData\Local\Temp\pip-install-5zl3_dr0\fsplit_45d13962e7b147008eee4fa18d381872\setup.py", line 11, in
from fsplit import version
File "C:\Users\Fisher\AppData\Local\Temp\pip-install-5zl3_dr0\fsplit_45d13962e7b147008eee4fa18d381872\fsplit_init_.py", line 11, in
from info import version # define version variable
ModuleNotFoundError: No module named 'info'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

请问如何解决

No module named 'fsplit'`

创建虚拟环境后,均按照readme进行,在环境下运行GUI.PY之后出现报错。
Logs如下:

成功打开文件:F:/video2txt/02/002.mp4
['F:/video2txt/02/002.mp4']
Exception in thread Thread-1:
Traceback (most recent call last):
File "F:\Miniconda\envs\vsgEnv_1\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "F:\Miniconda\envs\vsgEnv_1\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File ".\gui.py", line 173, in task
from backend import main
File "F:\video2txt\VSG\video-subtitle-generator\backend\main.py", line 23, in
from backend import config
File "F:\video2txt\VSG\video-subtitle-generator\backend\config.py", line 10, in
from fsplit.filesplit import Filesplit
ModuleNotFoundError: No module named 'fsplit'

尝试了自动/手动单独装fsplit,但是装不上,因为fsplit包里缺少info库

(vsgEnv_1) (vsgEnv_1) PS F:\video2txt\VSG\video-subtitle-generator> python "E:\Download of Chrome\dist\fsplit-1.0.0\setup.py"
Traceback (most recent call last):
File "E:\Download of Chrome\dist\fsplit-1.0.0\setup.py", line 11, in
from fsplit import version
File "E:\Download of Chrome\dist\fsplit-1.0.0\fsplit_init_.py", line 11, in
from info import version # define version variable
ModuleNotFoundError: No module named 'info'

是我哪里做错了吗?

安装依赖报错

运行 python gui.py 的时候出现下面的错误

ModuleNotFoundError: No module named 'utils.formatter'

已经尝试过 pip install utils 安装 utils 可是还是出现错误。

Python 版本 3.8.12

安装没有报错,使用main.py的时候报错了

conda 和 pip 安装的步骤都没有报错。
以下是在win11系统下,运行main的信息报错信息。

return program
RuntimeError: module compiled against API version 0xf but this version of numpy is 0xe
ImportError: numpy.core.multiarray failed to import

The above exception was the direct cause of the following exception:

SystemError: <built-in method __contains__ of dict object at 0x00000162E8958BC0> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".\main.py", line 228, in <module>
    sg.run()
  File ".\main.py", line 199, in run
    transcript = recognizer(data)
  File ".\main.py", line 35, in __call__
    _, probs = self.model.detect_language(mel)
  File "D:\Miniconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Administrator\Downloads\video-subtitle-generator-main\video-subtitle-generator-main\backend\whisper\decoding.py", line 56, in detect_language
    language_probs = [
  File "C:\Users\Administrator\Downloads\video-subtitle-generator-main\video-subtitle-generator-main\backend\whisper\decoding.py", line 59, in <listcomp>
    for j, c in zip(tokenizer.all_language_tokens, tokenizer.all_language_codes)
  File "C:\Users\Administrator\Downloads\video-subtitle-generator-main\video-subtitle-generator-main\backend\whisper\tokenizer.py", line 228, in all_language_codes
    return tuple(self.decode([l]).strip("<|>") for l in self.all_language_tokens)
  File "C:\Users\Administrator\Downloads\video-subtitle-generator-main\video-subtitle-generator-main\backend\whisper\tokenizer.py", line 228, in <genexpr>
    return tuple(self.decode([l]).strip("<|>") for l in self.all_language_tokens)
  File "C:\Users\Administrator\Downloads\video-subtitle-generator-main\video-subtitle-generator-main\backend\whisper\tokenizer.py", line 141, in decode
    return self.tokenizer.decode(token_ids, **kwargs)
  File "D:\Miniconda3\lib\site-packages\transformers\tokenization_utils_base.py", line 3474, in decode
    token_ids = to_py_obj(token_ids)
  File "D:\Miniconda3\lib\site-packages\transformers\utils\generic.py", line 174, in to_py_obj
    return [to_py_obj(o) for o in obj]
  File "D:\Miniconda3\lib\site-packages\transformers\utils\generic.py", line 174, in <listcomp>
    return [to_py_obj(o) for o in obj]
  File "D:\Miniconda3\lib\site-packages\transformers\utils\generic.py", line 175, in to_py_obj
    elif is_tf_tensor(obj):
  File "D:\Miniconda3\lib\site-packages\transformers\utils\generic.py", line 151, in is_tf_tensor
    return False if not is_tf_available() else _is_tensorflow(x)
  File "D:\Miniconda3\lib\site-packages\transformers\utils\generic.py", line 142, in _is_tensorflow
    import tensorflow as tf
  File "D:\Miniconda3\lib\site-packages\tensorflow\__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "D:\Miniconda3\lib\site-packages\tensorflow\python\__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "D:\Miniconda3\lib\site-packages\tensorflow\python\eager\context.py", line 33, in <module>
    from tensorflow.python.client import pywrap_tf_session
  File "D:\Miniconda3\lib\site-packages\tensorflow\python\client\pywrap_tf_session.py", line 19, in <module>
    from tensorflow.python.client._pywrap_tf_session import *
ImportError: initialization failed

翻译报错

While copying the parameter named "decoder.blocks.23.mlp.2.weight", whose dimensions in the model are torch.Size([1024, 4096]) and whose dimensions in the checkpoint are torch.Size([1024, 4096]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.',).

第一次运行正常, 但强行关闭后, 后面再也不能正常运行

貌似解决了: 用管理员权限打开CMD运行就好了。

原问题:

第一次打开, 选择了快速, 能正常开始生成字幕。但当时心急, 看到有错误, 想着直接关掉重新用中等去运行。
结果关掉的时候看到命令行处好像出现了点异常, 之后强行关掉命令行窗口。

之后无论用各种方法, 虽然能打开gui界面, 也能正常打开视频, 但是点运行就只会卡死在那里, 显卡显存也完全没有占用。

而且有个莫名其妙的副作用, 现在我的trojan-Qt5(梯子)启动的时候各种提示端口错误, 之前从未遇到过, 莫名其妙。

目前尝试过把项目重新解压缩, 重启电脑, 都不行。。

运行安装依赖报错

Collecting appdirs==1.4.4
Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting audioread==2.1.9
Using cached audioread-2.1.9.tar.gz (377 kB)
Preparing metadata (setup.py) ... done
Collecting certifi==2021.10.8
Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting cffi==1.15.0
Using cached cffi-1.15.0.tar.gz (484 kB)
Preparing metadata (setup.py) ... done
Collecting chardet==4.0.0
Using cached chardet-4.0.0-py2.py3-none-any.whl (178 kB)
Collecting click==8.1.3
Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting decorator==5.1.1
Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Collecting filelock==3.6.0
Using cached filelock-3.6.0-py3-none-any.whl (10.0 kB)
Collecting filesplit==3.0.2
Using cached filesplit-3.0.2.tar.gz (5.7 kB)
Preparing metadata (setup.py) ... done
Collecting idna==3.3
Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting joblib==1.1.0
Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting librosa==0.9.1
Using cached librosa-0.9.1-py3-none-any.whl (213 kB)
ERROR: Ignored the following versions that require a different python version: 0.36.0 Requires-Python >=3.6,<3.10; 0.37.0 Requires-Python >=3.7,<3.10; 0.38.0 Requires-Python >=3.7,<3.11; 0.38.1 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement llvmlite==0.38.0 (from versions: 0.2.0, 0.2.1, 0.2.2, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0.1, 0.12.1, 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0, 0.17.1, 0.18.0, 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.23.2, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.27.1, 0.28.0, 0.29.0, 0.30.0, 0.31.0, 0.32.0, 0.32.1, 0.33.0, 0.34.0, 0.35.0, 0.39.0, 0.39.1)
ERROR: No matching distribution found for llvmlite==0.38.0

前面运行的都很顺利,我也不知道这是怎么回事,希望大佬解答

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.