Coder Social home page Coder Social logo

chenyme / chenyme-aavt Goto Github PK

View Code? Open in Web Editor NEW
571.0 571.0 59.0 22.3 MB

这是一个全自动(音频)视频翻译项目。利用Whisper识别声音,AI大模型翻译字幕,最后合并字幕视频,生成翻译后的视频。

License: MIT License

Python 91.76% Batchfile 8.24%
faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper

chenyme-aavt's People

Contributors

chenyme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

chenyme-aavt's Issues

音频太大会超出限制(200M+)

APIStatusError: <title>413 Request Entity Too Large</title>

413 Request Entity Too Large


cloudflare
Traceback:
File "E:\AAVT_0.8_full\env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "E:\AAVT_0.8_full\Chenyme-AAVT.py", line 40, in
video()
File "E:\AAVT_0.8_full\project\video.py", line 168, in video
result = openai_whisper(st.session_state.openai_key, st.session_state.openai_base, proxy_on, whisper_prompt, temperature, output_file)
File "E:\AAVT_0.8_full\utils\utils.py", line 65, in openai_whisper
transcript = client.audio.transcriptions.create(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai\resources\audio\transcriptions.py", line 116, in create
return self._post(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1240, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 921, in request
return self._request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 976, in _request
return self._retry_request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1053, in _retry_request
return self._request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 976, in _request
return self._retry_request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1053, in _retry_request
return self._request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1020, in _request
raise self._make_status_error_from_response(err.response) from None

推荐使用虚拟环境

执行安装文件的时候创建一个专用的虚拟环境,在里面安装依赖包,而不是在全局Python环境中安装

KeyError: 'st.session_state has no key "w_model_option"

Traceback (most recent call last):
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 119, in getattr
return self[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 90, in getitem
return get_session_state()[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\safe_session_state.py", line 91, in getitem
return self._state[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state.py", line 400, in getitem
raise KeyError(_missing_key_error_message(key))
KeyError: 'st.session_state has no key "w_model_option". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
exec(code, module.dict)
File "E:\software\Chenyme_AAVT_0.5.1\Chenyme_AAVT_0.5.1\pages\📽️视频(Video).py", line 59, in
result = get_whisper_result(uploaded_file, output_file, device, st.session_state.w_model_option,
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 121, in getattr
raise AttributeError(_missing_attr_error_message(key))
AttributeError: st.session_state has no attribute "w_model_option". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization

Kimi Translation Function Reach the Limit of API Request

RateLimitError: Error code: 429 - {'error': {'message': 'max request per minute reached: 3, please try again after 1 seconds', 'type': 'rate_limit_reached_error'}}
Traceback:
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script
    exec(code, module.__dict__)
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\pages\📽️视频(Video).py", line 130, in <module>
    result = kimi_translate(st.session_state.kimi_key, translate_option, result, language1, language2, token_num)
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\utils\utils.py", line 190, in kimi_translate
    completion = client.chat.completions.create(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_utils\_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\resources\chat\completions.py", line 667, in create
    return self._post(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1233, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 922, in request
    return self._request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 998, in _request
    return self._retry_request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1046, in _retry_request
    return self._request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 998, in _request
    return self._retry_request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1046, in _retry_request
    return self._request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1013, in _request
    raise self._make_status_error_from_response(err.response) from None

Log as below.

Maybe Add a configuration of Max request?

网页中点击运行程序报错,命令框闪退

后台命令框内容显示如下

oMP: Error #15: Initializing libiomp5nd. d11, but found 1ibiomp5nd. dll already initialized.l oMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpeniP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library, As anunsafe, u nsupported, undocumented workaround you can set the environment variable KMP DUPLICATE LIB OK-TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

0.6.2版本生成视频问题

我在使用最新的版本包0.6.2版本进行视频字幕生成时遇到问题,配置如下

image

视频生成的过程中,我观察到前几分钟我的gpu cuda利用率有明显上升,但是后半段则只有CPU占用。

视频生成完毕后,通过VLC等外部播放器打开视频文件时,会发现字幕分成了“视频画面上的字幕”和“视频外挂的字幕”两部分,重叠显示在播放器窗口上。

  1. 视频生成过程中如何更好地利用GPU,避免低效的CPU时段。
  2. 视频生成时可否不将字幕压制到画面内,仅保留外挂字幕即可

streamlit server can not be connected

开webui后,chrome后台就报错,但是前端能正常交互,但是点击“运行程序”就还是报错

WeChat Screenshot_20240504165106

Failed to load resource: net::ERR_NAME_NOT_RESOLVED
main.eccc579f.js:2

   GET http://localhost:8501/%E9%9F%B3%E9%A2%91(Audio)/_stcore/health net::ERR_CONNECTION_REFUSED

(anonymous) @ main.eccc579f.js:2
xhr @ main.eccc579f.js:2
ke @ main.eccc579f.js:2
_request @ main.eccc579f.js:2
request @ main.eccc579f.js:2
P.forEach.Be. @ main.eccc579f.js:2
(anonymous) @ main.eccc579f.js:2
c @ main.eccc579f.js:2
(anonymous) @ main.eccc579f.js:2
pingServer @ main.eccc579f.js:2
setFsmState @ main.eccc579f.js:2
stepFsm @ main.eccc579f.js:2
websocket.onclose @ main.eccc579f.js:2
main.eccc579f.js:2

   GET http://localhost:8501/%E9%9F%B3%E9%A2%91(Audio)/_stcore/host-config net::ERR_CONNECTION_REFUSED

有命令行直接能运行的模式吗

挂着程序在后台跑了两个多小时顺便看了部电影, 然后edge直接把挂在后台的标签页杀了, 命令行窗口还能看到网页上显示的东西全都跟被重置了一样. 程序跑完的结果根本看不到.

希望可以增加docker

非常好的自动化项目,希望可以增加docker的安装方式,这样可以在服务器运行

关于个别bug的探讨

首先感谢作者开源了本项目,整体还是非常好用的,个人非常喜欢

下面列出几个我遇到的bug,希望能帮助项目变得更好:

  • project\video.py:
    • 64行: 变量 vad 的赋值应为boolean而非string,即 vad = True if VAD_on else False。在当前实现中,vad无论UI如何选择都会开启
    • 95行:”本地模型“的UI创建中未定义language2,将会导致后续调用 local_translate 函数出现未定义引用;另外language = ('中文', 'English', '日本語', '한국인', 'Italiano', 'Deutsch')也许可以放在更早的位置(如93行)进行赋值,以覆盖不同翻译设置

关于翻译使用的prompt,本人测试的时候使用性能较弱的本地部署的ChatGLM3-6B-int4,发现当前prompt的翻译效果并不理想,模型会输出很多废话。个人目前将prompt改成如下,可以实现无废话的翻译:

messages=[
    {
        "role": "user",
        "content": f"请将下列括号内的文本翻译为{language2},只需直接回答翻译后的文本。\n[{text}]"}
])

个人意见,仅供参考。

0.6.1测试发现的Bug

转audio的时候报错,转video没有报错
2024-03-12_22-30-46

File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Chenyme_AAVT_0.6.1\pages\🎙️音频(Audio).py", line 46, in
result = get_whisper_result(uploaded_file, cache_dir, device, w_model_option, w_version, vad)
TypeError: get_whisper_result() missing 3 required positional arguments: 'lang', 'beam_size', and 'min_vad'
2024-03-12 07:30:23.183 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Chenyme_AAVT_0.6.1\pages\🎙️音频(Audio).py", line 46, in
result = get_whisper_result(uploaded_file, cache_dir, device, w_model_option, w_version, vad)
TypeError: get_whisper_result() missing 3 required positional arguments: 'lang', 'beam_size', and 'min_vad'

爆出openai_key不存在

我已经在浏览器中输入了key,或者在config的配置文件找中配置,但是在主页中问AI小助手还是会报错,生成视频也一样,爆出key不存在

1

OpenMP

本地调用模式
加载模型:D:/BigModel/Chenyme-AAVT-main/models/medium
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

linux运行后点gpu加速显示Could not load library libcudnn_ops_infer.so.8.

根据install.bat安装了
pip install streamlit -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install -U openai-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install openai -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install langchain -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install faster-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

启动后gpu加速无法运行,/workspace/venv/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8有这个
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!

root@ae950ec2447b:/workspace# find / -type f -name libcudnn_ops_infer.so.8
/opt/conda/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
/opt/conda/pkgs/pytorch-2.1.2-py3.10_cuda11.8_cudnn8.7.0_0/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
find: '/proc/17/task/17/net': Invalid argument
find: '/proc/17/net': Invalid argument
find: '/proc/18/task/18/net': Invalid argument
find: '/proc/18/net': Invalid argument
find: '/proc/19/task/19/net': Invalid argument
find: '/proc/19/net': Invalid argument
find: '/sys/kernel/slab': Input/output error
/workspace/venv/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8

第一次使用出现如下错误

ValueError: [Errno 22] Invalid argument
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "D:\Chenyme_AAVT_0.6.3_FIixbug\pages\📽️视频(Video).py", line 116, in
result = get_whisper_result(uploaded_file, output_file, device, models_option,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Chenyme_AAVT_0.6.3_FIixbug\utils\utils.py", line 82, in get_whisper_result
segments, _ = model.transcribe(path_video,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\transcribe.py", line 294, in transcribe
audio = decode_audio(audio, sampling_rate=sampling_rate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 52, in decode_audio
for frame in frames:
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 103, in _resample_frames
for frame in itertools.chain(frames, [None]):
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 90, in _group_frames
for frame in frames:
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 80, in _ignore_invalid_frames
yield next(iterator)
^^^^^^^^^^^^^^
File "av\container\input.pyx", line 212, in decode
File "av\packet.pyx", line 87, in av.packet.Packet.decode
File "av\stream.pyx", line 168, in av.stream.Stream.decode
File "av\codec\context.pyx", line 513, in av.codec.context.CodecContext.decode
File "av\codec\context.pyx", line 416, in av.codec.context.CodecContext._send_packet_and_recv
File "av\error.pyx", line 336, in av.error.err_check

heygen video translation

假如再大胆一点

  • whisper解决语音到字幕的问题
  • LLMs(chatgpt,google translate)解决多国语言翻译问题
  • MockingBird或者so-vits-svc-fork训练原配角色音色(声纹)
  • 根据分析出的文本时间轴,利用ffmpeg分割不同音色的视频到片段,同时用训练好的原配角色音色按照翻译后的文本生成音轨
  • (可选)再用GeneFace++或者Wav2Lip对应的口型矫正
  • 最后合并回去(ffmpeg)

这个是不是就是heygen video translation的大致实现思路,当然我是一个rookie,真的过程想必远比这个复杂,这里最大的难点是,如何识别出不同的声音的前后时间轴,中间还有相关的去背景音,识别误差校准等很多问题

torch+CUDA安装后CUDA加速不可用

不知为何CUDA+torch正常工作下()加速的选项不可用
测试Stable-diffusion webui是正常的,

然后人工检查:
import torch
torch.cuda.is_available()
返回True
image

关于文字的上限

若使用kimi进行翻译,如果字幕超过一定数量之后,就无法进行翻译了,例如,一个视频超过10分钟,之后,就没办法进行翻译了,所提供的字幕就是还是英文的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.