chenyme / chenyme-aavt Goto Github PK

View Code? Open in Web Editor NEW

571.0 571.0 59.0 22.3 MB

这是一个全自动（音频）视频翻译项目。利用Whisper识别声音，AI大模型翻译字幕，最后合并字幕视频，生成翻译后的视频。

License: MIT License

Python 91.76% Batchfile 8.24%

faster-whisper gpt-4 gpt-4o speech-recognition video-translation whisper

chenyme-aavt's People

Contributors

Stargazers

Watchers

Forkers

b08240 wildhoney312 stonezhu870 vsrising onyxzphyr shinnyyang k576026608 islenkao rtx3 blackwhites pengge zhoulingjie shang-lm lianshan9527 themrtang sanlilin gigbucket nieyu catalina2014 shifeng011 shui9527 weilyken zorazora59 snake-git cococoffee huhobb alexandajerry gmh5225 honolulu0 caoyek linaofan malaaaa zenbao chaunceyxcx even1207 mouhao clardemasol wsk3373 migucn joe2hpimn nonomal crazyboy0752 popsc30

chenyme-aavt's Issues

音频太大会超出限制（200M+）

APIStatusError: <title>413 Request Entity Too Large</title>

413 Request Entity Too Large

cloudflare
Traceback:
File "E:\AAVT_0.8_full\env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 600, in _run_script
exec(code, module.dict)
File "E:\AAVT_0.8_full\Chenyme-AAVT.py", line 40, in
video()
File "E:\AAVT_0.8_full\project\video.py", line 168, in video
result = openai_whisper(st.session_state.openai_key, st.session_state.openai_base, proxy_on, whisper_prompt, temperature, output_file)
File "E:\AAVT_0.8_full\utils\utils.py", line 65, in openai_whisper
transcript = client.audio.transcriptions.create(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai\resources\audio\transcriptions.py", line 116, in create
return self._post(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1240, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 921, in request
return self._request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 976, in _request
return self._retry_request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1053, in _retry_request
return self._request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 976, in _request
return self._retry_request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1053, in _retry_request
return self._request(
File "E:\AAVT_0.8_full\env\lib\site-packages\openai_base_client.py", line 1020, in _request
raise self._make_status_error_from_response(err.response) from None

KeyError: 'st.session_state has no key "w_model_option"

Traceback (most recent call last):
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 119, in getattr
return self[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 90, in getitem
return get_session_state()[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\safe_session_state.py", line 91, in getitem
return self._state[key]
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state.py", line 400, in getitem
raise KeyError(_missing_key_error_message(key))
KeyError: 'st.session_state has no key "w_model_option". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
exec(code, module.dict)
File "E:\software\Chenyme_AAVT_0.5.1\Chenyme_AAVT_0.5.1\pages\📽️视频(Video).py", line 59, in
result = get_whisper_result(uploaded_file, output_file, device, st.session_state.w_model_option,
File "C:\Users\Administrator\pinokio\bin\miniconda\lib\site-packages\streamlit\runtime\state\session_state_proxy.py", line 121, in getattr
raise AttributeError(_missing_attr_error_message(key))
AttributeError: st.session_state has no attribute "w_model_option". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization

中转站无法正常调用

“视频”模块下开启代理选项仍旧识别为官方API

设置中已正确填写key和api地址

上传文件提示403

部署在本地，似乎是跨域问题？你们没遇到这个问题吗？

Kimi Translation Function Reach the Limit of API Request

RateLimitError: Error code: 429 - {'error': {'message': 'max request per minute reached: 3, please try again after 1 seconds', 'type': 'rate_limit_reached_error'}}
Traceback:
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script
    exec(code, module.__dict__)
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\pages\📽️视频(Video).py", line 130, in <module>
    result = kimi_translate(st.session_state.kimi_key, translate_option, result, language1, language2, token_num)
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\utils\utils.py", line 190, in kimi_translate
    completion = client.chat.completions.create(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_utils\_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\resources\chat\completions.py", line 667, in create
    return self._post(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1233, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 922, in request
    return self._request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 998, in _request
    return self._retry_request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1046, in _retry_request
    return self._request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 998, in _request
    return self._retry_request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1046, in _retry_request
    return self._request(
File "C:\ai\Chenyme_AAVT_0.6.3_FIixbug\Chenyme_AAVT_0.6.3_FIixbug\env\lib\site-packages\openai\_base_client.py", line 1013, in _request
    raise self._make_status_error_from_response(err.response) from None

Log as below.

Maybe Add a configuration of Max request?

网页中点击运行程序报错，命令框闪退

后台命令框内容显示如下

oMP: Error #15: Initializing libiomp5nd. d11, but found 1ibiomp5nd. dll already initialized.l oMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpeniP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library, As anunsafe, u nsupported, undocumented workaround you can set the environment variable KMP DUPLICATE LIB OK-TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

0.6.2版本生成视频问题

我在使用最新的版本包0.6.2版本进行视频字幕生成时遇到问题，配置如下

视频生成的过程中，我观察到前几分钟我的gpu cuda利用率有明显上升，但是后半段则只有CPU占用。

视频生成完毕后，通过VLC等外部播放器打开视频文件时，会发现字幕分成了“视频画面上的字幕”和“视频外挂的字幕”两部分，重叠显示在播放器窗口上。

视频生成过程中如何更好地利用GPU，避免低效的CPU时段。
视频生成时可否不将字幕压制到画面内，仅保留外挂字幕即可

streamlit server can not be connected

开webui后，chrome后台就报错，但是前端能正常交互，但是点击“运行程序”就还是报错

Failed to load resource: net::ERR_NAME_NOT_RESOLVED
main.eccc579f.js:2

   GET http://localhost:8501/%E9%9F%B3%E9%A2%91(Audio)/_stcore/health net::ERR_CONNECTION_REFUSED

(anonymous) @ main.eccc579f.js:2
xhr @ main.eccc579f.js:2
ke @ main.eccc579f.js:2
_request @ main.eccc579f.js:2
request @ main.eccc579f.js:2
P.forEach.Be. @ main.eccc579f.js:2
(anonymous) @ main.eccc579f.js:2
c @ main.eccc579f.js:2
(anonymous) @ main.eccc579f.js:2
pingServer @ main.eccc579f.js:2
setFsmState @ main.eccc579f.js:2
stepFsm @ main.eccc579f.js:2
websocket.onclose @ main.eccc579f.js:2
main.eccc579f.js:2

   GET http://localhost:8501/%E9%9F%B3%E9%A2%91(Audio)/_stcore/host-config net::ERR_CONNECTION_REFUSED

无法使用chatgpt中转站

已确认config里openai_base被正确修改。但仍然报密钥错误

有命令行直接能运行的模式吗

挂着程序在后台跑了两个多小时顺便看了部电影, 然后edge直接把挂在后台的标签页杀了, 命令行窗口还能看到网页上显示的东西全都跟被重置了一样. 程序跑完的结果根本看不到.

运行 install.bat 会有报错

python 3.12版本会在安装faster-whisper失败，3.11版本可以

希望可以增加docker

非常好的自动化项目，希望可以增加docker的安装方式，这样可以在服务器运行

关于个别bug的探讨

首先感谢作者开源了本项目，整体还是非常好用的，个人非常喜欢

下面列出几个我遇到的bug，希望能帮助项目变得更好：

project\video.py:
- 64行: 变量 vad 的赋值应为boolean而非string，即 vad = True if VAD_on else False。在当前实现中，vad无论UI如何选择都会开启
- 95行：”本地模型“的UI创建中未定义language2，将会导致后续调用 local_translate 函数出现未定义引用；另外language = ('中文', 'English', '日本語', '한국인', 'Italiano', 'Deutsch')也许可以放在更早的位置（如93行）进行赋值，以覆盖不同翻译设置

关于翻译使用的prompt，本人测试的时候使用性能较弱的本地部署的ChatGLM3-6B-int4，发现当前prompt的翻译效果并不理想，模型会输出很多废话。个人目前将prompt改成如下，可以实现无废话的翻译：

messages=[
    {
        "role": "user",
        "content": f"请将下列括号内的文本翻译为{language2}，只需直接回答翻译后的文本。\n[{text}]"}
])

个人意见，仅供参考。

0.6.1测试发现的Bug

转audio的时候报错，转video没有报错

File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Chenyme_AAVT_0.6.1\pages\🎙️音频(Audio).py", line 46, in
result = get_whisper_result(uploaded_file, cache_dir, device, w_model_option, w_version, vad)
TypeError: get_whisper_result() missing 3 required positional arguments: 'lang', 'beam_size', and 'min_vad'
2024-03-12 07:30:23.183 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 542, in _run_script
exec(code, module.dict)
File "C:\Chenyme_AAVT_0.6.1\pages\🎙️音频(Audio).py", line 46, in
result = get_whisper_result(uploaded_file, cache_dir, device, w_model_option, w_version, vad)
TypeError: get_whisper_result() missing 3 required positional arguments: 'lang', 'beam_size', and 'min_vad'

希望有docker部署版本

amd显卡可以用这个软件吗

我使用开源的whisper 音频转文字软件可以用amd转换，这个软件支持amd显卡吗

kimi 是哪个平台?

建议写到readme 里面
kimi 是哪个平台链接是什么?

上传视频问题

上传视频时出现AxiosError: Request failed with status code 403

爆出openai_key不存在

我已经在浏览器中输入了key，或者在config的配置文件找中配置，但是在主页中问AI小助手还是会报错，生成视频也一样，爆出key不存在

OpenMP

本地调用模式
加载模型：D:/BigModel/Chenyme-AAVT-main/models/medium
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

第一次使用生成视频报错，VPN已开启，该如何解决

LocalEntryNotFoundError: Cannot find an appropriate cached snapshot folder for the specified revision on the local disk and outgoing traffic has been disabled. To enable repo look-ups and downloads online, pass 'local_files_only=False' as input.

linux运行后点gpu加速显示Could not load library libcudnn_ops_infer.so.8.

根据install.bat安装了
pip install streamlit -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install -U openai-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install openai -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install langchain -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple some-package
pip install faster-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

启动后gpu加速无法运行,/workspace/venv/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8有这个
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!

root@ae950ec2447b:/workspace# find / -type f -name libcudnn_ops_infer.so.8
/opt/conda/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
/opt/conda/pkgs/pytorch-2.1.2-py3.10_cuda11.8_cudnn8.7.0_0/lib/python3.10/site-packages/torch/lib/libcudnn_ops_infer.so.8
find: '/proc/17/task/17/net': Invalid argument
find: '/proc/17/net': Invalid argument
find: '/proc/18/task/18/net': Invalid argument
find: '/proc/18/net': Invalid argument
find: '/proc/19/task/19/net': Invalid argument
find: '/proc/19/net': Invalid argument
find: '/sys/kernel/slab': Input/output error
/workspace/venv/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_ops_infer.so.8

第一次使用出现如下错误

ValueError: [Errno 22] Invalid argument
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script
exec(code, module.dict)
File "D:\Chenyme_AAVT_0.6.3_FIixbug\pages\📽️视频(Video).py", line 116, in
result = get_whisper_result(uploaded_file, output_file, device, models_option,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Chenyme_AAVT_0.6.3_FIixbug\utils\utils.py", line 82, in get_whisper_result
segments, _ = model.transcribe(path_video,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\transcribe.py", line 294, in transcribe
audio = decode_audio(audio, sampling_rate=sampling_rate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 52, in decode_audio
for frame in frames:
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 103, in _resample_frames
for frame in itertools.chain(frames, [None]):
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 90, in _group_frames
for frame in frames:
File "D:\Chenyme_AAVT_0.6.3_FIixbug\env\Lib\site-packages\faster_whisper\audio.py", line 80, in _ignore_invalid_frames
yield next(iterator)
^^^^^^^^^^^^^^
File "av\container\input.pyx", line 212, in decode
File "av\packet.pyx", line 87, in av.packet.Packet.decode
File "av\stream.pyx", line 168, in av.stream.Stream.decode
File "av\codec\context.pyx", line 513, in av.codec.context.CodecContext.decode
File "av\codec\context.pyx", line 416, in av.codec.context.CodecContext._send_packet_and_recv
File "av\error.pyx", line 336, in av.error.err_check

请问什么时候推出训练行运行的模式

heygen video translation

假如再大胆一点

whisper解决语音到字幕的问题
LLMs(chatgpt,google translate)解决多国语言翻译问题
MockingBird或者so-vits-svc-fork训练原配角色音色(声纹)
根据分析出的文本时间轴，利用ffmpeg分割不同音色的视频到片段，同时用训练好的原配角色音色按照翻译后的文本生成音轨
(可选)再用GeneFace++或者Wav2Lip对应的口型矫正
最后合并回去(ffmpeg)

这个是不是就是heygen video translation的大致实现思路，当然我是一个rookie，真的过程想必远比这个复杂，这里最大的难点是，如何识别出不同的声音的前后时间轴，中间还有相关的去背景音，识别误差校准等很多问题

主要是我小白基础太差，哈哈！还得进一步学习下

          你好！

直接删除项目目录下的所有文件即可。

请问项目有什么不明白的地方？以便我后续的改进。

Originally posted by @Chenyme in #26 (comment)