Coder Social home page Coder Social logo

gitmylo / audio-webui Goto Github PK

View Code? Open in Web Editor NEW
913.0 18.0 85.0 714 KB

A webui for different audio related Neural Networks

License: MIT License

Python 98.75% Batchfile 0.12% Shell 0.12% Jupyter Notebook 0.60% JavaScript 0.40%
ai audioldm bark rvc text-to-audio text-to-speech voice-cloning audiocraft music bark-gui

audio-webui's Introduction

logo Audio Webui logo

joe.biden.mp4

GitHub commit activity (branch) GitHub contributors GitHub all releases GitHub Sponsors

Discord

❗❗ Please read ❗❗

This code works on python 3.10 (lower versions don't support "|" type annotations, and i believe 3.11 doesn't have support for the TTS library currently).

You also need to have Git installed, you might already have it, run git --version in a console/terminal to see if you already have it installed.

Some features require ffmpeg to be installed.

On Windows, you need to have visual studio C++ build tools installed.

Common issues

Latest big features

  • Extensions

👍 Automatic installers

Automatic installers! (Download)

  1. Put the installer in a folder
  2. Run the installer for your operating system.
  3. Now run the webui's install script. Follow the steps at 📦 Installing

📦 Docker

Links to community audio-webui docker projects
Note: The docker repositories are not maintained by me. And docker related issues should go to the docker repositories. If an issue is related to audio-webui directly, create the issue here. Unless a fix has already been made.

💻 Local install (Manual)

🔽 Downloading

It is recommended to use git to download the webui, using git allows for easy updating.

To download using git, run git clone https://github.com/gitmylo/audio-webui in a console/terminal

📦 Installing

Installation is done automatically in a venv when you run run.bat or run.sh (.bat on Windows, .sh on Linux/MacOS).

🔼 Updating

To update,
run update.bat on windows, update.sh on linux/macos
OR run git pull in the folder your webui is installed in.

🏃‍ Running

Running should be as simple as running run.bat or run.sh depending on your OS. Everything should get installed automatically.

If there's an issue with running, please create an issue

💻 Google colab notebook

Open in colab Open in github

💻 Common command line flags

Name Args Short Usage Description
--skip-install [None] -si -si Skip installing packages
--skip-venv [None] -sv -sv Skip creating/activating venv, also skips install. (for advanced users)
--no-data-cache [None] [None] --no-data-cache Don't change the default dir for huggingface_hub models. (This might fix some models not loading)
--launch [None] [None] --launch Automatically open the webui in your browser once it launches.
--share [None] -s -s Share the gradio instance publicly
--username username (str) -u, --user -u username Set the username for gradio
--password password (str) -p, --pass -p password Set the password for gradio
--theme theme (str) [None] --theme "gradio/soft" Set the theme for gradio
--listen [None] -l -l Listen a server, allowing other devices within your local network to access the server. (or outside if port forwarded)
--port port (int) [None] --port 12345 Set a custom port to listen on, by default a port is picked automatically

✨ Current goals and features ✨

moved to a separate readme

More readme

Link

audio-webui's People

Contributors

buffmcbighuge avatar cocktailpeanut avatar d8ahazard avatar ethanperrine avatar framp avatar gitmylo avatar pravdomil avatar sammcj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audio-webui's Issues

[BUG REPORT] Pitch extraction silently fails.

I have 4gb of data that I am processing into RVC numpy's. The audio is cut fine but the pitch extraction is failing silently. There are about 8000 files it makes per folder and keeps failing on the last or second to last, regardless of whether I use chrome or firefox.

Since it can't pick up where it left off, I always have to start again and it's not failing on the same file.

[BUG REPORT] ImportError: numpy.core.multiarray failed to import

at the end it tells me that I have visual C++ build tools installed, but I already had it

Error:

C:\Users\USUARIO\Desktop\nylo\audio-webui\venv\lib\site-packages\whisper\timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def backtrace(trace: np.ndarray):
Traceback (most recent call last):
File "init.pxd", line 942, in numpy.import_array
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\main.py", line 24, in
from webui.webui import launch_webui
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\webui.py", line 1, in
from webui.ui.ui import create_ui
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\ui\ui.py", line 2, in
from .tabs import *
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\ui\tabs_init_.py", line 7, in
from .training_tab import training_tab
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\ui\tabs\training_tab.py", line 2, in
from webui.ui.tabs.training.rvc import train_rvc
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\ui\tabs\training\rvc.py", line 2, in
import webui.ui.tabs.training.training.rvc_workspace as rvc_ws
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\ui\tabs\training\training\rvc_workspace.py", line 33, in
from webui.modules.implementations.rvc.custom_pitch_extraction import pitch_extract as pe
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\webui\modules\implementations\rvc\custom_pitch_extraction.py", line 6, in
import pyworld
File "C:\Users\USUARIO\Desktop\nylo\audio-webui\venv\lib\site-packages\pyworld_init_.py", line 7, in
from .pyworld import *
File "pyworld\pyworld.pyx", line 6, in init pyworld.pyworld
File "init.pxd", line 944, in numpy.import_array
ImportError: numpy.core.multiarray failed to import
numpy.core.multiarray failed to import
Your install might have failed to install one of the requirements, are you missing a package?
Depending on the error message that was given during install, you might need to install visual C++ build tools.
Or read common issues at https://github.com/gitmylo/audio-webui/wiki/common-issues

[BUG REPORT]

Describe the bug
It Keeps Looping around when i try to use run.bat and

To Reproduce
Steps to reproduce the behavior:

  1. it did the installs
  2. it keeps asking if my pip is up to date
  3. it says no module named bark

Expected behavior
to run the ui

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Successfully installed librosa-0.10.0.post2

[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Installing requirement pytube...
Requirement already satisfied: pytube in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (15.0.0)

[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Installing requirement openai-whisper...
Requirement already satisfied: openai-whisper in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (20230314)
Requirement already satisfied: more-itertools in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (9.1.0)
Requirement already satisfied: torch in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (2.0.1+cu117)
Requirement already satisfied: numpy in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (1.23.5)
Requirement already satisfied: numba in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (0.56.4)
Requirement already satisfied: ffmpeg-python==0.2.0 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (0.2.0)
Requirement already satisfied: tiktoken==0.3.1 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (0.3.1)
Requirement already satisfied: tqdm in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from openai-whisper) (4.65.0)
Requirement already satisfied: future in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from ffmpeg-python==0.2.0->openai-whisper) (0.18.3)
Requirement already satisfied: regex>=2022.1.18 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from tiktoken==0.3.1->openai-whisper) (2023.6.3)
Requirement already satisfied: requests>=2.26.0 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from tiktoken==0.3.1->openai-whisper) (2.28.1)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from numba->openai-whisper) (0.39.1)
Requirement already satisfied: setuptools in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from numba->openai-whisper) (65.5.0)
Requirement already satisfied: sympy in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from torch->openai-whisper) (1.11.1)
Requirement already satisfied: jinja2 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from torch->openai-whisper) (3.1.2)
Requirement already satisfied: networkx in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from torch->openai-whisper) (2.8.8)
Requirement already satisfied: typing-extensions in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from torch->openai-whisper) (4.4.0)
Requirement already satisfied: filelock in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from torch->openai-whisper) (3.9.0)
Requirement already satisfied: colorama in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from tqdm->openai-whisper) (0.4.6)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from requests>=2.26.0->tiktoken==0.3.1->openai-whisper) (2022.12.7)
Requirement already satisfied: charset-normalizer<3,>=2 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from requests>=2.26.0->tiktoken==0.3.1->openai-whisper) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from requests>=2.26.0->tiktoken==0.3.1->openai-whisper) (1.26.13)
Requirement already satisfied: idna<4,>=2.5 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from requests>=2.26.0->tiktoken==0.3.1->openai-whisper) (3.4)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from jinja2->torch->openai-whisper) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in c:\users\tomas_000\desktop\audio-webui-master\venv\lib\site-packages (from sympy->torch->openai-whisper) (1.2.1)

[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Installing requirement git+https://github.com/facebookresearch/audiocraft.git...
Collecting git+https://github.com/facebookresearch/audiocraft.git
Cloning https://github.com/facebookresearch/audiocraft.git to c:\users\public\documents\wondershare\creatortemp\pip-req-build-nqhxh4ny
ERROR: Error [WinError 2] The system cannot find the file specified while executing command git version
ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Preparing
Traceback (most recent call last):
File "C:\Users\tomas_000\Desktop\audio-webui-master\main.py", line 15, in
from webui.modules.implementations.tts_monkeypatching import patch as patch1
File "C:\Users\tomas_000\Desktop\audio-webui-master\webui\modules\implementations_init_.py", line 1, in
import webui.modules.implementations.ttsmodels as tts
File "C:\Users\tomas_000\Desktop\audio-webui-master\webui\modules\implementations\ttsmodels.py", line 15, in
from webui.modules.implementations.patches.bark_custom_voices import wav_to_semantics, generate_fine_from_wav,
File "C:\Users\tomas_000\Desktop\audio-webui-master\webui\modules\implementations\patches\bark_custom_voices.py", line 4, in
from bark.generation import SAMPLE_RATE, load_codec_model
ModuleNotFoundError: No module named 'bark'
Press any key to continue . . .

[FEATURE REQUEST] Streamlined install system.

A system which streamlines installs by having the python code itsself check if packages are already installed etc. This would remove the need for the --skip-install flag, as it would skip any packages that are already on the right version.

No --listen parameter

I run this stuff over the network off a server and am not able to access the web UI over my local network. A public instance is not ideal for me to go out of my lan and back again.

[FEATURE REQUEST] Quality of Life improvements

Hi! very handy to have this all-in-one tool!

Can it maybe have enable / disable toggle to do auto-save any generated audio?

And I like to give another suggestion; some repo called Voicefixer, to clean up samples. Do you know how to incorporate it? That would be amazing. ;)

Suggestion - upgrade pip at start of install

Saves all these warnings as install happens.

[notice] A new release of pip available: 22.2.2 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Installing requirement pyworld>=0.3.2...

[notice] A new release of pip available: 22.2.2 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Installing requirement faiss-cpu==1.7.3...

[notice] A new release of pip available: 22.2.2 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Installing requirement torchcrepe==0.0.18...

Also, maybe show all stats as each package installs. Helps those with slow connections see the install is still running and assume it has hung.

RVC training

To do: Implement standalone training for RVC model training.

  • #13
  • #12
    • Training workspaces
    • #14
    • Loading and saving
    • Allow the user to download the base models for training automatically
    • Add saving of extracted pitches (f0) for all files
    • Training of models using data provided

[BUG REPORT] Tortoise-v2 TTS model doesn't load

I get this error

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UnifiedVoice:
        Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", "gpt.h.0.attn.masked_bias", "gpt.h.1.attn.bias", "gpt.h.1.attn.masked_bias", "gpt.h.2.attn.bias", "gpt.h.2.attn.masked_bias", "gpt.h.3.attn.bias", "gpt.h.3.attn.masked_bias", "gpt.h.4.attn.bias", "gpt.h.4.attn.masked_bias", "gpt.h.5.attn.bias", "gpt.h.5.attn.masked_bias", "gpt.h.6.attn.bias", "gpt.h.6.attn.masked_bias", "gpt.h.7.attn.bias", "gpt.h.7.attn.masked_bias", "gpt.h.8.attn.bias", "gpt.h.8.attn.masked_bias", "gpt.h.9.attn.bias", "gpt.h.9.attn.masked_bias", "gpt.h.10.attn.bias", "gpt.h.10.attn.masked_bias", "gpt.h.11.attn.bias", "gpt.h.11.attn.masked_bias", "gpt.h.12.attn.bias", "gpt.h.12.attn.masked_bias", "gpt.h.13.attn.bias", "gpt.h.13.attn.masked_bias", "gpt.h.14.attn.bias", "gpt.h.14.attn.masked_bias", "gpt.h.15.attn.bias", "gpt.h.15.attn.masked_bias", "gpt.h.16.attn.bias", "gpt.h.16.attn.masked_bias", "gpt.h.17.attn.bias", "gpt.h.17.attn.masked_bias", "gpt.h.18.attn.bias", "gpt.h.18.attn.masked_bias", "gpt.h.19.attn.bias", "gpt.h.19.attn.masked_bias", "gpt.h.20.attn.bias", "gpt.h.20.attn.masked_bias", "gpt.h.21.attn.bias", "gpt.h.21.attn.masked_bias", "gpt.h.22.attn.bias", "gpt.h.22.attn.masked_bias", "gpt.h.23.attn.bias", "gpt.h.23.attn.masked_bias", "gpt.h.24.attn.bias", "gpt.h.24.attn.masked_bias", "gpt.h.25.attn.bias", "gpt.h.25.attn.masked_bias", "gpt.h.26.attn.bias", "gpt.h.26.attn.masked_bias", "gpt.h.27.attn.bias", "gpt.h.27.attn.masked_bias", "gpt.h.28.attn.bias", "gpt.h.28.attn.masked_bias", "gpt.h.29.attn.bias", "gpt.h.29.attn.masked_bias".

Additionally is there a config file available to change the generation settings in tortoise?

Can't skip install.

The parameter didn't work. I had to comment it out in the main.py script. I already have an nvidia environment that I use, no need for me to download torch for the 20th time :P

After I commented it out the ui started, I think I needed to install 3 packages with pip.

[FEATURE REQUEST] Support so-vits-svc

Is your feature request related to a problem? Please describe.

RVC works great with spoken word but isn't so good at songs. I was finally able to test it after fixing their UI's loading of files.
Results are much better for sung content. It can transform female to male voices effectively, unlike RVC.

Describe the solution you'd like

Support training and use of so-vits models along side RVC.

Describe alternatives you've considered

Janky so-vits-svc-fork UI that's geared more towards real-time voice changing.

[BUG REPORT] After latest update RVC doesn't work anymore

Updated to latest version. When I try to generate anything with RVC I get these errors:

TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

if data.dtype in [np.float64, np.float32, np.float16]: AttributeError: 'NoneType' object has no attribute 'dtype'

[BUG REPORT] ConnectionResetError

Describe the bug
When generating audio with bark, after it goes through and gets to the final stages of generation, it'll sucesfully complete (I believe) and give me an audio file that sounds fine as far as I can tell, however there's typically some errors that're thrown in the cmd window as well and I just wanted to make you aware of them at the very least, and make sure they weren't causing other issues as well.

Generation split into sections: ['[woman] "I can't believe you did this to me!" [man] Alice shouted, throwing the vase at Bob's head. He ducked and ran out of the door, leaving her alone with the broken pieces of their marriage.', ' [woman] [whispering] "Don't go, please," [man] she whispered, collapsing on the floor in tears. Bob didn’t say anything. He was too afraid of Alice’s anger and hurt. He knew he had made a terrible mistake,', ' but he didn’t know how to fix it. He just wanted to get away from the scene of his betrayal.']
ffmpeg version 5.1.2-full_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libaribb24 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
Input #0, png_pipe, from 'C:\Users\Patrick\AppData\Local\Temp\tmpg12s0oww.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, rgba(pc), 1000x200, 25 fps, 25 tbr, 25 tbn
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from 'C:\Users\Patrick\AppData\Local\Temp\tmpmckuvh0q.wav':
Duration: 00:00:38.92, bitrate: 384 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 24000 Hz, mono, s16, 384 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[libx264 @ 0000000000ef7000] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0000000000ef7000] profile High, level 3.0, 4:2:0, 8-bit
[libx264 @ 0000000000ef7000] 264 - core 164 r3099 e067ab0 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'C:\Users\Patrick\AppData\Local\Temp\tmp3n0lbnii.mp4':
Metadata:
encoder : Lavf59.27.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 1000x200, q=2-31, 25 fps, 12800 tbn
Metadata:
encoder : Lavc59.37.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 24000 Hz, mono, fltp, 69 kb/s
Metadata:
encoder : Lavc59.37.100 aac
frame= 973 fps=793 q=-1.0 Lsize= 473kB time=00:00:38.95 bitrate= 99.4kbits/s speed=31.7x
video:82kB audio:366kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 5.534837%
[libx264 @ 0000000000ef7000] frame I:4 Avg QP:12.78 size: 5977
[libx264 @ 0000000000ef7000] frame P:301 Avg QP:20.67 size: 119
[libx264 @ 0000000000ef7000] frame B:668 Avg QP:16.30 size: 35
[libx264 @ 0000000000ef7000] consecutive B-frames: 1.0% 20.6% 5.2% 73.2%
[libx264 @ 0000000000ef7000] mb I I16..4: 75.5% 5.9% 18.6%
[libx264 @ 0000000000ef7000] mb P I16..4: 0.9% 0.5% 0.0% P16..4: 3.8% 0.2% 0.1% 0.0% 0.0% skip:94.6%
[libx264 @ 0000000000ef7000] mb B I16..4: 0.1% 0.0% 0.0% B16..8: 1.6% 0.1% 0.0% direct: 0.0% skip:98.2% L0:45.7% L1:54.1% BI: 0.2%
[libx264 @ 0000000000ef7000] 8x8 transform intra:19.8% inter:38.3%
[libx264 @ 0000000000ef7000] coded y,uvDC,uvAC intra: 4.8% 12.4% 10.5% inter: 0.0% 0.1% 0.0%
[libx264 @ 0000000000ef7000] i16 v,h,dc,p: 94% 2% 5% 0%
[libx264 @ 0000000000ef7000] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 8% 1% 90% 1% 0% 0% 0% 0% 0%
[libx264 @ 0000000000ef7000] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 61% 17% 17% 0% 1% 1% 1% 1% 1%
[libx264 @ 0000000000ef7000] i8c dc,h,v,p: 39% 5% 55% 0%
[libx264 @ 0000000000ef7000] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0000000000ef7000] ref P L0: 95.9% 0.2% 3.1% 0.8%
[libx264 @ 0000000000ef7000] ref B L0: 76.2% 22.1% 1.6%
[libx264 @ 0000000000ef7000] ref B L1: 99.5% 0.5%
[libx264 @ 0000000000ef7000] kb/s:17.14
[aac @ 0000000002aff900] Qavg: 1151.147
ERROR:asyncio:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Users\Patrick\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\Patrick\AppData\Local\Programs\Python\Python310\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
ERROR:asyncio:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Users\Patrick\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\Patrick\AppData\Local\Programs\Python\Python310\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Longer bark generations in some way.

I put in text in the box but only 14s will generate. I know there was some limit on this in the official bark but maybe it's possible to generate again and concatenate the files. It can't get past the first sentence of the vaporeon copypasta.

[QUESTION] All about these indexes?

So what is the deal on this? It has to now be shared with the models? The files are gigantic. I notice that my training loss is looking lower and the models sound a tiny bit nicer. Singing is way better. No more noise on empty sections of audio either.

See a lot of up and down on the graph after a certain number of epochs. The scale is really tiny so I can't tell much besides it zig-zagging towards 0. No idea on final loss unless it's in a text file somewhere. Would going for 200 epochs help now? Previously I didn't get much out of it as loss would flatline.

[BUG REPORT] RuntimeError when cloning Bark model

Describe the bug
When trying to clone a voice for bark I get the following error

Extracting semantics
Tokenizing semantics
Traceback (most recent call last):
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\gradio\routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\gradio\blocks.py", line 1352, in process_api
result = await self.call_function(
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\gradio\blocks.py", line 1077, in call_function
prediction = await anyio.to_thread.run_sync(
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\gradio\helpers.py", line 602, in tracked_fn
response = fn(*args)
File "A:\Desktop\00 AI Images\audio-webui\webui\ui\tabs\text_to_speech.py", line 81, in _generate
response, file = loader.get_response(*inputs, progress=progress)
File "A:\Desktop\00 AI Images\audio-webui\webui\modules\implementations\ttsmodels.py", line 210, in get_response
_speaker = self.create_voice(temp_file.name, clone_model)
File "A:\Desktop\00 AI Images\audio-webui\webui\modules\implementations\ttsmodels.py", line 42, in create_voice
fine_prompt = generate_fine_from_wav(file)
File "A:\Desktop\00 AI Images\audio-webui\webui\modules\implementations\patches\bark_custom_voices.py", line 88, in generate_fine_from_wav
encoded_frames = model.encode(wav)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\encodec\model.py", line 144, in encode
encoded_frames.append(self._encode_frame(frame))
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\encodec\model.py", line 161, in _encode_frame
emb = self.encoder(x)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\encodec\modules\seanet.py", line 144, in forward
return self.model(x)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
input = module(input)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\encodec\modules\conv.py", line 210, in forward
return self.conv(x)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\encodec\modules\conv.py", line 120, in forward
x = self.conv(x)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "A:\Desktop\00 AI Images\audio-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Appears to be no way to download or run RVC model

I really like what you're doing!😄 But I was trying the new RVC feature and it appears to be no way to run it right now. Nothing shows up at the dropdown menu and nothing auto downloads like the other tts models. Everything else works fine.

[FEATURE REQUEST] Bark options hints

add folloving hints to inputs

Text temperature
1.0 more diverse, 0.1 more conservative

Waveform temperature
1.0 more diverse, 0.1 more conservative

Multi Voice Models and tortoise.

I had to edit the TTS python module to set strict=False in order to load tortoise and it does some slow but great inputs. However the voice is still locked to only male and cannot be changed. Quite a few models have this problem where the "good" tts only has one voice or it won't run at all due to no speaker being selected.

We can choose the speaker for RVC models, how about for other TTS.

[QUESTION] Experiences with bark seed.

Have you tried to use seeds with bark? suno-ai/bark#175

Repeated generations almost never have the same voice for me and this looks to be a way to maybe have that happen. That way a good clone (in theory) could be maintained across multiple sentences and maybe even restarts of the UI.

Oh and they merged the other PR finally.

[BUG REPORT]

I'm trying to run the software but I get the following, what can I do?
thanks in advance for your support!
E.

C:\audio-webui\audio-webui>run
Checking installs and venv + autodebug checks
Python version: 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
activating venv
Checking installs and venv + autodebug checks
Python version: 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Installed PyTorch!

▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
█░▄▄▀██░██░█▄▄░▄▄██░▄▄▄░██░▄▄▀██░▄▄▄██░▄▄▀██░██░██░▄▄░██
█░▀▀░██░██░███░████░███░██░██░██░▄▄▄██░▄▄▀██░██░██░█▀▀██
█░██░██▄▀▀▄███░████░▀▀▀░██░▀▀░██░▀▀▀██░▀▀░██▄▀▀▄██░▀▀▄██
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

Install failed!
STDOUT:
Looking in indexes: https://download.pytorch.org/whl/cu117
Collecting torch

STDERR:
ERROR: Exception:
Traceback (most recent call last):
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\cli\base_command.py", line 160, in exc_logging_wrapper
status = run_func(*args)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\cli\req_command.py", line 247, in wrapper
return func(self, options, args)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\commands\install.py", line 400, in run
requirement_set = resolver.resolve(
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\resolver.py", line 92, in resolve
result = self._result = resolver.resolve(
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 481, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 348, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\resolvelib\resolvers.py", line 172, in _add_to_criteria
if not criterion.candidates:
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\resolvelib\structs.py", line 151, in bool
return bool(self._sequence)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\found_candidates.py", line 155, in bool
return any(self)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\found_candidates.py", line 143, in
return (c for c in iterator if id(c) not in self._incompatible_ids)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\found_candidates.py", line 47, in _iter_built
candidate = func()
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\factory.py", line 206, in _make_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\candidates.py", line 297, in init
super().init(
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\candidates.py", line 162, in init
self.dist = self._prepare()
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\candidates.py", line 231, in _prepare
dist = self._prepare_distribution()
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\resolution\resolvelib\candidates.py", line 308, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\operations\prepare.py", line 491, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\operations\prepare.py", line 536, in _prepare_linked_requirement
local_file = unpack_url(
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\operations\prepare.py", line 166, in unpack_url
file = get_http_url(
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\operations\prepare.py", line 107, in get_http_url
from_path, content_type = download(link, temp_dir.path)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\network\download.py", line 134, in call
resp = _http_get_download(self._session, link)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\network\download.py", line 117, in _http_get_download
resp = session.get(target_url, headers=HEADERS, stream=True)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\requests\sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_internal\network\session.py", line 518, in request
return super().request(method, url, *args, **kwargs)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\cachecontrol\adapter.py", line 48, in send
cached_response = self.controller.cached_request(request)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\cachecontrol\controller.py", line 155, in cached_request
resp = self.serializer.loads(request, cache_data, body_file)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\cachecontrol\serialize.py", line 95, in loads
return getattr(self, "_loads_v{}".format(ver))(request, data, body_file)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\cachecontrol\serialize.py", line 186, in _loads_v4
cached = msgpack.loads(data, raw=False)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\msgpack\fallback.py", line 123, in unpackb
unpacker.feed(packed)
File "C:\audio-webui\audio-webui\venv\lib\site-packages\pip_vendor\msgpack\fallback.py", line 381, in feed
self._buffer.extend(view)
MemoryError

[notice] A new release of pip available: 22.3.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip

Please read the error above carefully.
If you are unsure, please create an issue at https://github.com/gitmylo/audio-webui/issues.
When creating an issue, please include your full autodebug message.
Exit code: 2

Looping venv.

Explored the issue with chatgpt and it gives me the following:
I asked it why the contents of venv.py would cause a loop.

The problem with this script is in the activate_venv() function. When the virtual environment (venv) is not activated, it tries to activate it and then it calls itself by running the script again inside the venv. This leads to an infinite recursion, as the new script instance will again check if it's in a venv, and if not, it will try to activate it and call itself again, and so on.

Here's the problem in more detail:

In the line:

python
Copy code
run_command([('call' if is_windows() else 'source', venv_activate_path), ('python', ' '.join([f'"{arg}"' for arg in sys.argv]))])
The run_command function seems to be running a command which involves both activating the venv and then running the same Python script (since sys.argv contains the command-line arguments, which in this case would be the script itself).

This is followed by exit(), which would normally stop execution, but because you've launched another instance of the script inside the run_command function, this exit command doesn't stop the recursion from happening.

One way to solve this problem could be to refactor the script so that the script doesn't call itself. Instead of calling the script again after activating the venv, you could just run the code that needs the venv directly after activating it. This way, the venv will be activated only once, and there won't be any infinite recursion.

Please note that to do this, you might need to refactor your script to be able to run the code you want from within the script, rather than needing to start a new script instance. This would involve isolating the parts of the code that need the venv into a separate function, and then calling this function directly after the venv activation. This would avoid the recursion issue.

missing tensorboardX requirement

First run on Win 11 installs most requirements then fails launching with a message:
2023-05-29 11:29:34 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX

activating the venv and running pip install tensorboardX and then running again with --skip-install resolves this.

[BUG REPORT]

Describe the bug
syntax error when running install.bat on windows on line 1"@echo off"

To Reproduce
Steps to reproduce the behavior:

  1. python install.bat

Expected behavior
expected to install the software

Install/Setup for Mac

Do you have instruction for running this project on Mac? I get following error when I execute run.sh.


Checking installs and venv
activating venv
sh: bin/activate: No such file or directory
Press any key to exit...


[FEATURE REQUEST] Real time voice changing with RVC

Allow for real time voice changing with rvc, by adding a new audio "input" device, which is basically a fake microphone which can then be used for voice cloning. if i'm not able to do that. use an output device instead.

Where to place model?

I have downloaded the model separately but I'm not sure on the folder structure. I tried putting it in suno--bark but it still tries to download the model.

[BUG REPORT] Selecting "small" in settings appears to d/l the large models.

Cloning is hitting different so I decided to try the small models again. When I select small models after using large I get an index error. When I do it on a fresh start it tries to download coarse_2.pt and text_2.pt, etc even though they're already moved to the suno folder in audio-webui models directory (and are the large models?).

Also will it be possible to mix and match small text + large coarse + small fine, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.