The xva-synth's discuss from danruta

Where is xVASynth.exe file?

Thank you for such an exciting project.

"To start, double click the xVASynth.exe file, and make sure to click Allow"

Where is xVASynth.exe file?

UnicodeDecodeError: 'cp932' codec can't decode byte 0x99

Hello, just reporting server.py errors out when trying to read scripts.js with error:

python server.py
Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\Desktop\_windows sandbox mapped\xVASynth_2\resources\app\server.py", line 16, in <module>
    lines = f.read().split("\n")
UnicodeDecodeError: 'cp932' codec can't decode byte 0x99 in position 45308: illegal multibyte sequence

Basically, it won't run with Japanese locale.
I originally tried running the program as normal via xVASynth.exe, but server.exe would just close right away.

Logs don't show anything obvious dealing with that error, only app.log is written to
app.log

Can this work offline?

I do not understand why a program that is more than 10GB's requires the internet.

ffmpeg: MP3 format crash with Deesing

By default, the Deesing setting is set to 0.1. This makes ffmpeg go into a fatal error and display the pop-up:
Output ./resources/app/output/temp-16335802487988227_ffmpeg.mp3 same as Input #0 - exiting FFmpeg cannot edit existing files in-place.

This part of the code is at fault, as it expects WAV.
https://github.com/DanRuta/xVA-Synth/blob/master/python/audio_post.py#L81

Voice Request: Elder Scrolls Online voices

ESO has excellent and extensive voice acting. While ESO obviously can't be modded, it's voices would fit in well with other games and provide a bit more variety.

TTS Server only mode?

Hi, thanks for your great effort. Is there any possibility on running this project in a mozilla-tts / OpenTTS-compatible headless mode? (e.g. Without the electron UI, so that all the synthesis could be done headless).
I tried running server.py, but so far got stuck on the

ImportError: cannot import name 'quote' from 'urllib' (/usr/lib/python3.7/urllib/__init__.py)

I plan to hook it to https://rhasspy.readthedocs.io/en/latest/

Letter tones and intensity (suggestion)

First, I'd like to thank you for this amazing tool. Training ML models is a damn hard task, and is the very reason the development of the Grand Prognosticator has been halted for now, lmao. Kudos for this. Your tool is exactly what I needed to complete my mod, because the new lines for Falk Firebeard had no voice. Now they are perfectly voice acted, and I have you to thank you for it.

This comes as a suggestion and not an issue. I'm not versed in JavaScript, so I don't know how to help you much and I have yet to inspect the code in depth, but one interesting thing for this tool to have would be to allow the user to select more than one letter at once and raise their bars. This way, we would maintain the relative position of the bars between letters, but change their absolute position and change the tone of the sentence itself, as a whole, instead of tweaking letter by letter. I don't know how hard it would be to implement such a change, but this would surely be an interesting one!

Method for starting via the command line and perhaps preloading a model etc?

I've looked through the documentation but I can't find anything. Is this possible?

ARPAbet "the orc" fusion on v3 voices

Phrase "the orc" becomes "{DH AH0 AO1 R K}" rather than {DH AH0} {AO1 R K}

And I cannot find it as a single word in CMUDICT. Odd exception.

Importing a custom model

I've been training a tacotron2 model and I'm trying to import it to XVA Synth. However, the only output for most of these trainers is a raw file with no extension. I've tried renaming the checkpoint file as .pt like the models to no avail. I also copied and edited a JSON to go with the model. How were your models exported? I can't find any documentation on exporting checkpoints as .pt or .hg.pt files. Thanks.

Issue saving files

I'm having issues saving files recorded when there are more than two question marks in the sentence. And the phrase I'm synthesizing sounds better when I write "What? Are you mad?" than when I write these two things separately (but when I do, the file is saved normally).

The synthesized phrase is generated and it's playable, but I can't generate any actual files with it. The issue does not occur with exclamation marks, only with question marks. Maybe dropping any punctuation contained in the sentence synthesized when generating the file name would fix this.

HiFi-GAN training

Are the HiFi-GAN models trained using mel-spectrograms generated from the ground truth audio, from the Tacotron2 models or from the Fastpitch models?

Bug: Phoneme editor container stretches longer than its content

Users can scroll the empty part.

How about translation other languages?

Guide plі in which (direction) neuron network (neural framework) you use and why? in wich direction should I look, in order too make voices more realistic, and sound in other languages? Is it possible at all or it is very complex and hard to train network?

Wav2Vec2, FastPitch 1, and FastPitch 1.1 code does not remove '.exe' from expected executable paths when not running on Windows

In the following code locations, a '.exe' file path extension is expected independent of platform:

xVA-Synth/python/fastpitch/model.py

Line 103 in 72ac458

ffmpeg_path = f'{"./resources/app" if self.PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/fastpitch1_1/model.py

Line 200 in 72ac458

ffmpeg_path = f'{"./resources/app" if self.PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/wav2vec2/model.py

Line 33 in 72ac458

ffmpeg_path = f'{"./resources/app" if PROD else "."}/python/ffmpeg.exe'

Additionally, all code paths that check if the platform is Linux should be rewritten to check if the platform is NOT Windows: (for compatibility with macOS, BSD, etc.)

xVA-Synth/python/xvapitch/text/ipa_to_xvaarpabet.py

Line 285 in 72ac458

if platform.system() == 'Linux':

xVA-Synth/python/xvapitch/model.py

Line 179 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if self.PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/xvapitch/model.py

Line 257 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if self.PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/xvapitch/model.py

Line 389 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if self.PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/xvapitch/model.py

Line 721 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if self.PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/audio_post.py

Line 37 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/audio_post.py

Line 112 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if PROD else "."}/python/ffmpeg.exe'

xVA-Synth/python/audio_post.py

Line 289 in 72ac458

    
           ffmpeg_path = 'ffmpeg' if platform.system() == 'Linux' else f'{"./resources/app" if PROD else "."}/python/ffmpeg.exe'

AM cleanup crash v3 models if sentence begins with "Am"

Test sentence for v3 models:

Am I?

xVAPitch/v3 models <PAD> symbols ignored by modifier if done one by one

If modifier values of each letter in a word are changed one by one, then symbols are not affected and the effect is diminished.

    "_",
    "<PAD>",
    "P",
    "<PAD>",
    "AE1",
    "<PAD>",
    "L",
    "<PAD>",
    "AH0",
    "<PAD>",
    "D",
    "<PAD>",
    "IH0",
    "<PAD>",
    "N",
    "<PAD>",
    "_"
],
"pitchNew": [
    0,
    0,
    5.911111,
    0,
    5.911111,
    0,
    5.911111,
    0,
    5.911111,
    0,
    5.911111,
    0,
    5.911111,
    0,
    5.911111,
    0,
    0
],

Using the new features with ffmpeg 4.3

Hello! I liked this new feature to tweak the output a bit. I had to use Audacity to undersample the sounds to 16 bits so I could use them with the Creation Kit, but now that we can do it directly from the app, it's even better. The problem is that it says a file missing when I try to save the file.

I download the latest version of ffmpeg from their website, but I don't know what to do with it. What am I missing here?

Thanks!

Using the program for other voices

Hello,

How can I use the FastPitch part of this program for a non-Bethesda game speaker?

I discovered this repo in my quest to create a personal modding project involving voice synthesis. Over the past week, I’ve successfully been fiddling around with the Real Time Voice Cloning repo to fine-tune their pretrained models to a single speaker. The model is continuing to improve slowly, but my sole gripe is the inability to control the pitch of the generated audios.

Would it be possible for you to tell me how I can modify or use your repo for a non-Bethesda speaker? I know how to compile datasets with LibriTTS and train a pre-made synthesizer on them, but not much else.

If you can help, I’d be grateful!

Colab

Hello and thank you for your great work!
Is it possible to run your code from google colab? I am acquainted with python, but not with java, and was wondering if the code could be run from source.
Thank you very much in advance!

Voice Request: Lucien Lachance of Oblivion

If we could get a voice for this much-loved character I would be very grateful.

And thank you for this project. When I heard voice synthesizing was becoming possible the thought of creating a project like this crossed my mind, but I'm a novice programmer and wouldn't even know where to start. I'm happy to see I wasn't the only one who thought of it lol.

Cannot use xVASynth Fuz Ro Bork plugin v1.2 anymore since xVASynth v3.0.0.

I get this Error Message when i enable the Plugin:

Failed to initialize the following plugin: skyrim_fuz_ro_bork->start->post

TypeError: Cannot read properties of undefined (reading 'app') at setupSettings (F:\xVASynth v3.0.0\resources\app\plugins\skyrim_fuz_ro_bork\main.js:13:132) at startTTSListening (F:\xVASynth v3.0.0\resources\app\plugins\skyrim_fuz_ro_bork\main.js:231:9) at Object.setup (F:\xVASynth v3.0.0\resources\app\plugins\skyrim_fuz_ro_bork\main.js:27:5) at PluginsManager.loadModuleFns (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:415:32) at PluginsManager.loadModules (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:357:27) at PluginsManager.apply (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:304:39) at HTMLButtonElement. (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:57:63)
Maybe xVASynth Fuz Ro Bork needs to be updated? Or the "plugins_manager.js"

README.md instructions are not sufficient for a fresh install

I tried setting this up on a fresh linux install and found that the steps in the readme are incomplete to set this up from scratch. The steps mentioned are:

npm install
npm start
# source $VIRTUALENV_HOME/bin/activate # optional
pip3 install -r requirements.txt

I needed to follow the following additional steps to get this working:

Electron: Some dependency is missing. Setting up the electron example project by following the instructions here installed it: https://github.com/electron/electron-quick-start
npm audit fix Not sure why, but it doesn't work without this.
If using GPU, Install cuda by following the instructions here: https://docs.nvidia.com/cuda/#installation-guides
Install pytorch (if using GPU, with cuda) by following the instructions here: https://pytorch.org/get-started/locally/
Add at least one model file to the models/ directory. Make sure the game name directory is the same as the game name in the voice set json file. Without this it throws an error when enabling use gpu in the settings:

Traceback (most recent call last):
  File "./server.py", line 129, in do_POST
    fastpitch_model.device = torch.device('cuda' if use_gpu else 'cpu')
AttributeError: 'int' object has no attribute 'device'
{'device': 'gpu'}

Without doing this, the loading fastpitch model dialog appears but does not progress. Electron should show an error dialog here instead of just printing the error in a log file, but this is a separate issue from the one being discussed here. I believe the above steps are sufficient because the DEBUG* files and the xVA-Synth\server.log file do not have any errors when I try to generate sounds.
(Side note: I am on linux, so the \ above caused the file to be created in the parent directory. use pathlib in python 3 to make this work across platforms)

Resolution:

I'm not familiar with NPM packages to point out which packages are missing for NPM to get electron working. Most likely changing the package requirements file will fix this.
- What version of electron is needed? The npm install command installs v2.0.18, while it looks like the current version is 11.x.x
For the cuda and pytorch dependencies, it is probably better for the user to install them manually. The links above should be added to the README
- What version of CUDA is recommended? For me, this is working with cuda 10.2, the current pytorch default in pip.

Intel ARC GPU support

Hello, probably a big ask, but would you consider adding intel arc GPU acceleration (it also invariantly speeds up Intel CPUs), however, there are plenty of examples over in the IPEX-LLM repo for implementations. It's a long shot, but it would truly be great if possible.

All that would be required for the user would be a warning to visit the IPEX-LLM docs to install the necessary library.

The examples include loading to and from the GPU, etc., and includes a pytorch version specific for the GPUs.

Hopefully this is possible, as without GPU acceleration the program can be really rather slow.

Thank you.

xVASynth v3.0.0. / xVASynthVoices.json

xVASynthVoices.json

TypeError: model.games.forEach is not a function at file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:281:37 at Array.forEach () at file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:271:23 at Array.forEach () at file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:262:24 at new Promise () at window.loadAllModels (file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:221:12) at FSWatcher. (file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:1523:29) at FSWatcher.emit (node:events:526:28) at FSWatcher._handle.onchange (node:internal/fs/watchers:212:12)

ffmpeg error when attempting to export audio?

Originally, I thought this was due to me using the mp3 mode (as the temporary .wav file is still created and plays normally when I grab it)

But I can't seem to get audio files to export after updating - Previously, I was on 1.3.6 and exporting worked fine.

Below is a sample from server.log:

2021-07-20 02:07:02,345 - POST /outputAudio
2021-07-20 02:07:02,347 - audio options: {'hz': 22050, 'padStart': 0, 'padEnd': 0, 'bit_depth': 'pcm_s32le', 'amplitude': 1}
2021-07-20 02:07:02,348 - ffmpeg command: ffmpeg -i ./resources/app/output/temp-4752225620384505.wav -acodec pcm_s32le -af volume=1 -ar 22050 C:/Users/Kaz/Desktop/out/sk_delvin/You're my son. I can't let you just run off and get hurt.wav
2021-07-20 02:07:02,391 - Traceback (most recent call last):
  File "python\audio_post.py", line 39, in run_audio_post
  File "ffmpeg\_run.py", line 320, in run
  File "ffmpeg\_run.py", line 285, in run_async
  File "subprocess.py", line 709, in __init__
  File "subprocess.py", line 997, in _execute_child
FileNotFoundError: [WinError 2] The system cannot find the file specified

This happens no matter what voice or what mode i choose. I did accidentally wipe out my entire xVASynth folder when updating to 1.4.0, but I would've assumed that since i was "reinstalling" at that point (albiet by accident) exporting shouldn't have been an issue :?

I hope this is somehow user error, but I may as well open an issue here in case I'm not alone - Dr. Google said there was another issue similar to this but it happened randomly, as opposed to consistently as here.

Add audio streaming support

@DanRuta ~~had mentioned that v0.12 of TorchAudio has streaming capabilities. If TorchAudio was xVASynth's was updated, perhaps it would easily allow implementing audio streaming?~~
https://pytorch.org/blog/pytorch-1.12-new-library-releases/#beta-streaming-api

[Edit] Misinterpreted, while xVASynth does install TorchAudio, it is only used for mel spectrogram representation

Missing license

This by default means that there is no license, so all rights are reserved in the most extreme way, technically this means for users:

If you find software that doesn’t have a license, that generally means you have no permission from the creators of the software to use, modify, or share the software. Although a code host such as GitHub may allow you to view and fork the code, this does not imply that you are permitted to use, modify, or share the software for any purpose.

More information is here: https://choosealicense.com/no-permission/

In case this isn't intentional, which I find quite likely, simply add a LICENSE or LICENSE.md file to the repository with the license of your choice. Be aware that depending on what sources / libraries you used, you might be required to use the same or similar licenses as them.

Thank you for the application by the way.

ffmpeg 4.3.2 sample overwrite error

Version:
CPU
1.2.2 (FO4)

Expected behavior:
samples overwrite when prompted

Actual behavior:
samples do not overwrite when prompted

Description of Unexpected Behavior:
Installing ffmpeg 4.3.2 after keeping samples from prior to installing ffmpeg gives error and will not overwrite previously recorded sample.

How to reproduce error:

choose "Keep Sample" without installing ffmpeg 4.3.2
install ffmpeg following instructions from video link in Description of app on NexusMods
make minor adjustment to previous sample by clicking the edit icon
"Generate Voice"
error occurs

Additional information:
Voice model: Danse 1.1
Vocoder: WaveGlow

Prior to installing ffmpeg:
Sample bit rate: default
Audio bit depth: default
Format: wav

After installing ffmpeg and customizing:
Updated bit rate: 44100
Updated audio bit depth: pcm_s16le
Format: wav.
Silence padding start/end: 5 ms

Notes:
I've not found any documentation that this is expected behavior for the software.

I unfortunately clicked off the popup and was unable to get a screenshot of error.

I cannot find documentation for where the error logs are located to give any other info. In which folder do I look for logs in order to provide said information? The popup appears to be inefficient in this situation.

linux release ?

hello !
impressive work ! are you planning a linux release anytime soon ?

Loading new model does not reset Styles

In fact they are re-added to the select option list:

Portable Mode

Would it be possible to get a portable mode for this? It's a great application, but sometimes I want several installs to tweak etc.

Implement Smoothness parameter

Implement a "smoothness" parameter like Sonantic's Pitch editor:
https://www.youtube.com/watch?v=fNtwg-lXie8&t=194s

Nexus Mods Premium - install stalls if no game folder found

Voices stops installing, if there is no game folder.

Drag & Drop voice model archives on sidebar - progressbar

ZIP archives can be dropped on the sidebar in order to install them. But there is no progress bar that notifies the user that it is ongoing.

[ Feature Request ] An option to enable command-line-interface/cli.

As the title suggests, having an option to see the command line interface would be quite useful for both devs and users. The feature will help with catching errors, observing generation speed, or info in case generation or synthesis gets stuck.

Allow plugin to change emotional modifiers to whole sequence

Allow plugins to send emotional modifier values.

Error generating batch voice for Cirilla

I got a simple .csv file for batch synthesis here. The problem is that when I choose Cirilla's voice, batch generation doesn't work. I'll leave the traceback below. But the app works good if I choose other voices, such as Cerys (both from "The Witcher" game).

BathSynthesis.csv

Traceback (most recent call last):
File "server.py", line 335, in do_POST
File "python\xvapitch\model.py", line 379, in infer_batch
File "C:\Users\Ksardas\Downloads\xVASynth v3.0.0 Main app-44184-3-0-0-1684850032\v3.0.0\.\resources\app\python\xvapitch\xvapitch_model.py", line 223, in infer_advanced
return self.infer_using_vals(logger, plugin_manager, cleaned_text, text, lang_embs, speaker_embs, pace, None, None, None, None, None, None, pitch_amp=pitch_amp)
File "C:\Users\Ksardas\Downloads\xVASynth v3.0.0 Main app-44184-3-0-0-1684850032\v3.0.0\.\resources\app\python\xvapitch\xvapitch_model.py", line 292, in infer_using_vals
x, x_emb, x_mask = self.text_encoder(input_symbols, x_lengths, lang_emb=None, stats=False, lang_emb_full=lang_emb_full)
File "torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ksardas\Downloads\xVASynth v3.0.0 Main app-44184-3-0-0-1684850032\v3.0.0\.\resources\app\python\xvapitch\xvapitch_model.py", line 602, in forward
x = torch.cat((x_emb, lang_emb_full), dim=-1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 5

ARPABet dictionaries are not loaded for v3 models

Arpabet dictionaries are loaded for fastpitch1_1 models, but not xVAPitch.

https://github.com/DanRuta/xVA-Synth/blob/master/server.py#L251-L252

https://github.com/DanRuta/xVA-Synth/blob/master/server.py#L488-L489

Workaround: modify the resources\app\python\xvapitch\text\dicts\cmudict.txt file and restart xVASynth

Todd Howard voice

Do we have or will we have Todd Howard as one of the voices? Is that within the scope of this project?

I would really love to have Todd Howard as one of the possible voices. This would take me even further to my dream of having Todd Howard in Fallout 4 as a companion telling me to buy Skyrim (again).

Thanks.

Work Offline

It would be excellent to be able to launch and use xvasynth while not connected to the internet.

File access relative to current working directory throughout code, make it hard to use code in library

Throughout the code base, to figure out location of resource files, the following code pattern is used:

f'{"./resources/app" if self.PROD else "."}/some/path'

This creates a dependency on the current working directory, and makes the code harder to reuse outside of xVA-Synth - to package it in a library for example.

As the PROD attribute is propagated throughout the code base, it could as well be a path prefix, call it ROOT, instead of a boolean. The code pattern would become simpler and more flexible than current usage:

f'{self.ROOT}/some/path'

Plus dependency on current working directory - which can change due to many factors, e.g. how a Windows shortcut is configured - is removed.

Nexus Mods Premium - File integrity check

There are times when users that download all voice models, for example Skyrim, have incomplete downloads. There should be an error message for such cases.

Nexus Mods does not offer MD5/SHA1 checksums for download files, only a method for finding files by a MD5 checksum. The other method would be to check the filesize in kilobytes, info which Nexus Mods does offer:
https://github.com/Nexus-Mods/node-nexus-api/blob/af3f1874e3d914df2a881e6faa2b8458957fab60/docs/interfaces/_types_.ifileinfo.md#size

danruta / xva-synth Goto Github PK

xva-synth's Issues

Recommend Projects

Recommend Topics

Recommend Org