danruta / xva-synth Goto Github PK
View Code? Open in Web Editor NEWMachine learning based speech synthesis Electron app, with voices from specific characters from video games
License: GNU General Public License v3.0
Machine learning based speech synthesis Electron app, with voices from specific characters from video games
License: GNU General Public License v3.0
Thank you for such an exciting project.
"To start, double click the xVASynth.exe file, and make sure to click Allow"
Where is xVASynth.exe file?
Hello, just reporting server.py errors out when trying to read scripts.js with error:
python server.py
Traceback (most recent call last):
File "C:\Users\WDAGUtilityAccount\Desktop\_windows sandbox mapped\xVASynth_2\resources\app\server.py", line 16, in <module>
lines = f.read().split("\n")
UnicodeDecodeError: 'cp932' codec can't decode byte 0x99 in position 45308: illegal multibyte sequence
Basically, it won't run with Japanese locale.
I originally tried running the program as normal via xVASynth.exe, but server.exe would just close right away.
I do not understand why a program that is more than 10GB's requires the internet.
By default, the Deesing setting is set to 0.1. This makes ffmpeg go into a fatal error and display the pop-up:
Output ./resources/app/output/temp-16335802487988227_ffmpeg.mp3 same as Input #0 - exiting FFmpeg cannot edit existing files in-place.
This part of the code is at fault, as it expects WAV.
https://github.com/DanRuta/xVA-Synth/blob/master/python/audio_post.py#L81
ESO has excellent and extensive voice acting. While ESO obviously can't be modded, it's voices would fit in well with other games and provide a bit more variety.
Hi, thanks for your great effort. Is there any possibility on running this project in a mozilla-tts / OpenTTS-compatible headless mode? (e.g. Without the electron UI, so that all the synthesis could be done headless).
I tried running server.py, but so far got stuck on the
ImportError: cannot import name 'quote' from 'urllib' (/usr/lib/python3.7/urllib/__init__.py)
I plan to hook it to https://rhasspy.readthedocs.io/en/latest/
First, I'd like to thank you for this amazing tool. Training ML models is a damn hard task, and is the very reason the development of the Grand Prognosticator has been halted for now, lmao. Kudos for this. Your tool is exactly what I needed to complete my mod, because the new lines for Falk Firebeard had no voice. Now they are perfectly voice acted, and I have you to thank you for it.
This comes as a suggestion and not an issue. I'm not versed in JavaScript, so I don't know how to help you much and I have yet to inspect the code in depth, but one interesting thing for this tool to have would be to allow the user to select more than one letter at once and raise their bars. This way, we would maintain the relative position of the bars between letters, but change their absolute position and change the tone of the sentence itself, as a whole, instead of tweaking letter by letter. I don't know how hard it would be to implement such a change, but this would surely be an interesting one!
I've looked through the documentation but I can't find anything. Is this possible?
Phrase "the orc" becomes "{DH AH0 AO1 R K}" rather than {DH AH0} {AO1 R K}
And I cannot find it as a single word in CMUDICT. Odd exception.
I've been training a tacotron2 model and I'm trying to import it to XVA Synth. However, the only output for most of these trainers is a raw file with no extension. I've tried renaming the checkpoint file as .pt like the models to no avail. I also copied and edited a JSON to go with the model. How were your models exported? I can't find any documentation on exporting checkpoints as .pt or .hg.pt files. Thanks.
I'm having issues saving files recorded when there are more than two question marks in the sentence. And the phrase I'm synthesizing sounds better when I write "What? Are you mad?" than when I write these two things separately (but when I do, the file is saved normally).
The synthesized phrase is generated and it's playable, but I can't generate any actual files with it. The issue does not occur with exclamation marks, only with question marks. Maybe dropping any punctuation contained in the sentence synthesized when generating the file name would fix this.
Are the HiFi-GAN models trained using mel-spectrograms generated from the ground truth audio, from the Tacotron2 models or from the Fastpitch models?
Guide plі in which (direction) neuron network (neural framework) you use and why? in wich direction should I look, in order too make voices more realistic, and sound in other languages? Is it possible at all or it is very complex and hard to train network?
In the following code locations, a '.exe' file path extension is expected independent of platform:
xVA-Synth/python/fastpitch/model.py
Line 103 in 72ac458
xVA-Synth/python/fastpitch1_1/model.py
Line 200 in 72ac458
xVA-Synth/python/wav2vec2/model.py
Line 33 in 72ac458
Additionally, all code paths that check if the platform is Linux should be rewritten to check if the platform is NOT Windows: (for compatibility with macOS, BSD, etc.)
xVA-Synth/python/xvapitch/model.py
Line 179 in 72ac458
xVA-Synth/python/xvapitch/model.py
Line 257 in 72ac458
xVA-Synth/python/xvapitch/model.py
Line 389 in 72ac458
xVA-Synth/python/xvapitch/model.py
Line 721 in 72ac458
xVA-Synth/python/audio_post.py
Line 37 in 72ac458
xVA-Synth/python/audio_post.py
Line 112 in 72ac458
xVA-Synth/python/audio_post.py
Line 289 in 72ac458
Test sentence for v3 models:
Am I?
If modifier values of each letter in a word are changed one by one, then symbols are not affected and the effect is diminished.
"_", "<PAD>", "P", "<PAD>", "AE1", "<PAD>", "L", "<PAD>", "AH0", "<PAD>", "D", "<PAD>", "IH0", "<PAD>", "N", "<PAD>", "_" ], "pitchNew": [ 0, 0, 5.911111, 0, 5.911111, 0, 5.911111, 0, 5.911111, 0, 5.911111, 0, 5.911111, 0, 5.911111, 0, 0 ],
Hello! I liked this new feature to tweak the output a bit. I had to use Audacity to undersample the sounds to 16 bits so I could use them with the Creation Kit, but now that we can do it directly from the app, it's even better. The problem is that it says a file missing when I try to save the file.
I download the latest version of ffmpeg from their website, but I don't know what to do with it. What am I missing here?
Thanks!
Hello,
How can I use the FastPitch part of this program for a non-Bethesda game speaker?
I discovered this repo in my quest to create a personal modding project involving voice synthesis. Over the past week, I’ve successfully been fiddling around with the Real Time Voice Cloning repo to fine-tune their pretrained models to a single speaker. The model is continuing to improve slowly, but my sole gripe is the inability to control the pitch of the generated audios.
Would it be possible for you to tell me how I can modify or use your repo for a non-Bethesda speaker? I know how to compile datasets with LibriTTS and train a pre-made synthesizer on them, but not much else.
If you can help, I’d be grateful!
Hello and thank you for your great work!
Is it possible to run your code from google colab? I am acquainted with python, but not with java, and was wondering if the code could be run from source.
Thank you very much in advance!
If we could get a voice for this much-loved character I would be very grateful.
And thank you for this project. When I heard voice synthesizing was becoming possible the thought of creating a project like this crossed my mind, but I'm a novice programmer and wouldn't even know where to start. I'm happy to see I wasn't the only one who thought of it lol.
I get this Error Message when i enable the Plugin:
Failed to initialize the following plugin: skyrim_fuz_ro_bork->start->post
TypeError: Cannot read properties of undefined (reading 'app') at setupSettings (F:\xVASynth v3.0.0\resources\app\plugins\skyrim_fuz_ro_bork\main.js:13:132) at startTTSListening (F:\xVASynth v3.0.0\resources\app\plugins\skyrim_fuz_ro_bork\main.js:231:9) at Object.setup (F:\xVASynth v3.0.0\resources\app\plugins\skyrim_fuz_ro_bork\main.js:27:5) at PluginsManager.loadModuleFns (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:415:32) at PluginsManager.loadModules (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:357:27) at PluginsManager.apply (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:304:39) at HTMLButtonElement. (F:\xVASynth v3.0.0\resources\app\javascript\plugins_manager.js:57:63)
Maybe xVASynth Fuz Ro Bork needs to be updated? Or the "plugins_manager.js"
I tried setting this up on a fresh linux install and found that the steps in the readme are incomplete to set this up from scratch. The steps mentioned are:
npm install
npm start
# source $VIRTUALENV_HOME/bin/activate # optional
pip3 install -r requirements.txt
I needed to follow the following additional steps to get this working:
npm audit fix
Not sure why, but it doesn't work without this.models/
directory. Make sure the game name directory is the same as the game name in the voice set json file. Without this it throws an error when enabling use gpu
in the settings:Traceback (most recent call last):
File "./server.py", line 129, in do_POST
fastpitch_model.device = torch.device('cuda' if use_gpu else 'cpu')
AttributeError: 'int' object has no attribute 'device'
{'device': 'gpu'}
Without doing this, the loading fastpitch model
dialog appears but does not progress. Electron should show an error dialog here instead of just printing the error in a log file, but this is a separate issue from the one being discussed here. I believe the above steps are sufficient because the DEBUG*
files and the xVA-Synth\server.log
file do not have any errors when I try to generate sounds.
(Side note: I am on linux, so the \
above caused the file to be created in the parent directory. use pathlib
in python 3 to make this work across platforms)
Resolution:
npm install
command installs v2.0.18, while it looks like the current version is 11.x.xpip
.Hello, probably a big ask, but would you consider adding intel arc GPU acceleration (it also invariantly speeds up Intel CPUs), however, there are plenty of examples over in the IPEX-LLM repo for implementations. It's a long shot, but it would truly be great if possible.
All that would be required for the user would be a warning to visit the IPEX-LLM docs to install the necessary library.
The examples include loading to and from the GPU, etc., and includes a pytorch version specific for the GPUs.
Hopefully this is possible, as without GPU acceleration the program can be really rather slow.
Thank you.
xVASynthVoices.json
TypeError: model.games.forEach is not a function at file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:281:37 at Array.forEach () at file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:271:23 at Array.forEach () at file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:262:24 at new Promise () at window.loadAllModels (file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:221:12) at FSWatcher. (file:///F:/xVASynth%20v3.0.0/resources/app/javascript/script.js:1523:29) at FSWatcher.emit (node:events:526:28) at FSWatcher._handle.onchange (node:internal/fs/watchers:212:12)
Originally, I thought this was due to me using the mp3 mode (as the temporary .wav file is still created and plays normally when I grab it)
But I can't seem to get audio files to export after updating - Previously, I was on 1.3.6 and exporting worked fine.
Below is a sample from server.log:
2021-07-20 02:07:02,345 - POST /outputAudio
2021-07-20 02:07:02,347 - audio options: {'hz': 22050, 'padStart': 0, 'padEnd': 0, 'bit_depth': 'pcm_s32le', 'amplitude': 1}
2021-07-20 02:07:02,348 - ffmpeg command: ffmpeg -i ./resources/app/output/temp-4752225620384505.wav -acodec pcm_s32le -af volume=1 -ar 22050 C:/Users/Kaz/Desktop/out/sk_delvin/You're my son. I can't let you just run off and get hurt.wav
2021-07-20 02:07:02,391 - Traceback (most recent call last):
File "python\audio_post.py", line 39, in run_audio_post
File "ffmpeg\_run.py", line 320, in run
File "ffmpeg\_run.py", line 285, in run_async
File "subprocess.py", line 709, in __init__
File "subprocess.py", line 997, in _execute_child
FileNotFoundError: [WinError 2] The system cannot find the file specified
This happens no matter what voice or what mode i choose. I did accidentally wipe out my entire xVASynth folder when updating to 1.4.0, but I would've assumed that since i was "reinstalling" at that point (albiet by accident) exporting shouldn't have been an issue :?
I hope this is somehow user error, but I may as well open an issue here in case I'm not alone - Dr. Google said there was another issue similar to this but it happened randomly, as opposed to consistently as here.
@DanRuta had mentioned that v0.12 of TorchAudio has streaming capabilities. If TorchAudio was xVASynth's was updated, perhaps it would easily allow implementing audio streaming?
https://pytorch.org/blog/pytorch-1.12-new-library-releases/#beta-streaming-api
[Edit] Misinterpreted, while xVASynth does install TorchAudio, it is only used for mel spectrogram representation
This by default means that there is no license, so all rights are reserved in the most extreme way, technically this means for users:
If you find software that doesn’t have a license, that generally means you have no permission from the creators of the software to use, modify, or share the software. Although a code host such as GitHub may allow you to view and fork the code, this does not imply that you are permitted to use, modify, or share the software for any purpose.
More information is here: https://choosealicense.com/no-permission/
In case this isn't intentional, which I find quite likely, simply add a LICENSE
or LICENSE.md
file to the repository with the license of your choice. Be aware that depending on what sources / libraries you used, you might be required to use the same or similar licenses as them.
Thank you for the application by the way.
Version:
CPU
1.2.2 (FO4)
Expected behavior:
samples overwrite when prompted
Actual behavior:
samples do not overwrite when prompted
Description of Unexpected Behavior:
Installing ffmpeg 4.3.2 after keeping samples from prior to installing ffmpeg gives error and will not overwrite previously recorded sample.
How to reproduce error:
Additional information:
Voice model: Danse 1.1
Vocoder: WaveGlow
Prior to installing ffmpeg:
Sample bit rate: default
Audio bit depth: default
Format: wav
After installing ffmpeg and customizing:
Updated bit rate: 44100
Updated audio bit depth: pcm_s16le
Format: wav.
Silence padding start/end: 5 ms
Notes:
I've not found any documentation that this is expected behavior for the software.
I unfortunately clicked off the popup and was unable to get a screenshot of error.
I cannot find documentation for where the error logs are located to give any other info. In which folder do I look for logs in order to provide said information? The popup appears to be inefficient in this situation.
hello !
impressive work ! are you planning a linux release anytime soon ?
Would it be possible to get a portable mode for this? It's a great application, but sometimes I want several installs to tweak etc.
Implement a "smoothness" parameter like Sonantic's Pitch editor:
https://www.youtube.com/watch?v=fNtwg-lXie8&t=194s
ZIP archives can be dropped on the sidebar in order to install them. But there is no progress bar that notifies the user that it is ongoing.
As the title suggests, having an option to see the command line interface would be quite useful for both devs and users. The feature will help with catching errors, observing generation speed, or info in case generation or synthesis gets stuck.
Allow plugins to send emotional modifier values.
I got a simple .csv file for batch synthesis here. The problem is that when I choose Cirilla's voice, batch generation doesn't work. I'll leave the traceback below. But the app works good if I choose other voices, such as Cerys (both from "The Witcher" game).
Traceback (most recent call last):
File "server.py", line 335, in do_POST
File "python\xvapitch\model.py", line 379, in infer_batch
File "C:\Users\Ksardas\Downloads\xVASynth v3.0.0 Main app-44184-3-0-0-1684850032\v3.0.0\.\resources\app\python\xvapitch\xvapitch_model.py", line 223, in infer_advanced
return self.infer_using_vals(logger, plugin_manager, cleaned_text, text, lang_embs, speaker_embs, pace, None, None, None, None, None, None, pitch_amp=pitch_amp)
File "C:\Users\Ksardas\Downloads\xVASynth v3.0.0 Main app-44184-3-0-0-1684850032\v3.0.0\.\resources\app\python\xvapitch\xvapitch_model.py", line 292, in infer_using_vals
x, x_emb, x_mask = self.text_encoder(input_symbols, x_lengths, lang_emb=None, stats=False, lang_emb_full=lang_emb_full)
File "torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ksardas\Downloads\xVASynth v3.0.0 Main app-44184-3-0-0-1684850032\v3.0.0\.\resources\app\python\xvapitch\xvapitch_model.py", line 602, in forward
x = torch.cat((x_emb, lang_emb_full), dim=-1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 5
Arpabet dictionaries are loaded for fastpitch1_1 models, but not xVAPitch.
https://github.com/DanRuta/xVA-Synth/blob/master/server.py#L251-L252
https://github.com/DanRuta/xVA-Synth/blob/master/server.py#L488-L489
Workaround: modify the resources\app\python\xvapitch\text\dicts\cmudict.txt
file and restart xVASynth
Do we have or will we have Todd Howard as one of the voices? Is that within the scope of this project?
I would really love to have Todd Howard as one of the possible voices. This would take me even further to my dream of having Todd Howard in Fallout 4 as a companion telling me to buy Skyrim (again).
Thanks.
It would be excellent to be able to launch and use xvasynth while not connected to the internet.
Throughout the code base, to figure out location of resource files, the following code pattern is used:
f'{"./resources/app" if self.PROD else "."}/some/path'
This creates a dependency on the current working directory, and makes the code harder to reuse outside of xVA-Synth - to package it in a library for example.
As the PROD
attribute is propagated throughout the code base, it could as well be a path prefix, call it ROOT
, instead of a boolean. The code pattern would become simpler and more flexible than current usage:
f'{self.ROOT}/some/path'
Plus dependency on current working directory - which can change due to many factors, e.g. how a Windows shortcut is configured - is removed.
There are times when users that download all voice models, for example Skyrim, have incomplete downloads. There should be an error message for such cases.
Nexus Mods does not offer MD5/SHA1 checksums for download files, only a method for finding files by a MD5 checksum. The other method would be to check the filesize in kilobytes, info which Nexus Mods does offer:
https://github.com/Nexus-Mods/node-nexus-api/blob/af3f1874e3d914df2a881e6faa2b8458957fab60/docs/interfaces/_types_.ifileinfo.md#size
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.