Models trained to reconstruct melspectrograms of voices saying "Ah" using speaker embeddings
Data required is
- melspectrograms(extracted using librosa) as .npy and
- embeddings as .npy files extracted using the speaker encoder provided by J Corentin at https://github.com/CorentinJ/Real-Time-Voice-Cloning/