mindspore-lab / mindaudio Goto Github PK

View Code? Open in Web Editor NEW

33.0 5.0 10.0 8.69 MB

A toolbox of audio models and algorithms based on MindSpore

License: Apache License 2.0

Python 18.47% Shell 0.03% Jupyter Notebook 81.50%

deep-learning speech-recognition audio speaker-verification mindspore

mindaudio's People

Contributors

Stargazers

Watchers

Forkers

daiyuxin0511 vigo999 yiluxiangbei zyhstack wdfk123 jianyunchao mygia8 litingyu1997 liunix61 zainlau

mindaudio's Issues

[API][330][amplitude_to_dB]

Turn a spectrogram from the amplitude/power scale to decibel scale.

[Models][330][Fastspeech2 Train/Eval]

[API][330][angle]

Compute the norm of complex number sequence.

[API][330][trim]

Trim an audio signal to keep concecutive non-silent segment.

[API][330][context_window]

Create a context window from an audio signal to gather multiple time step in a single feature vector.
Returns the array with the surrounding context.

[API][330][dB_to_amplitude]

    Turn a dB-scaled spectrogram to the power/amplitude scale.

[ LJSpeech-wavegrad][Ascend][GRAPH] Distributed training error reporting

执行8p训练报错，在刚开始打印训练日志时存在报错信息，但训练可以继续向下执行，并正常打印loss值等信息

执行步骤：mpirun --allow-run-as-root -n 8 python recipes/LJSpeech/tts/wavegrad/train.py --device_target Ascend --is_distributed True --context_mode graph
报错截图：暂时不支持上传截图，会线上发送给开发进行沟通

[API][330][sliding_window_cmn]

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

[API][330][spectral_centroid]

Create a spectral centroid from an audio signal.

[API][330][invert_channels]

Inverts channels of the audio.

[API][330][melspectrogram]

Create a mel-scaled spectrogram from an audio signal.

[API][330][insert_in_background]

[API][330][stereo_to_mono]

Transform stereo audios into mono audio by averaging different channels.

[API][330][split]

Split an audio signal into non-silent intervals.

[API][330][frequencymasking]

Apply masking to a spectrogram in the frequency domain.

[API][330][compute_deltas]

Compute delta coefficients of a spectrogram.

[API][330][rescale]

Performs signal rescaling to a target level.

[API][330][resample]

Resample a signal from one frequency to another. A resample method can be given.

[LJSpeech-wavegrad] [Ascend] [GRAPH] Single card training error

执行单卡训练时报错

执行步骤：python recipes/LJSpeech/tts/wavegrad/train.py --device_target Ascend --device_id 0 --context_mode graph
报错截图：暂时不支持上传截图，会发送给开发进行沟通

[API][330][ unitarize]

Normalizes a signal to unitary average or peak amplitude.

[Models][330][Wavegrad Train/Eval]

[API][330][complex_norm]

Compute the norm of complex number sequence.

[API][330][normalize]

Normalize an array along a specified axis.

Update build version to 0.1.1

Update build version from 0.1.0 to 0.1.1

[API][330][mfcc]

Generate Mel-frequency cepstrum coefficients (MFCC) features from input audio signal.

[API][330][melscale]

Convert normal STFT to STFT at the Mel scale

[API][330][timemasking]

Apply masking to a spectrogram in the time domain.

[API][330][compute_amplitude]

Compute amplitude of a batch of waveforms.

[API][330][spectrogram]

 Create a spectrogram from an audio signal.

Please add distributed training scenarios in readme

请在readme中补充分布式训练的场景，目前训练部分仅单卡训练的说明

[API][330][add_noise]

add background noise.

[Models][330][EcapaTDNN-Train/Eval]

[Models][330][Deepspeech2 Train/Eval]

[ LJSpeech-wavegrad] Please clarify the path of manifest_path in readme

readme关于数据预处理过程的描述
Preprocess data to get a "_wav.npy" and "_feature.npy" for each ".wav" file in your dataset folder. Set your data_path and manifest_path in wavegrad_base.yaml. You can now run the following command:
python recipes/LJSpeech/tts/wavegrad/preprocess.py --device_target CPU --device_id 0

首次做数据处理时遇到报错，和开发沟通后获悉以上说明中的manifest_path为新生成的.csv文件的路径，并非原数据集解压后metadata.csv的路径
建议对该路径添加提示，以免用户使用时造成误解

[API][330][magphase]

Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase.

[API][330][notch_filter]

A notch filter only filters a very narrow band.

[API][330][reverberate]

Reverberate a given signal with given a Room Impulse Response (RIR). It performs convolution between RIR and signal,
but without changing the original amplitude of the signal.