Coder Social home page Coder Social logo

kaituoxu / conv-tasnet Goto Github PK

View Code? Open in Web Editor NEW
640.0 11.0 152.0 1.26 MB

A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).

License: MIT License

Shell 10.12% Perl 32.59% Python 56.90% Makefile 0.39%
speech-separation source-separation audio-separation pit pytorch tasnet conv-tasnet permutation-invariant-training

conv-tasnet's Introduction

Conv-TasNet

A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation".

Results

From N L B H P X R Norm Causal batch size SI-SNRi(dB) SDRi(dB)
Paper 256 20 256 512 3 8 4 gLN X - 14.6 15.0
Here 256 20 256 512 3 8 4 gLN X 3 15.5 15.7

Install

  • PyTorch 0.4.1+
  • Python3 (Recommend Anaconda)
  • pip install -r requirements.txt
  • If you need to convert wjs0 to wav format and generate mixture files, cd tools; make

Usage

If you already have mixture wsj0 data:

  1. $ cd egs/wsj0, modify wsj0 data path data to your path in the beginning of run.sh.
  2. $ bash run.sh, that's all!

If you just have origin wsj0 data (sphere format):

  1. $ cd egs/wsj0, modify three wsj0 data path to your path in the beginning of run.sh.
  2. Convert sphere format wsj0 to wav format and generate mixture. Stage 0 part provides an example.
  3. $ bash run.sh, that's all!

You can change hyper-parameter by $ bash run.sh --parameter_name parameter_value, egs, $ bash run.sh --stage 3. See parameter name in egs/aishell/run.sh before . utils/parse_options.sh.

Workflow

Workflow of egs/wsj0/run.sh:

  • Stage 0: Convert sphere format to wav format and generate mixture (optional)
  • Stage 1: Generating json files including wav path and duration
  • Stage 2: Training
  • Stage 3: Evaluate separation performance
  • Stage 4: Separate speech using Conv-TasNet

More detail

# Set PATH and PYTHONPATH
$ cd egs/wsj0/; . ./path.sh
# Train:
$ train.py -h
# Evaluate performance:
$ evaluate.py -h
# Separate mixture audio:
$ separate.py -h

How to visualize loss?

If you want to visualize your loss, you can use visdom to do that:

  1. Open a new terminal in your remote server (recommend tmux) and run $ visdom
  2. Open a new terminal and run $ bash run.sh --visdom 1 --visdom_id "<any-string>" or $ train.py ... --visdom 1 --vidsdom_id "<any-string>"
  3. Open your browser and type <your-remote-server-ip>:8097, egs, 127.0.0.1:8097
  4. In visdom website, chose <any-string> in Environment to see your loss im

How to resume training?

$ bash run.sh --continue_from <model-path>

How to use multi-GPU?

Use comma separated gpu-id sequence, such as:

$ bash run.sh --id "0,1"

How to solve out of memory?

  • When happened in training, try to reduce batch_size or use more GPU. $ bash run.sh --batch_size <lower-value>
  • When happened in cross validation, try to reduce cv_maxlen. $ bash run.sh --cv_maxlen <lower-value>

conv-tasnet's People

Contributors

kaituoxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conv-tasnet's Issues

typo at line 68, 96

line 68, 96 # generate minibach infomations

  • # generate minibatch informations

AssertionError while training separation model for 3 speaker scenario(C=3)

File "/nfs/users/Conv-TasNet/src/pit_criterion.py", line 21, in cal_loss
source_lengths)
File "/nfs/users/Conv-TasNet/src/pit_criterion.py", line 34, in cal_si_snr_with_pit
assert source.size() == estimate_source.size()
AssertionError

C is set to 3 and the training data is formatted accordingly with mix and s1,s2,s3.

Any support is appreciated
Thank you

why are long audio files ignored?

if num_segments > batch_size:

hi, thanks for sharing this!

If I'm not mistaken, in the above code segment it seems that audio files that are longer than one minibatch are ignored for the training. Why? They could be read in segments, otherwise a lot of audio from the database is not used at all.

Reproduce results - mixing SNR ration

Hi,
Thanks a lot for sharing the code!!
I'm trying to reproduce your results. I'm running everything exactly as specified, however my best model reaches: SDR: 13.4, SNR-SI: 12.8.

I was wondering if you changed the mixing SNR ration between the speakers to be in the range of [-5, 5]? or did you leave it to be in the range of [0, 5]?

Why the result I train can only reach the loss:( -14 valid and -16 train), the evaluate result is not 15.45 too

作者您好,请问一下我训练了100个epoch, 可是trainloss一直在-14,validloss一直在-16, 评估的sdr什么的也一直在12或13左右。数据集我都是用生成的min文件夹里的,20000张训练集那个。模型超参我也是用的一样的。 我尝试用你训练的最好模型再测试了一遍,结果是Average SDR improvement: 12.93, Average SISNR improvement: 12.50
我现在去很想问一下这种情况该怎么办,我不知道怎么才能提升sisnr,谢谢作者。

Hello, author. May I ask you that I have trained 100 epoch, but train loss is at -14, valid loss is at -16 for so long and never went down, and the assessed SDR has been around 12 . I used the generated min folder as your instruction said , 20,000 training wsj0 sets. I use the same model super-parameters. I tried to test again with the best model you trained and uploaded, and the result was Average SDR improvement: 12.93, Average SISNR improvement: 12.50
Now I would like to ask what to do about this situation. I don't know how to improve sisnr. The result I trained is not as what you said 15. Thank you very much.

Can't find WSJ0 Dataset

i can't find wsj0 dataset for training network !
I am looking for wsj0 dataset. Please help me.

error when trying to reduce batch_size less than 64 on DSD100 dataset

hey,
First of all, I am very thankful for your amazing work.
I am trying to test the flexibility of the model with DSD100 dataset.
Testing to see if the model can separate singer and drums instead of 2-speakers.
I am facing issue while trying to reduce the batch_size to 3.(default = 128)
In solver.py -> _run_one_epoch() -> it is not entering this for loop
{ for i, (data) in enumerate(data_loader): } when batch_size is less than 64
else: ram is exceeding 12GB (Titan V)
Can you please help me understanding this error. What could be the difference in datasets?
Thanks :D

why did you use normalization = False in librosa.write in separate.py ?

when I put normalization = True the output is clear without any disturbance. did you get estimate_source values between -1 to 1 using normalization = False?
As this is done in time domain, our output range is -1 to 1 but, when we use Relu the output can be more than 1 right? Please help!

Question regarding Normalization

How to implement cLN variant used in this paper which is next version of conv-tasnet released by the same authors.
Again, Thanks for this amazing implementation 🥇 .

"mask_nonlinear" choose problem

Hi kaituo xu:
when I train your code on a new datasets I meet some problem. when I choose "mask_nonlinear" of relu , I can get similar results with you. However, when I choose ""mask_nonlinear" of softmax, the training loss is always -1. 4 . Did you meet the same problem ,when you choose "mask_nonlinear" of softmax? the choose in the Conv-tasnet is softmax, why do you choose relu?

Questions about the SI-SNR

Thanks for your helpful sharing, but there still some question bothering me.
First, when I run your code, the loss is negative since your loss function is - SNR, but the negative loss seem not common for deep learning.
Then I noticed that, when calculating SNR, s_target is defined by both the clean and estimated source. It confused me much, could you give some explanation?

Evaluation seems ton only run on sequences with length < cv_max_len

Hi,
Thanks for providing your implementation of Conv-Tasnet. I trained a model and when I evaluated it on the test set (using run.sh) I was surprised to only see 2618 sequences evaluated (while the test set size is 3000). It seems that the AudioDataset created in evaluate.py uses the default cv_maxlen parameter (8 seconds), such that only test sequences shorter than 8 seconds are evaluated (2618 sequences). This would mean that the test SDR is not representative of how well the model performs on longer utterances. I attached the line where the AudioDataset is created.
Best regards,
Neil

dataset = AudioDataset(args.data_dir, args.batch_size,

Question about accelerating

It seems that it is low to load dataset, any way to accelate?

您好,感谢您的分享。
但是我发现训练模型的时候,读取数据似乎太慢了,有什么办法能够跑得更快一些吗?
谢谢,祝好!

C++ implementation of the Convtasnet model

Has anyone been able to try c++ implementation of ConvTasNet model. Atleast the inference from a pretrained model. I am facing issues wrt real time performance of the model.

What are the actual real time factors we can expect?

Thanks in advance

Questions about the training and pre-trained model

@kaituoxu Hi , It's really a nice work . Thanks for sharing the code.
However, I have a problem about the training. How many wave-files you used for training ?
I use 100000 '.wav' files (about for training and batch-size=10 (2GPU ,2080Ti_).It seems each epoch needs about 8 hours. It's too slow.
And could you please share the pre-trained model?
I am looking for your replay

batch_size

according to table, the batch_size is 3. But what is the segment of each waveform during training? (Is it 3 sec, in this condition what is batch_size?)
according to default batch_size in train.py code that is 128 , which value is batch_size ?

Thanks a lot.

Sample of separated files for spkr1 and spkr2.

Hi,
Could you please provide a few samples of separated files.
I trained a model on my own dataset,and I just want to compare the result of separation by hearing the output files of spkr1 and spkr2.

thanks in advance.

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

like WSJ0-mix, I make some micture from Librispeech in which tr, cv and tt dir contain 20000, 5000,3000 respectively . But I try to run your scripts, something wrong happened. Can you help me to figure it out? Thx.

And the key error of the train log is as follows:
"Training...
Traceback (most recent call last):
File "/home/yjm/Conv-TasNet/egs/LibriSpeech/../../src/train.py", line 145, in
main(args)
File "/home/yjm/Conv-TasNet/egs/LibriSpeech/../../src/train.py", line 139, in main
solver.train()
File "/home/yjm/Conv-TasNet/src/solver.py", line 76, in train
tr_avg_loss = self._run_one_epoch(epoch)
File "/home/yjm/Conv-TasNet/src/solver.py", line 178, in _run_one_epoch
estimate_source = self.model(padded_mixture)
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in worker
output = module(*input, **kwargs)
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/yjm/Conv-TasNet/src/conv_tasnet.py", line 54, in forward
est_source = self.decoder(mixture_w, est_mask)
File "/home/yjm/anaconda3/envs/tensorflow/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/yjm/Conv-TasNet/src/conv_tasnet.py", line 141, in forward
est_source = overlap_and_add(est_source, self.L//2) # M x C x T
File "/home/yjm/Conv-TasNet/src/utils.py", line 45, in overlap_and_add
result.index_add
(-2, frame, subframe_signal)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'"

Cannot get the same evaluation SI-SNRi, even if using the pretrained model

Hi, thanks for the code and the pretrained model, they really help me a lot!

When I trying to use your pretrained model provided in the link pan.baidu.com/s/1-Rqm7GwpV7Cc1XzHSpHROg, I found that, when running the evaluate.py, the result is very different from your evaluate.log.
In the evaluate.log, it appears "Average SISNR improvement: 15.45"
However, when I run it, it is around 9.8

I assume that, we should have the same json files in data/tt/. In this case, we have the same code and same weights, we should have the same SISNRi 15.45.
I am wondering what makes the difference. Could I know the commit id of your repo when you run the evaluate.py. And could I have a look at your data/tt/mix.json(maybe just the starting 10 lines)

Below are the first few lines of my data/tt/mix.json
[ [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/445c0206_0.60431_22gc0105_-0.60431.wav", 33301 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/420c020h_1.1139_442c0203_-1.1139.wav", 51541 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/22go0107_0.079969_051c010u_-0.079969.wav", 30391 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/444o0314_2.1819_053o020e_-2.1819.wav", 25624 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/423o0304_1.419_420c020x_-1.419.wav", 48961 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/423o030b_1.4753_053o0209_-1.4753.wav", 44774 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/441o030o_1.9903_445c020y_-1.9903.wav", 26795 ], [ "datasets/data/wsj0-mix/2speakers/wav8k/min/tt/mix/22ga010u_0.43921_443o030l_-0.43921.wav", 45120 ],

If this is not our difference, what other possibilities are there? Thanks!

Error when loading pretrained model

Hi,
when loading the pretrained model downloaded from https://pan.baidu.com/s/1-Rqm7GwpV7Cc1XzHSpHROg#list/path=%2F, some error happened:
Traceback (most recent call last):
File "src/separate.py", line 99, in
separate(args)
File "src/separate.py", line 39, in separate
model = TasNet.load_model(args.model_path)
File "/qgrapework/sspworks/TasNet_kaituoxu_20190624/src/tasnet.py", line 44, in load_model
model = cls.load_model_from_package(package)
File "/qgrapework/sspworks/TasNet_kaituoxu_20190624/src/tasnet.py", line 50, in load_model_from_package
package['hidden_size'], package['num_layers']
KeyError: 'hidden_size

RuntimeError: CUDA out of memory.

even set batch_size = 3 as default with 3 GPU , I still get the following error, I wanna know how can I solve the problem now?

RuntimeError: CUDA out of memory. Tried to allocate 18.75 MiB (GPU 0; 11.90 GiB total capacity; 9.54 GiB already allocated; 9.00 MiB free; 159.98 MiB cached)

+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A |
| 25% 44C P8 11W / 250W | 2MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 TITAN X (Pascal) Off | 00000000:03:00.0 Off | N/A |
| 23% 39C P8 9W / 250W | 2MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 TITAN X (Pascal) Off | 00000000:04:00.0 Off | N/A |
| 23% 36C P8 9W / 250W | 2MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

Reproduce results

Hi, can you show your reproduce results of SDR on test set? thanks

How long does training take usually in 1 GPU(Nvidia 1080Ti) use case?

I started run.sh to initiate training session(stage2) and is still running after full 4days. I checked htop and nvidia-smi and both tell the training is running. Does it normally take that long ? Or does anything wrong happen? The run.sh parameter setting is default as listed below.

#!/bin/bash

Created on 2018/12

Author: Kaituo XU

-- START IMPORTANT

* If you have mixture wsj0 audio, modify data to your path that including tr, cv and tt.

* If you jsut have origin sphere format wsj0 , modify wsj0_origin to your path and

modify wsj0_wav to path that put output wav format wsj0, then read and run stage 1 part.

After that, modify data and run from stage 2.

wsj0_origin=/home/xxx/xxxx/Speech_Corpus/csr_1
wsj0_wav=/home/xxx/xxxxx/Speech_Corpus/wsj0-wav/wsj0
data=/home/xxx/xxxxx/Speech_Corpus/wsj-mix/2speakers/wav8k/min/
stage=2 # Modify this to control to start from which stage

-- END

dumpdir=data # directory to put generated json file

-- START Conv-TasNet Config

train_dir=$dumpdir/tr
valid_dir=$dumpdir/cv
evaluate_dir=$dumpdir/tt
separate_dir=$dumpdir/tt
sample_rate=8000
segment=4 # seconds
cv_maxlen=6 # seconds

Network config

N=256
L=20
B=256
H=512
P=3
X=8
R=4
norm_type=gLN
causal=0
mask_nonlinear='relu'
C=2

Training config

use_cuda=1
id=0
epochs=100
half_lr=1
early_stop=0
max_norm=5

minibatch

shuffle=1
batch_size=3
num_workers=4

optimizer

optimizer=adam
lr=1e-3
momentum=0
l2=0

save and visualize

checkpoint=0
continue_from=""
print_freq=10
visdom=0
visdom_epoch=0
visdom_id="Conv-TasNet Training"

evaluate

ev_use_cuda=0
cal_sdr=1

-- END Conv-TasNet Config

exp tag

tag="" # tag for managing experiments.

ngpu=1 # always 1

. utils/parse_options.sh || exit 1;
. ./cmd.sh
. ./path.sh

if [ $stage -le 0 ]; then
echo "Stage 0: Convert sphere format to wav format and generate mixture"
local/data_prepare.sh --data ${wsj0_origin} --wav_dir ${wsj0_wav}

echo "NOTE: You should generate mixture by yourself now.
You can use tools/create-speaker-mixtures.zip which is download from
http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip
If you don't have Matlab and want to use Octave, I suggest to replace
all mkdir(...) in create_wav_2speakers.m with system(['mkdir -p '...])
due to mkdir in Octave can not work in 'mkdir -p' way.
e.g.:
mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);
->
system(['mkdir -p ' output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);"
exit 1
fi

if [ $stage -le 1 ]; then
echo "Stage 1: Generating json files including wav path and duration"
[ ! -d $dumpdir ] && mkdir $dumpdir
preprocess.py --in-dir $data --out-dir $dumpdir --sample-rate $sample_rate
fi

if [ -z ${tag} ]; then
expdir=exp/train_r${sample_rate}_N${N}_L${L}_B${B}_H${H}_P${P}_X${X}_R${R}C${C}${norm_type}causal${causal}${mask_nonlinear}_epoch${epochs}_half${half_lr}_norm${max_norm}_bs${batch_size}worker${num_workers}${optimizer}_lr${lr}mmt${momentum}l2${l2}basename $train_dir
else
expdir=exp/train
${tag}
fi

if [ $stage -le 2 ]; then
echo "Stage 2: Training"
${cuda_cmd} --gpu ${ngpu} ${expdir}/train.log
CUDA_VISIBLE_DEVICES="$id"
train.py
--train_dir $train_dir
--valid_dir $valid_dir
--sample_rate $sample_rate
--segment $segment
--cv_maxlen $cv_maxlen
--N $N
--L $L
--B $B
--H $H
--P $P
--X $X
--R $R
--C $C
--norm_type $norm_type
--causal $causal
--mask_nonlinear $mask_nonlinear
--use_cuda $use_cuda
--epochs $epochs
--half_lr $half_lr
--early_stop $early_stop
--max_norm $max_norm
--shuffle $shuffle
--batch_size $batch_size
--num_workers $num_workers
--optimizer $optimizer
--lr $lr
--momentum $momentum
--l2 $l2
--save_folder ${expdir}
--checkpoint $checkpoint
--continue_from "$continue_from"
--print_freq ${print_freq}
--visdom $visdom
--visdom_epoch $visdom_epoch
--visdom_id "$visdom_id"
fi
(the rest omitted)

Minor mistake?

Hi, I notice one small thing when I am running your code. It seems that when you are loading the data, you just sort according to the data length. And it seems that some audios in wsj_0 have the exact same length and it makes the mix/s1/s2 not matched.

However, I think it doesn't hurt the training since this situation is rare. I meet this problem because I want to parallel the inference stage with multiple CPUs and I get different results with the GPU version.

But thanks for your very organized code! It is very helpful to me.

Quations reagrding Decoder

Hi,

Thank you for an awesome, well-organized repository.
I have a question regarding the Decoder block. The paper states, they use a 1-D transposed convolution operation for the generating the decoder basis functions. paper
Screenshot from 2020-10-02 12-40-05

However, I see you use a linear dense layer in the decoder

self.basis_signals = nn.Linear(N, L, bias=False)

Could you explain the reason for this choice?

Bad performance when using for speech enhancement

Hi, very nice work. I noticed that some people are using Conv-TasNet for speech enhancement and get good results. While I encountered some problem while using this code for speech enhancement... I am trying to split clean speech and noise from a noisy speech. I am using VCTK dataset. The waveform of the results seem very weird...

GetImage

When I changed the activation of mask to sigmoid, the result is still not good.

GetImage (1)

I wonder anyone has a thought how to solve this problem. Thanks in advance!

different Data in evaluate.py and separate.py

I want to separate a mix sound with separate.py, but the separated sounds are noisy.
The mixture was female-male mix and I also test it in evaluate.py, the result was ~14 dB improvement in SDRi.
I test this mix file in separate.py, but when I listen to separated files, I find that they are separated female and male sound but they both were noisy.
I don't know what is the reason of noise.

Is this because of using different EvalDataLoader and DataLoader in separate.py and evaluate.py?

Thanks for your helpful repo.

Help

Hi, I would like to do speech enhancement with your work, do you have a model available? and how do I use your code to enhance speech?
Thank you!

Zero mean norm for SDR loss

Hi, I'm preparing a lecture on source separation …. Do you know where the zero_mean norm for the sdr losses is coming from and whats the intuition behind it? Was this in the original conv-tasnet paper?

GPU RAM Issue

In cv you have set cv_maxlen = 6 or 8. it doesn't take files above 6 to 8 seconds. When I tried to change that, I faced ram issues. Training has been done but when validation started, i faced out of memory issue. so i cropped the wav files to 4secs and didn't face any issue.

Can you explain me why is that we are facing RAM issue when trying to send more data.
For my project, as i am working on music data, need to increase sample rate. 8000 to 22050. 2.5 times more than your implementation, this needs to split my data into less than 2secs to pass through validation.

Is there no other way to solve this ram issue? Why is it occupying too much space in ram?
Thanks in advance.

Missing a specfic wsj0 folder designated by mix_2_spk_tr.txt for creating 2 speaker mixtures

I tried to set up wsj0 data and to run Conv-TasNet. I have wsj0 corpus downloaded from LDC repository. I followed the instruction in ...../egs/wsj0/run.sh and obtained .wavs successfully converted by sph2pipe. Then I ran create_wav_2speakers.m on MATLAB to get 2 speaker mixture .wavs but failed running in the middle because a specific file, ...../wsj0/si_tr_s/401/401c020s.wav' was missing. That .m program reads out two file names at a time from mix_2_spk_tr.txt to create mixture from them but the whole directory ..../401/... to which that .wav belongs is missing in the original wsj0 file tree. How to fix this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.