modelscope / 3d-speaker Goto Github PK

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

License: Apache License 2.0

Shell 28.65% Python 51.93% Perl 15.56% CMake 0.27% C++ 3.58%

campplus speaker-diarization speaker-verification voxceleb 3d-speaker eres2net rdino language-identification modelscope cnceleb

3d-speaker's Introduction

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible on ModelScope. Furthermore, we present a large-scale speech corpus also called 3D-Speaker to facilitate the research of speech representation disentanglement.

Quickstart

Install 3D-Speaker

git clone https://github.com/alibaba-damo-academy/3D-Speaker.git && cd 3D-Speaker
conda create -n 3D-Speaker python=3.8
conda activate 3D-Speaker
pip install -r requirements.txt

Running experiments

# Speaker verification: ERes2Net on 3D-Speaker dataset
cd egs/3dspeaker/sv-eres2net/
bash run.sh
# Speaker verification: ERes2NetV2 on 3D-Speaker dataset
cd egs/3dspeaker/sv-eres2netv2/
bash run.sh
# Speaker verification: CAM++ on 3D-Speaker dataset
cd egs/3dspeaker/sv-cam++/
bash run.sh
# Speaker verification: ECAPA-TDNN on 3D-Speaker dataset
cd egs/3dspeaker/sv-ecapa/
bash run.sh
# Self-supervised speaker verification: RDINO on 3D-Speaker dataset
cd egs/3dspeaker/sv-rdino/
bash run.sh
# Self-supervised speaker verification: SDPN on VoxCeleb dataset
cd egs/voxceleb/sv-sdpn/
bash run.sh
# Audio and multimodal Speaker diarization:
cd egs/3dspeaker/speaker-diarization/
bash run_audio.sh
bash run_video.sh
# Language identification
cd egs/3dspeaker/language-idenitfication
bash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released on Modelscope.

# Install modelscope
pip install modelscope
# ERes2Net trained on 200k labeled speakers
model_id=iic/speech_eres2net_sv_zh-cn_16k-common
# ERes2NetV2 trained on 200k labeled speakers
model_id=iic/speech_eres2netv2_sv_zh-cn_16k-common
# CAM++ trained on 200k labeled speakers
model_id=iic/speech_campplus_sv_zh-cn_16k-common
# Run CAM++ or ERes2Net inference
python speakerlab/bin/infer_sv.py --model_id $model_id

# SDPN trained on VoxCeleb
model_id=iic/speech_sdpn_ecapa_tdnn_sv_en_voxceleb_16k
# Run SDPN inference
python speakerlab/bin/infer_sv_ssl.py --model_id $model_id

Overview of Content

Supervised Speaker Verification
- CAM++, ERes2Net, ERes2NetV2, ECAPA-TDNN, ResNet and Res2Net training recipes on 3D-Speaker.
- CAM++, ERes2Net, ERes2NetV2, ECAPA-TDNN, ResNet and Res2Net training recipes on VoxCeleb.
- CAM++, ERes2Net, ERes2NetV2, ECAPA-TDNN, ResNet and Res2Net training recipes on CN-Celeb.
Self-supervised Speaker Verification
- RDINO and SDPN training recipes on VoxCeleb
- RDINO training recipes on 3D-Speaker.
- RDINO training recipes on CN-Celeb.
Speaker Diarization
- Speaker diarization inference recipes which comprise multiple modules, including voice activity detection, speech segmentation, speaker embedding extraction, and speaker clustering.
Language Identification
- Language identification training recipes on 3D-Speaker.
3D-Speaker Dataset
- Dataset introduction and download address: 3D-Speaker
- Related paper address: 3D-Speaker

What‘s new 🔥

[2024.5] Releasing X-vector model on VoxCeleb datasets.
[2024.5] Releasing SDPN model training and inference recipes for VoxCeleb.
[2024.5] Releasing visual module and semantic module training recipes.
[2024.4] Releasing ONNX Runtime and the relevant scripts for inference.
[2024.4] Releasing ERes2NetV2 model with lower parameters and faster inference speed on VoxCeleb datasets.
[2024.2] Releasing language identification integrating phonetic information recipes for more higher recognition accuracy.
[2024.2] Releasing multimodal diarization recipes which fuses audio and video image input to produce more accurate results.
[2024.1] Releasing ResNet34 and Res2Net model training and inference recipes for 3D-Speaker, VoxCeleb and CN-Celeb datasets.
[2024.1] Releasing large-margin finetune recipes in speaker verification and adding diarization recipes.
[2023.11] ERes2Net-base pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.
[2023.10] Releasing ECAPA model training and inference recipes for three datasets.
[2023.9] Releasing RDINO model training and inference recipes for CN-Celeb.
[2023.8] Releasing CAM++, ERes2Net-Base and ERes2Net-Large benchmarks in CN-Celeb.
[2023.8] Releasing ERes2Net annd CAM++ in language identification for Mandarin and English.
[2023.7] Releasing CAM++, ERes2Net-Base, ERes2Net-Large pretrained models trained on 3D-Speaker.
[2023.7] Releasing Dialogue Detection and Semantic Speaker Change Detection in speaker diarization.
[2023.7] Releasing CAM++ in language identification for Mandarin and English.
[2023.6] Releasing 3D-Speaker dataset and its corresponding benchmarks including ERes2Net, CAM++ and RDINO.
[2023.5] ERes2Net pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.
[2023.4] CAM++ pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.

Contact

If you have any comment or question about 3D-Speaker, please contact us by

email: {chenyafeng.cyf, zsq174630, tongmu.wh, shuli.cly}@alibaba-inc.com

License

3D-Speaker is released under the Apache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:
Speechbrain, Wespeaker, D-TDNN, DINO, Vicreg, TalkNet-ASD , Ultra-Light-Fast-Generic-Face-Detector-1MB

Citations

If you find this repository useful, please consider giving a star ⭐ and citation 🦖:

@article{chen2024eres2netv2,
  title={ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and and others},
  booktitle={INTERSPEECH},
  year={2024}
}
@article{chen2024sdpn,
  title={Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},
  url={https://arxiv.org/pdf/2308.02774},
  year={2024}
}
@article{chen20243d,
  title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},
  url={https://arxiv.org/pdf/2403.19971},
  year={2024}
}
@inproceedings{zheng20233d,
  title={3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement},
  author={Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang and Qian Chen},
  url={https://arxiv.org/pdf/2306.15354},
  year={2023}
}
@inproceedings{wang2023cam++,
  title={CAM++: A Fast and Efficient Network For Speaker Verification Using Context-Aware Masking},
  author={Wang, Hui and Zheng, Siqi and Chen, Yafeng and Cheng, Luyao and Chen, Qian},
  booktitle={INTERSPEECH},
  year={2023}
}
@inproceedings{chen2023enhanced,
  title={An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and Chen, Qian and Qi, Jiajun},
  booktitle={INTERSPEECH},
  year={2023}
}
@inproceedings{chen2023pushing,
  title={Pushing the limits of self-supervised speaker verification using regularized distillation framework},
  author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and Chen, Qian},
  booktitle={ICASSP},
  year={2023}
}

3d-speaker's People

Contributors

Stargazers

Watchers

Forkers

gongcaicai entn-at ishine chenchy normonisping cdliang11 macroustc hongwen-sun mdys eltociear new-a-class rapidai speaker-lover wendongj dennis-wxm jangocheng web-logs2 victorbcyang 88aggressive yip-jia-qi blszhang xiexukang wuzhiqi111 songasong1707 zhaoganglxh l1-j5n widyhu2020 xianmeiwan ggliao wpm-recs 545088212 boragocode hukefy juanjuan2519 whocares333 baizhanyang1 shiyuzh2007 leehongji fulldb peterzs baekms fionawyu lingry77 xtandyr 108628 panhaox wanghuii1 shucao79 geekorangeluyao querryton hong-qing-liu nzpeng linhong00316 lijiajun3029 adrianwangzhao jin1258804025 coolephemeroptera xunnew is chiiyeh shenkailai gammachen sunshinewlz wsk3373 lihuibng shop668 whaozl marvin-song forwiat land007 shiyanwuustc pengfeidamowang 1008yeyeye zhuuuuuu fenjiang03

3d-speaker's Issues

运行bash run.sh 出错

我运行的是sv-cam++中的run.sh，只用了一个GPU，到Stage3的时候报错，是python的问题吗？求赐教。
Stage3: Training the speaker model...
/root/miniconda3/envs/3D-Speaker/bin/python: can't open file 'speakerlab/bin/train.py': [Errno 20] Not a directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 3209) of binary: /root/miniconda3/envs/3D-Speaker/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/3D-Speaker/bin/torchrun", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

speakerlab/bin/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-11-02_16:34:10
host : autodl-container-9ee2119752-04687cb0
rank : 0 (local_rank: 0)
exitcode : 2 (pid: 3209)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Problem with training part.

Hi, I am Nathan and i am facing some problem with training part.

My env
Centos7.5
#PIP
pytorch-wpe 0.0.1
rotary-embedding-torch 0.5.3
torch 1.12.1+cu113 //To use cuda, I did reinstall torch and torchaudio.
torch-complex 0.4.3
torchaudio 0.12.1+cu113
torchvision 0.13.1+cu113

#rpm
libcudnn8-devel-8.2.0.53-1.cuda11.3.x86_64
libcudnn8-8.2.0.53-1.cuda11.3.x86_64

libnccl-devel-2.9.9-1+cuda11.3.x86_64
libnccl-2.9.9-1+cuda11.3.x86_64

To run a script , I follow 'egs/voxceleb/sv-ecapa/run.sh'
I set 4 gpus. (When i set single gpu, It's not working too)
But I got error as below.

Stage3: Training the speaker model...
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

2024-02-15 14:31:58,001 - INFO: Use GPU: 3 for training.
2024-02-15 14:31:58,003 - INFO: Use GPU: 2 for training.
2024-02-15 14:31:58,009 - INFO: Use GPU: 1 for training.
2024-02-15 14:31:58,011 - INFO: Use GPU: 0 for training.
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
Traceback (most recent call last):
File "speakerlab/bin/train.py", line 176, in
main()
File "speakerlab/bin/train.py", line 60, in main
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 646, in init
_verify_param_shape_across_processes(self.process_group, parameters)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/utils.py", line 89, in _verify_param_shape_across_processes
return dist._verify_params_across_processes(process_group, tensors, logger)
RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1191, unhandled system error, NCCL version 2.10.3
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. It can be also caused by unexpected exit of a remote peer, you can check NCCL warnings for failure reason and see if there is connection closure by a peer.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 121550 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 121547) of binary: /home/asr/miniconda3/envs/3D-Speaker/bin/python
Traceback (most recent call last):
File "/home/asr/miniconda3/envs/3D-Speaker/bin/torchrun", line 8, in
sys.exit(main())
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, kwargs)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

speakerlab/bin/train.py FAILED

Failures:
[1]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 121548)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 121549)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-02-15_14:32:03
host : e7bcf3a85e2c
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 121547)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

错误数据

用脚本处理数据，发现一条0kb大小的数据3dspeaker/train/3D_SPK_00014/3D_SPK_00014_008_Device06_Distance08_Dialect00.wav

我能问一下，前两天在达摩院重新上传的cam++的预训练模型与没重新上传之前的预训练模型有什么不一样吗，因为我在这个模型上加了一些模块，发现效果还没之前那个版本的好

Inference acceleration

When applying the module of speaker classification, hundreds of millions of data inference, how to perform batch inference when vad, extraction embedding.

Thanks to the author for his reply and suggestions

How to compute the ERes2Net model param?

Hello, I use the same model params as your configs in https://github.com/alibaba-damo-academy/3D-Speaker/blob/6f6ed3189a4d1db040586a518c8e5d80f4fc0665/egs/3dspeaker/sv-eres2net/conf/eres2net.yaml, but I get 9.88M. (Yours is 4.6M)

Here is the way I compute the model params:

I'm wondering where the difference is ?

请问cam++ 适合做文本相关的说话人确认任务吗？

windows

Is there a training environment deployed under Windows?

speech_eres2net_sv_zh-cn_16k-common预训练模型相关问题

1、提出使用200k的说话人进行训练，但是3D-Speaker中只有10000个说话人，请问是还使用了其他数据吗？
2、使用这个模型对CNCeleb的测试集和注册集分别提取embedding，然后再使用项目中的compute_score_metrics.py计算EER，我这边结果是4.08，这样对吗？比给出的结果2.8高出不少呢

关于ERes2Net的250k新模型

您好：
首先特别感谢您在modelscope中贡献的模型及代码。
我看到modelscope最近更新了ERes2Net的250k模型：“speech_eres2net_base_250k_sv_zh-cn_16k-common”
下面是使用该模型在本地推理的代码：

model_id=damo/speech_eres2net_base_250k_sv_zh-cn_16k-common
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path

但是我发现这个模型缺少了一些必要命令，如：

ERes2Net_Large_3D_Speaker = {
    'obj': 'speakerlab.models.eres2net.ResNet.ERes2Net',
    'args': {
        'feat_dim': 80,
        'embedding_size': 512,
        'm_channels': 64,
    }

和
supports = {...}
希望得到您的帮助，非常感谢~

cnceleb 训练配置问题

cnceleb的cam++ config中num_class是否应该是2793。
https://github.com/alibaba-damo-academy/3D-Speaker/blob/main/egs/cnceleb/sv-cam%2B%2B/conf/cam%2B%2B.yaml

Missing transcripts?

I read the FAQ on page. But I still find missing some transcripts, for example, the speaker 3D_SPK_00001 does not exist in transcription/train_transcription or transcription/test_transcription.
I missed something?
Or it just provides some transcripts.

请问为在训练CAM++模型时使用多卡训练显示Reducer buckets have been rebuilt in this iteration是正常的吗

ValueError: need at least one array to stack

/opt/conda/lib/python3.10/site-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of n_init will change from 10 to 'auto' in 1.4. Set the value of n_init explicitly to suppress the warning
super()._check_params_vs_input(X, default_n_init=10)
Traceback (most recent call last):
File "/vepfs/code/MossFormer/3D-Speaker/egs/3dspeaker/speaker-diarization/local/cluster_and_postprocess_h5.py", line 93, in audio_only_func_getnums
labels = cluster(embeddings)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 186, in call
labels = self.filter_minor_cluster(labels, X, self.min_cluster_size)
File "/vepfs/code/MossFormer/3D-Speaker/speakerlab/process/cluster.py", line 203, in filter_minor_cluster
major_center = np.stack([x[labels == i].mean(0)
File "/opt/conda/lib/python3.10/site-packages/numpy/core/shape_base.py", line 445, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

During handling of the above exception, another exception occurred:

===========================
运行这块时，labels = cluster(embeddings) # embedding [14, 192]
报了上边的错误；
请问是什么导致的问题，如何解决呢？

请问可以分离出指定人的语音吗

在多人说话重叠的场景是否可以分离出指定说话人的声音？

sv-rdino - RuntimeError

I’ve been trying to train sv-rdino, my code did report such an error at runtime:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2048]] is at version 3;

How should we solve this problem？

分类器？

您好，eres2net模型分为两部分，embedding和classifier，但是只提供了提取embedding的预训练模型，是否考虑提供分类器的预训练模型？

请问如果我要用CNceleb数据集训练cam++说话人验证模型，那我应该用哪个脚本

is speech_campplus_speaker-diarization_common onnx model available?

我用达摩院提供的cam++预训练模型在vox1-o数据集上提取embedding计算eer是0.8666，而论文里是0.7几

如何用这些个模型跑自己的数据集？

你好，现有自己的数据集，如何用这些个模型跑自己的数据集？3dspeaker 数据集优的文件自己的数据集并没有，例如trials文件等，求教求教！

关于 sv 识别结果的问题

问题描述

使用 SV 进行声纹验证，一段音频是存在人声的音频，另一段音频几乎没有声音（没有人声）。验证结果应该是低于阈值 0.6，但是结果却是高于0.6。想问下对于模型的识别结果，能获取到判断依据么？另外这个 threshold 一般应该设置多少合适？

使用模型

damo/speech_campplus_sv_cn_cnceleb_16k

识别结果

{'score': 0.68535, 'text': 'yes'}

speaker-diarization需要哪个版本的Funasr，第六步无法输出

使用了最新的Funasr==1.0.4，需要补充model_revision和修改vad_pipeline(wpath)，但是在执行第六步的时候，会出现这样的报错，换成旧的0.8.8也是无法执行

Stage 1: Prepare input wavs...
--2024-01-30 18:07:32--  https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/example.wav
正在解析主机 modelscope.cn (modelscope.cn)... 39.101.130.40
正在连接 modelscope.cn (modelscope.cn)|39.101.130.40|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：30720078 (29M) [application/octet-stream]
正在保存至: “examples/example.wav”

examples/example.wav                   100%[==========================================================================>]  29.30M  43.9MB/s  用时 0.7s    

2024-01-30 18:07:34 (43.9 MB/s) - 已保存 “examples/example.wav” [30720078/30720078])

--2024-01-30 18:07:34--  https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/example.rttm
正在解析主机 modelscope.cn (modelscope.cn)... 39.101.130.40
正在连接 modelscope.cn (modelscope.cn)|39.101.130.40|:443... 已连接。
已发出 HTTP 请求，正在等待回应... 200 OK
长度：1329 (1.3K) [application/octet-stream]
正在保存至: “examples/example.rttm”

examples/example.rttm                  100%[==========================================================================>]   1.30K  --.-KB/s  用时 0s      

2024-01-30 18:07:34 (29.3 MB/s) - 已保存 “examples/example.rttm” [1329/1329])

Stage2: Do vad for input wavs...
2024-01-30 18:07:37,343 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:07:37,345 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:07:37,470 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
[2024-01-30 18:07:38,659] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by: 
 pip install -U rotary_embedding_torch
2024-01-30 18:07:44,757 - modelscope - INFO - Use user-specified model revision: v2.0.4
2024-01-30 18:07:45,018 - modelscope - INFO - initiate model from /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-01-30 18:07:45,018 - modelscope - INFO - initiate model from location /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-01-30 18:07:45,019 - modelscope - INFO - initialize model from /home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-01-30 18:07:49,164 - modelscope - WARNING - No preprocessor field found in cfg.
2024-01-30 18:07:49,164 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-01-30 18:07:49,164 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/winner/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-01-30 18:07:49,164 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
[INFO]: Start computing VAD...
rtf_avg: 0.225: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.69s/it]
rtf_avg: 594.604: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:11<00:00, 11.90s/it]
[INFO]: VAD json is prepared in exp/json/vad.json
Stage3: Prepare subsegments info...
[INFO]: Generate sub-segmetns...
[INFO]: Subsegments json is prepared in exp/json/subseg.json
Stage4: Extract speaker embeddings...
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
2024-01-30 18:08:21,239 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,241 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,262 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,264 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,274 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,275 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,362 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,363 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,382 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,384 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,386 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,388 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,394 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,414 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,430 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,486 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,502 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,510 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,716 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,718 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,829 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
2024-01-30 18:08:21,835 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2024-01-30 18:08:21,837 - modelscope - INFO - Loading ast index from /home/winner/.cache/modelscope/ast_indexer
2024-01-30 18:08:21,968 - modelscope - INFO - Loading done! Current index file version is 1.11.1, with md5 e4ea8cecd8079cde83f512df2bae21a7 and a total number of 956 components indexed
[2024-01-30 18:08:22,719] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,719] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,743] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,763] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,797] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:22,825] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:23,048] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-30 18:08:23,275] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-01-30 18:08:32,879 - modelscope - INFO - Use user-specified model revision: v1.0.0
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
[INFO] Start computing embeddings...
[INFO] Start computing embeddings...
WARNING: The number of threads exceeds the number of filesWARNING: The number of threads exceeds the number of files

[WARNING] Embeddings has been saved previously. Skip it.
[WARNING] Embeddings has been saved previously. Skip it.
WARNING: The number of threads exceeds the number of files
Stage5: Perform clustering and output sys rttms...
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[INFO] Start clustering...
[INFO] Start clustering...
[INFO] Start clustering...
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
WARNING: The number of threads exceeds the number of files
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
/home/winner/anaconda3/envs/py38-pt200/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  warnings.warn(
Stage6: Get the final metrics...
Computing DER...
2024-01-30 18:08:53,245 - INFO: Concatenating individual RTTM files...
2024-01-30 18:08:53,285 - INFO: MS: 2.069159, FA: 0.203668, SER: 0.000000, DER: 2.272828
Computing ACC...
error,there is no fileid_sys in ref rttm: output
seg pur error,there is no fileid_sys in ref rttm: %s output
eval_elems_seg error,there is no fileid_sys in ref rttm: %s output
All metrics have been done.

关于voxceleb dino

我最近在复习您的项目，我用voxceleb2 训练dino，但是eer刚开始几轮只有14%。我不确定这是不是正常的，您可以给我一份您的训练日志吗。非常感谢

ValueError: need at least one array to stack

During handling of the above exception, another exception occurred:

===========================
运行这块时，labels = cluster(embeddings) # embedding [14, 192]
报了上边的错误；
请问是什么导致的问题，如何解决呢？

[BUG] 提fbank的频率和重采样之后的频率不一致

https://github.com/alibaba-damo-academy/3D-Speaker/blob/b537e3734bc502529bbdb921dca784cb9f67b1b5/speakerlab/bin/infer_sv.py#L165-L176

提fbank的频率始终为16kHz，应该等于重采样之后的频率

Exmaple can not run

Stage5: Get the final metrics...
Refrttm.list is not detected. Can't calculate the result

Low GPU Training speed of CAM++?

Hello, thank you for your open source of CAM++ model. The results are impressive!

I tried to train CAM++, but found it a little bit slower than ResNet34. The same training configs are used for both models (2*A100).
The interesting thing is that after exporting the models into onnx types and infer them using onnxruntime in CPUs，I can still see that CAM++ is about 3 times faster than ResNet34 (about 1/3 in rtf), which is consistent with your conclusion in your recent PR on 20230420.

My question is that do you have the same training phenomenon as me that CAM++ is slower than ResNet34? And how do you explain this phenomenon? lower inference rtf in cpu while lower training speed in gpu?

Inconsistent Performance and Loss when Resuming Training

Thank you for your excellent work. 🙂

We have observed that whenever we resume training with a different number of epochs after training completion, the loaded historical model exhibits significantly lower accuracy compared to the corresponding epoch during the original training. For instance, when loading a model trained for 100 epochs, its performance is only comparable to that of a model trained for 30 epochs.

This inconsistency in performance after resuming training poses a challenge for us to continue training from a checkpoint and obtain the desired results.

麻烦问下，3d speaker 开源数据里的标签代表什么意思？

比如3D_SPK_07854_005_Device03_Distance03_Dialect09.wav
Device、Distance、Dialect的数字标签分别代表什么？论文里也没说，所以想咨询下。

transcription

3d_speaker里有些audio clip没有相应的转录文本

Can you consider release the ERes2Net-Base-200k-Spkrs model?

I noticed that you have released the ERes2Net-Large-200k-Spkrs model on modelscope, can you also release the base model of ERes2Net(trained with 200k speakers)?

fine-tune

您好，我再训练完dino之后想用label数据fine-tune，如何加载之前的模型.pth

DDP WARNING

您好，非常感谢您补充了ecapatdnn，我在训练egs中voxceleb ecapatdnn中遇到了这个warning，我不知道是因为什么

compute_score_metrics.py 输入参数？

speaklab/bin下的compute_score_metrics.py输入输出参数具体是什么呢？有没有样例参考参考。。感谢~

ERes2Net模型 load报错

在用torch.load 载入模型speech_eres2net_sv_zh-cn_16k-common时，报错_pickle.UnpicklingError: invalid load key, '\x08'.。请问下这个有遇到过吗？环境信息：Python 3.10.9、torch 1.12.1。
而用同样的代码载入speech_campplus_sv_zh-cn_16k-common这个模型就没问题

Apache-2.0 协议涵盖范围

请问 Apache-2.0 协议涵盖的范围是否包括代码、语音语料库和预训练模型？

为什么我在训练sv-cam++的时候loss越来越大

Methods for fine-tuning of pretrained models in modelscope

Hello, thank you for the wonderful repository! It really helped.
Currently, our team is trying to fine-tune ERes2Net-200k published in modelscope using a large amount of speech data. As I was not able to fine-tune properly, I think that several parameters within the configuration need to be modified for the task. Could you please share those details? If my fine-tuning is successful with good results, I will share the methodologies for the community.

damo/speech_eres2net_sv_zh-cn_16k-common 模型不支持输入numpy数组

隔壁cam++推理时，可以直接输入提取好的音频numpy数组进行推理。
希望eres2net也可以统一加上，谢谢

EResNet result on VoxCeleb is not comparable

I ran the exact same script for the EResNet experiment on VoxCeleb. The EER and minDCF I got is 1.0105 and 0.1146, which is not comparable to the paper. The only difference is that I trained the model on 4 A100 machines, but I doubt that is the reason behind. Can you please provide the train.log and train_epoch.log files?

I also notice that in prepare_data_csv.csv, the default segment duration is 4 seconds, but in conf/eres2net.yaml it's 3 seconds. May I ask why is that?

Could you provide the md5 value of train.tar.gz-part-{a-f} ?

We downloaded the train.tar.gz-part-{a-f}, but the md5 value of the merged file is wrong. We are not sure which file is the wrong one.

Error occurred during "bash run.sh" for speaker diarization

Hi My name is Nathan. And i try to test 3d-speaker to get rttm from pretrained model on model scope.
But i get error as below.

(3D-Speaker) [asr@0419bb3cf325 speaker-diarization]$ bash run.sh
Stage 1: Prepare input wavs...
--2024-02-05 09:07:39-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.wav
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2528044 (2.4M) [application/octet-stream]
Saving to: 'examples/2speakers_example.wav'

100%[===========================================================================>] 2,528,044 831KB/s in 3.0s

2024-02-05 09:07:43 (831 KB/s) - 'examples/2speakers_example.wav' saved [2528044/2528044]

--2024-02-05 09:07:43-- https://modelscope.cn/api/v1/models/damo/speech_eres2net-large_speaker-diarization_common/repo?Revision=master&FilePath=examples/2speakers_example.rttm
Resolving modelscope.cn (modelscope.cn)... 39.101.130.40
Connecting to modelscope.cn (modelscope.cn)|39.101.130.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 380 [application/octet-stream]
Saving to: 'examples/2speakers_example.rttm'

100%[===========================================================================>] 380 --.-K/s in 0s

2024-02-05 09:07:44 (40.0 MB/s) - 'examples/2speakers_example.rttm' saved [380/380]

Stage2: Do vad for input wavs...
2024-02-05 09:07:46,885 - modelscope - INFO - PyTorch version 1.13.1 Found.
2024-02-05 09:07:46,886 - modelscope - INFO - Loading ast index from /home/asr/.cache/modelscope/ast_indexer
2024-02-05 09:07:47,056 - modelscope - INFO - Updating the files for the changes of local files, first time updating will take longer time! Please wait till updating done!
2024-02-05 09:07:47,083 - modelscope - INFO - AST-Scanning the path "/home/asr/miniconda3/envs/3D-Speaker/lib/python3.8/site-packages/modelscope" with the following sub folders ['models', 'metrics', 'pipelines', 'preprocessors', 'trainers', 'msdatasets', 'exporters']
2024-02-05 09:08:18,037 - modelscope - INFO - Scanning done! A number of 964 components indexed or updated! Time consumed 30.954344987869263s
2024-02-05 09:08:18,114 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 ccb085697b83dbefd09232fac3402a63 and a total number of 964 components indexed
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please Requires the ffmpeg CLI and ffmpeg-python package to be installed.
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
Please install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
2024-02-05 09:08:22,477 - modelscope - WARNING - Model revision not specified, use revision: v2.0.4
2024-02-05 09:08:22,825 - modelscope - INFO - initiate model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,826 - modelscope - INFO - initiate model from location /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch.
2024-02-05 09:08:22,827 - modelscope - INFO - initialize model from /home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
2024-02-05 09:08:22,874 - modelscope - WARNING - No preprocessor field found in cfg.
2024-02-05 09:08:22,875 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-02-05 09:08:22,875 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/asr/.cache/modelscope/hub/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch'}. trying to build by task and model information.
2024-02-05 09:08:22,875 - modelscope - WARNING - No preprocessor key ('funasr', 'voice-activity-detection') found in PREPROCESSOR_MAP, skip building preprocessor.
2024-02-05 09:08:22,876 - modelscope - INFO - cuda is not available, using cpu instead.
[INFO]: Start computing VAD...
rtf_avg: 0.043: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.22it/s]
Traceback (most recent call last):
File "local/voice_activity_detection.py", line 90, in
main()
File "local/voice_activity_detection.py", line 71, in main
for vad_t in vad_time['text']:
TypeError: list indices must be integers or slices, not str

if i print "vad_time", I get check
[{'key': 'rand_key_2yW4Acq9GFz6Y', 'value': [[5240, 29010], [29290, 37360], [37640, 67570], [67860, 78980]]}]

I don't understand meaning of text.
Please check this problem.
Thank you.

准备cnceleb wav

我发现在准备cnceleb的时候，flac2wav 那一步，在local下没有flac2wav.py 文件，我使用sv-ecapa下的文件的时候发现有部分flac无法转换成wav文件

The problem about the selection of num_of_spk in speaker-diarization

The spectral clustering in speakerlab/process/cluster.py, the following code is used to estimate the number of speakers

lambda_gap_list = self.getEigenGaps(
                lambdas[self.min_num_spks - 1:self.max_num_spks + 1])
num_of_spk = np.argmax(lambda_gap_list) + self.min_num_spks

But in other related projects, the following code is used to estimate the number of speakers

num_spks = num_spks if num_spks is not None \
                else cp.argmax(cp.diff(eig_values[:max_num_spks + 1])) + 1
num_spks = max(num_spks, min_num_spks)

# another
lambda_gap_list = self.getEigenGaps(lambdas[1 : self.max_num_spkrs])

num_of_spk = (
    np.argmax(
        lambda_gap_list[
            : min(self.max_num_spkrs, len(lambda_gap_list))
        ]
    )
    if lambda_gap_list
    else 0
) + 2

I would like to know what is the theoretical basis for your design? If the number of speakers' sentences is uneven, such as if a speaker speaks very little, is this estimation still valid? Perhaps you can provide relevant information? Thank you in advance for your answer.

from modelscope.pipelines import pipeline
sv_pipeline = pipeline(
    task='speaker-verification',
    model='damo/speech_campplus_sv_zh-cn_16k-common',
    model_revision='v1.0.0'
)
speaker1_a_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
speaker1_b_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
speaker2_a_wav = 'https://modelscope.cn/api/v1/models/damo/speech_campplus_sv_zh-cn_16k-common/repo?Revision=master&FilePath=examples/speaker2_a_cn_16k.wav'
# 相同说话人语音
result = sv_pipeline([speaker1_a_wav, speaker1_b_wav])
print(result)
# 不同说话人语音
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav])
print(result)
# 可以自定义得分阈值来进行识别，阈值越高，判定为同一人的条件越严格
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], thr=0.31)
print(result)

modelscope / 3d-speaker Goto Github PK

3d-speaker's Introduction

Quickstart

Install 3D-Speaker

Running experiments

Inference using pretrained models from Modelscope

Overview of Content

What‘s new 🔥

Contact

License

Acknowledge

Citations

3d-speaker's People

Contributors

Stargazers

Watchers

Forkers

3d-speaker's Issues

speakerlab/bin/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2023-11-02_16:34:10 host : autodl-container-9ee2119752-04687cb0 rank : 0 (local_rank: 0) exitcode : 2 (pid: 3209) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

speakerlab/bin/train.py FAILED

问题描述

使用模型

识别结果

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-11-02_16:34:10
host : autodl-container-9ee2119752-04687cb0
rank : 0 (local_rank: 0)
exitcode : 2 (pid: 3209)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html