yufan-aslp / alimeeting Goto Github PK

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Shell 30.46% Python 49.70% Perl 19.84%

m2met alimeeting aishell-4 asr speaker-diarization multi-speaker-asr challenge

alimeeting's Introduction

M2MeT challenge baseline -- AliMeeting

This project provides the baseline system recipes for the ICASSP 2020 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). The challenge mainly consists of two tracks, named Automatic Speech Recognition (ASR) and Speaker Diarization. For each track, detailed descriptions can be found in its corresponding directory. The goal of this project is to simplify the training and evaluation procedures and make it flexible for participants to reproduce the baseline experiments and develop novelty methods.

Setup

git clone https://github.com/yufan-aslp/AliMeeting.git

Introduction

Speech Recognition Track: Follow the detailed steps in ./asr.
Speaker Diarization Track: Follow the detailed steps in ./speaker.

General steps

Prepare the training data for speaker diarization and ASR model, respectively
Follow the running steps of the speaker diarization experiment and obtain the rttm file. The rttm file includes the voice activity detection (VAD) and speaker diarization results, which will be used to compute the final Diarization Error Rate (DER) scores.
For ASR track, we can train the single-speaker or multi-speaker ASR models. The evaluation metric of ASR systems is Character Error Rate (CER).

Citation

If you use the challenge dataset or our baseline systems, please consider citing the following:

@inproceedings{Yu2022M2MeT,
  title={M2{M}e{T}: The {ICASSP} 2022 Multi-Channel Multi-Party Meeting Transcription Challenge},
  author={Yu, Fan and Zhang, Shiliang and Fu, Yihui and Xie, Lei and Zheng, Siqi and Du, Zhihao and Huang, Weilong and Guo, Pengcheng and Yan, Zhijie and Ma, Bin and Xu, Xin and Bu, Hui},
  booktitle={Proc. ICASSP},
  year={2022},
  organization={IEEE}
}

@inproceedings{Yu2022Summary,
  title={Summary On The {ICASSP} 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge},
  author={Yu, Fan and Zhang, Shiliang and Guo, Pengcheng and Fu, Yihui and Du, Zhihao and Zheng, Siqi and Huang, Weilong and Xie, Lei  and Tan, Zheng-Hua and Wang, DeLiang and Qian, Yanmin and Lee, Kong Aik and Yan, Zhijie and Ma, Bin and Xu, Xin and Bu, Hui},
  booktitle={Proc. ICASSP},
  year={2022},
  organization={IEEE}
}

Challenge introduction paper: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge (https://arxiv.org/abs/2110.07393?spm=a3c0i.25445127.6257982940.1.111654811kxLMY&file=2110.07393)

Challenge summary paper: Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge (https://arxiv.org/abs/2202.03647?spm=a3c0i.25445127.6257982940.2.111654811kxLMY&file=2202.03647)

The AliMeeting data download at https://www.openslr.org/119

Room config of AliMeeting Train set download at https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/AliMeeting_Trainset_Room.xlsx

M2MeT challege codalab(Open evaluation platform for Eval and Test sets of both Tracks): https://codalab.lisn.upsaclay.fr/competitions/?q=M2MeT

Organizing Committee

Lei Xie, AISHELL Foundation, China, [email protected]
Bin Ma, Principal Engineer at Alibaba, Singapore, [email protected]
DeLiang Wang, Professor, Ohio State University, USA, [email protected]
Zheng-Hua Tan, Professor, Aalborg University, Denmark, [email protected]
Kong Aik Lee, Senior Scientist, Institute for Infocomm Research, A*STAR, Singapore, [email protected]
Zhijie Yan, Director of Speech Lab at Alibaba, China, [email protected]
Yanmin Qian, Associate Professor, Shanghai Jiao Tong University, China, [email protected]
Hui Bu, CEO, AIShell Inc., China, [email protected]

Contributors

Code license

Apache 2.0

alimeeting's People

Contributors

Stargazers

Watchers

Forkers

ishine chang111 normonisping evanlovea xuridongsheng7142 ai-x-king manmushanhe runngezhang qingqingxu2020 twistedmove wghezaiel yjiangling dker2 baekms birdyfun aakritiiistwal hiranoyu0830

alimeeting's Issues

/local/alimeeting_data_prep.sh: line 89: utils/filter_scp.pl: No such file or directory

Where is the direcotry “utils” ？ ...

Inference error on custom file having 2 speakers.

filenames: ['aaaa']
Finished the feature extracting (12921856, 2)

  0%|          | 0/174 [00:00<?, ?it/s]
  0%|          | 0/174 [00:00<?, ?it/s]
INFO:__main__:End:   Processing file aaaa: Elapsed: 1.611116647720337 seconds
Traceback (most recent call last):
  File "VBx/predict.py", line 176, in <module>
    fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
  File "/hdd/saumya/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
    x *= window
ValueError: operands could not be broadcast together with shapes (182,400,2) (400,) (182,400,2) 
# Accounting: time=5 threads=1
# Ended (code 1) at Thu Nov 18 16:16:36 IST 2021, elapsed time 5 seconds

path: data/test/dia_part/exp/extract_embedding.1.log
While inferencing on file 'aaaa.wav' the script fails at extracting embeddings. Can someone help?

Could you please share the room config of 13 recording rooms?

Could you please share the room config of 13 rooms mentioned in the challenge description paper?
It is not released on the OpenSLR.
https://www.openslr.org/119

Thank you!

about dev set information

in paper, table1,
overlap ratio (avg) 34.76% or 34.20% ?

Reproduce paper results

Hi,

If I follow the steps of readme, should i complete the baseline results of paper?

The results I get now is the red number.

Where to get AliMeeting data after the competition is over？

Hello, I want to conduct SD research in AliMeeting, but I missed the competition. Where can I get the open source data AliMeeting now

Corpus release plans

I was wondering if there are any plans to make the data publicly available, now that the challenge is over? I have prepared a Lhotse recipe for the dataset here but I am waiting to push it to master until the corpus is officially available.

Reproduce result fail

I wanna to reproduce your work, but the results was very terrible......

speaker_all_DER_overlaps_0.log

ile                  DER     JER    B3-Precision    B3-Recall    B3-F1    GKT(ref, sys)    GKT(sys, ref)    H(ref|sys)    H(sys|ref)    MI    NMI
-----------------  ------  ------  --------------  -----------  -------  ---------------  ---------------  ------------  ------------  ----  -----
R8001_M8004_MS801  102.74   99.74            0.76         0.99     0.86             0.00             0.00          0.94          0.05  0.00   0.00
R8003_M8001_MS801  102.83   99.77            0.78         0.99     0.87             0.00             0.00          0.87          0.05  0.00   0.00
R8007_M8010_MS803  102.35   99.82            0.77         0.99     0.87             0.00             0.00          0.97          0.05  0.00   0.00
R8007_M8011_MS806  133.25   99.37            0.77         0.89     0.83             0.00             0.00          0.89          0.33  0.00   0.00
R8008_M8013_MS807  101.34   99.89            0.79         1.00     0.88             0.00             0.00          0.76          0.02  0.00   0.00
R8009_M8018_MS809  103.67   99.72            0.79         0.99     0.88             0.00             0.00          0.67          0.05  0.00   0.00
R8009_M8019_MS810  100.21   99.93            0.74         1.00     0.85             0.00             0.00          0.79          0.01  0.00   0.00
R8009_M8020_MS810  100.33  100.00            0.79         1.00     0.88             0.00             0.00          0.66          0.01  0.00   0.00
*** OVERALL ***    106.07   99.75            0.78         0.98     0.87             0.98             0.75          0.82          0.07  2.99   0.88

I think that maybe something is wrong but I don't what happened....

the log seem fine.

Prepare Alimeeting data
fix_data_dir.sh: kept all 8 utterances.
fix_data_dir.sh: old files are kept in ./data/Eval_Ali_far/sad_part/.backup
steps/make_mfcc.sh --nj 4 --cmd run.pl -q all.q --mem 4G --mfcc-config conf/mfcc_hires.conf ./data/Eval_Ali_far/sad_part ./data/Eval_Ali_far/sad_part/make_mfcc ./data/Eval_Ali_far/sad_part/feat/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory ./data/Eval_Ali_far/sad_part
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for sad_part
Do SAD
--nj 4 --stage 0 --cmd run.pl -q all.q --mem 4G ./data/Eval_Ali_far/sad_part exp/segmentation_1a/tdnn_stats_sad_1a/ ./data/Eval_Ali_far/sad_part/feat/mfcc ./data/Eval_Ali_far/sad_part/exp ./data/Eval_Ali_far/sad_part/sad
diff: ./data/Eval_Ali_far/sad_part/exp/final.raw: No such file or directory
./data/Eval_Ali_far/sad_part
steps/nnet3/compute_output.sh --nj 4 --cmd run.pl -q all.q --mem 4G --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 150 --apply-exp true --frame-subsampling-factor 3 ./data/Eval_Ali_far/sad_part ./data/Eval_Ali_far/sad_part/exp ./data/Eval_Ali_far/sad_part/exp/sad_sad
utils/data/get_utt2dur.sh: ./data/Eval_Ali_far/sad_part/utt2dur already exists with the expected length.  We won't recompute it.
utils/data/get_utt2dur.sh: ./data/Eval_Ali_far/sad_part/utt2dur already exists with the expected length.  We won't recompute it.
utils/data/subsegment_data_dir.sh: note: frame shift is 0.01 [affects feats.scp]
utils/data/get_utt2num_frames.sh: ./data/Eval_Ali_far/sad_part/utt2num_frames already present!
utils/data/subsegment_data_dir.sh: subsegmented data from ./data/Eval_Ali_far/sad_part to ./data/Eval_Ali_far/sad_part/sad_seg
local/segmentation/detect_speech_activity.sh: Created output segmented kaldi data directory in ./data/Eval_Ali_far/sad_part/sad_seg
Do Speaker Embedding Extractor
Collect 8 utt2segments in file ./data/Eval_Ali_far/sad_part/sad_seg/segments
Write 8 labels
success
Do the Speaker Embedding Cluster
Process textgrid to obtain rttm label
Get DER result
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
Checking for overlapping system speaker turns...
Scoring...

When I try to see the rttm file i find that the segment very few......

R8009_M8020_MS810.rttm

SPEAKER R8009_M8020_MS810 1 5220.040000 1.600000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 9910.240000 1.120000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 9943.990000 1.360000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 10217.830000 1.270000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 15154.510000 0.790000 <NA> <NA> 1 <NA> <NA>

And then I check the lab file.

R8009_M8020_MS810.lab

5220.04 5221.64 sp
9910.24 9911.36 sp
9943.99 9945.35 sp
10217.83 10219.1 sp
15154.51 15155.3 sp

I think that maybe the SAD stage has some problem.
But my SAD model download from path.
Then, move the exp directory to the speaker directory.
This is follow the usage.

My segments file.

May you help me?
If you need more information, please contact me.
Thank you very much.

Speaker Diarization Usage

Hi~

In your main stage 3 say that Use scripts/segment_to_lab.sh to change the file format, but the file is empty......

in your code use scripts/segment_to_lab.py directly.

I think this lab format tranform is done, but I still meet some problem.

run.pl: 4 / 4 failed, log is in ./data/Eval_Ali_far/dia_part/exp/extract_embedding.*.log

INFO:__main__:Start: Processing file R8001_M8004_MS801:

filenames: ['R8001_M8004_MS801' 'R8003_M8001_MS801']
Finished the feature extracting (25181600, 8)
^M  0%|          | 0/251 [00:00<?, ?it/s]^M  0%|          | 0/251 [00:00<?, ?it/s]
INFO:__main__:End:   Processing file R8001_M8004_MS801: Elapsed: 4.490773439407349 seconds
Traceback (most recent call last):
  File "/mnt/HDD/HDD2/DTDwind/vbx_new/AliMeeting/speaker/VBx/predict.py", line 176, in <module>
    fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
  File "/mnt/HDD/HDD2/DTDwind/vbx_new/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
    x *= window
ValueError: operands could not be broadcast together with shapes (322,400,8) (400,) (322,400,8)
# Accounting: time=7 threads=1
# Ended (code 1) at Fri Aug  5 17:28:33 CST 2022, elapsed time 7 seconds

Do you know what happen and have any suggest?

Thanks!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.