Coder Social home page Coder Social logo

yufan-aslp / alimeeting Goto Github PK

View Code? Open in Web Editor NEW
108.0 3.0 17.0 504 KB

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Shell 30.46% Python 49.70% Perl 19.84%
m2met alimeeting aishell-4 asr speaker-diarization multi-speaker-asr challenge

alimeeting's Introduction

M2MeT challenge baseline -- AliMeeting

This project provides the baseline system recipes for the ICASSP 2020 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). The challenge mainly consists of two tracks, named Automatic Speech Recognition (ASR) and Speaker Diarization. For each track, detailed descriptions can be found in its corresponding directory. The goal of this project is to simplify the training and evaluation procedures and make it flexible for participants to reproduce the baseline experiments and develop novelty methods.

Setup

git clone https://github.com/yufan-aslp/AliMeeting.git

Introduction

General steps

  1. Prepare the training data for speaker diarization and ASR model, respectively
  2. Follow the running steps of the speaker diarization experiment and obtain the rttm file. The rttm file includes the voice activity detection (VAD) and speaker diarization results, which will be used to compute the final Diarization Error Rate (DER) scores.
  3. For ASR track, we can train the single-speaker or multi-speaker ASR models. The evaluation metric of ASR systems is Character Error Rate (CER).

Citation

If you use the challenge dataset or our baseline systems, please consider citing the following:

@inproceedings{Yu2022M2MeT,
  title={M2{M}e{T}: The {ICASSP} 2022 Multi-Channel Multi-Party Meeting Transcription Challenge},
  author={Yu, Fan and Zhang, Shiliang and Fu, Yihui and Xie, Lei and Zheng, Siqi and Du, Zhihao and Huang, Weilong and Guo, Pengcheng and Yan, Zhijie and Ma, Bin and Xu, Xin and Bu, Hui},
  booktitle={Proc. ICASSP},
  year={2022},
  organization={IEEE}
}

@inproceedings{Yu2022Summary,
  title={Summary On The {ICASSP} 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge},
  author={Yu, Fan and Zhang, Shiliang and Guo, Pengcheng and Fu, Yihui and Du, Zhihao and Zheng, Siqi and Huang, Weilong and Xie, Lei  and Tan, Zheng-Hua and Wang, DeLiang and Qian, Yanmin and Lee, Kong Aik and Yan, Zhijie and Ma, Bin and Xu, Xin and Bu, Hui},
  booktitle={Proc. ICASSP},
  year={2022},
  organization={IEEE}
}

Challenge introduction paper: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge (https://arxiv.org/abs/2110.07393?spm=a3c0i.25445127.6257982940.1.111654811kxLMY&file=2110.07393)

Challenge summary paper: Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge (https://arxiv.org/abs/2202.03647?spm=a3c0i.25445127.6257982940.2.111654811kxLMY&file=2202.03647)

The AliMeeting data download at https://www.openslr.org/119

Room config of AliMeeting Train set download at https://speech-lab-share-data.oss-cn-shanghai.aliyuncs.com/AliMeeting/AliMeeting_Trainset_Room.xlsx

M2MeT challege codalab(Open evaluation platform for Eval and Test sets of both Tracks): https://codalab.lisn.upsaclay.fr/competitions/?q=M2MeT

Organizing Committee

Contributors

Code license

Apache 2.0

alimeeting's People

Contributors

hnluo avatar yufan-aslp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

alimeeting's Issues

Inference error on custom file having 2 speakers.

filenames: ['aaaa']
Finished the feature extracting (12921856, 2)

  0%|          | 0/174 [00:00<?, ?it/s]
  0%|          | 0/174 [00:00<?, ?it/s]
INFO:__main__:End:   Processing file aaaa: Elapsed: 1.611116647720337 seconds
Traceback (most recent call last):
  File "VBx/predict.py", line 176, in <module>
    fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
  File "/hdd/saumya/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
    x *= window
ValueError: operands could not be broadcast together with shapes (182,400,2) (400,) (182,400,2) 
# Accounting: time=5 threads=1
# Ended (code 1) at Thu Nov 18 16:16:36 IST 2021, elapsed time 5 seconds

path: data/test/dia_part/exp/extract_embedding.1.log
While inferencing on file 'aaaa.wav' the script fails at extracting embeddings. Can someone help?

Reproduce paper results

Hi,

If I follow the steps of readme, should i complete the baseline results of paper?

The results I get now is the red number.

image

Corpus release plans

I was wondering if there are any plans to make the data publicly available, now that the challenge is over? I have prepared a Lhotse recipe for the dataset here but I am waiting to push it to master until the corpus is officially available.

Reproduce result fail

I wanna to reproduce your work, but the results was very terrible......

speaker_all_DER_overlaps_0.log

ile                  DER     JER    B3-Precision    B3-Recall    B3-F1    GKT(ref, sys)    GKT(sys, ref)    H(ref|sys)    H(sys|ref)    MI    NMI
-----------------  ------  ------  --------------  -----------  -------  ---------------  ---------------  ------------  ------------  ----  -----
R8001_M8004_MS801  102.74   99.74            0.76         0.99     0.86             0.00             0.00          0.94          0.05  0.00   0.00
R8003_M8001_MS801  102.83   99.77            0.78         0.99     0.87             0.00             0.00          0.87          0.05  0.00   0.00
R8007_M8010_MS803  102.35   99.82            0.77         0.99     0.87             0.00             0.00          0.97          0.05  0.00   0.00
R8007_M8011_MS806  133.25   99.37            0.77         0.89     0.83             0.00             0.00          0.89          0.33  0.00   0.00
R8008_M8013_MS807  101.34   99.89            0.79         1.00     0.88             0.00             0.00          0.76          0.02  0.00   0.00
R8009_M8018_MS809  103.67   99.72            0.79         0.99     0.88             0.00             0.00          0.67          0.05  0.00   0.00
R8009_M8019_MS810  100.21   99.93            0.74         1.00     0.85             0.00             0.00          0.79          0.01  0.00   0.00
R8009_M8020_MS810  100.33  100.00            0.79         1.00     0.88             0.00             0.00          0.66          0.01  0.00   0.00
*** OVERALL ***    106.07   99.75            0.78         0.98     0.87             0.98             0.75          0.82          0.07  2.99   0.88

I think that maybe something is wrong but I don't what happened....

the log seem fine.

Prepare Alimeeting data
fix_data_dir.sh: kept all 8 utterances.
fix_data_dir.sh: old files are kept in ./data/Eval_Ali_far/sad_part/.backup
steps/make_mfcc.sh --nj 4 --cmd run.pl -q all.q --mem 4G --mfcc-config conf/mfcc_hires.conf ./data/Eval_Ali_far/sad_part ./data/Eval_Ali_far/sad_part/make_mfcc ./data/Eval_Ali_far/sad_part/feat/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory ./data/Eval_Ali_far/sad_part
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for sad_part
Do SAD
--nj 4 --stage 0 --cmd run.pl -q all.q --mem 4G ./data/Eval_Ali_far/sad_part exp/segmentation_1a/tdnn_stats_sad_1a/ ./data/Eval_Ali_far/sad_part/feat/mfcc ./data/Eval_Ali_far/sad_part/exp ./data/Eval_Ali_far/sad_part/sad
diff: ./data/Eval_Ali_far/sad_part/exp/final.raw: No such file or directory
./data/Eval_Ali_far/sad_part
steps/nnet3/compute_output.sh --nj 4 --cmd run.pl -q all.q --mem 4G --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 150 --apply-exp true --frame-subsampling-factor 3 ./data/Eval_Ali_far/sad_part ./data/Eval_Ali_far/sad_part/exp ./data/Eval_Ali_far/sad_part/exp/sad_sad
utils/data/get_utt2dur.sh: ./data/Eval_Ali_far/sad_part/utt2dur already exists with the expected length.  We won't recompute it.
utils/data/get_utt2dur.sh: ./data/Eval_Ali_far/sad_part/utt2dur already exists with the expected length.  We won't recompute it.
utils/data/subsegment_data_dir.sh: note: frame shift is 0.01 [affects feats.scp]
utils/data/get_utt2num_frames.sh: ./data/Eval_Ali_far/sad_part/utt2num_frames already present!
utils/data/subsegment_data_dir.sh: subsegmented data from ./data/Eval_Ali_far/sad_part to ./data/Eval_Ali_far/sad_part/sad_seg
local/segmentation/detect_speech_activity.sh: Created output segmented kaldi data directory in ./data/Eval_Ali_far/sad_part/sad_seg
Do Speaker Embedding Extractor
Collect 8 utt2segments in file ./data/Eval_Ali_far/sad_part/sad_seg/segments
Write 8 labels
success
Do the Speaker Embedding Cluster
Process textgrid to obtain rttm label
Get DER result
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
Checking for overlapping system speaker turns...
Scoring...
Loading speaker turns from reference RTTMs...
Loading speaker turns from system RTTMs...
WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
Trimming reference speaker turns to UEM scoring regions...
Trimming system speaker turns to UEM scoring regions...
Checking for overlapping reference speaker turns...
WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
Checking for overlapping system speaker turns...
Scoring...

When I try to see the rttm file i find that the segment very few......

R8009_M8020_MS810.rttm

SPEAKER R8009_M8020_MS810 1 5220.040000 1.600000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 9910.240000 1.120000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 9943.990000 1.360000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 10217.830000 1.270000 <NA> <NA> 1 <NA> <NA>
SPEAKER R8009_M8020_MS810 1 15154.510000 0.790000 <NA> <NA> 1 <NA> <NA>

And then I check the lab file.

R8009_M8020_MS810.lab

5220.04 5221.64 sp
9910.24 9911.36 sp
9943.99 9945.35 sp
10217.83 10219.1 sp
15154.51 15155.3 sp

I think that maybe the SAD stage has some problem.
But my SAD model download from path.
Then, move the exp directory to the speaker directory.
This is follow the usage.

My segments file.

May you help me?
If you need more information, please contact me.
Thank you very much.

Speaker Diarization Usage

Hi~

In your main stage 3 say that Use scripts/segment_to_lab.sh to change the file format, but the file is empty......

in your code use scripts/segment_to_lab.py directly.

I think this lab format tranform is done, but I still meet some problem.

run.pl: 4 / 4 failed, log is in ./data/Eval_Ali_far/dia_part/exp/extract_embedding.*.log

Log in extract_embedding.1.log:

INFO:__main__:Start: Processing file R8001_M8004_MS801:

filenames: ['R8001_M8004_MS801' 'R8003_M8001_MS801']
Finished the feature extracting (25181600, 8)
^M  0%|          | 0/251 [00:00<?, ?it/s]^M  0%|          | 0/251 [00:00<?, ?it/s]
INFO:__main__:End:   Processing file R8001_M8004_MS801: Elapsed: 4.490773439407349 seconds
Traceback (most recent call last):
  File "/mnt/HDD/HDD2/DTDwind/vbx_new/AliMeeting/speaker/VBx/predict.py", line 176, in <module>
    fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
  File "/mnt/HDD/HDD2/DTDwind/vbx_new/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
    x *= window
ValueError: operands could not be broadcast together with shapes (322,400,8) (400,) (322,400,8)
# Accounting: time=7 threads=1
# Ended (code 1) at Fri Aug  5 17:28:33 CST 2022, elapsed time 7 seconds

Do you know what happen and have any suggest?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.