-
🎓 I’m currently a Ph.D. student at Harbin Institute of Technology and a research intern in Microsoft Research Asia.
-
🌱 My research interests include self-supervised learning, speech and audio processing and spoken language processing.
-
📄 My research highlights:
-
[Nov 2023] VALL-E produced the AI Audiobook of Impromptu: Amplifying Our Humanity Through AI with an “AI Reid” voice.
-
[Apr 2023] VALL-E wins the UNESCO Netexplo Innovation Award 2023 (top 10 out of over 3000 innovations of the year).
-
[Apr 2023] BEATs is accepted by ICML 2023 as an oral paper.
-
[Mar 2023] VALL-E X, a cross-lingual version of VALL-E that can help anyone speak a foreign language in their own voice without an accent. See https://aka.ms/vallex for demos.
-
[Jan 2023] VALL-E, a language modeling approach for text to speech synthesis, achieves state-of-the-art zero-shot TTS performance and emerges in-context learning capabilities. See https://aka.ms/valle for demos.
-
[Dec 2022] BEATs, a discrete label prediction based audio pre-training framework, ranks 1st in the AudioSet, Balanced AudioSet and ESC-50 leaderboards. We released the codes and pre-trained models.
-
[Nov 2022] WavLM is now available on TorchAudio. Try to use it here.
-
[Sep 2022] SpeechLM, a textual enhanced speech pre-training model, achieves 16% relative WER reduction over data2vec with only 10K text sentences on the LibriSpeech speech recognition benchmark. We released the codes and pre-trained models.
-
[Sep 2022] WavLM is published in IEEE Journal of Selected Topics in Signal Processing.
-
[Jan 2022] WavLM ranks 1st in the VoxSRC 2021 speaker verification permanent leaderboard.
-
[Dec 2021] WavLM demo of speaker verification is on Huggingface.
-
[Nov 2021] WavLM codes and pre-trained models are released here.
-
[Oct 2021] WavLM ranks 1st in the SUPERB leaderboard.
-
[Oct 2021] WavLM, a large-scale self-supervised pre-training framework for full-stack speech processing, achieves state-of-the-art performance on 19 tasks, including all the 15 tasks on SUPERB benchmark, VoxCeleb1 speaker verification benchmark, LibriCSS speech separation benchmark, CALLHOME speech diarization benchmark and LibriSpeech speech recognition benchmark.
-
[Oct 2021] Ultra fast continuous speech separation model is shipped in the Microsoft Conversation Transcription Service.
-
[Dec 2020] Our continuous speech separation model is shipped in the Microsoft Conversation Transcription Service.
-
[Oct 2020] Microsoft speaker diarization system with conformer-based continuous speech separation ranks 1st in the VoxCeleb Speaker Recognition Challenge 2020.
-
[Aug 2020] Continuous speech separation with conformer achieves state-of-the-art performance on the LibriCSS speech separation benchmark. We released the codes and pre-trained models. See demos here.
-
[Apr 2020] RecAdam, my 1st first-author paper, achieves state-of-the-art performance on the GLUE benchmark. We released the codes.
-
sanyuan-chen Goto Github PK
Name: Sanyuan Chen (陈三元)
Type: User
Company: Meta
Bio: Research Scientist @ Meta FAIR
Location: New York