Coder Social home page Coder Social logo

neuralspeech's Introduction

NeuralSpeech

NeuralSpeech is a research project at Microsoft Research Asia, which focuses on neural network based speech processing, including automatic speech recognition (ASR), text-to-speech synthesis (TTS), spatial audio synthesis, video dubbing, etc.

Currently this repo covers several research work:

For more research in NeuralSpeech project, you can refer to this page: https://speechresearch.github.io/. We will release more research work in the future.

For our research on AI music, you can refer to our Muzic project: https://github.com/microsoft/muzic.

We are hiring!

We are hiring researchers on speech (speech synthesis, speech recognition, voice conversion, audio processing), natural language processing, and machine learning. Please contact Xu Tan ([email protected]) if you have interests.

Reference

If you find NeuralSpeech project useful in your work, you can cite the following papers:

  • [1] FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition, Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin and Tie-Yan Liu, NeurIPS 2021.
  • [2] FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition, Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, Findings of EMNLP 2021.
  • [3] SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition, Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu, AAAI 2023.
  • [4] [MaskCorrect] Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction, Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin, EMNLP 2022.
  • [5] [CMatch] Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching, Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki, INTERSPEECH 2021.
  • [6] [Adapter] Exploiting Adapters for Cross-lingual Low-resource Speech Recognition, Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki. IEEE/ACM TASLP 2022.
  • [7] LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search, Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen and Tie-Yan Liu, ICASSP 2021.
  • [8] PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior, Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu, ICLR 2022.
  • [9] BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis, Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao and Tie-Yan Liu, NeurIPS 2022.
  • [10] VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing, Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian, AAAI 2022.
  • [11] PromptTTS 2: Describing and Generating Voices with Text Prompt, Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian, ICLR 2024.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

neuralspeech's People

Contributors

yichongleng avatar l0sg avatar tan-xu avatar renqianluo avatar wyh2000 avatar microsoftopensource avatar houwenxin avatar xel-maker avatar jindongwang avatar tilakraj0308 avatar microsoft-github-policy-service[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.