Coder Social home page Coder Social logo

low_power_slt's Introduction

Low Power SLT

Low Power Computing for Speech and Language Technologies

Collection of resources designed for those interested in low power computing for speech and language technologies, e.g. low-power, energy-efficient, on-device approaches to speech and language processing.

Keywords: low-power, energy-efficient, speech processing, natural language processing

Related topics: on-device training, on-device inference, edge computing, federated learning, embedded systems, mobile devices, IoT, sustainability, open source, low cost, green computing, power efficiency, sustainable computing systems, tiny ml, private-by-design, language processing, speech synthesis, keyword spotting, natural language understanding, dialogue systems, language modelling, text processing

Table of Contents

  1. Background and Motvation
  2. In the News
  3. Getting Started
  4. Research Publications
  5. Journals and Conferences
  6. Events and Announcements
  7. Future Plans and Feedback

Background and Motivation

Research areas that aim to enhance the performance and capabilities of speech and language technologies that run on resource-constrained devices are attracting increasing attention.

The benefits of such methods are becoming increasingly important and include:

  • privacy
  • energy efficiency
  • real-time processing
  • reduced latency
  • offline functionality
  • edge intelligence
  • scalability and flexibility

In the News

These articles highlight the significance of low power and on-device computing in enabling efficient and privacy-preserving speech and language processing on mobile devices, edge devices, and IoT devices. They discuss the challenges, opportunities, and advancements in this field, motivating further research and development in energy-efficient algorithms, hardware optimizations, and edge AI technologies.

Getting Started

Hardware platforms

The following hardware platforms commonly arise for processing, training and inference in low-power SLT settings and are useful to know about.

Type Examples
MCU Espressif ESP32, Arduino Uno
CPU Raspberry Pi 3/4 (CPU)
CPU + GPU Nvidia Jetson Nano
GPUs Nvidia provide many GPUs, e.g. RTX 3090 graphics card
TPU Coral AI USB Accelerator ft. Edge TPU

Software

For MCU devices

Designed to run on MCU devices:

  • TensorFlow Lite for microcontrollers (TFMicro) framework supports training and deploying machine learning (ML) models on microcontroller devices, like the ESP32. Examples include:
    • keyword spotting (Speech Commands dataset)
    • handwritten digit recognition (MNIST dataset).
  • Espressif ESP-SR helps build speech applications for ESP32 and ESP32-S3 chips. Includes
  • Espressif ESP Skainet provides also intelligent voice assistant functionality for the ESP32-S3 chip, supporting:
  • Espressif Audio Development Framework (ESP-ADF) supports the development of audio applications for the Espressif SoCs, and supports feeatures like:
    • Music player or recorder, supporting various audio formats
    • Play music from various sources, including HTTP, SPIFFS, SDCARD
    • Integrate media services, like DLNA, VoIP
    • Internet Radio
    • Voice recognition and integration with online services such as Alexa, DuerOS etc.

For CPU and GPU devices

TensorFlow Lite (TFLite) for mobile and edge devices

  • Uses TensorFlow models converted into a smaller, more efficient ML format. Pre-trained models are available, and can be modified, or you can train your own TensorFlow models and convert them to TFLite format.
  • Examples include speech recognition, pose estimation, text classification, autocomplete text inputs, natural language question answering, smart reply chat suggestions, audio classification, and on-device training, plus more.

Snowboy Hotword Detection

  • a DNN based, customisable hotword and wake word detection toolkit that can run on a Raspberry Pi, and allows for training your own models

Vosk

  • Offline speech recognition API for Android, iOS, Raspberry Pi, and servers (GitHub)
  • Enables speech recognition for 20+ languages and dialects
  • Small models (50Mb) that provide continuous large vocabulary transcription, zero-latency response with streaming API, a reconfigurable vocabulary and speaker identification
  • Applications include chatbots, smart home applications, virtual assistants, lecture transcriptions, movie subtitles

Coqui

  • Provides a library for Text-to-Speech (speech synthesis) with pretrained models (TTS toolkit) (main focus)
  • Provides a library for Speech-to-Text (speech recognition) with pretrained models (STT models) (no longer maintained, shift towards Whisper)

Whisper

  • A general-purpose speech recognition model trained on a large dataset of audio
  • Multitasking model that can perform multilingual speech recognition, speech translation and language identification
  • Python and PyTorch used to train and test models of various sizes (tiny, based, small, medium, large)

Mozilla DeepSpeech

  • An open source, embedded speech-to-text engine (offline, on-device)
  • Runs real ime on devices ranging from a Raspberry Pi 4 to high power GPU servers
  • Based on the research paper 'Deep Speech: Scaling up end-to-end speech recognition' (here)

ESPNet

  • End-to-end speech processing toolkit
  • Covers end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, etc.
  • Uses PyTorch as a deep learning engine, follows Kaldi style data processing, feature extraction, and provides recipes for speech processing experiments

SpeechBrain

  • An open-source conversational AI toolkit, currently in beta
  • Aims to provide a single, flexible and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies
  • Includes systems for speech recognition, speaker recognition, speech enhancement, speech separation, language identification, multi-microphone signal processing, and more.

Kaldi

  • Toolkit for speech recognition written in C++, widely used and intended for use by speech recognition researchers/professionals
  • Aims to provide modern, flexible code that is easy to modify and extend

Further Open Source Systems

Willow & Willow Inference Server (WIS)

  • Willow is
  • The Willow Inference Server is a local, self-hosted and highly optimized language inference server that supports ASR/STT, TTS, and LLM tasks across WebRTC, REST and WS.
  • The project attracted much attention to the Espressif ESP32-S3 Box that it uses for the user interface

Rasa Open Source

  • A machine learning framework for building chat and voice-based AI assistants that make use of contextual information.
  • The platform allows for connecting to messaging channels and thirdy party systems through a set of APIs, so you can build contextual assistants on Facebook, Slack, Telegram.

Rhasspy

  • Rhasspy focuses on wakeword detection, speech recognition, and language understanding for smart home applications
  • Various options are available for the implementation of wake word detection, speech to text and intent recognition, etc.

Jasper

  • Open source platform supporting the development of always-on, voice-controlled applications
  • Uses PocketSphinx or Julius for STT.

ESP-Skainet

  • An offline voice assistant with wakeword engine and speech command recognition for up to 200 commands.
  • The library includes acoustic algorithms for speech enhancement, acoustic echo cancellation, voice activity detection, automatic gain control, noise suppression.

FedML-AI

  • Provides a research and production interated edge-cloud platform for federated ML

Tutorials

How to Run a ChatGPT-Like LLM on Your PC Offline

Research Publications

Review Papers

  • Xu, Jingjing, et al. "A survey on green deep learning." arXiv preprint arXiv:2111.05193 (2021).
  • Schizas, Nikolaos, et al. "TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic Review." Future Internet 14.12 (2022): 363.
  • Han, Hui, and Julien Siebert. "TinyML: A systematic review and synthesis of existing research." 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 2022.
  • Treviso, Marcos, et al. "Efficient methods for natural language processing: A survey." arXiv preprint arXiv:2209.00099 (2022).
  • Hessenthaler, Marius, et al. "Bridging fairness and environmental sustainability in natural language processing." arXiv preprint arXiv:2211.04256 (2022).

Speech processing

  • Wong, Alexander, et al. "Tinyspeech: Attention condensers for deep speech recognition neural networks on edge devices." arXiv preprint arXiv:2008.04245 (2020).
  • Li, Qin, et al. "MSP-MFCC: Energy-efficient MFCC feature extraction method with mixed-signal processing architecture for wearable speech recognition applications." IEEE Access 8 (2020): 48720-48730.
  • Tian, He, et al. "Bioinspired dual-channel speech recognition using graphene-based electromyographic and mechanical sensors." Cell Reports Physical Science 3.10 (2022).
  • Tambe, Thierry, et al. "A 16-nm soc for noise-robust speech and nlp edge ai inference with bayesian sound source separation and attention-based dnns." IEEE Journal of Solid-State Circuits 58.2 (2022): 569-581.
  • Maayah, Marina, et al. "LimitAccess: on-device TinyML based robust speech recognition and age classification." Discover Artificial Intelligence 3.1 (2023): 8.

Natural language processing

  • Tambe, Thierry, et al. "Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference." MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 2021.
  • Gundu, Krishna Sriharsha, et al. "Comparative Analysis of Energy Consumption in Text Processing Models." International Conference on Advancements in Smart Computing and Information Security. Cham: Springer Nature Switzerland, (2022).
  • Ge, Tao, Si-Qing Chen, and Furu Wei. "EdgeFormer: A parameter-efficient transformer for on-device Seq2Seq generation." arXiv preprint arXiv:2202.07959 (2022).
  • Moro, Gianluca, Luca Ragazzi, and Lorenzo Valgimigli. "Carburacy: summarization models tuning and comparison in eco-sustainable regimes with a novel carbon-aware accuracy." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 12. 2023.

Related areas

On-device training

  • Incel, Ozlem Durmaz, and Sevda Ozge Bursa. "On-Device Deep Learning for Mobile and Wearable Sensing Applications: A Review." IEEE Sensors Journal (2023).

Federated learning

  • Liu, Ming, et al. "Federated learning meets natural language processing: A survey." arXiv preprint arXiv:2107.12603 (2021).

IoT

  • Burrello, Alessio, et al. "A microcontroller is all you need: Enabling transformer execution on low-power iot endnodes." 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS). IEEE, 2021.

Model compression

  • Mireshghallah, Fatemehsadat, et al. "Differentially private model compression." Advances in Neural Information Processing Systems 35 (2022): 29468-29483.

Journals & Conferences

Journals

  • IEEE Transactions on Audio, Speech and Language Processing
  • IEEE Transactions on Signal processing
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • IEEE Journal of Selected Topics in Signal Processing
  • ACM Transactions on Speech and Language Processing
  • ACM Transactions on Embedded Computing Systems
  • ACM Transactions on Intelligent Systems and Technology
  • Eurasip Journal on Audio, Speech, and Music Processing

Conferences

  • Interspeech (Annunal Conference of the International Speech Communication Association)
  • IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  • IEEE Spoken Language Technology Workshop (SLT)
  • IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP)
  • Association for Computational Linguistics (ACL) Conference
  • Conference on Empirical Methods and Natural Language Processing
  • European signal processing conference (EUSIPCO)
  • IEEE Interntational Conference on Multimedia and Expo (ICME)
  • IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Challenges

Some challenges that consider low-power and on-device speech and language processing:

  • Interspeech Challenges - (see Interspeech 2023 challenges)
  • Dialog System Technology Challenges - (see the 11th chalenge)
  • IEEE Signal Processing Cup: Annual competition that challenges teams of undergraduate students to solve a specific signal processing problem. While the topics vary each year, some editions have focused on speech and audio processing, encouraging participants to develop efficient and low-power algorithms for speech-related tasks. (see the 2022 challenge)

Events & Announcements

A place to share upcoming special issues/workshops/conferences on low-power SLTs.

Future plans & feedback

The aim is to maintain an up-to-date snapshot of recent and interesting works relating to low-power speech and language technologies, whilst also detailing resources for getting started in this area.

The repo is intended as a springboard to form a special interest group of those that are working on or interested in this field of research. While the repo will be maintained, it could be good to share quartlerly updates and anouncements with the communty via a newsletter.

Interested in a newsletter? Sign up here

Getting feedback from the community will be invaluable to the usefulness of this repository an group moving forward. Please feel free to send suggestions for anything you think should be included, anything you are working on, or any other ideas you may have.

Have suggestions? Give feedback here

Finally, to help grow the community - if you know others interested in low-power SLTs, please share this with them!:)

low_power_slt's People

Contributors

mjhewitt1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.