Low Power SLT

Low Power Computing for Speech and Language Technologies

Collection of resources designed for those interested in low power computing for speech and language technologies, e.g. low-power, energy-efficient, on-device approaches to speech and language processing.

Keywords: low-power, energy-efficient, speech processing, natural language processing

Related topics: on-device training, on-device inference, edge computing, federated learning, embedded systems, mobile devices, IoT, sustainability, open source, low cost, green computing, power efficiency, sustainable computing systems, tiny ml, private-by-design, language processing, speech synthesis, keyword spotting, natural language understanding, dialogue systems, language modelling, text processing

Table of Contents

Background and Motvation
In the News
Getting Started
Research Publications
Journals and Conferences
Events and Announcements
Future Plans and Feedback

Background and Motivation

Research areas that aim to enhance the performance and capabilities of speech and language technologies that run on resource-constrained devices are attracting increasing attention.

The benefits of such methods are becoming increasingly important and include:

privacy
energy efficiency
real-time processing
reduced latency
offline functionality
edge intelligence
scalability and flexibility

In the News

These articles highlight the significance of low power and on-device computing in enabling efficient and privacy-preserving speech and language processing on mobile devices, edge devices, and IoT devices. They discuss the challenges, opportunities, and advancements in this field, motivating further research and development in energy-efficient algorithms, hardware optimizations, and edge AI technologies.

Getting Started

Hardware platforms

The following hardware platforms commonly arise for processing, training and inference in low-power SLT settings and are useful to know about.

Type	Examples
MCU	Espressif ESP32, Arduino Uno
CPU	Raspberry Pi 3/4 (CPU)
CPU + GPU	Nvidia Jetson Nano
GPUs	Nvidia provide many GPUs, e.g. RTX 3090 graphics card
TPU	Coral AI USB Accelerator ft. Edge TPU

Software

For MCU devices

Designed to run on MCU devices:

TensorFlow Lite for microcontrollers (TFMicro) framework supports training and deploying machine learning (ML) models on microcontroller devices, like the ESP32. Examples include:
- keyword spotting (Speech Commands dataset)
- handwritten digit recognition (MNIST dataset).
Espressif ESP-SR helps build speech applications for ESP32 and ESP32-S3 chips. Includes
- wakeword engine (WakeNet)
- command recognition (MultiNet)
- an audio front-end (AFE)
- speech synthesis for Chinese.
Espressif ESP Skainet provides also intelligent voice assistant functionality for the ESP32-S3 chip, supporting:
- WakeNet wakeword engine
- MultiNet command recognition.
Espressif Audio Development Framework (ESP-ADF) supports the development of audio applications for the Espressif SoCs, and supports feeatures like:
- Music player or recorder, supporting various audio formats
- Play music from various sources, including HTTP, SPIFFS, SDCARD
- Integrate media services, like DLNA, VoIP
- Internet Radio
- Voice recognition and integration with online services such as Alexa, DuerOS etc.

For CPU and GPU devices

TensorFlow Lite (TFLite) for mobile and edge devices

Uses TensorFlow models converted into a smaller, more efficient ML format. Pre-trained models are available, and can be modified, or you can train your own TensorFlow models and convert them to TFLite format.
Examples include speech recognition, pose estimation, text classification, autocomplete text inputs, natural language question answering, smart reply chat suggestions, audio classification, and on-device training, plus more.

Snowboy Hotword Detection

a DNN based, customisable hotword and wake word detection toolkit that can run on a Raspberry Pi, and allows for training your own models

Vosk

Offline speech recognition API for Android, iOS, Raspberry Pi, and servers (GitHub)
Enables speech recognition for 20+ languages and dialects
Small models (50Mb) that provide continuous large vocabulary transcription, zero-latency response with streaming API, a reconfigurable vocabulary and speaker identification
Applications include chatbots, smart home applications, virtual assistants, lecture transcriptions, movie subtitles

Coqui

Provides a library for Text-to-Speech (speech synthesis) with pretrained models (TTS toolkit) (main focus)
Provides a library for Speech-to-Text (speech recognition) with pretrained models (STT models) (no longer maintained, shift towards Whisper)

Whisper

A general-purpose speech recognition model trained on a large dataset of audio
Multitasking model that can perform multilingual speech recognition, speech translation and language identification
Python and PyTorch used to train and test models of various sizes (tiny, based, small, medium, large)

Mozilla DeepSpeech

An open source, embedded speech-to-text engine (offline, on-device)
Runs real ime on devices ranging from a Raspberry Pi 4 to high power GPU servers
Based on the research paper 'Deep Speech: Scaling up end-to-end speech recognition' (here)

ESPNet

End-to-end speech processing toolkit
Covers end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, etc.
Uses PyTorch as a deep learning engine, follows Kaldi style data processing, feature extraction, and provides recipes for speech processing experiments

SpeechBrain

An open-source conversational AI toolkit, currently in beta
Aims to provide a single, flexible and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies
Includes systems for speech recognition, speaker recognition, speech enhancement, speech separation, language identification, multi-microphone signal processing, and more.

Kaldi

Toolkit for speech recognition written in C++, widely used and intended for use by speech recognition researchers/professionals
Aims to provide modern, flexible code that is easy to modify and extend

Further Open Source Systems

Willow & Willow Inference Server (WIS)

Willow is
The Willow Inference Server is a local, self-hosted and highly optimized language inference server that supports ASR/STT, TTS, and LLM tasks across WebRTC, REST and WS.
The project attracted much attention to the Espressif ESP32-S3 Box that it uses for the user interface

Rasa Open Source

A machine learning framework for building chat and voice-based AI assistants that make use of contextual information.
The platform allows for connecting to messaging channels and thirdy party systems through a set of APIs, so you can build contextual assistants on Facebook, Slack, Telegram.

Rhasspy

Rhasspy focuses on wakeword detection, speech recognition, and language understanding for smart home applications
Various options are available for the implementation of wake word detection, speech to text and intent recognition, etc.

Jasper

Open source platform supporting the development of always-on, voice-controlled applications
Uses PocketSphinx or Julius for STT.

ESP-Skainet

An offline voice assistant with wakeword engine and speech command recognition for up to 200 commands.
The library includes acoustic algorithms for speech enhancement, acoustic echo cancellation, voice activity detection, automatic gain control, noise suppression.

FedML-AI

Provides a research and production interated edge-cloud platform for federated ML

Tutorials

How to Run a ChatGPT-Like LLM on Your PC Offline

Research Publications

Review Papers

Xu, Jingjing, et al. "A survey on green deep learning." arXiv preprint arXiv:2111.05193 (2021).
Schizas, Nikolaos, et al. "TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic Review." Future Internet 14.12 (2022): 363.
Han, Hui, and Julien Siebert. "TinyML: A systematic review and synthesis of existing research." 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 2022.
Treviso, Marcos, et al. "Efficient methods for natural language processing: A survey." arXiv preprint arXiv:2209.00099 (2022).
Hessenthaler, Marius, et al. "Bridging fairness and environmental sustainability in natural language processing." arXiv preprint arXiv:2211.04256 (2022).

Speech processing

Wong, Alexander, et al. "Tinyspeech: Attention condensers for deep speech recognition neural networks on edge devices." arXiv preprint arXiv:2008.04245 (2020).
Li, Qin, et al. "MSP-MFCC: Energy-efficient MFCC feature extraction method with mixed-signal processing architecture for wearable speech recognition applications." IEEE Access 8 (2020): 48720-48730.
Tian, He, et al. "Bioinspired dual-channel speech recognition using graphene-based electromyographic and mechanical sensors." Cell Reports Physical Science 3.10 (2022).
Tambe, Thierry, et al. "A 16-nm soc for noise-robust speech and nlp edge ai inference with bayesian sound source separation and attention-based dnns." IEEE Journal of Solid-State Circuits 58.2 (2022): 569-581.
Maayah, Marina, et al. "LimitAccess: on-device TinyML based robust speech recognition and age classification." Discover Artificial Intelligence 3.1 (2023): 8.

Natural language processing

Tambe, Thierry, et al. "Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference." MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 2021.
Gundu, Krishna Sriharsha, et al. "Comparative Analysis of Energy Consumption in Text Processing Models." International Conference on Advancements in Smart Computing and Information Security. Cham: Springer Nature Switzerland, (2022).
Ge, Tao, Si-Qing Chen, and Furu Wei. "EdgeFormer: A parameter-efficient transformer for on-device Seq2Seq generation." arXiv preprint arXiv:2202.07959 (2022).
Moro, Gianluca, Luca Ragazzi, and Lorenzo Valgimigli. "Carburacy: summarization models tuning and comparison in eco-sustainable regimes with a novel carbon-aware accuracy." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 12. 2023.

Related areas

On-device training

Incel, Ozlem Durmaz, and Sevda Ozge Bursa. "On-Device Deep Learning for Mobile and Wearable Sensing Applications: A Review." IEEE Sensors Journal (2023).

Federated learning

Liu, Ming, et al. "Federated learning meets natural language processing: A survey." arXiv preprint arXiv:2107.12603 (2021).

IoT

Burrello, Alessio, et al. "A microcontroller is all you need: Enabling transformer execution on low-power iot endnodes." 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS). IEEE, 2021.

Model compression

Mireshghallah, Fatemehsadat, et al. "Differentially private model compression." Advances in Neural Information Processing Systems 35 (2022): 29468-29483.

Journals & Conferences

Journals

IEEE Transactions on Audio, Speech and Language Processing
IEEE Transactions on Signal processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE Journal of Selected Topics in Signal Processing
ACM Transactions on Speech and Language Processing
ACM Transactions on Embedded Computing Systems
ACM Transactions on Intelligent Systems and Technology
Eurasip Journal on Audio, Speech, and Music Processing

Conferences

Interspeech (Annunal Conference of the International Speech Communication Association)
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
IEEE Spoken Language Technology Workshop (SLT)
IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Association for Computational Linguistics (ACL) Conference
Conference on Empirical Methods and Natural Language Processing
European signal processing conference (EUSIPCO)
IEEE Interntational Conference on Multimedia and Expo (ICME)
IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

Challenges

Some challenges that consider low-power and on-device speech and language processing:

Interspeech Challenges - (see Interspeech 2023 challenges)
Dialog System Technology Challenges - (see the 11th chalenge)
IEEE Signal Processing Cup: Annual competition that challenges teams of undergraduate students to solve a specific signal processing problem. While the topics vary each year, some editions have focused on speech and audio processing, encouraging participants to develop efficient and low-power algorithms for speech-related tasks. (see the 2022 challenge)

Events & Announcements

A place to share upcoming special issues/workshops/conferences on low-power SLTs.

Future plans & feedback

The aim is to maintain an up-to-date snapshot of recent and interesting works relating to low-power speech and language technologies, whilst also detailing resources for getting started in this area.

The repo is intended as a springboard to form a special interest group of those that are working on or interested in this field of research. While the repo will be maintained, it could be good to share quartlerly updates and anouncements with the communty via a newsletter.

Interested in a newsletter? Sign up here

Getting feedback from the community will be invaluable to the usefulness of this repository an group moving forward. Please feel free to send suggestions for anything you think should be included, anything you are working on, or any other ideas you may have.

Have suggestions? Give feedback here

Finally, to help grow the community - if you know others interested in low-power SLTs, please share this with them!:)

mjhewitt1 / low_power_slt Goto Github PK

low_power_slt's Introduction