Thanks for allosaurus, my experiments with it have been fruitful so far. Very impressi

Realtime? (low-latency streaming inference) about allosaurus HOT 5 OPEN

xinjli commented on June 8, 2024 1

Realtime? (low-latency streaming inference)

from allosaurus.

Comments (5)

IpsumDominum commented on June 8, 2024 2

Hello, just wanna clarify, is it because the current model uses a bidirectional LSTM this is not possible?

from allosaurus.

xinjli commented on June 8, 2024

Hi, thanks for asking!

Unfortunately, the current model is not able to do the real-time transcription. The real-time model would need some special architecture, which is not implemented in the current model.
If you want to use it for real-time purposes. Maybe the best way, for now, is to feed your audio stream into the model for a fixed amount of time (e.g: 2 second), and then concatenate the outputs.

from allosaurus.

willstott101 commented on June 8, 2024

Fair enough, I'm curious about the theoretical minimum latency of the model. I see there is a "window_size": 0.025, in pm_config.json and "window_size": 3, in am_config.json (uni2005). Is the minimum latency therefore basically 0.025 * 3 (seconds I assume), or am I wrong in assuming those window sizes are the overall limiter of data passed to any given execution of the neural network? Perhaps the network keeps state as windows are passed to it. Perhaps those windows aren't actually what I think they are. 🤷

from allosaurus.

xinjli commented on June 8, 2024

hi sorry for the late reply.

For this model, the minimum latency would be 0.025 + 0.01 + 0.01 because the window are overlapping by 0.01. And of course you also need to consider the time spent on feature extraction and inference

from allosaurus.

padster06 commented on June 8, 2024

hi, continuing on wills thread about real time audio streaming, we ran into a bit of a blocker. In the lifter (pm.feature.lifter) function it seems to change the output based on the length of the array inputted as "cepstra". The same array just with less elements gets returned with different values. Is there an obvious way to make this function invariant to input array length? Or do we need to keep state and have a rolling average type thing?

Thanks

from allosaurus.

Recommend Projects

Realtime? (low-latency streaming inference) about allosaurus HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent