ahmetoner / whisper-asr-webservice Goto Github PK

View Code? Open in Web Editor NEW

1.9K 1.9K 345.0 1.38 MB

OpenAI Whisper ASR Webservice API

Home Page: https://ahmetoner.github.io/whisper-asr-webservice

License: MIT License

Python 86.61% Dockerfile 13.39%

asr automatic-speech-recognition docker openai-whisper speech speech-recognition speech-to-text

whisper-asr-webservice's People

Contributors

Stargazers

Watchers

Forkers

insamniac ihanif willvin313 amrrs techthiyanes adamsch1 agarwalprashant rkarg-blizz winhoals afifudinmtop curisan marcin-laskowski gianpdev thevolcanomanishere svirmi kyleroot aiera-inc shenzaimin john0909 pngobiro dschoon thamwangjun bnayfeh matthiaszepper monicaarnaud 447806664 undo76 tomatobobot brahimmade gautampambhar prideland-okoi ericthemagician maiquangtuan jhenzi kristiankielhofner abeusher samanmohamadi drnic tradediamonds eff-kay tikibu radek-baczynski ylazerson sunny635 yeldarby ogis-do mcutshaw juxsta naksmaestroqa sirfragles crazy-airhead batwood001 land007 dalesjo swirkes lleixat leafonix avsavani smykov jacobezuidenhout joemgu7 amitkml aditzend wavelengthapp replicateanything nathankot atrdev-rgb mobro94 agoulah tvytlx dev-msp yeritsyantigran lucasleandro1204 ayancey rftaurus alienware ultwolf kang7367 guqiangjs surajk7 clonn caesarexma cartertsai xingfanxia sashker maze0417 time-sequence oguzsezer k-is-k datmt alexpryazhnikov nitishymtpl niugm guyt101z organicnz tommyorndorff rafaelcalleja fer334 publimonitor hadleyrich

whisper-asr-webservice's Issues

limit 6 tabs for long requests

Good afternoon.
For long requests, more than 6 are not processed.
Do not even open more than 6 tabs.
Is there any way to change this limit?

Python WHL package publishing pipeline for pypi registry

RuntimeError: 2D or 3D (batch mode) tensor expected for input, but got: [ torch.FloatTensor{1,1,0} ]

I am getting the following error while running this on Mac ( Intel)

Run Command:
docker run -d -p 9000:9000 -e ASR_MODEL=base onerahmet/openai-whisper-asr-webservice:latest

Console Log:

2022-12-19 19:47:42 +0000] [1] [INFO] Starting gunicorn 20.1.0

[2022-12-19 19:47:42 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)

[2022-12-19 19:47:42 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker

[2022-12-19 19:47:42 +0000] [8] [INFO] Booting worker with pid: 8

26%|█████████▋ | 35.4M/139M [00:13<00:29, 3.73MiB/s]

72%|███████████████████████████▎ | 99.4M/139M [00:24<00:06, 6.57MiB/s]

100%|███████████████████████████████████████| 139M/139M [00:31<00:00, 4.62MiB/s]

[2022-12-19 19:48:16 +0000] [8] [INFO] Started server process [8]

[2022-12-19 19:48:16 +0000] [8] [INFO] Waiting for application startup.

[2022-12-19 19:48:16 +0000] [8] [INFO] Application startup complete.

/app/.venv/lib/python3.9/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead

warnings.warn("FP16 is not supported on CPU; using FP32 instead")

[2022-12-19 19:50:26 +0000] [8] [ERROR] Exception in ASGI application

Traceback (most recent call last):

File "/app/.venv/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi

result = await app(  # type: ignore[func-returns-value]

File "/app/.venv/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call

return await self.app(scope, receive, send)

File "/app/.venv/lib/python3.9/site-packages/fastapi/applications.py", line 270, in call

await super().__call__(scope, receive, send)

File "/app/.venv/lib/python3.9/site-packages/starlette/applications.py", line 124, in call

await self.middleware_stack(scope, receive, send)

File "/app/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call

raise exc

File "/app/.venv/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call

await self.app(scope, receive, _send)

File "/app/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 75, in call

raise exc

File "/app/.venv/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 64, in call

await self.app(scope, receive, sender)

File "/app/.venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call

raise e

File "/app/.venv/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call

await self.app(scope, receive, send)

File "/app/.venv/lib/python3.9/site-packages/starlette/routing.py", line 680, in call

await route.handle(scope, receive, send)

File "/app/.venv/lib/python3.9/site-packages/starlette/routing.py", line 275, in handle

await self.app(scope, receive, send)

File "/app/.venv/lib/python3.9/site-packages/starlette/routing.py", line 65, in app

response = await func(request)

File "/app/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 231, in app

raw_response = await run_endpoint_function(

File "/app/.venv/lib/python3.9/site-packages/fastapi/routing.py", line 162, in run_endpoint_function

return await run_in_threadpool(dependant.call, **values)

File "/app/.venv/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool

return await anyio.to_thread.run_sync(func, *args)

File "/app/.venv/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync

return await get_asynclib().run_sync_in_worker_thread(

File "/app/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread

return await future

File "/app/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run

result = context.run(func, *args)

File "/app/app/webservice.py", line 70, in transcribe

result = run_asr(audio_file.file, task, language)

File "/app/app/webservice.py", line 117, in run_asr

result = model.transcribe(audio, **options_dict)

File "/app/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 84, in transcribe

mel = log_mel_spectrogram(audio)

File "/app/.venv/lib/python3.9/site-packages/whisper/audio.py", line 115, in log_mel_spectrogram

stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)

File "/app/.venv/lib/python3.9/site-packages/torch/functional.py", line 630, in stft

input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)

RuntimeError: 2D or 3D (batch mode) tensor expected for input, but got: [ torch.FloatTensor{1,1,0} ]

Can't run docker

2023-02-04 17:35:15 [2023-02-05 01:35:15 +0000] [1] [INFO] Starting gunicorn 20.1.0
2023-02-04 17:35:15 [2023-02-05 01:35:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)
2023-02-04 17:35:15 [2023-02-05 01:35:15 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
2023-02-04 17:35:15 [2023-02-05 01:35:15 +0000] [8] [INFO] Booting worker with pid: 8
2023-02-04 17:35:26 [2023-02-05 01:35:26 +0000] [8] [ERROR] Exception in worker process
2023-02-04 17:35:26 Traceback (most recent call last):
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 1348, in do_open
2023-02-04 17:35:26 h.request(req.get_method(), req.selector, req.data, headers,
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 1282, in request
2023-02-04 17:35:26 self._send_request(method, url, body, headers, encode_chunked)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 1328, in _send_request
2023-02-04 17:35:26 self.endheaders(body, encode_chunked=encode_chunked)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 1277, in endheaders
2023-02-04 17:35:26 self._send_output(message_body, encode_chunked=encode_chunked)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 1037, in _send_output
2023-02-04 17:35:26 self.send(msg)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 975, in send
2023-02-04 17:35:26 self.connect()
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 1447, in connect
2023-02-04 17:35:26 super().connect()
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/http/client.py", line 941, in connect
2023-02-04 17:35:26 self.sock = self._create_connection(
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/socket.py", line 824, in create_connection
2023-02-04 17:35:26 for res in getaddrinfo(host, port, 0, SOCK_STREAM):
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
2023-02-04 17:35:26 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
2023-02-04 17:35:26 socket.gaierror: [Errno -3] Temporary failure in name resolution
2023-02-04 17:35:26
2023-02-04 17:35:26 During handling of the above exception, another exception occurred:
2023-02-04 17:35:26
2023-02-04 17:35:26 Traceback (most recent call last):
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
2023-02-04 17:35:26 worker.init_process()
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/uvicorn/workers.py", line 66, in init_process
2023-02-04 17:35:26 super(UvicornWorker, self).init_process()
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
2023-02-04 17:35:26 self.load_wsgi()
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
2023-02-04 17:35:26 self.wsgi = self.app.wsgi()
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
2023-02-04 17:35:26 self.callable = self.load()
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
2023-02-04 17:35:26 return self.load_wsgiapp()
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
2023-02-04 17:35:26 return util.import_app(self.app_uri)
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/gunicorn/util.py", line 359, in import_app
2023-02-04 17:35:26 mod = importlib.import_module(module)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/importlib/init.py", line 126, in import_module
2023-02-04 17:35:26 return _bootstrap._gcd_import(name[level:], package, level)
2023-02-04 17:35:26 File "", line 1050, in _gcd_import
2023-02-04 17:35:26 File "", line 1027, in _find_and_load
2023-02-04 17:35:26 File "", line 1006, in _find_and_load_unlocked
2023-02-04 17:35:26 File "", line 688, in _load_unlocked
2023-02-04 17:35:26 File "", line 883, in exec_module
2023-02-04 17:35:26 File "", line 241, in _call_with_frames_removed
2023-02-04 17:35:26 File "/app/app/webservice.py", line 55, in
2023-02-04 17:35:26 model = whisper.load_model(model_name)
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/whisper/init.py", line 108, in load_model
2023-02-04 17:35:26 checkpoint_file = _download(_MODELS[name], download_root, in_memory)
2023-02-04 17:35:26 File "/app/.venv/lib/python3.10/site-packages/whisper/init.py", line 50, in _download
2023-02-04 17:35:26 with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen
2023-02-04 17:35:26 return opener.open(url, data, timeout)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 519, in open
2023-02-04 17:35:26 response = self._open(req, data)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 536, in _open
2023-02-04 17:35:26 result = self._call_chain(self.handle_open, protocol, protocol +
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 496, in _call_chain
2023-02-04 17:35:26 result = func(*args)
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 1391, in https_open
2023-02-04 17:35:26 return self.do_open(http.client.HTTPSConnection, req,
2023-02-04 17:35:26 File "/usr/local/lib/python3.10/urllib/request.py", line 1351, in do_open
2023-02-04 17:35:26 raise URLError(err)
2023-02-04 17:35:26 urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
2023-02-04 17:35:26 [2023-02-05 01:35:26 +0000] [8] [INFO] Worker exiting (pid: 8)
2023-02-04 17:35:26 [2023-02-05 01:35:26 +0000] [1] [INFO] Shutting down: Master
2023-02-04 17:35:26 [2023-02-05 01:35:26 +0000] [1] [INFO] Reason: Worker failed to boot.

Running dual GPUs. Only one is used.

Processor 13th Gen Intel(R) Core(TM) i9-13900KF 3.00 GHz
Installed RAM 64.0 GB (63.8 GB usable)
System type 64-bit operating system, x64-based processor
GPU : Dual Nvidia Geforce 3090

PS C:\Users\rainm> docker run -it --gpus=all --rm nvidia/cuda:11.4.2-base-ubuntu20.04 nvidia-smi
Thu Feb  9 07:10:31 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.05    Driver Version: 528.24       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   48C    P8    43W / 420W |  11306MiB / 24576MiB |     15%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:02:00.0 Off |                  N/A |
|  0%   48C    P8    38W / 420W |  11306MiB / 24576MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A         7      C   /python3.10                     N/A      |
|    1   N/A  N/A         7      C   /python3.10                     N/A      |
+-----------------------------------------------------------------------------+```

2023-02-08 16:13:49 [2023-02-09 00:13:49 +0000] [1] [INFO] Starting gunicorn 20.1.0
2023-02-08 16:13:49 [2023-02-09 00:13:49 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)
2023-02-08 16:13:49 [2023-02-09 00:13:49 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
2023-02-08 16:13:49 [2023-02-09 00:13:49 +0000] [7] [INFO] Booting worker with pid: 7
14%|��2023-02-09T00:14:09.555821584Z �████▍ | 41 27%|██████████▏ 2023-02-09T00:14:09.555821584Z | 789M 39%|██████████████▎ | 1.11G2023-02-09T00:14:09.555821584Z 49%|��2023-02-09T00:14:09.555821584Z �█████████████████▎ | 1.4 60%|████████��2023-02-09T00:14:09.555821584Z �█████████████ | 1.7 69%|█████████████████████████▋ 2023-02-09T00:14:09.555821584Z | 2.00G 79%|█████████████████████████████ |2023-02-09T00:14:09.555821584Z 2.26G 87%|███��2023-02-09T00:14:09.555821584Z �████████████████████████████▎ | 2.5 96%|████████████████████████████████2023-02-09T00:14:09.555821584Z ███▍ | 2.75G100%|█████████████████████████████████████| 2.87G/2.87G [02:13<00:00, 23.1MiB/s]
2023-02-08 16:16:18 [2023-02-09 00:16:18 +0000] [7] [INFO] Started server process [7]
2023-02-08 16:16:18 [2023-02-09 00:16:18 +0000] [7] [INFO] Waiting for application startup.
2023-02-08 16:16:18 [2023-02-09 00:16:18 +0000] [7] [INFO] Application startup complete.
2023-02-08 16:17:21 [2023-02-09 00:17:21 +0000] [7] [ERROR] Exception in ASGI application
2023-02-08 16:17:21 Traceback (most recent call last):
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi
2023-02-08 16:17:21 result = await app( # type: ignore[func-returns-value]
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
2023-02-08 16:17:21 return await self.app(scope, receive, send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 270, in call
2023-02-08 16:17:21 await super().call(scope, receive, send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/applications.py", line 124, in call
2023-02-08 16:17:21 await self.middleware_stack(scope, receive, send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
2023-02-08 16:17:21 raise exc
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
2023-02-08 16:17:21 await self.app(scope, receive, _send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 75, in call
2023-02-08 16:17:21 raise exc
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 64, in call
2023-02-08 16:17:21 await self.app(scope, receive, sender)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
2023-02-08 16:17:21 raise e
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
2023-02-08 16:17:21 await self.app(scope, receive, send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 680, in call
2023-02-08 16:17:21 await route.handle(scope, receive, send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 275, in handle
2023-02-08 16:17:21 await self.app(scope, receive, send)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 65, in app
2023-02-08 16:17:21 response = await func(request)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 231, in app
2023-02-08 16:17:21 raw_response = await run_endpoint_function(
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
2023-02-08 16:17:21 return await run_in_threadpool(dependant.call, **values)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
2023-02-08 16:17:21 return await anyio.to_thread.run_sync(func, *args)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
2023-02-08 16:17:21 return await get_asynclib().run_sync_in_worker_thread(
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
2023-02-08 16:17:21 return await future
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
2023-02-08 16:17:21 result = context.run(func, *args)
2023-02-08 16:17:21 File "/app/app/webservice.py", line 71, in transcribe
2023-02-08 16:17:21 result = run_asr(audio_file.file, task, language, initial_prompt)
2023-02-08 16:17:21 File "/app/app/webservice.py", line 122, in run_asr
2023-02-08 16:17:21 result = model.transcribe(audio, **options_dict)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/whisper/transcribe.py", line 84, in transcribe
2023-02-08 16:17:21 mel = log_mel_spectrogram(audio)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/whisper/audio.py", line 115, in log_mel_spectrogram
2023-02-08 16:17:21 stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
2023-02-08 16:17:21 File "/app/.venv/lib/python3.10/site-packages/torch/functional.py", line 630, in stft
2023-02-08 16:17:21 input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
2023-02-08 16:17:21 RuntimeError: 2D or 3D (batch mode) tensor expected for input, but got: [ torch.FloatTensor{1,1,0} ]
2023-02-08 16:18:29 [2023-02-09 00:18:29 +0000] [7] [ERROR] Exception in ASGI application
2023-02-08 16:18:29 Traceback (most recent call last):
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi
2023-02-08 16:18:29 result = await app( # type: ignore[func-returns-value]
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
2023-02-08 16:18:29 return await self.app(scope, receive, send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 270, in call
2023-02-08 16:18:29 await super().call(scope, receive, send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/applications.py", line 124, in call
2023-02-08 16:18:29 await self.middleware_stack(scope, receive, send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
2023-02-08 16:18:29 raise exc
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
2023-02-08 16:18:29 await self.app(scope, receive, _send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 75, in call
2023-02-08 16:18:29 raise exc
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 64, in call
2023-02-08 16:18:29 await self.app(scope, receive, sender)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
2023-02-08 16:18:29 raise e
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
2023-02-08 16:18:29 await self.app(scope, receive, send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 680, in call
2023-02-08 16:18:29 await route.handle(scope, receive, send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 275, in handle
2023-02-08 16:18:29 await self.app(scope, receive, send)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 65, in app
2023-02-08 16:18:29 response = await func(request)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 231, in app
2023-02-08 16:18:29 raw_response = await run_endpoint_function(
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
2023-02-08 16:18:29 return await run_in_threadpool(dependant.call, **values)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
2023-02-08 16:18:29 return await anyio.to_thread.run_sync(func, *args)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
2023-02-08 16:18:29 return await get_asynclib().run_sync_in_worker_thread(
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
2023-02-08 16:18:29 return await future
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
2023-02-08 16:18:29 result = context.run(func, *args)
2023-02-08 16:18:29 File "/app/app/webservice.py", line 71, in transcribe
2023-02-08 16:18:29 result = run_asr(audio_file.file, task, language, initial_prompt)
2023-02-08 16:18:29 File "/app/app/webservice.py", line 122, in run_asr
2023-02-08 16:18:29 result = model.transcribe(audio, **options_dict)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/whisper/transcribe.py", line 84, in transcribe
2023-02-08 16:18:29 mel = log_mel_spectrogram(audio)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/whisper/audio.py", line 115, in log_mel_spectrogram
2023-02-08 16:18:29 stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
2023-02-08 16:18:29 File "/app/.venv/lib/python3.10/site-packages/torch/functional.py", line 630, in stft
2023-02-08 16:18:29 input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
2023-02-08 16:18:29 RuntimeError: 2D or 3D (batch mode) tensor expected for input, but got: [ torch.FloatTensor{1,1,0} ]
2023-02-08 16:19:21 [2023-02-09 00:19:21 +0000] [7] [ERROR] Exception in ASGI application
2023-02-08 16:19:21 Traceback (most recent call last):
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi
2023-02-08 16:19:21 result = await app( # type: ignore[func-returns-value]
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
2023-02-08 16:19:21 return await self.app(scope, receive, send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 270, in call
2023-02-08 16:19:21 await super().call(scope, receive, send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/applications.py", line 124, in call
2023-02-08 16:19:21 await self.middleware_stack(scope, receive, send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
2023-02-08 16:19:21 raise exc
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
2023-02-08 16:19:21 await self.app(scope, receive, _send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 75, in call
2023-02-08 16:19:21 raise exc
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 64, in call
2023-02-08 16:19:21 await self.app(scope, receive, sender)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
2023-02-08 16:19:21 raise e
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
2023-02-08 16:19:21 await self.app(scope, receive, send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 680, in call
2023-02-08 16:19:21 await route.handle(scope, receive, send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 275, in handle
2023-02-08 16:19:21 await self.app(scope, receive, send)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 65, in app
2023-02-08 16:19:21 response = await func(request)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 231, in app
2023-02-08 16:19:21 raw_response = await run_endpoint_function(
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
2023-02-08 16:19:21 return await run_in_threadpool(dependant.call, **values)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
2023-02-08 16:19:21 return await anyio.to_thread.run_sync(func, *args)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
2023-02-08 16:19:21 return await get_asynclib().run_sync_in_worker_thread(
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
2023-02-08 16:19:21 return await future
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
2023-02-08 16:19:21 result = context.run(func, *args)
2023-02-08 16:19:21 File "/app/app/webservice.py", line 71, in transcribe
2023-02-08 16:19:21 result = run_asr(audio_file.file, task, language, initial_prompt)
2023-02-08 16:19:21 File "/app/app/webservice.py", line 122, in run_asr
2023-02-08 16:19:21 result = model.transcribe(audio, **options_dict)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/whisper/transcribe.py", line 84, in transcribe
2023-02-08 16:19:21 mel = log_mel_spectrogram(audio)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/whisper/audio.py", line 115, in log_mel_spectrogram
2023-02-08 16:19:21 stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
2023-02-08 16:19:21 File "/app/.venv/lib/python3.10/site-packages/torch/functional.py", line 630, in stft
2023-02-08 16:19:21 input = F.pad(input.view(extended_shape), [pad, pad], pad_mode)
2023-02-08 16:19:21 RuntimeError: 2D or 3D (batch mode) tensor expected for input, but got: [ torch.FloatTensor{1,1,0} ]
2023-02-08 23:06:56 [2023-02-09 07:06:56 +0000] [1] [INFO] Starting gunicorn 20.1.0
2023-02-08 23:06:56 [2023-02-09 07:06:56 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)
2023-02-08 23:06:56 [2023-02-09 07:06:56 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
2023-02-08 23:06:56 [2023-02-09 07:06:56 +0000] [7] [INFO] Booting worker with pid: 7
2023-02-08 23:07:12 [2023-02-09 07:07:12 +0000] [7] [INFO] Started server process [7]
2023-02-08 23:07:12 [2023-02-09 07:07:12 +0000] [7] [INFO] Waiting for application startup.
2023-02-08 23:07:12 [2023-02-09 07:07:12 +0000] [7] [INFO] Application startup complete.

Only the base model is used, even choosing another option

Hi, thanks a lot for releasing Whisper for docker, it's working as expected and using gpu now,

However, when we select some other model (other than the base model), still the base model is apparently what is used,

How to fix this problem? so we can use other models?

Command used:

docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=medium onerahmet/openai-whisper-asr-webservice-gpu

Thanks Lucas Rodrigues.

Installation Instructions Unclear

The instructions seem to be written for people who know a lot about something I don't.
I'm a clever guy, but I have no idea how to make sense of the instructions.

delete me

Cancel!

Request to include Faster-Whisper in API choices

Hi,

I have been testing Faster-Whisper, a library that claims to provide 4x faster inference than normal Whisper. After trying it out, I can confirm that it does seem to be faster.

I was wondering if it would be possible to include Faster-Whisper as a choice in your API? This would be beneficial for users who need faster inference times and would like to have the option to use Faster-Whisper.

Thank you for considering my request. Let me know if you need any additional information or have any questions.

Best regards,
Rishish Pandey

Combine JSON, SRT, and VTT output methods in one endpoint

Current transcribe version provides different outputs (JSON, SRT, and VTT) with different endpoints. It needs to be merged in one endpoint.

There

gunicorn is not recognized as an internal or external command...

When I try to run the gunicorn command, I keep getting the error:

"gunicorn is not recognized as an internal or external command, operable program, or batch file"

All the other command work, just this one is giving me issues.

Make Swagger doc not depend on internet connection

If one runs this on a computer that does not have internet connection, the Swagger docs don't work as the browser tries to download the Javascript from some CDN.

Combining with an external VAD or diarization

Start/End times of the segments are not always correct. Actually, the first segment always starts at the beginning of the file, even if it starts with a long silence. Apart from the timing being off, it can also cause the model to hallucinate a transcript, as it tries to transcribe this silence.

This can be solved by doing the segmentation based on voice activity detection (VAD) first, and then using whisper to transcribe each segment.

While we are at it, we can even put pyannote in the pipeline to support the full diarization.

Option to split output by speaker for multi channel audio

Whisper.cpp has an option to output speaker markers when feeding multi channel audio files. Is this possible with faster whisper?

GPU

I get this error message:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

update the versions

please update the versions of the applications that you use, as well as the whisper commit (7858aa9c08d98f75575035ecd6481f462d66ca27). And is it possible to use python 3.11?

Is there any way we can do realtime streaming by this web service?

Hello,

First of all thanks a lot for this repo. I was wondering is this possible to use this API as websocket and then send streaming audio from microphone? As of now, this is accepting audio file.

Thanks

Hosted Swagger documentation with descriptions

Queueing for parallel execution

Hi @ahmetoner!

We found your nice web service while doing research for a project we're doing to make Whisper more accessible, where a web service is a necessary component. Thanks for making it and making it open source! We plan on doing that as well, and would be happy to share it with you in case its useful.

For our use case, we'd need the web service to be able to queue files for transcription, as many users may use it at once, and Whisper will hog the whole GPU and (as far as we know) crash upon trying to use VRAM that isn't available. If we can extend your web service to work off a queuing system, would you be inteterested in a pull request for that?

Best regards,
The speech to text team @schibsted

VTT and SRT output

Returning only auto-detected language

I ran into an issue where the language returned is the auto-detected language instead of translating it to English or specifically stating that the file is in English.

I'm setting the parameters as per the below

   // Kotlin using OKHttp, basically copied from Postman generated code
   val myBody = MultipartBody.Builder()
      .setType(MultipartBody.FORM)
      .addFormDataPart(
         "audio_file",
         aFile.name,
         aFile.asRequestBody("application/octet-stream".toMediaTypeOrNull()))
      .addFormDataPart("task", "transcribe") // also tried "task", "translate"
      .addFormDataPart("language", "en")
      .addFormDataPart("output", "json")
      .build()

But with my accented Spanish speaking file and my accented Japanese speaking file, the response is coming back in Spanish/Japanese. It seems like the model isn't picking up the language or translate flags and just going by the detected language.

I suspect this could be because the language options are all lower case, while the example given on openai/whisper starts off with uppercase:

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

whisper japanese.wav --language Japanese
Adding --task translate will translate the speech into English:

whisper japanese.wav --language Japanese --task translate

I get: Expected UploadFile, received: <class 'str'>

I have thrown together this python audio recorder that tries to use this server, but with the below code I got the response:
{"detail":[{"loc":["body","audio_file"],"msg":"Expected UploadFile, received: <class 'str'>","type":"value_error"}]}
Could somebody help me and tell me how to fix this? The last ~15 lines are the problem I think. The recording works correctly.

#!/usr/bin/env python3

import soundfile as sf
import pyaudio
import wave
import os
import tempfile
import signal
import requests

chunk = 1024  # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16  # 16 bits per sample
channels = 1
fs = 44100  # Record at 44100 samples per second

stop_recording = False

def signal_handler(sig, frame):
    global stop_recording 
    stop_recording = True

signal.signal(signal.SIGINT, signal_handler)

p = pyaudio.PyAudio()  # Create an interface to PortAudio

print('Recording')

stream = p.open(format=sample_format,
                channels=channels,
                rate=fs,
                frames_per_buffer=chunk,
                input=True)

frames = []  # Initialize array to store frames

# Store data in chunks for 3 seconds
while True:
    data = stream.read(chunk)
    frames.append(data)
    if stop_recording:
        break

# Stop and close the stream 
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()

print('Finished recording')

with tempfile.TemporaryDirectory() as tmp_dir:
    wav_path = f"{tmp_dir}/temp.wav"
    mp3_path = f"{tmp_dir}/temp.mp3"

    print('saving wav')
    wf = wave.open(wav_path, 'wb')
    wf.setnchannels(channels)
    wf.setsampwidth(p.get_sample_size(sample_format))
    wf.setframerate(fs)
    wf.writeframes(b''.join(frames))
    wf.close()

    print('saving mp3')
    data, fs = sf.read(wav_path) 
    sf.write(mp3_path, data, fs)
    with open(mp3_path, 'rb') as f:
        data_mp3 = f.read()

    print(type(data_mp3))

    url = 'http://fritz.local:9000/asr'
    params = {'task':'transcribe'}
    send_data = {'audio_file': data_mp3}
    x = requests.post(url, json=params, data = send_data)
    print(x.text)

Locking whisper dependency version with git revision number via pyproject.toml

Do you have any suggestion on deploying Whisper?

I'm generally looking for different ideas to deploy Whisper based product. Do you have any suggestion?

[FEATURE REQUEST] Translate into more than one language

I recently used your application and noticed that the translation feature only supports English. However, I have seen other wisper tools that can translate to multiple languages. I was wondering if it would be possible to add support for additional languages in the future?

It would be great if the application could support translation to any language it supports, this would greatly enhance the user experience and provide more flexibility.

Please let me know if this is something that can be implemented, and if there's any additional information that you need from me.

Thank you!

CUDA version of Docker image

exec /opt/poetry-venv/bin/poetry: exec format error

While running CPU Dockerfile on AWS ECS Fargate, getting this poetry exec format error. The Fargate task definition stays in the Pending state and I cannot change it to the running state. Any idea why?

Cannot run multiple workers

Running multiple workers on uvicorn is not supported if it is ran via python.

We need to run uvicorn via CLI to enable worker scaling.

Posting a file via Swagger takes forever

For some reason it takes forever to request for simple 3sec-duration file translation:

Is it ok for the first time?

GPU consumption at 100% all the time

Good morning, now it is working with the mediu model,

However I'm a little confused

As I have zero knowledge about Docker,

But because when opening docker and starting the instance, I notice that the gpu consumption is almost at 100% intermittent usage,

So I would like to know if the VRAM is really being used, or is it just a (fake) virtualization?

Thank you and I look forward to your reply;

Yours sincerely,
Lucas Rodrigues.

Local setup instructions seem out-of-date

There seem to be a few issues with the local setup instructions

pip3 install poetry==1.2.2
The version specified here seems to have some issues with loading the dependencies (at least for me)
The latest poetry version seems to be fine, is there any reason not to use it?
pip3 install torch==1.13.0+cpu -f https://download.pytorch.org/whl/torch
There seems to be no 1.13.0+cpu version as mentioned in #57
Also I was wondering why this step is required, isn't torch also part of the poetry dependencies?
The commited poetry.lock seems to be out of date
It gives a warning Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies.
and doesn't install all dependencies correctly and throws an error when installing more-itertools
Running poetry lock solved the problem for me
The 'Starting the webservice' instruction requires a global installation of gunicorn
I guess this should be poetry run gunicorn --bind 0.0.0.0:9001 --workers 1 --timeout 0 app.webservice:app -k uvicorn.workers.UvicornWorker
The mentioned script poetry run whisper_asr is no longer part of https://github.com/ahmetoner/whisper-asr-webservice/blob/main/pyproject.toml

Transcription never finishes

I found your container after my own implimentation kept stalling. Your's seems to be doing the same.

When run locally, whisper runs fine in a virtual env.

But when I run your container or my container it stalls here for 20-50 minutes before returning a transcription

2023-04-01 18:32:58 [2023-04-01 10:32:58 +0000] [1] [INFO] Starting gunicorn 20.1.0 2023-04-01 18:32:58 [2023-04-01 10:32:58 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1) 2023-04-01 18:32:58 [2023-04-01 10:32:58 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker 2023-04-01 18:32:58 [2023-04-01 10:32:58 +0000] [7] [INFO] Booting worker with pid: 7 100%|███████████████████████████████████████| 139M/139M [00:07<00:00, 19.9MiB/s] 2023-04-01 18:33:07 [2023-04-01 10:33:07 +0000] [7] [INFO] Started server process [7] 2023-04-01 18:33:07 [2023-04-01 10:33:07 +0000] [7] [INFO] Waiting for application startup. 2023-04-01 18:33:07 [2023-04-01 10:33:07 +0000] [7] [INFO] Application startup complete. 2023-04-01 18:33:59 /app/.venv/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead 2023-04-01 18:33:59 warnings.warn("FP16 is not supported on CPU; using FP32 instead")

Request for Parallel Processing in Web Service

Hi,

I recently used your web service and attempted to make multiple requests together. However, I noticed that the service handled them in a sequential manner only. I was wondering if it's possible to process them in parallel instead?

As a newcomer to this field, I'm not sure if multiple GPUs are necessary to achieve parallel processing. If that's the case, please let me know.

Thank you for your time and assistance. Please feel free to ask if you need any further information.

Best regards,
Rishish Pandey

Concurrency

Hi 👋 -- thank you for sharing your efforts!

I noticed that concurrent requests will result in a 500 error, please find a trace at the end of this message.

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 404, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/fastapi/applications.py", line 261, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/fastapi/routing.py", line 227, in app
    raw_response = await run_endpoint_function(
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/src/whisper_asr/webservice.py", line 31, in transcribe_file
    result = model.transcribe(audio, **options_dict)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/transcribe.py", line 182, in transcribe
    result = decode_with_fallback(segment)[0]
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/transcribe.py", line 110, in decode_with_fallback
    results = model.decode(segment, options)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/decoding.py", line 699, in decode
    result = DecodingTask(model, options).run(mel)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/decoding.py", line 631, in run
    tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/decoding.py", line 586, in _main_loop
    logits = self.inference.logits(tokens, audio_features)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/decoding.py", line 145, in logits
    return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/model.py", line 189, in forward
    x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/model.py", line 124, in forward
    x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/model.py", line 85, in forward
    wv = self.qkv_attention(q, k, v, mask)
  File "/root/.cache/pypoetry/virtualenvs/whisper-asr-webservice-9TtSrW0h-py3.9/lib/python3.9/site-packages/whisper/model.py", line 97, in qkv_attention
    qk = qk + mask[:n_ctx, :n_ctx]
RuntimeError: The size of tensor a (67) must match the size of tensor b (3) at non-singleton dimension 3

[Feature Request] Allow model selections

Running the command within the localhost framework, there are several options. Model (tiny, etc) isn't one of them. Could that be added?

Again, great job with this.

Thomas

Just A Note on Mac Torch Install

On Mac, I couldn't get torch to install with the commands you had listed but modified the command to
pip3 install torch==1.13.0 -f https://download.pytorch.org/whl/torch and it installed.

TARGETPLATFORM is not specified?

When trying to build the dockerfile on my system I was running into the issue where the TARGETPLATFORM ARG was not populated and torch did not install, causing the container to successfully build, but immediately die once started. Unless there is something I am missing in regard the the TARGETPLATFORM ARG, then I would suggest that "--platform" be added to the documentation examples, and potentially install for torch for amd64 by default, as I have an unfounded assumption that platform is the more common for installs.

Feature Request: Callback URL when done transcribing

Good morning,

Thanks for all the effort that you have put into this repo. We use Cloudflare to tunnel requests to whisper that runs on a server in our office (Currently only CPU). Cloudflare has a max of 100sec timeout so some long files does take longer than that... Is it possible to perhaps add a callback url (and ideally headers/cookies) to the API post so that when the transcription has finished it can post the results back...?

Docker install issue

poetry install failed

#11 6.118   • Installing ffmpeg-python (0.2.0)
#11 6.120   • Installing h11 (0.14.0)
#11 6.120   • Installing httptools (0.5.0)
#11 6.122   • Installing iniconfig (2.0.0)
#11 6.124   • Installing pluggy (1.0.0)
#11 6.126   • Installing py (1.11.0)
#11 6.128   • Updating more-itertools (9.1.0 -> 9.0.0)
#11 6.129   • Installing python-dotenv (0.21.1)
#11 6.130   • Installing pydantic (1.10.4)
#11 6.135   • Updating setuptools (67.6.0 -> 67.1.0)
#11 6.136   • Installing toml (0.10.2)
#11 6.138   • Installing transformers (4.26.0)
#11 6.143   • Installing torch (1.12.0)
#11 6.145   • Installing starlette (0.20.4)
#11 6.151   • Installing uvloop (0.17.0)
#11 6.155   • Installing websockets (10.4)
#11 6.157   • Installing watchfiles (0.18.1)
#11 6.245 Connection pool is full, discarding connection: pypi.org. Connection pool size: 10
 
#11 6.259 Connection pool is full, discarding connection: pypi.org. Connection pool size: 10
#11 6.270 Connection pool is full, discarding connection: pypi.org. Connection pool size: 10
#11 6.276 Connection pool is full, discarding connection: pypi.org. Connection pool size: 10
#11 6.296 Connection pool is full, discarding connection: pypi.org. Connection pool size: 10
#11 6.338 Connection pool is full, discarding connection: pypi.org. Connection pool size: 10
 
#11 26.60
#11 26.60   _WheelFileValidationError
#11 26.60
#11 26.60   ["In /root/.cache/pypoetry/artifacts/7d/3e/e5/7917acdd005be9659419d0aaf31f173ded60517f74f643685cbb825ba8/torch-1.12.0-cp310-cp310-manylinux1_x86_64.whl, hash / size of torch-1.12.0.dist-info/METADATA didn't match RECORD"]
#11 26.60
#11 26.60   at .venv/lib/python3.10/site-packages/installer/sources.py:289 in validate_record
#11 26.62       285│                         f"In {self._zipfile.filename}, hash / size of {item.filename} didn't match RECORD"
#11 26.62       286│                     )
#11 26.62       287│
#11 26.62       288│         if issues:
#11 26.62     → 289│             raise _WheelFileValidationError(issues)
#11 26.62       290│
 
#11 26.62       291│     def get_contents(self) -> Iterator[WheelContentElement]:
#11 26.62       292│         """Sequential access to all contents of the wheel (including dist-info files).
#11 26.62       293│
#11 26.62
 
#11 ERROR: executor failed running [/bin/sh -c poetry install]: exit code: 1
-----
> [7/7] RUN poetry install:
-----
 
executor failed running [/bin/sh -c poetry install]: exit code: 1

.m4a Support

openai/whisper#238 (reply in thread)

Lock the latest version

Upgrade whisper version to latest commit (9f70a352f9f8630ab3aa0d06af5cb9532bd8c21d)

openai/whisper@9f70a35

Mp3 file support

GPU out of memory issues

Hi there,
Is there a way to limit the number of concurrent processes at the same time to prevent GPU out of memory error? For example we may use semaphore to limit the number of concurrent jobs

Feature Request? Faster-whisper

There is another project that claims to run whisper models 4x faster at the same accuracy.

https://github.com/guillaumekln/faster-whisper

The api itself seems to be similar or identical with whisper's python api.

It would be great to also have an integration into an api compatible webservice.

Feature Request: Api key

It would be cool to add an api key.

Use Case:
host whisper and restrict access via api key.

e.g. that an additional field can be set that is checked against the correct value, e.g. from en environment variable.

langauge_code in /detect-language

Any reasons why the return field of /detect-language is "langauge_code" and not "language_code" or is it just a typo ?

Possible to run on Jetson Nano?

I'm trying to run the gpu accelerated version on a jetson nano. I'm not sure if it's supposed to work though.

I updated docker to the latest version. Unfortunately the gpu version of the package does not support the arm architecture. Thus, I tried to build myself.

When trying to build the Dockerfile.gpu I'm running into the following errors:

#0 47.29 Collecting pycparser
#0 47.33   Downloading pycparser-2.21-py2.py3-none-any.whl (118 kB)
#0 47.41      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 118.7/118.7 kB 1.6 MB/s eta 0:00:00
#0 48.74 Installing collected packages: webencodings, pylev, ptyprocess, msgpack, lockfile, distlib, zipp, urllib3, tomlkit, six, shellingham, pyrsistent, pycparser, poetry-core, platformdirs, pkginfo, pexpect, packaging, more-itertools, jeepney, idna, filelock, crashtest, charset-normalizer, certifi, cachy, attrs, virtualenv, requests, jsonschema, jaraco.classes, importlib-metadata, html5lib, dulwich, cleo, cffi, requests-toolbelt, cryptography, cachecontrol, SecretStorage, keyring, poetry-plugin-export, poetry
#0 58.98 Successfully installed SecretStorage-3.3.3 attrs-22.1.0 cachecontrol-0.12.11 cachy-0.3.0 certifi-2022.12.7 cffi-1.15.1 charset-normalizer-2.1.1 cleo-1.0.0a5 crashtest-0.3.1 cryptography-38.0.4 distlib-0.3.6 dulwich-0.20.50 filelock-3.8.2 html5lib-1.1 idna-3.4 importlib-metadata-4.13.0 jaraco.classes-3.2.3 jeepney-0.8.0 jsonschema-4.17.3 keyring-23.11.0 lockfile-0.12.2 more-itertools-9.0.0 msgpack-1.0.4 packaging-22.0 pexpect-4.8.0 pkginfo-1.9.2 platformdirs-2.6.0 poetry-1.2.0 poetry-core-1.1.0 poetry-plugin-export-1.1.2 ptyprocess-0.7.0 pycparser-2.21 pylev-1.4.0 pyrsistent-0.19.2 requests-2.28.1 requests-toolbelt-0.9.1 shellingham-1.5.0 six-1.16.0 tomlkit-0.11.6 urllib3-1.26.13 virtualenv-20.17.1 webencodings-0.5.1 zipp-3.11.0
#0 61.34 Looking in links: https://download.pytorch.org/whl/torch
#0 63.51 ERROR: Could not find a version that satisfies the requirement torch==1.13.0+cu117 (from versions: 1.8.0, 1.8.1, 1.9.0, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0)
#0 63.51 ERROR: No matching distribution found for torch==1.13.0+cu117
------
failed to solve: executor failed running [/bin/sh -c python3 -m venv $POETRY_VENV     && $POETRY_VENV/bin/pip install -U pip setuptools     && $POETRY_VENV/bin/pip install poetry==${POETRY_VERSION}     && $POETRY_VENV/bin/pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/torch]: exit code: 1

Can't run on Ampere A100 40Gb

Hi,
I'm trying to install the package on an Ampere A100 40GB system, using the default Dockerfile.gpu file.
The image gets built but when the container starts I get the following logs:
[...]
NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70
[...]

What/how do I have to modify to get it run on this environment?
many thanks ...

Pinning poetry version

The recently released 1.4.1 version of poetry exposes an issue with some dependencies where package hashes/sizes are marked as invalid and installation fails with WheelFileValidationError, e.g. torch==1.12.0 - pytorch/pytorch#97153

While this behavior can be disabled by setting installer.modern-installation to false (python-poetry/poetry#7686) or downgrading from 1.4.1 to 1.4.0, pinning pip install poetry in Dockerfiles seems like a good idea in case of other breaking changes in the future.