Coder Social home page Coder Social logo

word-level timestamps about faster-whisper HOT 8 CLOSED

systran avatar systran commented on May 11, 2024 4
word-level timestamps

from faster-whisper.

Comments (8)

guillaumekln avatar guillaumekln commented on May 11, 2024 6

I just pushed an experimental branch implementing word-level timestamps! It would be great if you can test this early.

Note that I implemented exactly the same logic as openai/whisper. So if there is a strange result and openai/whisper has the same result, you should report the issue to openai/whisper and not here.

Here's how you can test this today:

Install the development branch of faster-whisper

pip install --force-reinstall "faster-whisper[conversion] @ https://github.com/guillaumekln/faster-whisper/archive/refs/heads/word-level-timestamps.tar.gz"

Install the development build of CTranslate2

  1. Go to this build page
  2. Download the artifact "python-wheels"
  3. Extract the archive
  4. Install the wheel matching your system and Python version, for example:
pip install --force-reinstall ctranslate2-3.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Reconvert the model

The model should be converted again with the latest version of CTranslate2 as the configuration needs to be updated with additional information:

ct2-transformers-converter --model openai/whisper-large-v2 --output_dir whisper-large-v2-ct2 --copy_files tokenizer.json --quantization float16

Transcribe with word-level timestamps

segments, _ = model.transcribe(audio_path, word_timestamps=True)

for segment in segments:
    print(segment.words)

from faster-whisper.

guillaumekln avatar guillaumekln commented on May 11, 2024

Hi,

Word-level timestamps are currently not possible. They usually require extensions to the model that are not implemented at this time.

from faster-whisper.

tohe91 avatar tohe91 commented on May 11, 2024

Thank you for the amazing work on this!
It would be amazing if world level timestamps could be implemented in faster-whisper, once the world-level-timestamps branch is merged to main in whisper

from faster-whisper.

collynce avatar collynce commented on May 11, 2024

Just checked out the whisper repo and world-level timestamp PR has been merged. I would be great indeed to have the same on faster-whiper.

Great work!

from faster-whisper.

eschmidbauer avatar eschmidbauer commented on May 11, 2024

just tested this with the tiny model and it worked!
going to do more tests but this is great, thanks so much for sharing!

from faster-whisper.

eschmidbauer avatar eschmidbauer commented on May 11, 2024

large-v2 seems to work too. Thanks again

from faster-whisper.

Jeronymous avatar Jeronymous commented on May 11, 2024

When I tested word timestamps on a bunch of file, I saw this error happening on some corner case:

  File "/usr/local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 531, in add_word_timestamps
    alignment = self.find_alignment(tokenizer, text_tokens, mel, num_frames)
  File "/usr/local/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 598, in find_alignment
    start_times = jump_times[word_boundaries[:-1]]
IndexError: index 1 is out of bounds for axis 0 with size 1

from faster-whisper.

guillaumekln avatar guillaumekln commented on May 11, 2024

Thank you for testing!

Do you confirm the same file works without issue in openai/whisper? If yes, is it possible for you to share this input file?

from faster-whisper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.