Comments (20)
@Gldkslfmsd hi, run it: pip install pip install sacremoses
still:
from mosestokenizer import MosesTokenizer
ModuleNotFoundError: No module named 'mosestokenizer'
from whisper_streaming.
@lucasjinreal which version on macOS and Python (and pip) are you using?
I.e. what do these commands give you?
sw_vers
python3 --version
pip --version
Asking for pip specifically because with brew you can have a weird situation where pip
doesn't use the python3
in your path, but the one shipped with macOS.
opus-fast-mosestokenizer should have wheels for your setup (it does for mine, using up-to-date macOS 14.0 and Python 3.11 from Homebrew). I don't think building the package from source using pip works, since it has quite a few dependencies that don't come in the source tar.gz.
from whisper_streaming.
OK. But we need sentence segmentation in this repo, and I'm afraid that sacremoses doesn't have it. Do you know another option, @jelmervdl ?
Speed is the first priority in this repo, second priority is having it working.
from whisper_streaming.
Yes, that's fair. I mainly use other tools, not python libraries, for that, so no I don't really have an off-the-shelf alternative for you.
I'll have a look at whether I can make the setup.py behave more like a proper package that will download and install dependencies when not a wheel. If that's working, I'll also see whether I can switch this repo to cibuildwheels action which can do cross-compilation to make ARM wheels on Github Actions.
from whisper_streaming.
@lucasjinreal -- at this moment, you can use this fork: #23
I can not merge this in this version because it breaks working for Unix, but for you it should work, suboptimally, very slowly.
from whisper_streaming.
#20 ?
How about sacremoses?
from whisper_streaming.
OK. Then I'm sorry, I can't help with MacOS support, I don't have access to MacOS.
But generally, you can replace the tokenizer with another one that works. There are more implementations of mosestokenizer. I hope that one of them works, although not as fast as fast-mosestokenizer.
from whisper_streaming.
And are you sure that it doesn't work on MacOS? Isn't it because your python version?
mingruimingrui/fast-mosestokenizer#4
from whisper_streaming.
@jelmervdl Hi, am using python3.10 here.
I using pip install opus-fast-mosestokenizer, just got:
~~~~~^
/opt/homebrew/Cellar/abseil/20230802.1/include/absl/utility/utility.h:198:12: error: no member named 'in_place_index' in namespace 'std'
using std::in_place_index;
~~~~~^
/opt/homebrew/Cellar/abseil/20230802.1/include/absl/utility/utility.h:199:12: error: no member named 'in_place_index_t' in namespace 'std'
using std::in_place_index_t;
~~~~~^
In file included from /private/var/folders/m_/kzyr4q_11cl35ngrj77k28f00000gn/T/pip-install-frrhajaq/opus-fast-mosestokenizer_f3c2b876b3ef46da81e2b520befe57f0/src/Tokenizer.cpp:3:
In file included from /private/var/folders/m_/kzyr4q_11cl35ngrj77k28f00000gn/T/pip-install-frrhajaq/opus-fast-mosestokenizer_f3c2b876b3ef46da81e2b520befe57f0/include/mosestokenizer/Tokenizer.h:14:
In file included from /opt/homebrew/Cellar/re2/20230901/include/re2/re2.h:219:
/opt/homebrew/Cellar/abseil/20230802.1/include/absl/types/optional.h:48:12: error: no member named 'optional' in namespace 'std'
using std::optional;
~~~~~^
/opt/homebrew/Cellar/abseil/20230802.1/include/absl/types/optional.h:49:12: error: no member named 'make_optional' in namespace 'std'
using std::make_optional;
~~~~~^
/opt/homebrew/Cellar/abseil/20230802.1/include/absl/types/optional.h:50:12: error: no member named 'nullopt_t' in namespace 'std'; did you mean 'nullptr_t'?
using std::nullopt_t;
~~~~~^~~~~~~~~
nullptr_t
apparently, A: it doesn't provides wheel, it build on local; B: it doesn't work on my clang version, seems it using every old C++ version config.
from whisper_streaming.
@Gldkslfmsd unfortunately, I tested jetmervdl's fork, it doesn't able to build on macOS, many of its dependencies depends on tar.gz file, some of them can not build on macOS at all. Beside, this is a really a bad practically to include you dependencies as tar.gz (this is not a problem. but when glibc goes as tar.gz would cause many fundamental issue).
I still need help, none of opus-fastxxx got work on macOS actually.
from whisper_streaming.
@lucasjinreal how about installing from wheel, not from pip install opus-fast-mosestokenizer
? It means download the .whl file from Jelmer's link, and then follow this: https://www.educative.io/answers/how-to-install-a-python-package-with-a-whl-file
from whisper_streaming.
@Gldkslfmsd thanks for the info. But unfortunately am using M1 Mac book Pro. It doesn't provides such one wheel.
However, I am very wondering how does it alble to build it.... The opus-fast-mosestokenizer and fast-mosestokenizer code really like a mess to me.
Does there any more build friendly alternative lib to them? Actually, both opus-fast-mosestokenizer and fast-mosestokenizer seems not maintained any longer, opus-fast-mosestokenizer even just a fork with out issue feedback. Using them have very huge risk
from whisper_streaming.
Aah, arm, yes, I hadn't thought of that. Unfortunately there are no wheels for m1 yet. Those should definitely be added.
From what I understand, the plugin prefers to compile its own dependencies during build as opposed to using the shared libraries because often those don't allow static linking, which it needs for wheels.
The opus-fast-mosestokenizer repo is just a fork with the goal of fixing the installation issues encountered by the OPUS project and HPLT partners that uses them. I tried to get wheels built upstream, but the original fast-mosestokenizer repo seems to be unmaintained at the moment.
If you don't really need the speed, I'd say stick with something like sacremoses, which is better maintained and doesn't have all the dependencies.
from whisper_streaming.
OK, thanks, @jelmervdl .
Meanwhile, @lucasjinreal , you can try this: https://pypi.org/project/mosestokenizer/ It's runs perl script in the background process. Does it work on your MacOS? If yes, then we can integrate it to create_tokenizer function. Wrap it, to adapt the interface.
from whisper_streaming.
@Gldkslfmsd I can installed it: Successfully installed mosestokenizer-1.2.1 openfile-0.0.7 toolwrapper-2.1.0 uctools-1.3.0
Not sure can be called correctly or not. Have a fallback without building c++ libs is a good choice.
@jelmervdl Really appreciated if a setup.py which works for most distribution can be added. It useful turns of resolving building issues of opus-xxx libs. For me, the fast-xxx are really out-dated, but opus-xxx hard to build it doesn't support c++14 but google's absil needs it.
from whisper_streaming.
Hi, Mac M2 user here, following the thread.
Confirming mosestokenizer builds on arm wheels, and can thereafter be called: https://pastebin.com/YrTNA0jc
Similarly, sacremoses builds as well, but as @Gldkslfmsd mentioned earlier, it doesn't do sentence segmentation.
Also confirming opus-fast-mosestokenizer does not build for mac-arm. Full log here: https://pastebin.com/a10cW1jF
fast-mosestokenizer doesnt work either, but you seem to have already caught that: #5
ProductName: macOS ProductVersion: 13.6 BuildVersion: 22G120 Python 3.10.13 pip 23.3
from whisper_streaming.
@lucasjinreal Not sure can be called correctly or not.
Please, see the Sample Usage on https://pypi.org/project/mosestokenizer/ . Can you reproduce tha part with MosesSentenceSplitter in your Python? If yes, then it can be called correctly.
It seems that @vincentwi confirms it works on Mac. So now we need to integrate it . It's simple, in create_tokenizer
function, but I'm sorry I'm busy in the following month.
Have a fallback without building c++ libs is a good choice.
agree
from whisper_streaming.
@Gldkslfmsd any examples to follow?
from whisper_streaming.
closing -- same reason as #23
from whisper_streaming.
Here's the solution so people can find it while the pull request to opus-fast-mosestokenizer is pending:
from whisper_streaming.
Related Issues (20)
- can dubbing be done in real-time with 3060 12gb ? HOT 1
- About LocalAgreement HOT 1
- Link about the research paper seem broken HOT 2
- TypeError: Translations.create() got an unexpected keyword argument 'timestamp_granularities' HOT 2
- `tgt_language undefined` problem on `main` HOT 2
- Can't run from windows the command of recording from mic HOT 2
- Tracking down delay HOT 2
- --model_path Never Work! HOT 6
- How to use whisper_online_server.py on macOS HOT 2
- Help to to run the program to transcirbe real time audio from mic HOT 1
- [BUG] Unnecessary socket re-creation inside with statement in whisper_online_server.py HOT 3
- Use of another backend HOT 2
- OpenAi Api not adding punctuation HOT 10
- OpenAI Whisper is not working anymore as a backend for whisper_streaming HOT 5
- Could this impletemented with micphone as voice input? HOT 1
- unexpected slow speed HOT 3
- [Quesion] about embedding whisper on deivce? HOT 1
- bilgi/ instructions notice learning HOT 2
- Server and Client for Web App HOT 1
- How to start the command correctly:whisper_online_server.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper_streaming.