Coder Social home page Coder Social logo

macOS support about whisper_streaming HOT 20 CLOSED

lucasjinreal avatar lucasjinreal commented on June 17, 2024
macOS support

from whisper_streaming.

Comments (20)

lucasjinreal avatar lucasjinreal commented on June 17, 2024 1

@Gldkslfmsd hi, run it: pip install pip install sacremoses

still:

from mosestokenizer import MosesTokenizer
ModuleNotFoundError: No module named 'mosestokenizer'

from whisper_streaming.

jelmervdl avatar jelmervdl commented on June 17, 2024 1

@lucasjinreal which version on macOS and Python (and pip) are you using?

I.e. what do these commands give you?

sw_vers
python3 --version
pip --version

Asking for pip specifically because with brew you can have a weird situation where pip doesn't use the python3 in your path, but the one shipped with macOS.

opus-fast-mosestokenizer should have wheels for your setup (it does for mine, using up-to-date macOS 14.0 and Python 3.11 from Homebrew). I don't think building the package from source using pip works, since it has quite a few dependencies that don't come in the source tar.gz.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024 1

OK. But we need sentence segmentation in this repo, and I'm afraid that sacremoses doesn't have it. Do you know another option, @jelmervdl ?

Speed is the first priority in this repo, second priority is having it working.

from whisper_streaming.

jelmervdl avatar jelmervdl commented on June 17, 2024 1

Yes, that's fair. I mainly use other tools, not python libraries, for that, so no I don't really have an off-the-shelf alternative for you.

I'll have a look at whether I can make the setup.py behave more like a proper package that will download and install dependencies when not a wheel. If that's working, I'll also see whether I can switch this repo to cibuildwheels action which can do cross-compilation to make ARM wheels on Github Actions.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024 1

@lucasjinreal -- at this moment, you can use this fork: #23
I can not merge this in this version because it breaks working for Unix, but for you it should work, suboptimally, very slowly.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

#20 ?
How about sacremoses?

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

OK. Then I'm sorry, I can't help with MacOS support, I don't have access to MacOS.

But generally, you can replace the tokenizer with another one that works. There are more implementations of mosestokenizer. I hope that one of them works, although not as fast as fast-mosestokenizer.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

And are you sure that it doesn't work on MacOS? Isn't it because your python version?

mingruimingrui/fast-mosestokenizer#4

from whisper_streaming.

lucasjinreal avatar lucasjinreal commented on June 17, 2024

@jelmervdl Hi, am using python3.10 here.

I using pip install opus-fast-mosestokenizer, just got:

        ~~~~~^
      /opt/homebrew/Cellar/abseil/20230802.1/include/absl/utility/utility.h:198:12: error: no member named 'in_place_index' in namespace 'std'
      using std::in_place_index;
            ~~~~~^
      /opt/homebrew/Cellar/abseil/20230802.1/include/absl/utility/utility.h:199:12: error: no member named 'in_place_index_t' in namespace 'std'
      using std::in_place_index_t;
            ~~~~~^
      In file included from /private/var/folders/m_/kzyr4q_11cl35ngrj77k28f00000gn/T/pip-install-frrhajaq/opus-fast-mosestokenizer_f3c2b876b3ef46da81e2b520befe57f0/src/Tokenizer.cpp:3:
      In file included from /private/var/folders/m_/kzyr4q_11cl35ngrj77k28f00000gn/T/pip-install-frrhajaq/opus-fast-mosestokenizer_f3c2b876b3ef46da81e2b520befe57f0/include/mosestokenizer/Tokenizer.h:14:
      In file included from /opt/homebrew/Cellar/re2/20230901/include/re2/re2.h:219:
      /opt/homebrew/Cellar/abseil/20230802.1/include/absl/types/optional.h:48:12: error: no member named 'optional' in namespace 'std'
      using std::optional;
            ~~~~~^
      /opt/homebrew/Cellar/abseil/20230802.1/include/absl/types/optional.h:49:12: error: no member named 'make_optional' in namespace 'std'
      using std::make_optional;
            ~~~~~^
      /opt/homebrew/Cellar/abseil/20230802.1/include/absl/types/optional.h:50:12: error: no member named 'nullopt_t' in namespace 'std'; did you mean 'nullptr_t'?
      using std::nullopt_t;
            ~~~~~^~~~~~~~~
                 nullptr_t

apparently, A: it doesn't provides wheel, it build on local; B: it doesn't work on my clang version, seems it using every old C++ version config.

from whisper_streaming.

lucasjinreal avatar lucasjinreal commented on June 17, 2024

@Gldkslfmsd unfortunately, I tested jetmervdl's fork, it doesn't able to build on macOS, many of its dependencies depends on tar.gz file, some of them can not build on macOS at all. Beside, this is a really a bad practically to include you dependencies as tar.gz (this is not a problem. but when glibc goes as tar.gz would cause many fundamental issue).

I still need help, none of opus-fastxxx got work on macOS actually.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

@lucasjinreal how about installing from wheel, not from pip install opus-fast-mosestokenizer? It means download the .whl file from Jelmer's link, and then follow this: https://www.educative.io/answers/how-to-install-a-python-package-with-a-whl-file

from whisper_streaming.

lucasjinreal avatar lucasjinreal commented on June 17, 2024

@Gldkslfmsd thanks for the info. But unfortunately am using M1 Mac book Pro. It doesn't provides such one wheel.

However, I am very wondering how does it alble to build it.... The opus-fast-mosestokenizer and fast-mosestokenizer code really like a mess to me.

Does there any more build friendly alternative lib to them? Actually, both opus-fast-mosestokenizer and fast-mosestokenizer seems not maintained any longer, opus-fast-mosestokenizer even just a fork with out issue feedback. Using them have very huge risk

from whisper_streaming.

jelmervdl avatar jelmervdl commented on June 17, 2024

Aah, arm, yes, I hadn't thought of that. Unfortunately there are no wheels for m1 yet. Those should definitely be added.

From what I understand, the plugin prefers to compile its own dependencies during build as opposed to using the shared libraries because often those don't allow static linking, which it needs for wheels.

The opus-fast-mosestokenizer repo is just a fork with the goal of fixing the installation issues encountered by the OPUS project and HPLT partners that uses them. I tried to get wheels built upstream, but the original fast-mosestokenizer repo seems to be unmaintained at the moment.

If you don't really need the speed, I'd say stick with something like sacremoses, which is better maintained and doesn't have all the dependencies.

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

OK, thanks, @jelmervdl .

Meanwhile, @lucasjinreal , you can try this: https://pypi.org/project/mosestokenizer/ It's runs perl script in the background process. Does it work on your MacOS? If yes, then we can integrate it to create_tokenizer function. Wrap it, to adapt the interface.

from whisper_streaming.

lucasjinreal avatar lucasjinreal commented on June 17, 2024

@Gldkslfmsd I can installed it: Successfully installed mosestokenizer-1.2.1 openfile-0.0.7 toolwrapper-2.1.0 uctools-1.3.0
Not sure can be called correctly or not. Have a fallback without building c++ libs is a good choice.

@jelmervdl Really appreciated if a setup.py which works for most distribution can be added. It useful turns of resolving building issues of opus-xxx libs. For me, the fast-xxx are really out-dated, but opus-xxx hard to build it doesn't support c++14 but google's absil needs it.

from whisper_streaming.

vincentwi avatar vincentwi commented on June 17, 2024

Hi, Mac M2 user here, following the thread.

Confirming mosestokenizer builds on arm wheels, and can thereafter be called: https://pastebin.com/YrTNA0jc

Similarly, sacremoses builds as well, but as @Gldkslfmsd mentioned earlier, it doesn't do sentence segmentation.

Also confirming opus-fast-mosestokenizer does not build for mac-arm. Full log here: https://pastebin.com/a10cW1jF

fast-mosestokenizer doesnt work either, but you seem to have already caught that: #5

ProductName: macOS ProductVersion: 13.6 BuildVersion: 22G120 Python 3.10.13 pip 23.3

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

@lucasjinreal Not sure can be called correctly or not.

Please, see the Sample Usage on https://pypi.org/project/mosestokenizer/ . Can you reproduce tha part with MosesSentenceSplitter in your Python? If yes, then it can be called correctly.

It seems that @vincentwi confirms it works on Mac. So now we need to integrate it . It's simple, in create_tokenizer function, but I'm sorry I'm busy in the following month.

Have a fallback without building c++ libs is a good choice.

agree

from whisper_streaming.

lucasjinreal avatar lucasjinreal commented on June 17, 2024

@Gldkslfmsd any examples to follow?

from whisper_streaming.

Gldkslfmsd avatar Gldkslfmsd commented on June 17, 2024

closing -- same reason as #23

from whisper_streaming.

kgrusha avatar kgrusha commented on June 17, 2024

Here's the solution so people can find it while the pull request to opus-fast-mosestokenizer is pending:

from whisper_streaming.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.