Comments (19)
(Strange though, I have a clear memory of getting GPU support out of a
tensorflow
PyPI release. But maybe that was in an Nvidia Docker image, or TF 2.)
Behaviour changed between releases, so that explains it:
(Left: October 2019, right: February 2020)
from ocrd_all.
- For tensorflow 1.15.*, one can simply depend on
tensorflow-gpu == 1.15.*
for CPU and GPU support. I am not aware of any issues usingtensorflow-gpu
's CPU fallback on CPUBut isn't that equally true for using
tensorflow == 1.15.*
? It is the variant with a-gpu
suffix that is going to be dropped eventually IIUC.
Nah, they had recommended tensorflow-gpu
for TF2 CPU+GPU but changed it again to just tensorflow
🤣 So if tensorflow == 1.15.*
has GPU support I am happy with that convention, too.
from ocrd_all.
1. https://github.com/OCR-D/ocrd_all#conflicting-requirements states that
Yes, that section needs to be updated (cf. #35). But the real problem is that TF2 dependencies are lurking everywhere, so we will very soon have the unacceptable state that no catch-all venv (satisfying both TF1 and TF2 modules) is possible anymore. By then, a new solution needs to be in place, which (at least partially) isolates venvs from each other again.
2\. For tensorflow 1.15.*, one can simply depend on `tensorflow-gpu == 1.15.*` _for CPU **and** GPU_ support. I am not aware of any issues using `tensorflow-gpu`'s CPU fallback on CPU
But isn't that equally true for using tensorflow == 1.15.*
? It is the variant with a -gpu
suffix that is going to be dropped eventually IIUC.
from ocrd_all.
Is there a chance to upgrade everything to Tensorflow 2?
from ocrd_all.
Is there a chance to upgrade everything to Tensorflow 2?
Code migration is not so difficult – yes, that could be streamlined in a coordinated PR effort. But IIRC the hard problem is that models will be incompatible and thus have to be retrained. This is something that the module providers have to decide on whether and when it is prudent themselves. And it's highly unlikely the time frames will converge.
from ocrd_all.
Of course there is a chance, it just involves quite a bit of work. For a maintained software like
ocrd_calamari:
- Training a new model for a week (done)
- Updating
- Testing
- Proper evaluation (no regression?)
This stuff is a. not super high on priority lists because of effort vs. benefit, b. takes time and c. sometimes depends on other software involved. ocrd_all will always have to deal with version conflicts.
And I imagine there are research projects that have no maintainance anymore or maybe just some poor PhD student with other priorities.
from ocrd_all.
But isn't that equally true for using
tensorflow == 1.15.*
?
I do not get GPU support with that, only CPU. With tensorflow-gpu == 1.15.*
I have no issues. But I'll try again after lunch, to make sure.
from ocrd_all.
But IIRC the hard problem is that models will be incompatible and thus have to be retrained.
Maybe existing models can be converted, too?
from ocrd_all.
But IIRC the hard problem is that models will be incompatible and thus have to be retrained.
Maybe existing models can be converted, too?
In some cases this is possible. But not for e.g. Calamari 0.3.5 → 1.0, unless they support it.
from ocrd_all.
But isn't that equally true for using
tensorflow == 1.15.*
?I do not get GPU support with that, only CPU. With
tensorflow-gpu == 1.15.*
I have no issues. But I'll try again after lunch, to make sure.
Alright, these are my results using the below script:
== tensorflow==1.15.*, CUDA_VISIBLE_DEVICES='0'
Already using interpreter /usr/bin/python3
2020-02-25 17:21:35.205395: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:21:35.220274: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:21:35.220640: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564f5da0e220 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:21:35.220655: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
GPU available: False
== tensorflow==1.15.*, CUDA_VISIBLE_DEVICES=''
Already using interpreter /usr/bin/python3
2020-02-25 17:21:55.577941: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:21:55.593243: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:21:55.593497: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5594505bb720 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:21:55.593532: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
GPU available: False
== tensorflow-gpu==1.15.*, CUDA_VISIBLE_DEVICES='0'
Already using interpreter /usr/bin/python3
2020-02-25 17:22:27.264675: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:22:27.281148: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:22:27.281383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f6815f70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:27.281398: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-02-25 17:22:27.282909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 17:22:27.424313: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f68a56b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:27.424336: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-02-25 17:22:27.424711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2020-02-25 17:22:27.424872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-25 17:22:27.425769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-02-25 17:22:27.426610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-02-25 17:22:27.426867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-02-25 17:22:27.428707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-02-25 17:22:27.430106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-02-25 17:22:27.433060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-25 17:22:27.433717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-02-25 17:22:27.433752: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-25 17:22:27.434268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-25 17:22:27.434279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0
2020-02-25 17:22:27.434284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N
2020-02-25 17:22:27.434897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 6786 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
GPU available: True
== tensorflow-gpu==1.15.*, CUDA_VISIBLE_DEVICES=''
Already using interpreter /usr/bin/python3
2020-02-25 17:22:58.971329: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:22:58.987226: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:22:58.987497: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558cc0be40d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:58.987526: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-02-25 17:22:58.989005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 17:22:58.992375: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-02-25 17:22:58.992396: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: b-pc30533
2020-02-25 17:22:58.992402: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: b-pc30533
2020-02-25 17:22:58.992431: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.59.0
2020-02-25 17:22:58.992449: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.59.0
2020-02-25 17:22:58.992455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 440.59.0
GPU available: False
Script:
#!/bin/sh
for package in "tensorflow==1.15.*" "tensorflow-gpu==1.15.*"; do
for CUDA_VISIBLE_DEVICES in "0" ""; do
echo "== $package, CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES'"
export CUDA_VISIBLE_DEVICES
venv=/tmp/tmp.$RANDOM
virtualenv --quiet -p /usr/bin/python3 $venv
. $venv/bin/activate
pip3 install --quiet --upgrade pip
pip3 install --quiet "$package"
python3 -c 'import tensorflow as tf; print("GPU available:", tf.test.is_gpu_available())'
done
done
from ocrd_all.
So, tensorflow-gpu==1.15.*
is the right choice for TF1, it gives GPU and CPU support. (The script does not check for CPU support, I know that -gpu
works for CPU too)
from ocrd_all.
So,
tensorflow-gpu==1.15.*
is the right choice for TF1, it gives GPU and CPU support. (The script does not check for CPU support, I know that-gpu
works for CPU too)
Indeed! We should open issues/PRs to all directly or indirectly affected module repos.
(Strange though, I have a clear memory of getting GPU support out of a tensorflow
PyPI release. But maybe that was in an Nvidia Docker image, or TF 2.)
from ocrd_all.
With tensorflow-gpu == 1.15.* I have no issues.
Bad news: With tensorflow-gpu==1.15.*
I have issues because it does not work on macOS. tensorflow==1.15.*
works fine there.
from ocrd_all.
With tensorflow-gpu == 1.15.* I have no issues.
Bad news: With
tensorflow-gpu==1.15.*
I have issues because it does not work on macOS.tensorflow==1.15.*
works fine there.
These TF devs keep driving me mad. I thought we had this solved by now.
Okay, can you re-label the prebuilt tensorflow
as tensorflow-gpu
somehow?
Or should we build our own TF wheels under the correct name for macOS and include them in the supply chain?
from ocrd_all.
Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?
Yes, that is possible. Of course there remains the conflict between TF1 and TF2, so the resulting installation won't work.
from ocrd_all.
Building TF is a nightmare. It takes days for ARM, and I expect many hours for macOS.
from ocrd_all.
Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?
Yes, that is possible. Of course there remains the conflict between TF1 and TF2, so the resulting installation won't work.
I don't think this is the right approach. First of all, you don't discriminate the version you are delegating to. And second, this requires to install tensorflow
from the same base version (which yes, then makes it impossible to have both TF1 and TF2 installed at the same time).
I was thinking along the lines of modifying the name in the official wheel.
from ocrd_all.
Building TF is a nightmare. It takes days for ARM, and I expect many hours for macOS.
I know. And it never quite works out of the box as documented (at least for me). Too fast to die, too slow to live.
But building from scratch trivially gives you whatever package name you want. (So we could have tensorflow
for TF2 and tensorflow-gpu
for TF1 – even if it does not have actual GPU support on macOS.) But I am still more inclined to the wheel patching approach.
@kba your thoughts?
from ocrd_all.
So, except for ARM and macOS and Python 3.8 support (it just keeps growing) – which we should probably discuss in #147 – I think this has been solved by #118. @mikegerber can we close?
from ocrd_all.
Related Issues (20)
- /models not working HOT 6
- Provide date-based alias for maximum-git
- frak models in ocrd resmgr HOT 27
- empty OCR HOT 13
- model download in Docker only allowed for root HOT 4
- no word coordinates? HOT 6
- Docker: interference with older versions of core HOT 3
- Docker: build CD images sequentially HOT 1
- Use annotated tags for new releases
- `make check` fails with latest code because of missing ocrd-tesserocr-binarize HOT 5
- 2nd build stops waiting for user input HOT 1
- ocrd-import does not work HOT 4
- Docker: Logfile permissions problems HOT 3
- `make check` fails since January
- `make all` fails for Python 3.7 HOT 4
- Docker: build multi-architecture images HOT 3
- qurator namespace pkg problems are back HOT 4
- Broken build (ocrd_detectron2) HOT 2
- Broken builds on Ubuntu 20.04 HOT 3
- Check core submodule version is consistent with base image core version
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ocrd_all.