Tensorflow CPU vs GPU,about ocr-d/ocrd_all

Comments (19)

mikegerber commented on September 27, 2024 2

(Strange though, I have a clear memory of getting GPU support out of a tensorflow PyPI release. But maybe that was in an Nvidia Docker image, or TF 2.)

Behaviour changed between releases, so that explains it:

https://web.archive.org/web/diff/20191015141958/20191208214348/https://www.tensorflow.org/install/pip

(Left: October 2019, right: February 2020)

from ocrd_all.

mikegerber commented on September 27, 2024 1

For tensorflow 1.15.*, one can simply depend on tensorflow-gpu == 1.15.* for CPU and GPU support. I am not aware of any issues using tensorflow-gpu's CPU fallback on CPU

But isn't that equally true for using tensorflow == 1.15.*? It is the variant with a -gpu suffix that is going to be dropped eventually IIUC.

Nah, they had recommended tensorflow-gpu for TF2 CPU+GPU but changed it again to just tensorflow 🤣 So if tensorflow == 1.15.* has GPU support I am happy with that convention, too.

from ocrd_all.

bertsky commented on September 27, 2024

1. https://github.com/OCR-D/ocrd_all#conflicting-requirements states that

Yes, that section needs to be updated (cf. #35). But the real problem is that TF2 dependencies are lurking everywhere, so we will very soon have the unacceptable state that no catch-all venv (satisfying both TF1 and TF2 modules) is possible anymore. By then, a new solution needs to be in place, which (at least partially) isolates venvs from each other again.

2\. For tensorflow 1.15.*, one can simply depend on `tensorflow-gpu == 1.15.*` _for CPU **and** GPU_ support. I am not aware of any issues using `tensorflow-gpu`'s CPU fallback on CPU

But isn't that equally true for using tensorflow == 1.15.*? It is the variant with a -gpu suffix that is going to be dropped eventually IIUC.

from ocrd_all.

stweil commented on September 27, 2024

Is there a chance to upgrade everything to Tensorflow 2?

from ocrd_all.

bertsky commented on September 27, 2024

Is there a chance to upgrade everything to Tensorflow 2?

Code migration is not so difficult – yes, that could be streamlined in a coordinated PR effort. But IIRC the hard problem is that models will be incompatible and thus have to be retrained. This is something that the module providers have to decide on whether and when it is prudent themselves. And it's highly unlikely the time frames will converge.

from ocrd_all.

mikegerber commented on September 27, 2024

Of course there is a chance, it just involves quite a bit of work. For a maintained software like
ocrd_calamari:

Training a new model for a week (done)
Updating
Testing
Proper evaluation (no regression?)

This stuff is a. not super high on priority lists because of effort vs. benefit, b. takes time and c. sometimes depends on other software involved. ocrd_all will always have to deal with version conflicts.

And I imagine there are research projects that have no maintainance anymore or maybe just some poor PhD student with other priorities.

from ocrd_all.

mikegerber commented on September 27, 2024

But isn't that equally true for using tensorflow == 1.15.*?

I do not get GPU support with that, only CPU. With tensorflow-gpu == 1.15.* I have no issues. But I'll try again after lunch, to make sure.

from ocrd_all.

stweil commented on September 27, 2024

But IIRC the hard problem is that models will be incompatible and thus have to be retrained.

Maybe existing models can be converted, too?

from ocrd_all.

mikegerber commented on September 27, 2024

But IIRC the hard problem is that models will be incompatible and thus have to be retrained.
Maybe existing models can be converted, too?

In some cases this is possible. But not for e.g. Calamari 0.3.5 → 1.0, unless they support it.

from ocrd_all.

mikegerber commented on September 27, 2024

But isn't that equally true for using tensorflow == 1.15.*?

I do not get GPU support with that, only CPU. With tensorflow-gpu == 1.15.* I have no issues. But I'll try again after lunch, to make sure.

Alright, these are my results using the below script:

== tensorflow==1.15.*, CUDA_VISIBLE_DEVICES='0'
Already using interpreter /usr/bin/python3
2020-02-25 17:21:35.205395: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:21:35.220274: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:21:35.220640: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564f5da0e220 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:21:35.220655: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
GPU available: False
== tensorflow==1.15.*, CUDA_VISIBLE_DEVICES=''
Already using interpreter /usr/bin/python3
2020-02-25 17:21:55.577941: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:21:55.593243: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:21:55.593497: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5594505bb720 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:21:55.593532: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
GPU available: False
== tensorflow-gpu==1.15.*, CUDA_VISIBLE_DEVICES='0'
Already using interpreter /usr/bin/python3
2020-02-25 17:22:27.264675: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:22:27.281148: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:22:27.281383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f6815f70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:27.281398: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-25 17:22:27.282909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 17:22:27.424313: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f68a56b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:27.424336: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-02-25 17:22:27.424711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2020-02-25 17:22:27.424872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-25 17:22:27.425769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-02-25 17:22:27.426610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-02-25 17:22:27.426867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-02-25 17:22:27.428707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-02-25 17:22:27.430106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-02-25 17:22:27.433060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-25 17:22:27.433717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-02-25 17:22:27.433752: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-25 17:22:27.434268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-25 17:22:27.434279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-02-25 17:22:27.434284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-02-25 17:22:27.434897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 6786 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
GPU available: True
== tensorflow-gpu==1.15.*, CUDA_VISIBLE_DEVICES=''
Already using interpreter /usr/bin/python3
2020-02-25 17:22:58.971329: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:22:58.987226: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:22:58.987497: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558cc0be40d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:58.987526: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-25 17:22:58.989005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 17:22:58.992375: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-02-25 17:22:58.992396: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: b-pc30533
2020-02-25 17:22:58.992402: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: b-pc30533
2020-02-25 17:22:58.992431: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.59.0
2020-02-25 17:22:58.992449: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.59.0
2020-02-25 17:22:58.992455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 440.59.0
GPU available: False

Script:

#!/bin/sh
for package in "tensorflow==1.15.*" "tensorflow-gpu==1.15.*"; do
  for CUDA_VISIBLE_DEVICES in "0" ""; do

    echo "== $package, CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES'"

    export CUDA_VISIBLE_DEVICES

    venv=/tmp/tmp.$RANDOM
    virtualenv --quiet -p /usr/bin/python3 $venv
    . $venv/bin/activate

    pip3 install --quiet --upgrade pip
    pip3 install --quiet "$package"

    python3 -c 'import tensorflow as tf; print("GPU available:", tf.test.is_gpu_available())'

  done
done

from ocrd_all.

mikegerber commented on September 27, 2024

So, tensorflow-gpu==1.15.* is the right choice for TF1, it gives GPU and CPU support. (The script does not check for CPU support, I know that -gpu works for CPU too)

from ocrd_all.

bertsky commented on September 27, 2024

So, tensorflow-gpu==1.15.* is the right choice for TF1, it gives GPU and CPU support. (The script does not check for CPU support, I know that -gpu works for CPU too)

Indeed! We should open issues/PRs to all directly or indirectly affected module repos.

(Strange though, I have a clear memory of getting GPU support out of a tensorflow PyPI release. But maybe that was in an Nvidia Docker image, or TF 2.)

from ocrd_all.

stweil commented on September 27, 2024

With tensorflow-gpu == 1.15.* I have no issues.

Bad news: With tensorflow-gpu==1.15.* I have issues because it does not work on macOS. tensorflow==1.15.* works fine there.

from ocrd_all.

bertsky commented on September 27, 2024

With tensorflow-gpu == 1.15.* I have no issues.

Bad news: With tensorflow-gpu==1.15.* I have issues because it does not work on macOS. tensorflow==1.15.* works fine there.

These TF devs keep driving me mad. I thought we had this solved by now.

Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?
Or should we build our own TF wheels under the correct name for macOS and include them in the supply chain?

from ocrd_all.

stweil commented on September 27, 2024

Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?

Yes, that is possible. Of course there remains the conflict between TF1 and TF2, so the resulting installation won't work.

from ocrd_all.

stweil commented on September 27, 2024

Building TF is a nightmare. It takes days for ARM, and I expect many hours for macOS.

from ocrd_all.

bertsky commented on September 27, 2024

Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?

Yes, that is possible. Of course there remains the conflict between TF1 and TF2, so the resulting installation won't work.

I don't think this is the right approach. First of all, you don't discriminate the version you are delegating to. And second, this requires to install tensorflow from the same base version (which yes, then makes it impossible to have both TF1 and TF2 installed at the same time).

I was thinking along the lines of modifying the name in the official wheel.

from ocrd_all.

bertsky commented on September 27, 2024

Building TF is a nightmare. It takes days for ARM, and I expect many hours for macOS.

I know. And it never quite works out of the box as documented (at least for me). Too fast to die, too slow to live.

But building from scratch trivially gives you whatever package name you want. (So we could have tensorflow for TF2 and tensorflow-gpu for TF1 – even if it does not have actual GPU support on macOS.) But I am still more inclined to the wheel patching approach.

@kba your thoughts?

from ocrd_all.

bertsky commented on September 27, 2024

So, except for ARM and macOS and Python 3.8 support (it just keeps growing) – which we should probably discuss in #147 – I think this has been solved by #118. @mikegerber can we close?

from ocrd_all.

Tensorflow CPU vs GPU about ocrd_all HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent