Coder Social home page Coder Social logo

Tensorflow CPU vs GPU about ocrd_all HOT 19 CLOSED

ocr-d avatar ocr-d commented on September 27, 2024
Tensorflow CPU vs GPU

from ocrd_all.

Comments (19)

mikegerber avatar mikegerber commented on September 27, 2024 2

(Strange though, I have a clear memory of getting GPU support out of a tensorflow PyPI release. But maybe that was in an Nvidia Docker image, or TF 2.)

Behaviour changed between releases, so that explains it:

https://web.archive.org/web/diff/20191015141958/20191208214348/https://www.tensorflow.org/install/pip

image
(Left: October 2019, right: February 2020)

from ocrd_all.

mikegerber avatar mikegerber commented on September 27, 2024 1
  1. For tensorflow 1.15.*, one can simply depend on tensorflow-gpu == 1.15.* for CPU and GPU support. I am not aware of any issues using tensorflow-gpu's CPU fallback on CPU

But isn't that equally true for using tensorflow == 1.15.*? It is the variant with a -gpu suffix that is going to be dropped eventually IIUC.

Nah, they had recommended tensorflow-gpu for TF2 CPU+GPU but changed it again to just tensorflow 🤣 So if tensorflow == 1.15.* has GPU support I am happy with that convention, too.

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024
1. https://github.com/OCR-D/ocrd_all#conflicting-requirements states that

Yes, that section needs to be updated (cf. #35). But the real problem is that TF2 dependencies are lurking everywhere, so we will very soon have the unacceptable state that no catch-all venv (satisfying both TF1 and TF2 modules) is possible anymore. By then, a new solution needs to be in place, which (at least partially) isolates venvs from each other again.

2\. For tensorflow 1.15.*, one can simply depend on `tensorflow-gpu == 1.15.*` _for CPU **and** GPU_ support. I am not aware of any issues using `tensorflow-gpu`'s CPU fallback on CPU

But isn't that equally true for using tensorflow == 1.15.*? It is the variant with a -gpu suffix that is going to be dropped eventually IIUC.

from ocrd_all.

stweil avatar stweil commented on September 27, 2024

Is there a chance to upgrade everything to Tensorflow 2?

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024

Is there a chance to upgrade everything to Tensorflow 2?

Code migration is not so difficult – yes, that could be streamlined in a coordinated PR effort. But IIRC the hard problem is that models will be incompatible and thus have to be retrained. This is something that the module providers have to decide on whether and when it is prudent themselves. And it's highly unlikely the time frames will converge.

from ocrd_all.

mikegerber avatar mikegerber commented on September 27, 2024

Of course there is a chance, it just involves quite a bit of work. For a maintained software like
ocrd_calamari:

  • Training a new model for a week (done)
  • Updating
  • Testing
  • Proper evaluation (no regression?)

This stuff is a. not super high on priority lists because of effort vs. benefit, b. takes time and c. sometimes depends on other software involved. ocrd_all will always have to deal with version conflicts.

And I imagine there are research projects that have no maintainance anymore or maybe just some poor PhD student with other priorities.

from ocrd_all.

mikegerber avatar mikegerber commented on September 27, 2024

But isn't that equally true for using tensorflow == 1.15.*?

I do not get GPU support with that, only CPU. With tensorflow-gpu == 1.15.* I have no issues. But I'll try again after lunch, to make sure.

from ocrd_all.

stweil avatar stweil commented on September 27, 2024

But IIRC the hard problem is that models will be incompatible and thus have to be retrained.

Maybe existing models can be converted, too?

from ocrd_all.

mikegerber avatar mikegerber commented on September 27, 2024

But IIRC the hard problem is that models will be incompatible and thus have to be retrained.
Maybe existing models can be converted, too?

In some cases this is possible. But not for e.g. Calamari 0.3.5 → 1.0, unless they support it.

from ocrd_all.

mikegerber avatar mikegerber commented on September 27, 2024

But isn't that equally true for using tensorflow == 1.15.*?

I do not get GPU support with that, only CPU. With tensorflow-gpu == 1.15.* I have no issues. But I'll try again after lunch, to make sure.

Alright, these are my results using the below script:

== tensorflow==1.15.*, CUDA_VISIBLE_DEVICES='0'
Already using interpreter /usr/bin/python3
2020-02-25 17:21:35.205395: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:21:35.220274: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:21:35.220640: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564f5da0e220 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:21:35.220655: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
GPU available: False
== tensorflow==1.15.*, CUDA_VISIBLE_DEVICES=''
Already using interpreter /usr/bin/python3
2020-02-25 17:21:55.577941: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:21:55.593243: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:21:55.593497: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5594505bb720 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:21:55.593532: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
GPU available: False
== tensorflow-gpu==1.15.*, CUDA_VISIBLE_DEVICES='0'
Already using interpreter /usr/bin/python3
2020-02-25 17:22:27.264675: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:22:27.281148: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:22:27.281383: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f6815f70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:27.281398: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-25 17:22:27.282909: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 17:22:27.424313: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b5f68a56b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:27.424336: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
2020-02-25 17:22:27.424711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2020-02-25 17:22:27.424872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-25 17:22:27.425769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-02-25 17:22:27.426610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-02-25 17:22:27.426867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-02-25 17:22:27.428707: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-02-25 17:22:27.430106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-02-25 17:22:27.433060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-25 17:22:27.433717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-02-25 17:22:27.433752: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-02-25 17:22:27.434268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-25 17:22:27.434279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-02-25 17:22:27.434284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-02-25 17:22:27.434897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 6786 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
GPU available: True
== tensorflow-gpu==1.15.*, CUDA_VISIBLE_DEVICES=''
Already using interpreter /usr/bin/python3
2020-02-25 17:22:58.971329: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-25 17:22:58.987226: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2799925000 Hz
2020-02-25 17:22:58.987497: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558cc0be40d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-25 17:22:58.987526: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-25 17:22:58.989005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-25 17:22:58.992375: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-02-25 17:22:58.992396: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: b-pc30533
2020-02-25 17:22:58.992402: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: b-pc30533
2020-02-25 17:22:58.992431: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.59.0
2020-02-25 17:22:58.992449: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.59.0
2020-02-25 17:22:58.992455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 440.59.0
GPU available: False

Script:

#!/bin/sh
for package in "tensorflow==1.15.*" "tensorflow-gpu==1.15.*"; do
  for CUDA_VISIBLE_DEVICES in "0" ""; do

    echo "== $package, CUDA_VISIBLE_DEVICES='$CUDA_VISIBLE_DEVICES'"

    export CUDA_VISIBLE_DEVICES

    venv=/tmp/tmp.$RANDOM
    virtualenv --quiet -p /usr/bin/python3 $venv
    . $venv/bin/activate

    pip3 install --quiet --upgrade pip
    pip3 install --quiet "$package"

    python3 -c 'import tensorflow as tf; print("GPU available:", tf.test.is_gpu_available())'

  done
done

from ocrd_all.

mikegerber avatar mikegerber commented on September 27, 2024

So, tensorflow-gpu==1.15.* is the right choice for TF1, it gives GPU and CPU support. (The script does not check for CPU support, I know that -gpu works for CPU too)

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024

So, tensorflow-gpu==1.15.* is the right choice for TF1, it gives GPU and CPU support. (The script does not check for CPU support, I know that -gpu works for CPU too)

Indeed! We should open issues/PRs to all directly or indirectly affected module repos.

(Strange though, I have a clear memory of getting GPU support out of a tensorflow PyPI release. But maybe that was in an Nvidia Docker image, or TF 2.)

from ocrd_all.

stweil avatar stweil commented on September 27, 2024

With tensorflow-gpu == 1.15.* I have no issues.

Bad news: With tensorflow-gpu==1.15.* I have issues because it does not work on macOS. tensorflow==1.15.* works fine there.

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024

With tensorflow-gpu == 1.15.* I have no issues.

Bad news: With tensorflow-gpu==1.15.* I have issues because it does not work on macOS. tensorflow==1.15.* works fine there.

These TF devs keep driving me mad. I thought we had this solved by now.

Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?
Or should we build our own TF wheels under the correct name for macOS and include them in the supply chain?

from ocrd_all.

stweil avatar stweil commented on September 27, 2024

Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?

Yes, that is possible. Of course there remains the conflict between TF1 and TF2, so the resulting installation won't work.

from ocrd_all.

stweil avatar stweil commented on September 27, 2024

Building TF is a nightmare. It takes days for ARM, and I expect many hours for macOS.

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024

Okay, can you re-label the prebuilt tensorflow as tensorflow-gpu somehow?

Yes, that is possible. Of course there remains the conflict between TF1 and TF2, so the resulting installation won't work.

I don't think this is the right approach. First of all, you don't discriminate the version you are delegating to. And second, this requires to install tensorflow from the same base version (which yes, then makes it impossible to have both TF1 and TF2 installed at the same time).

I was thinking along the lines of modifying the name in the official wheel.

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024

Building TF is a nightmare. It takes days for ARM, and I expect many hours for macOS.

I know. And it never quite works out of the box as documented (at least for me). Too fast to die, too slow to live.

But building from scratch trivially gives you whatever package name you want. (So we could have tensorflow for TF2 and tensorflow-gpu for TF1 – even if it does not have actual GPU support on macOS.) But I am still more inclined to the wheel patching approach.

@kba your thoughts?

from ocrd_all.

bertsky avatar bertsky commented on September 27, 2024

So, except for ARM and macOS and Python 3.8 support (it just keeps growing) – which we should probably discuss in #147 – I think this has been solved by #118. @mikegerber can we close?

from ocrd_all.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.