Crash when calling MacenkoNormalizer.fit with tensorflow backend about torchstain HOT 12 OPEN

eidoslab commented on August 18, 2024 1

Crash when calling MacenkoNormalizer.fit with tensorflow backend

from torchstain.

Comments (12)

andreped commented on August 18, 2024

I am sorry if my question is trivial but I have trouble using this package with the tensorflow backend.

Hello, @bertrandchauveau! I had this issue when making this myself, so no worries :]

You can take a look at what is done in the tests here.

Basically, do this instead:

import tensorflow as tf
import torchstain
import numpy as np

T = lambda x: tf.convert_to_tensor(np.moveaxis(x, -1, 0).astype("float32"))
t_to_transform = T(to_transform)

normalizer = torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')
normalizer.fit(T(target))
result, _, _ = normalizer.normalize(I=t_to_transform, stains=True)

result = result.numpy().astype("float32")

Could you try this first to see if it resolves you issue? I'm a bit occupied right now, but could take a new look tomorrow, if you are still having issues.

This will be better documented in the upcoming release, which includes some new and interested stain normalization techniques and new backends (see here).

BTW: What is the status on the release, @carloalbertobarbano? Shall we aim to get it released by next week? I have a master student who would be interested in the new modified reinhard implementation.

from torchstain.

bertrandchauveau commented on August 18, 2024

Thank you for your quick response!

Sadly the same problem occurs, i.e. crashes when running:

normalizer.fit(T(target))

the "T" conversion does the same as my attempt of tf tensor conversion

from torchstain.

andreped commented on August 18, 2024

Sadly the same problem occurs, i.e. crashes when running:

Hmm, well, what I described above is what we do in the unit test, so that should work. Could you show me the error log from the terminal?

Also, could you try downloading the test data that we used for the unit tests here and here, and try running them through your code. I believe that should work. If that works, then the intensity range of your image after imread is in the wrong range. You can see the intensity range by running print(np.unique(image))

Also, I noticed that you were a pathologists. If you just want to get a method working, I would recommend trying the command line tool fast-stain-normalization that is based on torchstain. It enables you to normalize an entire folder without needing to code. Just provide arguments to a CLI and run it from the terminal. You can see how to use it here.

from torchstain.

bertrandchauveau commented on August 18, 2024

I had the same issue with the test images that you provided.

This is the error message from the terminal:

2023-02-26 16:20:02.217992: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-02-26 16:20:02.225717: I tensorflow/core/util/cuda_solvers.cc:179] Creating GpuSolver handles for stream 000001CE08BFD700
2023-02-26 16:20:03.039762: F tensorflow/core/util/cuda_solvers.cc:114] Check failed: cusolverDnCreate(&cusolver_dn_handle) == CUSOLVER_STATUS_SUCCESS Failed to create cuSolverDN instance.
[I 16:20:28.009 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports

I tried this kind of things from what I saw from stackoverflow, but the kernel still crashes:

gpu = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(device=gpu[0], enable=True)

As I understand it, Tensorflow tries to place the tensors on the GPU, but for whatever reason, it does not work (as you said, I'm a pathologist.) For note, I have an RTX 4090 in a Windows setup and I have not encountered similar issues when tranining deep learning models.

So by forcing Tensorflow to use the CPU with:

with tf.device('/CPU:0'):
    tf_normalizer.fit(T(target))
    result_tf, _, _ = tf_normalizer.normalize(I=t_to_transform, stains=True)

It works as intended.

Should it also work with the GPU?

from torchstain.

andreped commented on August 18, 2024

I was unable to reproduce your issue. See gist.
As you can see from the gist, it works just fine with GPU, also for TF backend.

What you are observing I'm guessing is likely related to the TensorFloat-32 message your are seeing, which I have not seen before. This likely happens because you have a very new GPU, 4090, which I would think might produce some issues.

First I would try disabling TensorFloats, by adding this to the top of your script (after tf import): tf.enable_tensor_float_32_execution(False)

If that did not fix the issue, try installing the nightly release of TF to see if this has been fixed recently:

pip uninstall tensorflow && pip install tf-nightly

from torchstain.

bertrandchauveau commented on August 18, 2024

Thank you for your response. Agree that it works nicely in colab.

On my local machine, I disabled TensorFloat-32 with :
tf.config.experimental.enable_tensor_float_32_execution(False)

But the kernel still crashes when fitting the normalizer.
Upgrading tensorflow won’t be as simple as that since I am currently running on native Windows and tf_2.10.0 was the last version that allowed this according to the tf documentation. Upgrading would require to use WSL2, but I am not ready for this right now.

My initial idea (perhaps not a good one) for my project was to use torchstain to normalize images on the fly using a custom data generator, this to avoid the duplication of the dataset (normalized and non-normalized).

For now, I will duplicate my dataset, as relying on the CPU for normalization slows down the batch preparation pretty much. I’ll give it a try when I’m ready to upgrade tensorflow or will try with pytorch which seems less windows-phobic.

from torchstain.

carloalbertobarbano commented on August 18, 2024

Hi @bertrandchauveau, what version of CUDA and cuDNN are you using?

from torchstain.

andreped commented on August 18, 2024

My initial idea (perhaps not a good one) for my project was to use torchstain to normalize images on the fly using a custom data generator, this to avoid the duplication of the dataset (normalized and non-normalized).

That's exactly what I do in my training frameworks and that works just fine. As long as you are using tf.data.Dataset and take advantage of multithreading, it is barely any lag :] But I guess it depends on how much lag you expect and can tolerate, how large the images are, which CPU and SSD/HDD you have, and whatnot.

I don't really work on windows for training models anymore. Note that multithreading does not work as well on windows, as for UNIX-based systems.

Hi @bertrandchauveau, what version of CUDA and cuDNN are you using?

I guess as you seem to be using anaconda, you have installed CUDA through something like this. As I said, I don't have that much experience with conda, as I don't use it myself, but I guess @carloalbertobarbano can help you on that.

from torchstain.

bertrandchauveau commented on August 18, 2024

Hi @carloalbertobarbano,
cudatoolkit 11.2.2
cudnn 8.1.0.77
Exactly, installed via conda

from torchstain.

andreped commented on August 18, 2024

@bertrandchauveau Are you still experiencing issues?

from torchstain.

bertrandchauveau commented on August 18, 2024

Hi,
Thank you for your message and sorry for my late reply. Since my last message:

I installed torchstain 1.3.0
kernel still crashes when using the Macenko approach, in fact now when calling:
torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')
Same error message as before.
With using modified Reinhard method on a single image, sometimes it worked with the GPU, sometimes it crashed. I did not have time to explore this more.

It works when I force torchstain to work on the CPU. With tf.data.Dataset, it is true that there is not much lag during pure training (about +10% for me as compared to no stain normalization) but the validation step after each training epoch is much longer.

As you suggested it, I tried to install the last tf.2.12 on WSL, but failed for now with it seems endless error messages for tf to simply work and recognize the GPU...

I should have a bit more time this week to see why sometimes it seems to work with the modified Reinhard method.

from torchstain.

andreped commented on August 18, 2024

As you suggested it, I tried to install the last tf.2.12 on WSL, but failed for now with it seems endless error messages for tf to simply work and recognize the GPU...

AFAIK, there does not yet exist a precompiled binary of tf 2.12 on windows, so I believe that might result in some issues. But if you are using WSL it should work better. You could post the error messages you are getting and I could try to debug it for you. Note that I believe you need a nightly release, as the GPU you have might be too new, as discussed above.

I should have a bit more time this week to see why sometimes it seems to work with the modified Reinhard method.

Why it sometimes works and sometimes fails does not make much sense to me. Have you tried not using Anaconda and just regular Python virtual environments? You will need to setup CUDA yourself then.

from torchstain.

Crash when calling MacenkoNormalizer.fit with tensorflow backend about torchstain HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent