Comments (18)
Adding here that we never faced this in original cufinufft. So i am feeling this is a more specific issue with the wrapper
from tensorflow-nufft.
I have never observed this behaviour personally, although it's not the first report I have had of runs occasionally going wrong. Unfortunately I have not had the time to investigate that yet. It may be hardware-specific or problem-specific. Could you share some code (as simple as possible) that causes the issue, even if only occasionally?
Do you know if it's possible to run into this issue with eager execution, or does it only happen in graph execution?
As for debugging yourself, I use a VS code container for development, which has all the necessary dependencies to compile and run the op. It might be a good starting point where you can configure any debugger you would like to use.
from tensorflow-nufft.
It occurs both in eager mode and graph execution. I checked that.
I will try to setup a simple repro for you.
I will use the container for some debug
from tensorflow-nufft.
Great, thank you! I'll be very interested to hear about anything you can find out. I'll also do my best to have a look as soon as possible.
from tensorflow-nufft.
Adding additional details as i observe them
This issue occurs only when (take it with a apinch of salt as I really cant be sure given how random it is, i just run 100 iterations to be sure for now)
- Gradients Tape is enabled
- When we have multiscale ssim loss on the output.
- It crops up even when optimizer is not used to update trajectory
I am currently still setting up a minimum repro, sadly, the code seems to work fine in repro, but I think its a matter on closing in on where the issue is being exactly triggered.
from tensorflow-nufft.
I finally have a working reproducable example / test. Although, this works after running the code for many iterations.
Here is a simple boiled down case of a model which tries to learn a k-space sampling pattern for simple density compensated adjoint (here is the paper describing parts of it) :
import tensorflow as tf
import numpy as np
import tensorflow_nufft as tfft
import tensorflow_mri as tfmri
# We assume no gradients with respect to density compensation
def nograd_density_estimator(traj, img_size, method, max_iter):
@tf.custom_gradient
def get_density(traj):
out = tf.math.reciprocal_no_nan(tfmri.sampling.estimate_density(traj, img_size, method, max_iter))
# Get approx scaling by take adj_op(op(1))
test_im = tf.ones(img_size, dtype=tf.complex64)
test_im_recon = tfft.nufft(
tf.cast(out, tf.complex64) * tfft.nufft(
test_im,
traj,
transform_type='type_2',
fft_direction='forward'
),
traj,
grid_shape=img_size,
transform_type='type_1',
fft_direction='backward'
)
ratio = tf.reduce_mean(tf.math.abs(test_im_recon))
out = out / tf.cast(ratio, out.dtype)
def grad(dy):
return None
return out, grad
return tf.stop_gradient(get_density(tf.transpose(traj)))
# Simple model to learn trajectory on Density compensated adjoint
class TrajModel(tf.keras.Model):
def __init__(self, shape=(2, 32*512), img_size=(320, 320)):
super(TrajModel, self).__init__()
self.shape = shape
# Radial initialization of trajectory
self.init = tf.transpose(tf.reshape(tfmri.radial_trajectory(base_resolution=512/2, views=32), (-1, 2)))
self.traj = self.add_weight(shape=self.shape, dtype=tf.float32)
self.img_size = img_size
self.traj.assign(self.init)
def call(self, image):
density_comp = nograd_density_estimator(self.traj, self.img_size, method='pipe', max_iter=10)
kspace = tfft.nufft(image, tf.transpose(self.traj), transform_type='type_2', fft_direction='forward')
kspace = kspace * tf.cast(density_comp, kspace.dtype)
recon = tfft.nufft(kspace, tf.transpose(self.traj), grid_shape=image.shape[-2:], transform_type='type_1', fft_direction='backward')
return tf.abs(recon[..., None])
#tf.config.run_functions_eagerly(True)
my_model = TrajModel()
np.random.seed(0)
I = np.random.random((1, 1, 320, 320)) + 1j * np.random.random((1, 1, 320, 320))
I = I.astype('complex64')
opt = tf.keras.optimizers.Adam(learning_rate=1e-3)
for i in range(1000):
print(i)
with tf.GradientTape(persistent=True) as tape:
recon = my_model(I)
# loss = tf.reduce_mean(tf.abs(recon -tf.abs(I)))
loss = tf.reduce_mean(1 - tf.image.ssim_multiscale(tf.abs(I[..., None]), recon, max_val=tf.reduce_max(tf.abs(I))))
print("Loss : " + str(loss))
print("Max value of Recon : " + str(np.max(recon)))
grads = tape.gradient(loss, my_model.trainable_variables)
print(tf.math.reduce_max(grads))
The same code seems to work fine (again pinch of salt, in my runs :P ) for L1 loss (if you comment the mSSIM loss and enable just the line above).
All the above details i provided in above comments ( #20 (comment) and #20 (comment) ) hold true here as well.
from tensorflow-nufft.
I think I resolved the issue. It is as I expected, the need for a cudaDeviceSynchronize
. (Also see https://stackoverflow.com/questions/55975775/non-deterministic-results-using-gpus-with-tensorflow-and-tensorflow-serving and https://forums.developer.nvidia.com/t/multiple-runs-different-results/37950 )
I will make a PR soon. After final tests. We cant exactly have a test in place to detect this as it would run on CPU where it will be fine, and also the above repro isnt exactly perfect.
I am just doing some final tests to check everything is running fine.
Thank you @jmontalt for your help!
from tensorflow-nufft.
Excellent! Race conditions of this sort are often easily fixed one you've closed in on the problem, but can be very tricky to pinpoint. I was not looking forward to it. So thanks to you @chaithyagr!
I agree we'll need to do without a test in this case. The nature of the problem means it's impossible to get deterministic behaviour, and the GitHub runner does not have a GPU anyway.
I will merge this ASAP.
from tensorflow-nufft.
Well ideally, the race conditions can easily be found by using racecheck by Nvidia. It just happened that I found this while looking through the codes even before I got that racecheck run up with line informations.
I think cufinufft has a similar issue and a colleague of mine @paquiteau has isolated it around the same time.
Turns out same discoveries at same time do happen xD
from tensorflow-nufft.
Hey, it looks like the issue is still not fixed. I am not yet very sure whats the issue. @jmontalt can you pleas reopen this issue? I will try to see if I can get another PR with the fix
from tensorflow-nufft.
Oh, sure!
from tensorflow-nufft.
@jmontalt I took some time and went through the cuda API calls and trace in NSight. I suspected that most of the calls are occurring in the same stream due to which, we mostly wont need a device sync (my earlier fix isnt needed). It was a false fix given how variable this issue is, I just got wrong in my first thinking.
I had a question:
I see that for spread
step, we do not set the global memory to 0 as is done for execute_type_1
. Is there any particular reason? Do we expect tensorflow to do this?
I feel most our issues crop up from this, as I face this issue when I have density compensation estimation in my test code and if i run the same estimation and replace spread
and interp
with complete NUFFT operations (op
and adj_op
), everything runs smoothly for now (still testing to be very sure this time).
Further, I observed that the spread
runs kernels which sends AtomicAdd
operations to update the global memory, which will continue to hold on to the issue of uninitialized memory.
from tensorflow-nufft.
Hey @chaithyagr, many thanks for looking into this. I was able to reproduce your example. In my machine, it seems to happen at every iteration after the first one. After including the latest fix, I have not observed the issue in 1,000 iterations. You may have nailed it down this time.
You're absolutely right, the grid data should be initialized to 0. This memory was originally allocated with OpKernelContext::allocate_temp
, documented here. I have no reason to believe this will initialize its memory.
I might remove the device synchronization instructions we added last time. There is no point in making the CPU wait if subsequent kernels will be launched in the same stream. Do you agree?
from tensorflow-nufft.
Yes I agree with u. I tested without them also. We don't need them. Looking at the cuda trace I can be sure that we run almost everything in the same stream.
from tensorflow-nufft.
Great. I'll prepare the merge.
That certainly should be the case. The CUDA stream is assigned by the TensorFlow runtime and our code should respect its wishes.
from tensorflow-nufft.
Hello @jmontalt I dont see the next version up, I thought it was a CD?
Perhaps we can something like https://github.com/zaccharieramzi/tfkbnufft/blob/master/.github/workflows/publish.yml ?
from tensorflow-nufft.
It's certainly a CD, but releases are only triggered on tagged commits. I tagged it this morning and it was released half an hour ago.
from tensorflow-nufft.
Ah I see, my bad. Thank u
from tensorflow-nufft.
Related Issues (19)
- `tfft.util.estimate_density` is inaccurate HOT 1
- NUFFT ignores TensorFlow's intra-op parallelism setting HOT 1
- NUFFT ignores CUDA device/stream specified by TF framework HOT 1
- Accept tensors in the `grid_shape` argument HOT 1
- Implement 1D NUFFT for GPU HOT 1
- Finish code refactoring/linting
- I dont think the gradients wrt to `points` is right? HOT 5
- Implement shape inference function HOT 1
- Compiling without cuda? HOT 16
- Random failing to associate CUDA stream issues? HOT 4
- Test Case NUFFT1 and Real Valued Data HOT 2
- Allow user to specify `max_batch_size`? HOT 7
- Still facing NaN issues! HOT 21
- Scaling for NUFFT operator HOT 1
- Cannot pip install HOT 2
- ImportError: cannot import name 'nufft_options_pb2' HOT 1
- Running NUFFT in parallel causes a seg fault in CPU kernel HOT 2
- how to specify number of modes/frequencies? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-nufft.