mmuckley / torchkbnufft Goto Github PK

View Code? Open in Web Editor NEW

201.0 7.0 44.0 14.54 MB

A high-level, easy-to-deploy non-uniform Fast Fourier Transform in PyTorch.

Home Page: https://torchkbnufft.readthedocs.io/

License: MIT License

Python 100.00%

mri deep-learning nufft pytorch reconstruction

torchkbnufft's People

Stargazers

Watchers

Forkers

manik-hossain hdocmsu jtamir rubenstern fagan2888 jyhmiinlin back2yes zaccharieramzi gongjizhang tianxiaoxiao88 edongdongchen adamwu1979 chaithyagr ricshaw zoonono fzimmermann89 yousuf-siat xpjiang koflera imraj-singh airplaneless andrewwmao ckolbptb hbb1 mriosu mxlutz bigponglee minjunkang zhantaochen hellopipu sschmitter sawydust1228 zhengguotan njstewart-eju nishi951 maartenterpstra ahsanjav cwru-mri andrew-dupuis softvision17 grox167 mylyu nmickevicius eminsight

torchkbnufft's Issues

Failed for 3D non-Cartesian trajectory

Hi there,

I tried to use the torchkbnufft for 3D non-Cartesian trajectory for simulation. I used a 1x1x64x64x64 image and 3D radial trajectory as input to kbNufft for forward NUFFT and then use the acquired kdata for adjoint NUFFT, the output image doesn't look right. I'm wondering if torchkbnufft can be used for 3D non-Cartesian reconstruction? If so, are there any examples for 3D non-Cartesian reconstruction using this package?

Yours,
Qijia

Support for NFFTs of real valued Arrays

Hi,

as far as I can see, there is no support for real-valued input arrays?

However, with rfft we could save a factor of 2 on computation for those cases by using rfft.

Are there some issues with that?

Best,

Felix

Is it possible to use *cfl files with this library?

Hello,
I am aware this is a very specific question, but I will try, and if you think this is not the setting for such a discussion feel free to lock the issue.

So, I am trying to use this library to reconstruct real MRI data acquired on a Siemens mMR scanner.
For that purpose I am able to use 2 different data export formats:

meas*_.dat files (which I can decode using mapvbvd)
*.cfl file (which I can decode using a function integrated in BART)

Decoding the .dat file, and writing the trajectory myself as a python function worked for a radial stack-of-stars sequence, and I am able to get nice images using your library.

For a SpiralVIBE UTE, instead, I was trying to decode those cfl files, as I don't exactly know how to code the trajectory myself. Decoding those files give me both the kspace data and the trajectory.

However, the shape of the arrays I get from those .cfl files is most likely not suited for what tkbn expects me to feed to the nufft operator and needs to be tweaked. The issue is that I am failing to understand how to do it in the correct way.

The original shape of those arrays is:

kspace: (580, 296, 160, 20)
ktraj :(3, 580, 296, 160)

without any reshape or transposition, I always get an error saying: ValueError: If omega has batch dim, omega batch dimension must match data

I have tried all sorts of transpositions with no success. Even trying to unsqueeze the kspace array gives the same error.

The only solution that 'works' is when I unravel both arrays to:

kspace: [1, 1, 27468800]
ktraj :[3, 27468800]

But even in this case, the images I get inverting the kspace are just a 'point' in the center. of the FOV.

I was wondering if you had tried working with a similar type of 3D spiral trajectory and could me any tip about how to organize those arrays before using them as input for the nufft operator.

Thank you!

Performance vs FFT

Hi,

thanks for this package :)!

However, I was wondering about the performance. I would have expected maybe x10 in difference to a standard FFT (from my past experience with CPU based NFFTs.). But here, I observe a difference in factor x100. The adjoint is almost a factor of x1000 slower.

Any thoughts on this?

Felix

Code:

import torch
import torchkbnufft as tkbn
import numpy as np
from skimage.data import shepp_logan_phantom
import matplotlib.pyplot as plt
import napari
import glob
import imageio.v3 as iio
import os
from tqdm import tqdm

device = 'cuda' if torch.cuda.is_available() else 'cpu'

N = 512
N_z = 100
N_angles = int(np.ceil(np.pi * N / 2))
voxels = (torch.zeros(1, N_z, N, N, dtype=torch.float32) + 1j * 0).to(device).to(torch.complex64)


def prepare_nufft(arr, N_angles):
    N = arr.shape[-1]
    k_1d = np.reshape(np.linspace(-0.5, 0.5, N + 1)[:-1], (-1, 1))
    angles = np.reshape(np.linspace(0, np.pi, N_angles + 1)[:-1], (1, -1))
    k_y = np.cos(angles) * k_1d
    k_x = np.sin(angles) * k_1d
    ks = np.reshape(np.stack((k_y, k_x)), (2, -1))
    ktraj = torch.tensor(ks, dtype=torch.float32).to(device)
    nufft = tkbn.KbNufft(im_size=(N, N), numpoints=6).to(device)

    adjnufft = tkbn.KbNufftAdjoint(im_size=(N, N), numpoints=6).to(device)
    
    return nufft, adjnufft, ktraj


nufft, adjnufft, ktraj = prepare_nufft(voxels, N_angles)

%%time
kdata = nufft(voxels, ktraj)
CPU times: user 30.1 ms, sys: 0 ns, total: 30.1 ms
Wall time: 29.6 ms

%%time
#kdata = nufft(voxels, ktraj)
img_filter = adjnufft(kdata, ktraj)
#torch.cuda.synchronize()
CPU times: user 268 ms, sys: 172 ms, total: 440 ms
Wall time: 394 ms



%%time
torch.fft.fft(voxels)
CPU times: user 337 µs, sys: 180 µs, total: 517 µs
Wall time: 283 µs

KB-NUFFT hangs when used in PyTorch DataLoader with num_workers > 0

Hi,

I'm trying to perform on-the-fly data undersampling in my PyTorch dataset. To do this, I perform a Toeplitz NUFFT in the __getitem__ function of my Dataset class. This works as expected. Now, I want to to batching, so I wrap the PyTorch Dataset in a PyTorch DataLoader. This works as expected when num_workers=0. However, when num_workers is non-zero, computation of the NUFFT seemingly enters an infinite loop.

Expected behaviour

Performing a NUFFT in parallel using multiple workers should result in undersampled images.

Observed behaviour

Sampling the dataloader results in a hanging script, seemingly entering an infinite loop.

Extra information

A minimally-working example of this behaviour is attached below.
This behaviour is observed with Torch-kbnufft version 1.3.0 and PyTorch version 1.12.
CUDA is not used in this example, but it also happens when the NUFFT is computed on the GPU.
It is not limited to a Toeplitz NUFFT but also happens with the table NUFFT.
Density compensation has no influence on the observed behaviour

Minimal example

import torch
from skimage.data import shepp_logan_phantom
from skimage.transform import rescale
import numpy as np
import torchkbnufft as tkbn
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader


NUM_WORKERS=0 # set to > 0 to hang

class DataUndersampler(Dataset):
    def __init__(self, undersampling_factor, spatial_scaling_factor):
        self.image = shepp_logan_phantom()
        self.image_shape = self.image.shape
        self.undersampling_factor = undersampling_factor
        self.spatial_scaling_factor = spatial_scaling_factor
        # Need the original size to determine the undersampling factor and NUFFT operators
        self.orig_size = self.image_shape
        self.image_shape = (self.image_shape[0] // self.spatial_scaling_factor, self.image_shape[1] // self.spatial_scaling_factor)
        # Create an oversampled grid
        spokelength = self.image_shape[0] * 2
        self.grid_size = (spokelength, spokelength)
        
        # Generate A LOT of spokes, pick a starting point at random
        nspokes = 2000
        
        # Sample enough spokes to achieve undersampling factor
        self.spokes_to_sample = int((self.orig_size[0] * np.pi / 2) / self.undersampling_factor)
        # Generate a golden angle radial trajectory
        ga = np.deg2rad(180 / ((1 + np.sqrt(5)) / 2))
        kx = np.zeros(shape=(spokelength, nspokes))
        ky = np.zeros(shape=(spokelength, nspokes))
        ky[:, 0] = np.linspace(-np.pi, np.pi, spokelength)
        for i in range(1, nspokes):
            kx[:, i] = np.cos(ga) * kx[:, i - 1] - np.sin(ga) * ky[:, i - 1]
            ky[:, i] = np.sin(ga) * kx[:, i - 1] + np.cos(ga) * ky[:, i - 1]
            
        self.ky = np.transpose(ky)
        self.kx = np.transpose(kx)
        # 1D Ramlak. Needed for density compensation. Density is a linear function 
        # depending on the distance of the center of k-space...
        ram_lak = np.abs(np.linspace(-1, 1, spokelength + 1))
        ram_lak = ram_lak[:-1]
        
        #... except for the center, we know exactly how often we sample that,
        # namely as many times as the number of spokes
        middle_idx = len(ram_lak) // 2
        ram_lak[middle_idx] = 1/(2 * self.spokes_to_sample)
        self.ram_lak = ram_lak
    def __len__(self):
        return 1
    def __getitem__(self, index):
        if self.spatial_scaling_factor > 1:
            img = rescale(self.image, 1/self.spatial_scaling_factor).astype(np.complex)
        else:
            img = self.image.astype(np.complex)
            
        if self.undersampling_factor > 1:
            img_tensor = torch.from_numpy(img).unsqueeze(0).unsqueeze(0)

            toep_ob = tkbn.ToepNufft()
            # We pick a random starting spoke
            offset = np.random.choice(range(self.ky.shape[0]-self.spokes_to_sample))

            # And select as many subsequent spokes to reach the desired undersampling factor
            # todo: use continuing trajectories for cine?
            selected_ky = self.ky[offset:offset+self.spokes_to_sample].flatten()
            selected_kx = self.kx[offset:offset+self.spokes_to_sample].flatten()

            ktraj = torch.tensor(np.stack((selected_ky, selected_kx), axis=0))
            # Repeat and reshape the ram-lak so every spoke is density-compensated
            ram_lak_t = torch.from_numpy(np.tile(self.ram_lak, self.spokes_to_sample)).unsqueeze(0).unsqueeze(0)
            # Calculate the really efficient Toeplitz kernel to compute the NUFFT
            dcomp_kernel = tkbn.calc_toeplitz_kernel(ktraj, self.image_shape, weights=ram_lak_t, norm='ortho',numpoints=(3,3))  # with density compensation
            # And in a single step, compute the radial k-space and back to image space
            img = toep_ob(img_tensor, dcomp_kernel, norm='ortho').abs().squeeze().numpy()
            # renormalize the output, because undersampling can change this.
            img /= np.max(img)
            
        return np.abs(img)

# Create dataset
dset = DataUndersampler(
            undersampling_factor=2, 
            spatial_scaling_factor=1)


# this works fine
undersampled_img = dset[0]

# From here, the observed behaviour emerges. 

dloader = DataLoader(dset,
            shuffle=False,
            batch_size=1, num_workers=NUM_WORKERS)

# this statement hangs when num_workers > 0
undersampled_img = next(iter(dloader))

Could you give an example of implementing a two-dimensional Fourier transform by use torchkbnufft?

Hello,torchkbnufft is great but the example is not easy.So,I want to know if I want to compute the Fourier transform of a two-dimensional array by use torchkbnufft,what should I do?
example:a = torch.rand((4,4))
fft: b = torch.fft.fft2(a)
torchkbnufft:???

Issue with batched toeplitz kernels

Hey Matt, I seem to be consistently getting an error when attempting to call "calc_toeplitz_kernel" with 'omega' with dimension (N, 3, klength) and 'weights' with dimension (N, 1, klength). This seems to occur even when N=1. The error goes away if I simply don't supply the density compensation weights. Any suggestions for how to fix?

Potential issue with shell-based thread management

I made a change in #27 that could be a potential issue, so I would like to document it here. Particularly, in this commit the following lines were commented out:

if USING_OMP and cpu_device:
    torch.set_num_threads(num_threads)

The reason for this was that in PyTorch 1.8 these lines led to a severe performance regression - i.e., at the Python level it seemed PyTorch wasn't handling switching the number of available threads very well. I removed the lines as the regression was too large.

The downside is that shell-based OMP thread management may be ignored within forks - OMP specifies the number of threads that can be spawned but does not keep a global limit, so if your global limit is 8 and you fork 8 times, each one of those forks could create 8 new threads and lead to oversubscription.

In general it's a niche issue that I hope doesn't affect most people. I haven't been able to figure out how to fix this - the answer may be to just go to C++ as mentioned in Issue #28 if anyone decides to tackle this.

Inverse non-uniform Fourier Transform

Thank you very much for putting torchkbnufft together. I like the fact that it supports autograd. My question is how does one take an inverse non-uniform Fourier Transform with torchkbnufft? Thanks!

Gradients w.r.t. coordinates in ktraj

Hi,
thanks for the great package!
Is it right that both nufft and adjoint nufft are only differentiable w.r.t. the input image/k-space but not w.r.t. the coordinates in the input trajectory?
Is it feasible to add gradients w.r.t. to the coordinates as well?

Best,
Tobit

Batched k-space NUFFT

Hi,
Since I work with cardiac MRI k-space data (golden angle radial acquisition, 2D + time), my application involves a large number of small NUFFTs. Currently, the only way to process the data is to loop over the time dimension because there is no batch dimension in the trajectory tensor (i.e., different trajectories for every 2D slice).

Is there a more efficient solution than just iterating over the batch dimension with diffenrent trajectories? @mmuckley pointed out in #18 that he has an efficient solution for this use case.

Thanks a lot!

Forward NUFFT not backpropagatable!

Hi,
I know it's bold claim, but I think there is an issue with the autograd functionality in the forward nufft operation. In order to reproduce the problem I modified the cell [8] in the Basic Example in the following way:

image.requires_grad = True
kdata = nufft_ob(image, ktraj)
kdata.requires_grad

The first time I executed the cell, the output is True, as expected. But every subsequent time, it prints False. I am not an expert in the torch autograd functionality but I would expect the output to have the same requires_grad value as the the input.
I also did the same test with the adjoint nufft (cell [10] modified):

kdata.requires_grad = True
image_blurry = adjnufft_ob(kdata, ktraj)
print(image_blurry.requires_grad)

... and the output was always True. The same holds for the examples in v0.3.4.
Could it be that the nufft_ob's internal state changes with every call?

Thanks a lot!

KbInterpAdjoint gives mismatch results?

The omega is given by np.mgrid as on-grid locaiton 2 * pi * n / N

data = torch.randn(1, 1, 8, 8) + 1j * torch.randn(1, 1, 8, 8)
[x, y] = np.mgrid[-4:4, -4:4].astype(np.float32) * np.pi / 8
omega = torch.stack([torch.tensor(x), torch.tensor(y)], dim=0)
adjkb_ob = tkbn.KbInterpAdjoint(im_size=(8, 8), grid_size=(8, 8))
image = adjkb_ob(data.view(1, 1, -1), omega.view(2, -1))
plt.figure()
plt.imshow(data.squeeze().abs())
plt.figure()
plt.imshow(image.squeeze().abs())
plt.show()

The left one is the original data and the right is after interpolation. They are quite different !
I wonder whether the on-grid location after interpolation is 2 * pi * n / N ?

Higher order derivatives involving nufft?

Hi,

My problem is quite specific but I try my best to describe it here:

I have two loss functions - let's call them loss1 and loss2. loss1 is computed from an input image x, applying nufft operator on it and then computing loss1, i.e. f(x) := loss1(nufft(x)) (more operations are involved but they are irrelevant here).

Subsequently, I call torch.autograd.grad on f to compute loss2, i.e. f'(x) is part of the backpropagation graph for loss2. However, when I try to execute my code, sometimes, I get the following error message in the line where the autograd.grad function is called:

Frankly, the setup is quite complicated but it works fine when I remove the nufft operator from the function f. When the above error doesn't appear (usually, when I restart the kernel in Jupyter), the code breaks only at the final point when backward() on loss2 is called with an even more cryptic message:

Any suggestions what the underlying problem might be? Happy to provide more information if needed.

Thanks and best,
ajlok3

data range changed after processing

For example, in the provided code example Toeplitz Example，you can find that:

After applying the nufft operation, the range of the data has changed a lot. Although it is possible to force the unification of the data range through data normalization, how to normalize itself is still debatable.

ToepNufft's scale is inconsistent with KbNufft when norm = 'ortho' and grid size != 2*im_size

Hi Matthew,
Thanks for such an elegant implementation for NUFFT on Pytorch.
I have on little question: when choosing 'ortho' norm and the grid_size is not exactly 2*im_size, the scale of ToepNUFFT is inconsistent with AdjKbNufft(KbNufft()). This also happens before 1.0.0.
Thanks in advance.

Batched input with varying size leads to TorchScript error

When using batched inputs with varying ktraj sizes, I get a TorchScript RuntimeError when using a cuda GPU. On CPU, the error does not occur. See following example:

import torch
import torchkbnufft as tkbn
torch.manual_seed(0)

device = torch.device("cuda:0")

batch_size = 16
x = torch.rand((256, 256))
im_size = x.shape
x = x.unsqueeze(0).unsqueeze(0).to(torch.complex64)
x = x.repeat(batch_size, 1, 1, 1).to(device)

for _ in range(10):
    klength = 32 + torch.randint(0, 128, size=(1,)).item()
    print(klength)
    ktraj = torch.stack(
        (torch.zeros(klength), torch.linspace(-torch.pi, torch.pi, klength))
    )
    ktraj = ktraj.to(device)
    ktraj = ktraj.unsqueeze(0).repeat(batch_size, 1, 1)

    nufft_ob = tkbn.KbNufft(im_size=im_size).to(device)
    kdata = nufft_ob(x, ktraj)

Output for me is

104
138
88
61
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 21
     18 ktraj = ktraj.unsqueeze(0).repeat(batch_size, 1, 1)
     20 nufft_ob = tkbn.KbNufft(im_size=im_size).to(device)
---> 21 kdata = nufft_ob(x, ktraj)

File ~/miniforge3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
   1106 # If we don't have any hooks, we want to skip the rest of the logic in
   1107 # this function, and just call forward.
   1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110     return forward_call(*input, **kwargs)
   1111 # Do not call functions when jit is used
   1112 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniforge3/envs/py310/lib/python3.10/site-packages/torchkbnufft/modules/kbnufft.py:211, in KbNufft.forward(self, image, omega, interp_mats, smaps, norm)
    208     assert isinstance(self.table_oversamp, Tensor)
    209     assert isinstance(self.offsets, Tensor)
--> 211     output = tkbnF.kb_table_nufft(
    212         image=image,
    213         scaling_coef=self.scaling_coef,
    214         im_size=self.im_size,
    215         grid_size=self.grid_size,
    216         omega=omega,
    217         tables=tables,
    218         n_shift=self.n_shift,
    219         numpoints=self.numpoints,
    220         table_oversamp=self.table_oversamp,
    221         offsets=self.offsets.to(torch.long),
    222         norm=norm,
    223     )
    225 if not is_complex:
    226     output = torch.view_as_real(output)

File ~/miniforge3/envs/py310/lib/python3.10/site-packages/torchkbnufft/functional/nufft.py:172, in kb_table_nufft(image, scaling_coef, im_size, grid_size, omega, tables, n_shift, numpoints, table_oversamp, offsets, norm)
    169     is_complex = False
    170     image = torch.view_as_complex(image)
--> 172 data = kb_table_interp(
    173     image=fft_and_scale(
    174         image=image,
    175         scaling_coef=scaling_coef,
    176         im_size=im_size,
    177         grid_size=grid_size,
    178         norm=norm,
    179     ),
    180     omega=omega,
    181     tables=tables,
    182     n_shift=n_shift,
    183     numpoints=numpoints,
    184     table_oversamp=table_oversamp,
    185     offsets=offsets,
    186 )
    188 if is_complex is False:
    189     data = torch.view_as_real(data)

File ~/miniforge3/envs/py310/lib/python3.10/site-packages/torchkbnufft/functional/interp.py:117, in kb_table_interp(image, omega, tables, n_shift, numpoints, table_oversamp, offsets)
    114     is_complex = False
    115     image = torch.view_as_complex(image)
--> 117 data = KbTableInterpForward.apply(
    118     image, omega, tables, n_shift, numpoints, table_oversamp, offsets
    119 )
    121 if is_complex is False:
    122     data = torch.view_as_real(data)

File ~/miniforge3/envs/py310/lib/python3.10/site-packages/torchkbnufft/_autograd/interp.py:87, in KbTableInterpForward.forward(ctx, image, omega, tables, n_shift, numpoints, table_oversamp, offsets)
     81 """Apply table interpolation.
     82 
     83 This is a wrapper for for PyTorch autograd.
     84 """
     85 grid_size = torch.tensor(image.shape[2:], device=image.device)
---> 87 output = table_interp(
     88     image=image,
     89     omega=omega,
     90     tables=tables,
     91     n_shift=n_shift,
     92     numpoints=numpoints,
     93     table_oversamp=table_oversamp,
     94     offsets=offsets,
     95 )
     97 ctx.save_for_backward(
     98     omega, n_shift, numpoints, table_oversamp, offsets, grid_size, *tables
     99 )
    101 return output

File ~/miniforge3/envs/py310/lib/python3.10/site-packages/torchkbnufft/_nufft/interp.py:376, in table_interp(image, omega, tables, n_shift, numpoints, table_oversamp, offsets, min_kspace_per_fork)
    374 if USING_OMP and image.device == torch.device("cpu"):
    375     torch.set_num_threads(threads_per_fork)
--> 376 kdat = table_interp_fork_over_batchdim(
    377     image, omega, tables, n_shift, numpoints, table_oversamp, offsets, num_forks
    378 )
    379 if USING_OMP and image.device == torch.device("cpu"):
    380     torch.set_num_threads(num_threads)

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/CODE1/320171775/miniforge3/envs/py310/lib/python3.10/site-packages/torchkbnufft/_nufft/interp.py", line 265, in table_interp_fork_over_batchdim

    # collect the results
    return torch.cat([torch.jit.wait(future) for future in futures])
                      ~~~~~~~~~~~~~~ <--- HERE
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: MALFORMED INPUT: Index out of bounds in check_bounds. Index: 511; bounds: [0, 1). - ret_val(dtype=int64_t, sizes=[1], strides=[1])

grid_size

Hi @mmuckley, thank you for the wonderful work. I'm using the KbNufftAdjoint for spiral data and am wondering how do I disable the oversampling during gridding? I tried to set the grid_size the same as the im_size to avoid it but it doesn't seem to work. The final result still looks smaller than it should be. I tried various settings with no success....

Different k-space trajectories

Hi,

I am a bit confused about how the ktraj is defined.
In the examples you provide, you use a radial k-space trajectory, and it's shape is (2, 2 x nspokes x image_size).
How can I define a simple Cartesian trajectory?

Thanks!

Allow user control of threading

We should have some parameter that the user can pass that controls the number of threads used for a NUFFT. Presently we discover the number of threads on the system here. What this feature proposes is to pass in num_threads as a parameter. By default it can be None, which would revert to the current behavior.

This may address issues with using NUFFTs in dataloaders with multiple workers as seen in #73.

is nufft a reversible transformation？

x != A'Ax

Make compatible with latest PyTorch

Hi,
When I try to import the lib, there raises an error: AttributeError: module 'torch' has no attribute 'complex32'. How can I solve the issue?
Thanks!

torch.func (pytorch 2.0) compatibility

Hi,

to be able to use torch.func.* (for example, torch.func.grad) with torchkbnufft, the torch.autograd.Functions in torchkbnufft/_autograd/interp.py would need an update to the newer calling signature.
See https://pytorch.org/docs/stable/notes/extending.func.html.

In particular, the forward should no longer take a ctx argument. Instead, it should return the output and all tensors etc. that need to be saved to ctx. And a third function, setup_context should save it in ctx.
Also, it would be nice to make use of vmap possible

As this would break compatibility for pytorch versions <2.0 (released before March 2023), this would either require a new torchkbnufft version and a bump in the required torch version, or an import-time switch depending on the pytorch version.

I would be willing to prepare a PR for either option.

batched nufft

Basic KB-NUFFT Example

Thank you for sophisticated implementation of NUFFT with pytorch.
I have a question about "Basic KB-NUFFT Example".

In the code, I compared the original image of the shepp-logan and the final product of tkbn.KbNufftAdjoint, namely "sharp image (with Pipe dcomp)" in the last image,
and found that the accuracy is in order of ~10%.

Is this an expected result (since its purpose is demonstration)?
And if so, could you kindly teach me how to improve it to moderate accuracy like ~0.1%?

My environment is:
MacOS 11.3.1
python 3.8.3
torch 1.10.2
torchvision 0.11.3
jupyter 6.0.3

Basic 1D case

Hello, i have several 1D time series data that is non-uniformly sampled. How would i apply this case in your package? Thank you very much for the help in advance :)

Performance degradation for CPU NUFFT with PyTorch 1.8

I am noticing the performance degradation with PyTorch 1.8 for the CPU on my home system (Windows 10, i5 8400, GTX 1660, torchkbnufft version 1.1.0). It looks like the GPU is relatively unaffected. The details are below. I'm not sure why this is happening yet, but I will try to look into it. If anyone has any information, feel free to post on this issue.

PyTorch 1.8:

running profiler...
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cpu, sparse_mats: False, toep_mat: False, size_3d: None
forward average time: 2.0657340599999996, backward average time: 3.4234444799999992
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cpu, sparse_mats: False, toep_mat: True, size_3d: None
toeplitz forward/backward average time: 0.13343995500000005
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cpu, sparse_mats: True, toep_mat: False, size_3d: None
forward average time: 1.0262545000000016, backward average time: 1.0705226799999992
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cuda, sparse_mats: False, toep_mat: False, size_3d: None
GPU forward max memory: 0.159003136 GB, forward average time: 0.074529785, GPU adjoint max memory: 0.152530432 GB, backward average time: 0.0685140699999998
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cuda, sparse_mats: False, toep_mat: True, size_3d: None
GPU forward max memory: 0.114505216 GB, toeplitz forward/backward average time: 0.006467924000000096
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cuda, sparse_mats: True, toep_mat: False, size_3d: None
GPU forward max memory: 0.77268992 GB, forward average time: 0.20692490499999963, GPU adjoint max memory: 1.035167232 GB, backward average time: 0.2132972450000004

PyTorch 1.7.1:

running profiler...
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cpu, sparse_mats: False, toep_mat: False, size_3d: None
forward average time: 1.8955573599999997, backward average time: 1.6387825
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cpu, sparse_mats: False, toep_mat: True, size_3d: None
toeplitz forward/backward average time: 0.12237997000000007
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cpu, sparse_mats: True, toep_mat: False, size_3d: None
forward average time: 0.8352743000000004, backward average time: 1.01682184
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cuda, sparse_mats: False, toep_mat: False, size_3d: None
GPU forward max memory: 0.158736896 GB, forward average time: 0.07951689000000002, GPU adjoint max memory: 0.152530432 GB, backward average time: 0.06854967499999987
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cuda, sparse_mats: False, toep_mat: True, size_3d: None
GPU forward max memory: 0.114505216 GB, toeplitz forward/backward average time: 0.006591889999999978
im_size: (256, 256), spokelength: 512, num spokes: 405, ncoil: 15, batch_size: 1, device: cuda, sparse_mats: True, toep_mat: False, size_3d: None
GPU forward max memory: 0.77268992 GB, forward average time: 0.2121914199999999, GPU adjoint max memory: 1.035167232 GB, backward average time: 0.21654677000000006

Accuracy for numpoints = 2 - 7

Not necessary to ask this. Close the issue.

Usage examples

Thanks for the great effort of bringing NUFFT to Pytorch!
The current version of the code is geared towards image analysis, and MRI specifically.
As such, it is hard to grasp the meaning of arguments and their names from the examples.
It would be great to have a basic 1D example of forward/inverse NUFFT-I/NUFFT-II to broaden user
audience

Warnings during run

I got these warnings with the version of torchkbnufft that I installed through pip3 by following:

sudo pip3 install torchkbnufft

/usr/local/lib/python3.8/dist-packages/odak-0.1.5-py3.8.egg/odak/wave/classical.py:82: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at  /pytorch/aten/src/ATen/native/Copy.cpp:162.)
  H           = torch.tensor(H).to(dtype).unsqueeze(0).unsqueeze(0)
/usr/local/lib/python3.8/dist-packages/torchkbnufft/kbnufft.py:109: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  torch.tensor(np.imag(self.scaling_coef))
/usr/local/lib/python3.8/dist-packages/torchkbnufft/nufft/fft_functions.py:46: UserWarning: The function torch.fft is deprecated and will be removed in PyTorch 1.8. Use the new torch.fft module functions, instead, by importing torch.fft and calling torch.fft.fft or torch.fft.fftn. (Triggered internally at  /pytorch/aten/src/ATen/native/SpectralOps.cpp:567.)
  x = torch.fft(x, grid_size.numel())

nonuniform discrete Fourier transform of type II

If I read

https://en.wikipedia.org/wiki/Non-uniform_discrete_Fourier_transform#Definition

correctly, then "adjoint nonuniform DFT" (also called "nonuniform discrete Fourier transform of type II") would entail interpolating in space.

Thus, I expect that KbNufftAdjoint would not operate in frequency space, as seems to be suggested by the docs. Is there a standard reduction in this field that allows me to interpolate in space using this logic please?

Question about density compensation

Hi,

I am interested in the way you implemented density compensation.
On this line, you use [-2:, ...] on a tensor that is of shape [2, 324000]. Therefore I don't understand the use of slicing in this instance.

The reason I want to know more is because I want to use density compensation in 3D, and I think that in the current state, this slicing is preventing it. Other than that, do you see any reason why the current density compensation wouldn't work well for 3D trajectories?

faster cal_density_compensation_function?

Hi @mmuckley,

thank you for the great job for non-Cartesian MR reconstruction!

I'm currently working on cardiac MRI with radial acquisition. I found that the computation of density compensation takes much longer time than the nufft itself. Would it be possible to speed it up? Or any suggestions on that?

Thanks in advance!

Can the current package handle 2x oversampling data along readout direction？

Can the current package handle 2x oversampling data along readout direction？ For example，if readout point of each spoke is 512（2x oversampling included），image size is 256，grid size is 384。

MriSenseNufft, AdjMriSenseNufft, AdjKbNufft, ToepSenseNufft removed

Hello @mmuckley ,

Thank you very much for sharing such a great work.

When I try some open-source project, some errors happen, because some functions like MriSenseNufft, AdjMriSenseNufft, AdjKbNufft, ToepSenseNufft, etc., have been deprecated.
If I still want to use the latest version of torchkbnufft, how can I solve this problem?

Thanks a lot.

Best Regards,
kingaza

Repository Refactor

Hello everyone.

Since I built this repository, I've learned about Python and would like to make a number of changes to the backend of torchkbnufft. Among other things, I'd like to minimize the use of high-level Python constructs like dicts and make the backend functions scriptable by torch.jit.script. This will likely involve several backwards compatibility-breaking changes. I'll try to keep track of progress in this Issue.

Turn off complex tensors completely

Since some features of pytorch are not yet supported for ComplexFloat tensors, it would be desirable to have a "switch" to turn off complex tensor completely (maybe in a different branch?).

My particular use case involves multi-GPU training with nccl backend. The error message I get is:

RuntimeError: Input tensor data type is not supported for NCCL process group: ComplexFloat

... and the complex tensors don't seem to come from my code but from within KbNUFFT object within the model.

Toeplitz Kernel has boundary artifacts

Hi,

I'm not quite familiar with the mathematics of the kernel but it looks like that the kernel has some boundary issues.
I would have expected that it decreases with intensities but it doesn't.

This is for the use-case of radial polar patterns.

What you see is the real part of the kernel, after fftshift.

Did you observe that before?

Best,

Felix

Deprecation warning and shutdown

When I use self.nufft_ob = tkbn.KbNufft(im_size=im_size, #grid_size=(256,256), ).to(self.dev),
I got warning and shutdown

DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

but there is no problem with tkbn.ToepNufft()

and I want to know how to inverse NuFFT with toepliz.

Thank you

Thread allocation issues with batched density compensation

Hello,
I encountered this problem on our computation server, it has a dual socket setup with two cpus each having 64 cores (together 128 cores and 256 threads). I have a dynamic dataset with radial sampling where are 100 frames each having 37 spokes (different trajectories). I wanted to compute the density compensation in a batched way, so that it is faster. When run in the batched form I get errors failing obtaining resources from the libgomp:

libgomp: Thread creation failed: Resource temporarily unavailable
This does not differ if I run the compensation on a CPU (the trajectories are on a CPU device) or they are on a GPU (CUDA).

On the other hand when I run this density compensation in a for loop (each 2D frame at a time), I get the results without an issue.

Also when working with the forward/adjoint operators later on they work in a batched form just fine(although quite slow). This problem is related only to the density compensation function. Might this be an issue with thread allocation specifically in this function?

Unfortunately I am able to reproduce this problem only at this specific setup, I tried it on a different PC and there the batch computation works just fine but quite slow even on a GPU. When I looked at the GPU usage it was very low, while the CPU usage was quite high. I first thought that there is some precomputation going on when the trajectories are not the same for each frame, although it might be related to the fact that it is creating too many threads in that case. I am not that familiar with this implementation. I am used to planning the NUFFT as in for example the gpuNUFFT or pynufft and then applying the transforms and here it seems different, as no planning step is taken ahead of applying the forward/adjoint operations.

C++ Interpolation Backend with PyTorch Extensions

I'm wondering if we might get better performance with the package by rewriting the backend to call into the PyTorch C++ API directly. This would bypass the Python interpreter, which gives quite a bit of overhead with indexing, and indexing is usually our slowest or second-slowest operation when doing interpolations. The process is documented fairly well here.

The nice thing about using the PyTorch C++ API is we've basically already written the code in C++. torch.Tensor.index_add_ becomes at::Tensor::index_add_ and so on and so forth, so we could basically keep the code we have in Python and transcribe it to C++. As long as we stay wtihin the PyTorch API we shouldn't have to worry about C++ stuff like memory management, as the PyTorch garbage collector will take care of that for us.

One thing to figure out would be distribution. I think the best thing would be a modification to setup.py that builds for any target system and then uploads it to PyPI, keeping the current pure-Python as a fallback. That is to say, we should avoid compiling anything on the install system. The user should only download wheels and immediately use it, which is what people expect from this package. We could accomplish this by modifying the distribution config to build and upload wheels for each target system. Conservatively we could build for Mac and Linux targets and have Windows as a stretch goal.

I don't have time to do this in the immediate future, but I could assist and review diffs from anyone that is interested. Feel free to post here or submit PRs if you are.

mmuckley / torchkbnufft Goto Github PK

torchkbnufft's People

Stargazers

Watchers

Forkers

torchkbnufft's Issues

Expected behaviour

Observed behaviour

Extra information

Minimal example

Recommend Projects

Recommend Topics

Recommend Org