Spreadinterponly (CPU and GPU) #602

ahbarnett · 2025-01-08T23:28:27Z

This is a proposal for how a convenient access to the spread/interp task is possible via the FINUFFT/cuFINUFFT interface. (since spreadinterp module is not part of our API). It is based on @chaithyagr PR #599 for the new CPU opts field. And the GPU s/i-only is already part of 2.3.1 and master.

Design discussion:

I feel that the GPU spreadinterponly interface has to be tweaked so that upsampfac controls the kernel shape. This would also apply to CPU. Currently GPU forces upsampfac=1.0, but under the hood uses the kernel for the default upsampfac=2, which makes no sense.
Eg, I would like to be able to set upsampfac=infinity, which gives a certain kernel shape needed in Ewald methods.

I will hash this out here. Comments welcome.

Tasks:

clean up CPU spreadinterponly=1 logic
add the new opts field to all language interfaces and check
example which demos it (1D)
add tester to CI (1D)
docs CPU
MATLAB demo CPU
Revisit gpu_spreadinterponly=1 and edit logic so upsampfac is respected rather than forced to be 1.0
GPU doc copying CPU doc
remove error code 23, since when spreadinterponly=1 then any upsampfac in (1,Inf] is valid.

… (fails), py

…U type changes to keep

…eds rand_r

…ge-upsampfac warning (it legit could be Inf)

ahbarnett · 2025-01-09T03:25:31Z

@chaithyagr If you look at the example, tester, and docs/opts.rst you will see my proposal for the cpu spreadinterponly=1 implementation. See what you think. If you are not unhappy - let me know - I will apply this to GPU too - the difference being you would not have to set upsampfac=1 to use it, rather would set it as with a NUFFT. This is really the only way I can see it working, without breaking the API.

I still would be curious how you guys access the actual kernel function used, in your DCW for MRI application...

ahbarnett · 2025-01-16T19:44:14Z

@chaithyagr Still hoping for some feedback on GPU interface tweak, so I can go ahead (early February at this rate) with:

Revisit gpu_spreadinterponly=1 and edit logic so upsampfac is respected rather than forced to be 1.0.

chaithyagr · 2025-01-17T09:17:48Z

Hey @ahbarnett , I am so sorry for my delay. I think I responded your question regarding ns going to infinity in #564 .
Coming to this update,
I am okay with this new interface, I will have to change some stuff in mind-inria/mri-nufft#195 and mind-inria/mri-nufft#224.
Just some quick questions to be sure I understood correcty:

So right now if spreadinterp_only is set to 1, the output is still with upsampfac>1, basically we will need to send arrays of the right sizes on our own. This was not needed when we do NUFFT as the output shape is exactly the same size as what we need.

To be clearer:
If we are doing NUFFT with 1000 points trajectory to a shape of (256, 256) and upsampfac=2,
then the image size in forward NUFFT is (256, 256), but if spreadinterp_only is set, it is (512, 512), right?

Same applies for adjoint operation where we provide a memory location for output image dimensions.

Do we have checks on input shapes of the images? I didnt find this originally and I just wanted to make sure that the sizes to API remain the same, which is why I chose the route of making upsampfac=1

Finally, please do let me know when you are ready and I would like to make sure my tests pass and I can make fixes right on my side.

Thank you so much for handeling this and please do let me know when you are ready. I will do my changes on finufft side at mri-nufft today and give you update on whether it works good.

chaithyagr · 2025-01-17T12:40:40Z

Assuming that we need to provide (512, 512), note that current state of codes lead to following error:

finufft/python/finufft/finufft/_interfaces.py

Lines 467 to 474 in d13dade

    
           if fshape[-1] != ms: 
        
               raise RuntimeError('FINUFFT f.shape is not consistent with n_modes') 
        
           if dim>1: 
        
               if fshape[-2] != mt: 
        
                   raise RuntimeError('FINUFFT f.shape is not consistent with n_modes') 
        
           if dim>2: 
        
               if fshape[-3] != mu: 
        
                   raise RuntimeError('FINUFFT f.shape is not consistent with n_modes')

Basically the validity checks are right, except when we are pushing out a larger array when we have spreadinterp_only is true.

chaithyagr · 2025-01-27T17:41:49Z

@ahbarnett did you get a chance to look at my comments? The main issue of setting spreadinterp_only=1 and upsampfac!=1, implies we need to update the API interface such that it can support having image domain shapes = img_size * upsampfac.

ahbarnett · 2025-02-08T01:37:03Z

@chaithyagr Sorry about the delay. But the answer to your question is no: the output array for a type 1 always has the requested size. Recall that when spreadinterponly=1, upsampfac only controls the kernel choice and width in gridpoints; there is no actual upsampling, just spreading to the user-requested grid. It wouldn't make sense to do much else. The spreading kernel must be controlled somehow, and this is the most elegant way to do it (via tol and upsampfac, as kernel control parameters only). Note that for type 2 it would also be meaningless to "upsample" the user's regular input array: again, plain interpolation from their grid is done. See examples/spreadinterponly1d.cpp

Let me know if unclear. I will finish up the PR now on the GPU side. Best, Alex

…U sionly docs

ahbarnett · 2025-02-08T20:11:47Z

@chaithyagr I am done and would like to bring it in, if you can let me know. From your end (GPU user) there is only one change: upsampfac cannot be set to 1.0; it must be a valid setting in order to know what kernel shape to use. Recall these are 0.0 (auto-choose), one of the valid settings 1.25 or 2.0 (for fast kerevalmeth=0), or any number in (1,+Inf], for slower kerevalmeth=1. Recall that upsampfac is only a kernel shape control parameter, and does not scale the I/O f grid which is always the requested size (N1*N2 etc). It is amusing that upsampfac=Inf is a valid setting for kerevalmeth=1, even though it could never be used for a NUFFT. I have removed your error code 23, as there is no need. I improved the comments. I fixed the docs to match, and to improve the explanation for you about the precise grid that is spread/interp to.

I added a matlab CPU demo that includes plotting the spreading kernel:

Here you get to see the output array size (500x1000) and that the origin precisely aligns with (250,500), ie (N1/2, N2/2).
This helps understand the overlay of the spreadinterp grid on the physical (for you k-space) coordinates x,y in [-pi,pi], etc.

Best, Alex

chaithyagr · 2025-02-08T20:22:43Z

Hey @ahbarnett , thats awesome. Thank you for your updates. I will surely get back to you early next week!
Based on your response, I still dont think I understand what upsampfac does in your case..

My understanding based on how it used to work in gpuNUFFT is that :

For forward NUFFT (image to kspace): we take image of size N to fourier domain of grid size of upsampfac*N after which you would interpolate the data to kspace locations
For adjoint NUFFT (kspace to image): we spread the data to upsampfac*N grid followed by FFT and crop the data in image domain.

Am I missing something?

If above is true, wouldnt removing FFT imply exposing a larger grid image as input (for forward) and as output (for adjoint). RIght?

i.e. isnt the output the spread grid with a different size?

I will go through code to get more clarity. Btw, the CPU one works when we give the right shapes as expected, but I think the output was cropped. Hopwfully I can get some example images to give clarity in my message above

… clang-format make compat w/ 14.0.0

ahbarnett · 2025-02-08T21:33:33Z

src/finufft_core.cpp

+    if (opts.spreadinterponly) { // (unusual case of no NUFFT, just report)
+
+      // spreadinterp grid will simply be the user's "mode" grid...
+      for (int idim = 0; idim < dim; ++idim) nfdim[idim] = mstu[idim];


@chaithyagr this is the CPU version where you see upsampling is switched off. Indeed, it needs to write to the user sizes of array, so it cannot be any other way.

ahbarnett · 2025-02-08T21:35:04Z

@chaithyagr Sorry, I forgot to set nf1=ms (ie, N1), and the equivalents in other dims. I fixed that now. The problem is we don't have a tester for your gpu_spreadinterponly=1 mode, so CI doesn't catch such a bug.

For CPU I know the behavior is correct. I didn't understand your comment about output being cropped: that is just for plotting purposes! The I/O grid sizes are (500,1000) and indeed the result matches that. Re code, see l. 665 of src/finufft_core.cpp. I can't make github make a link to that since it's in a PR. I tagged you in a comment.

But, in terms of behavior, I feel that the docs are clear: there is no upsampling, and the "upsampfac" merely controls the kernel parameters. I'm not sure what's unclear about that. Please let me know.

PS I have no idea what gpuNUFFT does! (recall that none of these other codes provide actual mathematical formulae for what they do... I could do that for spreadinterponly=1 mode, but I want to know it's used first....)

Thanks, Alex

ahbarnett · 2025-02-08T21:36:56Z

include/cufinufft/impl.h

-
+    if (d_plan->opts.gpu_spreadinterponly) {
+      // spread/interp grid is precisely the user "mode" sizes, no upsampling
+      nf1 = d_plan->ms;


@chaithyagr ...and here is the corresponding GPU code. At the risk of repetition, since the user allocates an N1*N2 output array, spreading could not write to any other size without segfault. Agreed?

ahbarnett · 2025-02-08T21:55:53Z

Indeed, I was able to hack my local test/cuda/cufinufft1d_test to set debug=1 and gpu_spreadinterponly=1,
giving the following.
This verifies that the nf1=N1, ie, user array sizes are respected, even though sigma=2 (upsampfac for kernel design):

[time  ] dummy warmup call to CUFFT	 0.00434 s
[cufinufft] (ms,mt,mu): 1000 1 1
[setup_spreader] (kerevalmeth=1) eps=1e-06 sigma=2: chose ns=7 beta=16.1
[cufinufft] spreader options:
[cufinufft] nspread: 7
[cufinufft] bin size x: 1048
[cufinufft] shared memory required for the spreader: 16896
[cufinufft] spreadinterponly mode: (nf1,nf2,nf3) = (1000, 1, 1)
[time  ] cufinufft plan:		 0.00805 s
[cufinufft] plan->M=1000
[time  ] cufinufft setNUpts:		 0.000264 s
[time  ] cufinufft exec:		 0.00173 s
[time  ] cufinufft destroy:		 1.95e-05 s
[Method 1] 1000 U pts to 1000 NU pts in 0.0101 s:      9.94e+04 NU pts/s
					(exec-only thoughput: 5.78e+05 NU pts/s)
[gpu   ] one mode: rel err in F[370] is 5.62

Note the error is garbage, as it should be since it thinks it's a NUFFT :)

We need to add a GPU version of the basic SI-only math test that I created for the CPU. Anyway, I'll leave this now. Let me know if you are happy. Thanks, Alex

chaithyagr · 2025-02-10T15:12:39Z

Hey @ahbarnett , I finally tested your codes on our plugin and it works great! Thank you so much for your updates and cleaning up of my hacky implementation. Please let us know if you need anything else from us. Currently we have some basic python tests in our repository. Sadly we dont have any C++ version of tests. I can help with it if needed, but I cant work on it urgently, please let us know.

Thank you again for this work!

DiamonDinoia · 2025-02-10T15:38:35Z

Hey @ahbarnett , I finally tested your codes on our plugin and it works great! Thank you so much for your updates and cleaning up of my hacky implementation. Please let us know if you need anything else from us. Currently we have some basic python tests in our repository. Sadly we dont have any C++ version of tests. I can help with it if needed, but I cant work on it urgently, please let us know.

Thank you again for this work!

Could you send a link to the tests?

chaithyagr · 2025-02-10T15:47:00Z

The ongoing PR is at mind-inria/mri-nufft#224

In particular, you can see https://github.com/chaithyagr/mri-nufft/blob/7187491268f19ded65627cbed8d0a4a7acfbb9b6/tests/operators/test_density_for_op.py#L28-L52

This test checks that the density compensation weights is approximately the inverse of the radial distance for radial trajectories. Its not a very specific mathematical test, but when the trajectory is Nyquist, we expect the density to match what is theoretical. However, for MR recon approximations are okay.

DiamonDinoia

This looks good.

Just minor documentation issues.

DiamonDinoia · 2025-02-10T20:00:16Z

docs/opts.rst

+**spreadinterponly**: [only has effect for type 1 or 2.] For experts only!
+If ``0`` do
+the NUFFT as intended.  If ``1``, omit the FFT and deconvolution
+(diagonal division by kernel Fourier transform) steps, thus returning
+*garbage answers as a NUFFT*, but allowing experts to perform solely
+spreading (if type 1) or solely interpolation (if type 2) by hijacking
+the usual FINUFFT API.  The spreading is onto the grid of the
+user-given size (``N1`` in x, ``N2`` in y, etc), with grid points
+located at coordinates $\{-\pi, -\pi+h, \dots, \pi-h\}$ in each
+dimension, where $h = 2\pi/N$ is the spacing for that dimension ($N$
+here meaning ``N1``, etc). Interpolation is from that same grid.  The
+kernel (width and shape parameter) is determined by ``tol`` and
+``opts.upsampfac``, just as it would be in an actual NUFFT. Note that
+the upsampling factor here only controls the kernel; the grid size
+never differs from ``N1``, etc.  The kernel is not directly
+accessible, leaving the user to figure out how to make use of this
+interface to extract the actual kernel function.  This provides a
+convenient (if hacky) interface to our ``spreadinterp`` module
+(including looping over multiple vectors, if ``ntransf>1``).  The
+known use-case here is estimating so-called density compensation,
+conventionally used in MRI (see `MRI-NUFFT
+<https://mind-inria.github.io/mri-nufft/nufft.html>`_), although it
+might also be useful in spectral Ewald.


Looking at the rendered docs here: https://github.com/flatironinstitute/finufft/blob/spreadinterponly/docs/opts.rst

modeord has a nice itemize. I think is worth doing the same.

Maybe: if 1: It does not perform a NUFFT!

hijacking -> adapting

if hacky -> unconventional | unorthodox

DiamonDinoia · 2025-02-10T20:01:35Z

examples/spreadinterponly1d.cpp

+  x[0]       = 0.0;
+  c[0]       = 1.0;
+  int unused = 1;
+  int ier    = finufft1d1(1, &x[0], &c[0], unused, tol, N, &F[0], &opts); // warm-up


x.data(), c.data(), F.data()

&x[0] is hacky we should not encourage users to do so.

DiamonDinoia · 2025-02-10T20:03:41Z

include/finufft_opts.h

-               //                  1 FFT-style mode order
+  int modeord;          // (type 1,2 only): 0 CMCL-style increasing mode order
+                        //                  1 FFT-style mode order
+  int spreadinterponly; // 0 do actual NUFFT


as per comment for modeord (type1, 2 only) should be specified here too

DiamonDinoia · 2025-02-10T20:13:31Z

src/spreadinterp.cpp

              upsampfac);
      return FINUFFT_ERR_UPSAMPFAC_TOO_SMALL;
    }
-    // calling routine must abort on above errors, since opts is garbage!
-    if (showwarn && upsampfac > 4.0)
+    // calling routine must abort on above errors, since (spread)opts is garbage!


we should update the comment to match the code. This is garbage only for a NUFFT

DiamonDinoia

This looks good.

Just minor documentation issues.

chaithyagr and others added 13 commits November 26, 2024 11:03

WIP

838245b

Added support to do spread interp only

a033f3b

double free and wrap

a67b717

Remove unwanted creation

d77cbb1

Remove unwanted resizing

2094338

Working codes with mri-nufft

d0d60fe

Fixes, update API

305482b

remove unwanted changes

09a9d0c

remove span

90a0675

WIP: merge ChaithyaGR CPU spreadinterponly, tidy up, add to fort, mat…

85e1c4e

… (fails), py

got chaithya CPU spreadinterp working, added example; couple minor GP…

2e490a1

…U type changes to keep

better doc example/spreadinterponly.cpp

8170567

added test (and into CI) for CPU spreadinterponly=1

0d4c0e4

ahbarnett mentioned this pull request Jan 9, 2025

Add support for spreadinterponly in finufft #599

Closed

ahbarnett added 4 commits January 8, 2025 21:22

test spreadinterponly give utils namespace to fix windows CI which ne…

244791f

…eds rand_r

setup_spreader needed to know if spreadinterponly=1 to switch off lar…

d8f42d6

…ge-upsampfac warning (it legit could be Inf)

doc CPU opts.spreadinterponly

29940c2

spreadtest fix extra arg of setup_spreader()

d13dade

ahbarnett mentioned this pull request Jan 9, 2025

Support for gpu_spreadinterponly=1 #564

Merged

comment in example

2b8786f

DiamonDinoia marked this pull request as draft January 24, 2025 18:04

DiamonDinoia mentioned this pull request Jan 30, 2025

Towards 2.4 #491

Open

8 tasks

ahbarnett added 2 commits February 7, 2025 22:09

2d spreadinterponly matlab demo with plot

587ddf2

tweak opts.h

bd96294

ahbarnett added 8 commits February 7, 2025 22:34

clarify cpu spreadinterponly behavior in docs

89deb03

merge in matlab opts12 snafu from master

175ef83

add utils to test_defs.h, fixing spreadinterp1d_test build

2acdb7a

remove err code 23, change sionly logic in setup_spreader, correct GP…

b43e3a5

…U sionly docs

added debug output to GPU setup_spreader, as CPU

295cd10

sionly changelog

12d1760

paren typo in cuda spreadinterp.cpp

d5b2bad

2nd paren typo in cuda spreadinterp.cpp

c4a78dd

ahbarnett marked this pull request as ready for review February 8, 2025 19:17

ahbarnett assigned chaithyagr and unassigned chaithyagr Feb 8, 2025

ahbarnett requested a review from DiamonDinoia February 8, 2025 19:58

ahbarnett added 2 commits February 8, 2025 15:51

set nf1=N1, etc, when gpu_spreadinterponly=1

10342bb

gpu_sionly typo in gpu impl.h; sorry for using Jenkins as a debugger;…

ba1cc8a

… clang-format make compat w/ 14.0.0

ahbarnett commented Feb 8, 2025

View reviewed changes

DiamonDinoia mentioned this pull request Feb 8, 2025

GPU Math test for spreadinterponly #624

Open

DiamonDinoia approved these changes Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spreadinterponly (CPU and GPU) #602

Spreadinterponly (CPU and GPU) #602

ahbarnett commented Jan 8, 2025 •

edited

Loading

ahbarnett commented Jan 9, 2025

ahbarnett commented Jan 16, 2025

chaithyagr commented Jan 17, 2025

chaithyagr commented Jan 17, 2025

chaithyagr commented Jan 27, 2025

ahbarnett commented Feb 8, 2025

ahbarnett commented Feb 8, 2025

chaithyagr commented Feb 8, 2025

ahbarnett Feb 8, 2025

ahbarnett commented Feb 8, 2025

ahbarnett Feb 8, 2025

ahbarnett commented Feb 8, 2025

chaithyagr commented Feb 10, 2025

DiamonDinoia commented Feb 10, 2025

chaithyagr commented Feb 10, 2025

DiamonDinoia left a comment

DiamonDinoia Feb 10, 2025

DiamonDinoia Feb 10, 2025

DiamonDinoia Feb 10, 2025

DiamonDinoia Feb 10, 2025

DiamonDinoia Feb 10, 2025

DiamonDinoia Feb 10, 2025

DiamonDinoia left a comment

Spreadinterponly (CPU and GPU) #602

Are you sure you want to change the base?

Spreadinterponly (CPU and GPU) #602

Conversation

ahbarnett commented Jan 8, 2025 • edited Loading

ahbarnett commented Jan 9, 2025

ahbarnett commented Jan 16, 2025

chaithyagr commented Jan 17, 2025

chaithyagr commented Jan 17, 2025

chaithyagr commented Jan 27, 2025

ahbarnett commented Feb 8, 2025

ahbarnett commented Feb 8, 2025

chaithyagr commented Feb 8, 2025

Choose a reason for hiding this comment

ahbarnett commented Feb 8, 2025

Choose a reason for hiding this comment

ahbarnett commented Feb 8, 2025

chaithyagr commented Feb 10, 2025

DiamonDinoia commented Feb 10, 2025

chaithyagr commented Feb 10, 2025

DiamonDinoia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DiamonDinoia left a comment

Choose a reason for hiding this comment

ahbarnett commented Jan 8, 2025 •

edited

Loading