FIX Without OPENMP, LM_distributable_computation doesn't write output. #1566

ashgillman · 2025-02-20T03:35:44Z

G'day! Long time no see.

I'm having a little explore into the gantry shift PR to see what is required, but first step is to get master building on my machine (OSX).

I'm not familiar with this code, but it looks like the idea was to store the outputs per thread in local_ooutput_image_sptrs and sum them up at the end. Without OpenMP (required if Clang), it is treated as one thread, but the writing back out never happend.

Testing performed

I've tested with ctest, and tests now passing.

Checklist before requesting a review

I have performed a self-review of my code
[-] I have added docstrings/doxygen in line with the guidance in the developer guide
[] I have implemented unit tests that cover any new or modified functionality (if applicable)
The code builds and runs on my machine
[] documentation/release_XXX.md has been updated with any functionality change (if applicable)

…le_computation doesn't write output.

ashgillman · 2025-02-20T03:59:05Z

See also: KrisThielemans@cab1c8e

But the re-enabling the build fails with:

/Users/runner/work/_temp/9e40082b-aa77-4c9f-b976-fcf58e7dc9a9.sh: line 6: /opt/homebrew/opt/llvm/bin/clang: No such file or directory

 + brew install llvm@16
==> Downloading https://ghcr.io/v2/homebrew/core/llvm/16/manifests/16.0.6_1
==> Fetching llvm@16
==> Downloading https://ghcr.io/v2/homebrew/core/llvm/16/blobs/sha256:e799fe5056da9a07d79474f7a7290e0a465474eb004cfc614775f290197e10e5
==> Pouring [email protected]_1.arm64_sonoma.bottle.tar.gz
==> Caveats
To use the bundled libc++ please add the following LDFLAGS:
  LDFLAGS="-L/opt/homebrew/opt/llvm@16/lib/c++ -Wl,-rpath,/opt/homebrew/opt/llvm@16/lib/c++"

llvm@16 is keg-only, which means it was not symlinked into /opt/homebrew,
because this is an alternate version of another formula.

If you need to have llvm@16 first in your PATH, run:
  echo 'export PATH="/opt/homebrew/opt/llvm@16/bin:$PATH"' >> /Users/runner/.bash_profile

For compilers to find llvm@16 you may need to set:
  export LDFLAGS="-L/opt/homebrew/opt/llvm@16/lib"
  export CPPFLAGS="-I/opt/homebrew/opt/llvm@16/include"

So it uses a different path because 16 is non-default.

https://formulae.brew.sh/formula/llvm

Default should be 19

ashgillman · 2025-02-20T04:40:55Z

Okay, so failing a lot of tests with Clang.

Seems like it fails whenever a test deliberately throws an exception, not being caught. Possibly related: llvm/llvm-project#94292

I'm using clang 16 locally with no error. Probably the best way is to revert the 16->19 change and fix the CC CXX paths, but I'll await what you think, Kris, rather than using up all the project's GHA credits.

KrisThielemans · 2025-02-20T07:36:32Z

I'm amazed. This seems a bug that is not specific to MacOS at all. I see that we don't test with OpenMP=OFF at all before this PR! I think it'd be good to add an Ubuntu gcc test without OpenMP as well.

I think there's a mistake though.

STIR/src/include/stir/recon_buildblock/distributable.txx

Line 96 in 729527c

local_double_out_ptrs.resize(1, double_out_ptr);

initialises local_double_out_ptrs[0] with double_out_ptr. So, I think we'll be double counting the double_out_ptr after this fix. This will currently not be used/tested, but please check.

I think for the image, originally it was also intended to avoid the final copy, i.e. process directly into output_image_ptr. However, I'm not sure if that'd be correct (without spending more time than I have) due to accumulation vs overwriting.

Possibly the best/clearest solution might be to always use the code in

STIR/src/include/stir/recon_buildblock/distributable.txx

Lines 77 to 88 in 729527c

    
           info("Listmode gradient calculation: starting loop with " + std::to_string(omp_get_num_threads()) + " threads", 2); 
        
           local_output_image_sptrs.resize(omp_get_max_threads(), shared_ptr<DiscretisedDensity<3, float>>()); 
        
           local_double_out_ptrs.resize(omp_get_max_threads(), 0); 
        
           if (double_out_ptr) 
        
             { 
        
               local_double_outs.resize(omp_get_max_threads(), 0.); 
        
               for (int t = 0; t < omp_get_max_threads(); ++t) 
        
                 local_double_out_ptrs[t] = &local_double_outs[t]; 
        
             } 
        
           local_counts.resize(omp_get_max_threads(), 0); 
        
           local_count2s.resize(omp_get_max_threads(), 0); 
        
           local_row.resize(omp_get_max_threads(), ProjMatrixElemsForOneBin());

, using num_threads and max_num_threads variables initialised appropriately.

KrisThielemans · 2025-02-20T07:38:09Z

Regarding the exception problem, this is #1490. I have no energy for this, so using clang 16 on GHA seems best.

The fact that the bug is not MacOS specific implies you could do those fixes in a separate PR if you prefer.

danieldeidda · 2025-02-24T16:32:42Z

BTW I installed master with openMP off and run "run_tests_listmode_recon.sh" i get no issue

============= nonTOF tests using PET_ACQ_small.l.hdr.STIR ==========

=== Generate attenuation image

WARNING: image duration not set, so time frame definitions will not be initialised

INFO: Processing next shape...
=== Calculate ACFs
=== Creating my_test_lm_frame.fdef (time frame definitions)
=============== Using frame_nonTOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_frame_nonTOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !
=============== Using counts_nonTOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_counts_nonTOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !

============= TOF tests using root_header.hroot ==========

=== Generate attenuation image

WARNING: image duration not set, so time frame definitions will not be initialised

INFO: Processing next shape...
=== Calculate ACFs
=== Creating my_test_lm_frame.fdef (time frame definitions)
=============== Using frame_TOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_frame_TOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !
=============== Using counts_TOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_counts_TOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !

--------------- End of tests -------------

Everything seems to be fine !

KrisThielemans · 2025-02-24T23:13:28Z

BTW I installed master with openMP off and run "run_tests_listmode_recon.sh" i get no issue

Interesting. Not sure if I understand that, but I've now added a GHA test without OpenMP for Ubuntu as well, so at least we should be safe now.

src/include/stir/recon_buildblock/distributable.txx

KrisThielemans · 2025-02-24T23:26:57Z

@ashgillman I've now simplified the code. If it all works, I suggest a squash-merge for this PR, as we went back and forth a bit.

Thanks for finding the bug!

set env variables for llvm compiler on MacOS (need to include the version number)

KrisThielemans · 2025-02-24T23:45:12Z

I've had to fix my code again, so decided to clean-up history, keeping your and my updates, so a merge seems best.

fingers crossed...

KrisThielemans · 2025-02-25T00:15:28Z

@ashgillman please check final code. Merge (my) tomorrow!

KrisThielemans · 2025-02-25T07:49:36Z

AppVeyor fails due to unrelated problem, see #1570.

KrisThielemans · 2025-02-25T07:51:00Z

@danieldeidda would you be able to check this PR both with and without OpenMP? (Please run run_tests_listmode_recon.sh and ctest).

ashgillman · 2025-02-25T11:02:03Z

Apologies, I haven't looked back over this thread since Friday.

If you can hold off, I will run it tomorrow. If you need to merge ASAP, I see no issue given the build runs. I eyeballed the code and saw nothing obvious of issue.

KrisThielemans · 2025-02-25T11:22:07Z

no hurry. Let me know when done.

ashgillman · 2025-02-26T01:26:51Z

Yes! Both ctest and recon_test_pack/run_tests.sh run without reporting any error!

FIX UCL#1422 UCL#1491, maybe UCL#1200. Without OPENMP, LM_distributab…

7386883

…le_computation doesn't write output.

ashgillman requested a review from KrisThielemans February 20, 2025 04:01

KrisThielemans self-assigned this Feb 24, 2025

KrisThielemans added this to the v6.3 milestone Feb 24, 2025

KrisThielemans reviewed Feb 24, 2025

View reviewed changes

src/include/stir/recon_buildblock/distributable.txx Outdated Show resolved Hide resolved

KrisThielemans force-pushed the fix/DistributableNoOpenMP branch from 5ce9d6b to d17d7b6 Compare February 24, 2025 23:24

DOC: add 6.3 release notes for fix [skip ci]

5419f52

KrisThielemans added 3 commits February 24, 2025 23:28

[GHA] re-enable/fix MacOS build

2ddbe35

set env variables for llvm compiler on MacOS (need to include the version number)

[GHA] add Ubuntu OpenMP=OFF job

f531135

unify OpenMP=ON/OFF for lm-distributable

fd2ed8c

KrisThielemans force-pushed the fix/DistributableNoOpenMP branch 2 times, most recently from 4915e74 to fd2ed8c Compare February 24, 2025 23:43

KrisThielemans approved these changes Feb 25, 2025

View reviewed changes

KrisThielemans merged commit 42aebde into UCL:master Feb 26, 2025
10 of 11 checks passed

KrisThielemans deleted the fix/DistributableNoOpenMP branch February 26, 2025 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Without OPENMP, LM_distributable_computation doesn't write output. #1566

FIX Without OPENMP, LM_distributable_computation doesn't write output. #1566

ashgillman commented Feb 20, 2025

ashgillman commented Feb 20, 2025

ashgillman commented Feb 20, 2025

KrisThielemans commented Feb 20, 2025

KrisThielemans commented Feb 20, 2025

danieldeidda commented Feb 24, 2025 •

edited

Loading

KrisThielemans commented Feb 24, 2025

KrisThielemans commented Feb 24, 2025

KrisThielemans commented Feb 24, 2025

KrisThielemans commented Feb 25, 2025

KrisThielemans commented Feb 25, 2025

KrisThielemans commented Feb 25, 2025

ashgillman commented Feb 25, 2025 •

edited

Loading

KrisThielemans commented Feb 25, 2025

ashgillman commented Feb 26, 2025

FIX Without OPENMP, LM_distributable_computation doesn't write output. #1566

FIX Without OPENMP, LM_distributable_computation doesn't write output. #1566

Conversation

ashgillman commented Feb 20, 2025

Testing performed

Checklist before requesting a review

ashgillman commented Feb 20, 2025

ashgillman commented Feb 20, 2025

KrisThielemans commented Feb 20, 2025

KrisThielemans commented Feb 20, 2025

danieldeidda commented Feb 24, 2025 • edited Loading

KrisThielemans commented Feb 24, 2025

KrisThielemans commented Feb 24, 2025

KrisThielemans commented Feb 24, 2025

KrisThielemans commented Feb 25, 2025

KrisThielemans commented Feb 25, 2025

KrisThielemans commented Feb 25, 2025

ashgillman commented Feb 25, 2025 • edited Loading

KrisThielemans commented Feb 25, 2025

ashgillman commented Feb 26, 2025

danieldeidda commented Feb 24, 2025 •

edited

Loading

ashgillman commented Feb 25, 2025 •

edited

Loading