Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Without OPENMP, LM_distributable_computation doesn't write output. #1566

Merged
merged 5 commits into from
Feb 26, 2025

Conversation

ashgillman
Copy link
Collaborator

Should fix #1422 #1491, maybe #1200.

G'day! Long time no see.

I'm having a little explore into the gantry shift PR to see what is required, but first step is to get master building on my machine (OSX).

I'm not familiar with this code, but it looks like the idea was to store the outputs per thread in local_ooutput_image_sptrs and sum them up at the end. Without OpenMP (required if Clang), it is treated as one thread, but the writing back out never happend.

Testing performed

I've tested with ctest, and tests now passing.

Checklist before requesting a review

  • I have performed a self-review of my code
  • [-] I have added docstrings/doxygen in line with the guidance in the developer guide
  • [] I have implemented unit tests that cover any new or modified functionality (if applicable)
  • The code builds and runs on my machine
  • [] documentation/release_XXX.md has been updated with any functionality change (if applicable)

@ashgillman
Copy link
Collaborator Author

See also: KrisThielemans@cab1c8e

But the re-enabling the build fails with:

/Users/runner/work/_temp/9e40082b-aa77-4c9f-b976-fcf58e7dc9a9.sh: line 6: /opt/homebrew/opt/llvm/bin/clang: No such file or directory
 + brew install llvm@16
==> Downloading https://ghcr.io/v2/homebrew/core/llvm/16/manifests/16.0.6_1
==> Fetching llvm@16
==> Downloading https://ghcr.io/v2/homebrew/core/llvm/16/blobs/sha256:e799fe5056da9a07d79474f7a7290e0a465474eb004cfc614775f290197e10e5
==> Pouring [email protected]_1.arm64_sonoma.bottle.tar.gz
==> Caveats
To use the bundled libc++ please add the following LDFLAGS:
  LDFLAGS="-L/opt/homebrew/opt/llvm@16/lib/c++ -Wl,-rpath,/opt/homebrew/opt/llvm@16/lib/c++"

llvm@16 is keg-only, which means it was not symlinked into /opt/homebrew,
because this is an alternate version of another formula.

If you need to have llvm@16 first in your PATH, run:
  echo 'export PATH="/opt/homebrew/opt/llvm@16/bin:$PATH"' >> /Users/runner/.bash_profile

For compilers to find llvm@16 you may need to set:
  export LDFLAGS="-L/opt/homebrew/opt/llvm@16/lib"
  export CPPFLAGS="-I/opt/homebrew/opt/llvm@16/include"

So it uses a different path because 16 is non-default.

https://formulae.brew.sh/formula/llvm

Default should be 19

@ashgillman
Copy link
Collaborator Author

Okay, so failing a lot of tests with Clang.

Seems like it fails whenever a test deliberately throws an exception, not being caught. Possibly related: llvm/llvm-project#94292

I'm using clang 16 locally with no error. Probably the best way is to revert the 16->19 change and fix the CC CXX paths, but I'll await what you think, Kris, rather than using up all the project's GHA credits.

@KrisThielemans
Copy link
Collaborator

I'm amazed. This seems a bug that is not specific to MacOS at all. I see that we don't test with OpenMP=OFF at all before this PR! I think it'd be good to add an Ubuntu gcc test without OpenMP as well.

I think there's a mistake though.

local_double_out_ptrs.resize(1, double_out_ptr);
initialises local_double_out_ptrs[0] with double_out_ptr. So, I think we'll be double counting the double_out_ptr after this fix. This will currently not be used/tested, but please check.

I think for the image, originally it was also intended to avoid the final copy, i.e. process directly into output_image_ptr. However, I'm not sure if that'd be correct (without spending more time than I have) due to accumulation vs overwriting.

Possibly the best/clearest solution might be to always use the code in

info("Listmode gradient calculation: starting loop with " + std::to_string(omp_get_num_threads()) + " threads", 2);
local_output_image_sptrs.resize(omp_get_max_threads(), shared_ptr<DiscretisedDensity<3, float>>());
local_double_out_ptrs.resize(omp_get_max_threads(), 0);
if (double_out_ptr)
{
local_double_outs.resize(omp_get_max_threads(), 0.);
for (int t = 0; t < omp_get_max_threads(); ++t)
local_double_out_ptrs[t] = &local_double_outs[t];
}
local_counts.resize(omp_get_max_threads(), 0);
local_count2s.resize(omp_get_max_threads(), 0);
local_row.resize(omp_get_max_threads(), ProjMatrixElemsForOneBin());
, using num_threads and max_num_threads variables initialised appropriately.

@KrisThielemans
Copy link
Collaborator

Regarding the exception problem, this is #1490. I have no energy for this, so using clang 16 on GHA seems best.

The fact that the bug is not MacOS specific implies you could do those fixes in a separate PR if you prefer.

@KrisThielemans KrisThielemans self-assigned this Feb 24, 2025
@KrisThielemans KrisThielemans added this to the v6.3 milestone Feb 24, 2025
@danieldeidda
Copy link
Collaborator

danieldeidda commented Feb 24, 2025

BTW I installed master with openMP off and run "run_tests_listmode_recon.sh" i get no issue

============= nonTOF tests using PET_ACQ_small.l.hdr.STIR ==========

=== Generate attenuation image

WARNING: image duration not set, so time frame definitions will not be initialised

INFO: Processing next shape...
=== Calculate ACFs
=== Creating my_test_lm_frame.fdef (time frame definitions)
=============== Using frame_nonTOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_frame_nonTOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !
=============== Using counts_nonTOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_counts_nonTOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !

============= TOF tests using root_header.hroot ==========

=== Generate attenuation image

WARNING: image duration not set, so time frame definitions will not be initialised

INFO: Processing next shape...
=== Calculate ACFs
=== Creating my_test_lm_frame.fdef (time frame definitions)
=============== Using frame_TOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_frame_TOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !
=============== Using counts_TOF definition =============
=== Unlist listmode data (for comparison)
---- Executable ran ok
=== Create additive sino my_additive_sinogram_counts_TOF.hs
---- Executable ran ok
=== Reconstruct listmode data without cache
---- Executable ran ok
=== Reconstruct listmode data with cache and store it on disk
---- Executable ran ok
=== Compare reconstructed images with and without caching LM file
---- This test seems to be ok !
=== Reconstruct listmode data with cache loaded from the disk
---- Executable ran ok
=== Compare reconstructed images without caching LM file and with loading cache from disk
---- This test seems to be ok !
=== Reconstruct projection data for comparison
---- Executable ran ok
=== Compare sensitivity images
---- This test seems to be ok !
=== Compare reconstructed images
---- This test seems to be ok !

--------------- End of tests -------------

Everything seems to be fine !

@KrisThielemans
Copy link
Collaborator

BTW I installed master with openMP off and run "run_tests_listmode_recon.sh" i get no issue

Interesting. Not sure if I understand that, but I've now added a GHA test without OpenMP for Ubuntu as well, so at least we should be safe now.

@KrisThielemans KrisThielemans force-pushed the fix/DistributableNoOpenMP branch from 5ce9d6b to d17d7b6 Compare February 24, 2025 23:24
@KrisThielemans
Copy link
Collaborator

@ashgillman I've now simplified the code. If it all works, I suggest a squash-merge for this PR, as we went back and forth a bit.

Thanks for finding the bug!

@KrisThielemans KrisThielemans force-pushed the fix/DistributableNoOpenMP branch 2 times, most recently from 4915e74 to fd2ed8c Compare February 24, 2025 23:43
@KrisThielemans
Copy link
Collaborator

I've had to fix my code again, so decided to clean-up history, keeping your and my updates, so a merge seems best.

fingers crossed...

@KrisThielemans
Copy link
Collaborator

@ashgillman please check final code. Merge (my) tomorrow!

@KrisThielemans
Copy link
Collaborator

AppVeyor fails due to unrelated problem, see #1570.

@KrisThielemans
Copy link
Collaborator

@danieldeidda would you be able to check this PR both with and without OpenMP? (Please run run_tests_listmode_recon.sh and ctest).

@ashgillman
Copy link
Collaborator Author

ashgillman commented Feb 25, 2025

Apologies, I haven't looked back over this thread since Friday.

If you can hold off, I will run it tomorrow. If you need to merge ASAP, I see no issue given the build runs. I eyeballed the code and saw nothing obvious of issue.

@KrisThielemans
Copy link
Collaborator

no hurry. Let me know when done.

@ashgillman
Copy link
Collaborator Author

Yes! Both ctest and recon_test_pack/run_tests.sh run without reporting any error!

@KrisThielemans KrisThielemans merged commit 42aebde into UCL:master Feb 26, 2025
10 of 11 checks passed
@KrisThielemans KrisThielemans deleted the fix/DistributableNoOpenMP branch February 26, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MacOS job fails on the list-mode Hessian-concavity test
3 participants