Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] Segfault in extract_forceconstants when built with -Ofast #122

Open
ejmeitz opened this issue Jan 22, 2025 · 4 comments
Open

[ERROR] Segfault in extract_forceconstants when built with -Ofast #122

ejmeitz opened this issue Jan 22, 2025 · 4 comments
Labels

Comments

@ejmeitz
Copy link

ejmeitz commented Jan 22, 2025

The Error: Segmentation fault when running extract_forceconstants -rc2 4.0 with the optimization flag set to -Ofast (Tried -Og to debug and it works fine with that).

The input data I attached has 5000 samples (yes that's a lot but I have 96GB of RAM and the IFC calculation does not need that much). The error only occurs when N_timesteps is set to 5000 inside of infile.meta the segfault occurs somewhere in the symmetry calculation as the last output to screen was:

 GENERATING CONSTRAINTS
 ... 0 secondorder constraints (0.00027s)
 ... created 0 constraints (0.00029s)
... creating fc coefficients            100.0% |========================================|  4.58623s
 ... least squares solver
 ... solved second order (1.06431s)
 ... finished solving for forceconstants
 ... diagnostics from three-fold cross-validation:
Segmentation fault

if I set N_timesteps to like 2500 the code runs a bit longer and crashes somewhere after this line based on the screen output which ended at:

 ... energies writen to `outfile.energies`

 ENERGIES (meV/atom):
                     rms          std          std(res)
          input:     9.693095     0.551539     -
   second order:     9.693138     0.552289     0.016234
Segmentation fault

I'm on git commit 23c403e. I went and set the optimization level to -Og to try and debug and the error went away so I can't give much more information. But its seems related to the number of samples somehow. Yeah this is WAY more samples than TDEP is probably intended to use, but thought I'd report since its unclear to me what's happening.

input_data.zip
position_data.zip
It won't let me upload the infile.forces for some reason....

@ejmeitz ejmeitz added the error label Jan 22, 2025
@mjv500
Copy link
Contributor

mjv500 commented Jan 22, 2025 via email

@ejmeitz
Copy link
Author

ejmeitz commented Jan 22, 2025

Ok well ran gdb with those flags on and I can get a line number now for both cases. I don't have polar interactions in this simulation so the fp array should be empty.

With N = 2500 segfaults at:
Image

With N = 5000 segfaults at
Image

@mjv500
Copy link
Contributor

mjv500 commented Jan 23, 2025 via email

@ejmeitz
Copy link
Author

ejmeitz commented Jan 23, 2025

I monitored the memory usage with htop when running. At most about 10 GB are used and I have 96 GB available so it really shouldn't be an OOM issue. There's definitely some memory being accessed that shouldn't be. Like f0, f1, ...fp should just be Nx3 arrays that are 5000 x 3 of Float32. Which is tiny.

I just ran valgrind, log is attached. I've never actually used valgrind before but the output does detect several invalid writes in the function that crashes when running gdb. Happy to run more tests if you have ideas on how to fix the bug based on the valgrind log.

valgrind_output.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants