-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ERROR] Segfault in extract_forceconstants when built with -Ofast #122
Comments
Hi Ethan,
thanks for the heads up and flagging. I agree this is not the normal use
case, and it's strange that it's not a memory issue. Perhaps you can
compile with Ofast -g -check all -warn all -diag-enable remark
and see what happens?
best
Matthieu
…On Wed, Jan 22, 2025 at 6:06 PM Ethan Meitz ***@***.***> wrote:
The Error: Segmentation fault when running extract_forceconstants -rc2 4.0
with the optimization flag set to -Ofast (Tried -Og to debug and it works
fine with that).
The input data I attached has 5000 samples (yes that's a lot but I have
96GB of RAM and the IFC calculation does not need that much). The error
only occurs when N_timesteps is set to 5000 inside of infile.meta the
segfault occurs somewhere in the symmetry calculation as the last output to
screen was:
GENERATING CONSTRAINTS
... 0 secondorder constraints (0.00027s)
... created 0 constraints (0.00029s)
... creating fc coefficients 100.0% |========================================| 4.58623s
... least squares solver
... solved second order (1.06431s)
... finished solving for forceconstants
... diagnostics from three-fold cross-validation:
Segmentation fault
if I set N_timesteps to like 2500 the code runs a bit longer and crashes
somewhere after this
<https://github.com/tdep-developers/tdep/blob/23c403ee227fe92bd04f0f537ea74e2fae5b2e5f/src/extract_forceconstants/main.f90#L602>
line based on the screen output which ended at:
... energies writen to `outfile.energies`
ENERGIES (meV/atom):
rms std std(res)
input: 9.693095 0.551539 -
second order: 9.693138 0.552289 0.016234
Segmentation fault
I'm on git commit 23c403e. I went and set the optimization level to -Og
to try and debug and the error went away so I can't give much more
information. But its seems related to the number of samples somehow. Yeah
this is WAY more samples than TDEP is probably intended to use, but thought
I'd report since its unclear to me what's happening.
input_data.zip
<https://github.com/user-attachments/files/18509129/input_data.zip>
position_data.zip
<https://github.com/user-attachments/files/18509137/position_data.zip>
—
Reply to this email directly, view it on GitHub
<#122>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWOT377FS4GNSGSM2FNJOT2L7FZRAVCNFSM6AAAAABVVNU4A6VHI2DSMVQWIX3LMV43ASLTON2WKOZSHAYDIOJRGQYDOOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Professor Matthieu J Verstraete
Fellow, American Physical Society
Chair, Steering Committee, European Theoretical Spectroscopy Facility
www.etsf.eu
Alumnus Fellow, Young Academy of Europe yacadeuro.org
Institute for Theoretical Physics (ITP) Department of Physics
Buys Ballot Gebouw/Building, Princetonplein 5, office
University of Utrecht, 3584 CC Utrecht
ITP Secretariat: +31 30 253 5928 E-mail: ***@***.***
Group web page (Liège): http://www.nanomat.ulg.ac.be/
Nanomat lab, Q-Mat center, Université de Liège
Département de Physique, Bat. B5a, 4/50
Allée du 6 août, 19 B-4000 Sart Tilman, Liège Belgium
Phone : +32 4 366 90 17
European Theoretical Spectroscopy Facility (ETSF)
Mail : ***@***.***
***@***.***
***@***.***
|
Hi Ethan,
fp is empty but allocated, so it should fail at first use if memory is
insufficient or something like that. Could try resetting it to 0 just
before these lines, but the fact that the segfault dances around with N is
suspicious, it could be due to a completely different line/array. What is
the lower level of N which fails? We might put a warning at least in the
code that TDEP is not designed for that type of scaling (though there is no
excuse for a segfault).
Have you tried valgrind as well?
…On Wed, Jan 22, 2025 at 10:17 PM Ethan Meitz ***@***.***> wrote:
Ok well ran gdb with those flags on and I can get a line number now for
both cases. I don't have polar interactions in this simulation so the fp
array *should* be empty.
With N = 2500 segfaults at:
Thread 1 "extract_forceco" received signal SIGSEGV, Segmentation fault.
0x000000000040fc24 in extract_forceconstants () at main.f90:611
611 sfp = lo_stddev(f0 - fp)*lo_force_hartreebohr_to_eVa
With N = 5000 segfaults at
Thread 1 "extract_forceco" received signal SIGSEGV, Segmentation fault.
0x0000000000534554 in ifc_solvers.ifc_solvers_diagnostics::report_diagnostics (map=..., tp=..., ih=..., dh=..., mw=...,
mem=..., verbosity=<optimized out>) at ifc_solvers_diagnostics.f90:110
110 force_rsq(div, 2) = distributed_rsquare(f2 + fp, f0, nrow, mw)
—
Reply to this email directly, view it on GitHub
<#122 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWOT34A5AOUCCWMAO4FPRT2MADHDAVCNFSM6AAAAABVVNU4A6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBYGI4DQOBVHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Professor Matthieu J Verstraete
Fellow, American Physical Society
Chair, Steering Committee, European Theoretical Spectroscopy Facility
www.etsf.eu
Alumnus Fellow, Young Academy of Europe yacadeuro.org
Institute for Theoretical Physics (ITP) Department of Physics
Buys Ballot Gebouw/Building, Princetonplein 5, office
University of Utrecht, 3584 CC Utrecht
ITP Secretariat: +31 30 253 5928 E-mail: ***@***.***
Group web page (Liège): http://www.nanomat.ulg.ac.be/
Nanomat lab, Q-Mat center, Université de Liège
Département de Physique, Bat. B5a, 4/50
Allée du 6 août, 19 B-4000 Sart Tilman, Liège Belgium
Phone : +32 4 366 90 17
European Theoretical Spectroscopy Facility (ETSF)
Mail : ***@***.***
***@***.***
***@***.***
|
I monitored the memory usage with htop when running. At most about 10 GB are used and I have 96 GB available so it really shouldn't be an OOM issue. There's definitely some memory being accessed that shouldn't be. Like I just ran valgrind, log is attached. I've never actually used valgrind before but the output does detect several invalid writes in the function that crashes when running gdb. Happy to run more tests if you have ideas on how to fix the bug based on the valgrind log. |
The Error: Segmentation fault when running
extract_forceconstants -rc2 4.0
with the optimization flag set to-Ofast
(Tried-Og
to debug and it works fine with that).The input data I attached has 5000 samples (yes that's a lot but I have 96GB of RAM and the IFC calculation does not need that much). The error only occurs when
N_timesteps
is set to 5000 inside ofinfile.meta
the segfault occurs somewhere in the symmetry calculation as the last output to screen was:if I set
N_timesteps
to like 2500 the code runs a bit longer and crashes somewhere after this line based on the screen output which ended at:I'm on git commit
23c403e
. I went and set the optimization level to-Og
to try and debug and the error went away so I can't give much more information. But its seems related to the number of samples somehow. Yeah this is WAY more samples than TDEP is probably intended to use, but thought I'd report since its unclear to me what's happening.input_data.zip
position_data.zip
It won't let me upload the infile.forces for some reason....
The text was updated successfully, but these errors were encountered: