Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nealder-Mead optimiser fails #763

Open
amiszuda opened this issue Jun 28, 2023 · 11 comments
Open

Nealder-Mead optimiser fails #763

amiszuda opened this issue Jun 28, 2023 · 11 comments
Assignees

Comments

@amiszuda
Copy link

amiszuda commented Jun 28, 2023

When setting the MPI on, the NM optimiser fails with the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-18-0acdc279fa53> in <module>
      1 phoebe.mpi_on(nprocs=12)
----> 2 b.run_solver('opt_nm_full', solution='LC_nm_full', overwrite=True)

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _send_if_client(self, *args, **kwargs)
    422 
    423         else:
--> 424             return fctn(self, *args, **kwargs)
    425 
    426     return _send_if_client

~/.local/lib/python3.6/site-packages/phoebe/frontend/bundle.py in run_solver(self, solver, solution, detach, return_changes, **kwargs)
  13635 
  13636             if not detach:
> 13637                 return job_param.attach(sleep=job_sleep)
  13638             else:
  13639                 logger.info("detaching from run_solver.  Call get_parameter(solution='{}').attach() to re-attach".format(solution))

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in attach(self, wait, sleep, cleanup, return_changes)
  12582         else:
  12583             logger.info("current status: {}, pulling job results".format(status))
> 12584             return self._retrieve_and_attach_results(cleanup=cleanup, return_changes=return_changes)
  12585 
  12586     def load_progress(self, cleanup=True, return_changes=False):

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_and_attach_results(self, cleanup, return_changes)
  12444             _ = self.get_status()
  12445 
> 12446         ret_ps = self._retrieve_results()
  12447 
  12448         if not len(ret_ps.to_list()) and 'progress' in self._value:

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_results(self)
  12414                         raise ValueError("job has not yet produced any output, with the following log:\n\n{}".format("\n".join(e.readlines())))
  12415                     else:
> 12416                         raise ValueError("job failed with the following log:\n\n{}".format("\n".join(e.readlines())))
  12417 
  12418             else:

ValueError: job failed with the following log:

After ValueError: job failed with the following log: no other message appears.

When setting MPI off, the solver runs. This error is not always repetitive.

The working example is provided below:

import phoebe
import numpy as np
import matplotlib.pyplot as plt

b = phoebe.default_binary(semidetached='secondary')
lc = np.loadtxt('lc_corrected_for_occonell.txt',unpack=True)


b.add_dataset('lc', times = lc[0], fluxes=lc[1], sigmas=lc[2], 
              compute_phases=phoebe.linspace(0,1,101), passband='TESS:T')

b.add_solver('estimator.lc_periodogram', solver='lcperiod_bls', 
             algorithm='bls', minimum_n_cycles=2, sample_mode='manual',
             sample_periods = np.linspace(3.,4.,1000),
             overwrite=True)
b.run_solver('lcperiod_bls', solution='lcperiod_bls_sol', overwrite=True)
print(b['fitted_values@lcperiod_bls_sol'])
b.adopt_solution('lcperiod_bls_sol')

b.set_value('teff@primary', value=25000)
b.set_value('atm@primary', 'blackbody')
b.set_value('atm@secondary', 'blackbody')
b.set_value('ld_mode_bol@primary', 'manual')
b.set_value('ld_mode_bol@secondary', 'manual')

b.add_solver('estimator.lc_geometry', solver='lc_est_lcgeom', phase_bin = False)
b.run_solver('lc_est_lcgeom', solution='lc_soln_lcgeom')
b.flip_constraint('ecc', solve_for='esinw')
b.flip_constraint('per0', solve_for='ecosw')
b.run_compute(model='lc_geom', sample_from='lc_soln_lcgeom', overwrite=True)
b.adopt_solution('lc_soln_lcgeom', overwrite=True)

b.add_compute('ellc', compute='fastcompute')
b.flip_constraint('esinw', solve_for='ecc')
b.flip_constraint('ecosw', solve_for='per0')

b.add_solver('optimizer.nelder_mead', 
             solver='opt_nm_full',
             fit_parameters = ['t0_supconj@binary', 'period@binary','incl@binary', 
                               'teffratio', 'requivsumfrac', 'esinw', 'ecosw', 'q', 
                               'sma@binary', 'vgamma@system'],
             compute='fastcompute', overwrite=True)
b.set_value('maxiter@opt_nm_full', solver='opt_nm_full', value=10000)
b.set_value('expose_lnprobabilities@opt_nm_full', True)
b.set_value('progress_every_niters@opt_nm_full', 1)

phoebe.mpi_on(nprocs=12)
b.run_solver('opt_nm_full', solution='NM_sol', overwrite=True)

The situation is no different when MPI is turned off, or when the solver runs on compute='phoebe01'.

lc_corrected_for_occonell.txt

@amiszuda
Copy link
Author

@kecnry, sorry to mention you but somehow my previous post got missing, so I am not sure whether posting another one under the same issue number will make the issue pop out in the notifications.

@kecnry
Copy link
Member

kecnry commented Oct 10, 2023

I had seen it come through, noticed it was blank, and just assumed you cleared it instead of closing it. Let me see if anyone can reproduce this on their own machines so we can track down the error (since the error log is otherwise empty).

@kecnry
Copy link
Member

kecnry commented Oct 11, 2023

@amiszuda - what version of phoebe are you running?

@amiszuda
Copy link
Author

2.4.10 on Ubuntu

@bpablo
Copy link
Contributor

bpablo commented Oct 17, 2023

Hello,

I have tried to reproduce this error, but unfortunately the script is erroring out before I ever get to your issue with the following:
ValueError: 0 results found for twig: 'ecc@binary', {'context': 'constraint', 'check_visible': True, 'check_default': True, 'check_advanced': False, 'check_single': False}

Can you confirm that this script does in fact work for you and you aren't getting this error?

@amiszuda
Copy link
Author

Hi Bert!

I am not getting that error, though I am getting others regarding constraints or ld_mode='interp' not being supported by blackbody. This is also strange since I did not get those using the same commands under the notebook, however, I did notice those differences a couple of times before. As this issue was reported some time ago already and since I did not really provide a full report using the exact bit-to-bit code that crashed on my end (apologies here!) I will try to dig a bit and attach a new script as python executable script and notebook sheet hoping it will provide a better log of what is happening here.
I'll do my best to get back to you asap.

@amiszuda
Copy link
Author

Hi again,

Below, I provide a fully reproducible script and a notebook sheet. This is the exact version of the code that causes the following crash:

# crimpl: ls -d /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-*
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: mkdir -p /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs
# crimpl: cp exportpath.sh /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; conda -V
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; mkdir -p /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12
# crimpl: cp crimpl_submit_script.sh /media/data/Work/BCep/TIC0247315421/phoebe/_cPywECbIMpNKMWIELAdcSpclMhYJei.py /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; echo '_cPywECbIMpNKMWIELAdcSpclMhYJei.py' >> /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-input-files.list
# crimpl: source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; echo 'False' > /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-conda-environment
# crimpl (detached): source /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/exportpath.sh; cd /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12; chmod +x ./crimpl_submit_script.sh; nohup bash ./crimpl_submit_script.sh 2> ./crimpl_submit_script.sh.err & echo $! > crimpl-nohup.pid
# crimpl: ls -d /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-*
# crimpl: cat /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-job.status
# crimpl: cat /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-job.status
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: cat /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/crimpl-input-files.list
# crimpl: ls /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/*
# crimpl: cp /media/data/Work/BCep/TIC0247315421/phoebe/phoebe_crimpl_jobs/crimpl-job-2023.10.26-10.30.12/nohup.out ./
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-3a686994c1ae> in <module>
      1 phoebe.mpi_on(nprocs=12)
      2 # phoebe.mpi_off()
----> 3 b.run_solver('opt_nm_full', solution='NM', overwrite=True)

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _send_if_client(self, *args, **kwargs)
    422 
    423         else:
--> 424             return fctn(self, *args, **kwargs)
    425 
    426     return _send_if_client

~/.local/lib/python3.6/site-packages/phoebe/frontend/bundle.py in run_solver(self, solver, solution, detach, return_changes, **kwargs)
  13635 
  13636             if not detach:
> 13637                 return job_param.attach(sleep=job_sleep)
  13638             else:
  13639                 logger.info("detaching from run_solver.  Call get_parameter(solution='{}').attach() to re-attach".format(solution))

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in attach(self, wait, sleep, cleanup, return_changes)
  12582         else:
  12583             logger.info("current status: {}, pulling job results".format(status))
> 12584             return self._retrieve_and_attach_results(cleanup=cleanup, return_changes=return_changes)
  12585 
  12586     def load_progress(self, cleanup=True, return_changes=False):

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_and_attach_results(self, cleanup, return_changes)
  12444             _ = self.get_status()
  12445 
> 12446         ret_ps = self._retrieve_results()
  12447 
  12448         if not len(ret_ps.to_list()) and 'progress' in self._value:

~/.local/lib/python3.6/site-packages/phoebe/parameters/parameters.py in _retrieve_results(self)
  12414                         raise ValueError("job has not yet produced any output, with the following log:\n\n{}".format("\n".join(e.readlines())))
  12415                     else:
> 12416                         raise ValueError("job failed with the following log:\n\n{}".format("\n".join(e.readlines())))
  12417 
  12418             else:

ValueError: job failed with the following log:

phoebe_failing_job.tar.gz

@bpablo
Copy link
Contributor

bpablo commented Nov 28, 2023

Hey Amadeusz,

I still can't produce what you do. With the MPI on it fails but does give me an error as i don't think this computer is set up for it. However, if I don't use mpi it appears to be working:

--------------------------------
  0%|                                                     | 2/10000 [00:11<15:33:51,  5.60s/it]/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/astropy/units/quantity.py:666: RuntimeWarning: invalid value encountered in subtract
  result = super().__array_ufunc__(function, method, *arrays, **kwargs)
  0%|                                                    | 22/10000 [02:30<18:54:41,  6.82s/it]/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/scipy/optimize/_optimize.py:917: RuntimeWarning: invalid value encountered in subtract
  np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):
  1%|▎                                                    | 64/10000 [05:38<8:34:04,  3.10s/it]
---------------------------------

It isn't finished yet so maybe it will fail eventually, but at this point it appears to be fine. Can you confirm whether you see any of this or not?

@amiszuda
Copy link
Author

Hey Bert,

No, nothing like it. It's weird, as sometimes the mpi is working and sometimes not. One more thing though, as I observed only recently, the empty ValueError: job failed with the following log: appears only on jupyter.

@bpablo
Copy link
Contributor

bpablo commented Dec 5, 2023

It ran for me and finished in jupyter. I am using lab though and not notebook. Is it the same for you?

@amiszuda
Copy link
Author

No, I am using jupyter-notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants