Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch and handle caget and caput exception #591

Open
jesusvasquez333 opened this issue Jan 25, 2021 · 2 comments
Open

Catch and handle caget and caput exception #591

jesusvasquez333 opened this issue Jan 25, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request SLAC

Comments

@jesusvasquez333
Copy link
Contributor

jesusvasquez333 commented Jan 25, 2021

Describe the bug

Currently the pysmurf.client code that implements low level EPICS caget and caput commands without handling exceptions. So, when a exception if raise by the pyepics layer, the pysmurf.client code is halted.

T avoid this, the code that implements caget and caput calls need to wrap those call in try-catch statements and handle the exception accordingly.

To Reproduce

I don't know a way to reproduce this type if issues, but this is the client backtrace of an instance when this happened:

/data/smurf_data/20210122/1611355797/outputs
CA.Client.Exception...............................................
    Warning: "Virtual circuit unresponsive"
    Context: "localhost:38373"
    Source File: ../tcpiiu.cpp line 920
    Current Time: Fri Jan 22 2021 22:51:25.903226123
..................................................................
---------------------------------------------------------------------------
CASeverityException                       Traceback (most recent call last)
<ipython-input-7-78593f2b9dd5> in <module>
      1 while True:
----> 2     pb.run(2, epics_prefix, config_file, S.shelf_manager, True, c03_hack=True, subband_low=52, subband_high=460)
      3     pb.run(3, epics_prefix, config_file, S.shelf_manager, True, c03_hack=True, subband_low=52, subband_high=460)
      4 
/usr/local/src/pysmurf/scratch/eyy/test_scripts/profile_band.py in run(band, epics_root, config_file, shelf_manager, setup, no_band_off, no_find_freq, subband_low, subband_high, loopback, no_setup_notches, reset_rate_khz, n_phi0, threading_test, c03_hack)
    231     if setup:
    232         status = execute(status, lambda: S.setup(set_defaults_max_timeout_sec=1200),
--> 233                          'setup')
    234 
    235         # Check if setup succeeded
/usr/local/src/pysmurf/scratch/eyy/test_scripts/profile_band.py in execute(status_dict, func, label, save_dict)
    212 
    213         # Run function
--> 214         status_dict[label]['output'] = func()
    215 
    216         # Add end time
/usr/local/src/pysmurf/scratch/eyy/test_scripts/profile_band.py in <lambda>()
    230     # Setup
    231     if setup:
--> 232         status = execute(status, lambda: S.setup(set_defaults_max_timeout_sec=1200),
    233                          'setup')
    234 
/usr/local/src/pysmurf/python/pysmurf/client/base/smurf_control.py in setup(self, write_log, payload_size, set_defaults_max_timeout_sec, **kwargs)
    424         self.set_read_all(write_log=write_log)
    425         set_defaults_success = self.set_defaults_pv(write_log=write_log,
--> 426             max_timeout_sec=set_defaults_max_timeout_sec)
    427 
    428         # Checking if setDefaults succeeded is only supported for
/usr/local/src/pysmurf/python/pysmurf/client/command/smurf_command.py in set_defaults_pv(self, wait_after_sec, max_timeout_sec, caget_timeout_sec, **kwargs)
    594                 # to "False".  Otherwise we keep trying.
    595                 if self.get_configuring_in_progress(
--> 596                         timeout=caget_timeout_sec, **kwargs) is False:
    597                     success=True
    598                     break
/usr/local/src/pysmurf/python/pysmurf/client/command/smurf_command.py in get_configuring_in_progress(self, **kwargs)
    342         ret = self._caget(self.smurf_application +
    343                           self._configuring_in_progress_reg,
--> 344                           as_string=True, **kwargs)
    345         if ret == 'True':
    346             return True
/usr/local/src/pysmurf/python/pysmurf/client/command/smurf_command.py in _caget(self, cmd, write_log, execute, count, log_level, enable_poll, disable_poll, new_epics_root, yml, retry_on_fail, max_retry, **kwargs)
    171         # Get the data
    172         elif execute and not self.offline:
--> 173             ret = epics.caget(cmd, count=count, **kwargs)
    174 
    175             # If epics doesn't respond in time, epics.caget returns None.
/usr/local/lib/python3.6/dist-packages/epics/__init__.py in caget(pvname, as_string, count, as_numpy, use_monitor, timeout)
     87     if thispv.connected:
     88         if as_string:
---> 89             thispv.get_ctrlvars()
     90         timeout -= (time.time() - start_time)
     91         val = thispv.get(count=count, timeout=timeout,
/usr/local/lib/python3.6/dist-packages/epics/pv.py in wrapped(self, *args, **kwargs)
     46             raise RuntimeError('Expected CA context is unset')
     47         elif expected_context == initial_context:
---> 48             return func(self, *args, **kwargs)
     49 
     50         # If not using the expected context, switch to it here:
/usr/local/lib/python3.6/dist-packages/epics/pv.py in get_ctrlvars(self, timeout, warn)
    729         if not self.wait_for_connection():
    730             return None
--> 731         kwds = ca.get_ctrlvars(self.chid, timeout=timeout, warn=warn)
    732         if kwds is not None:
    733             self._args.update(kwds)
/usr/local/lib/python3.6/dist-packages/epics/ca.py in wrapper(*args, **kwds)
    618                 timeout = kwds.get('timeout', DEFAULT_CONNECTION_TIMEOUT)
    619                 connect_channel(chid, timeout=timeout)
--> 620         return fcn(*args, **kwds)
    621     return wrapper
    622 
/usr/local/lib/python3.6/dist-packages/epics/ca.py in get_ctrlvars(chid, timeout, warn)
   1754     ftype = promote_type(chid, use_ctrl=True)
   1755     metadata = get_with_metadata(chid, ftype=ftype, count=1, timeout=timeout,
-> 1756                                  wait=True)
   1757     if metadata is not None:
   1758         # Ignore the value returned:
/usr/local/lib/python3.6/dist-packages/epics/ca.py in wrapper(*args, **kwds)
    618                 timeout = kwds.get('timeout', DEFAULT_CONNECTION_TIMEOUT)
    619                 connect_channel(chid, timeout=timeout)
--> 620         return fcn(*args, **kwds)
    621     return wrapper
    622 
/usr/local/lib/python3.6/dist-packages/epics/ca.py in get_with_metadata(chid, ftype, count, wait, timeout, as_string, as_numpy)
   1368             ret = libca.ca_array_get_callback(
   1369                 ftype, count, chid, _CB_GET, ctypes.py_object(ftype))
-> 1370             PySEVCHK('get', ret)
   1371 
   1372     if wait:
/usr/local/lib/python3.6/dist-packages/epics/ca.py in PySEVCHK(func_name, status, expected)
    642     if status == expected:
    643         return status
--> 644     raise CASeverityException(func_name, message(status))
    645 
    646 def withSEVCHK(fcn):
CASeverityException:  get returned 'Virtual circuit disconnect'

In this example, pyepics raised a CASeverityException exception here which has not handled by the client code and the script stop.

A similar situation can happened also during a register write, i.e. during a caput call.

Expected behavior

Exception should be caught, and handle, maybe retrying the call by a maximum number of time, similar to the solution to #589.

For example, something like this example (which combines the solution for #589):

    n_retry = 0
    success = False
    while True:
        try:
            ret = epics.caget(cmd, count=count, **kwargs)

            if ret:
                # If ret is not None, mark the call as succeed 
                success = True
            else:
                # Log the error
                if write_log:
                    self.log(f'"epics.caget({cmd}) returned None."')

        except Exception as e:
            # Catch any type exception coming from pyepics, and log it.
            if write_log:
                self.log(f'Exception during a "epics.caget({cmd})" call: {e}')

        # If the pyepics.caget succeeded, break the loop and continue.
        if success:
            break

        # If the pyepics.caget failed (either by raising and exception, or returning None),
        # but we should not retry on error, or if we reached the maximum number if retries,
        # stop the script by raising an exception
        if not retry_on_fail or n_retry >= max_retry:

            # Give a different error log message depending on the reason we stopped.
            if not retry_on_fail
                error_msg = f'"epics.caget({cmd})" failed, but we are not retrying...'
            else:
                error_msg = f'Maximum number of retires reach during a "epics.caget({cmd})"'

            # Write the log message
            if write_log:
                self.log(error_msg)

            # Raise and exception to stop the script
            raise RuntimeError(error_msg)

        # Otherwise, we retry the call
        if write_log:
            self.log(f'Retry "epics.caput({cmd})"" attempt {n_retry+1} of {max_retry}')

    # "epics.caget" succeed.Continue with the rest of the script.
    if write_log:
        self.log(ret)

and something similar for the epics.caput call as well.

@jesusvasquez333 jesusvasquez333 added the bug Something isn't working label Jan 25, 2021
@jesusvasquez333 jesusvasquez333 added enhancement New feature or request and removed bug Something isn't working labels Feb 2, 2021
@swh76
Copy link
Collaborator

swh76 commented Sep 14, 2021

@slacrherbst Hey Ryan ; sorry, I lost this thread ; I had thought we agreed that it wasn't possible to catch EPICS exceptions but maybe that's not right. Should we implement this proposed fix?

@agustiner
Copy link
Member

So there's basically three cases of "Where is my PV":

  1. The PV times out after 5 retries (e.g. Estimate Phase Delay Fails waiting for PVs to update #673)
  2. caget or caput throws (e.g. Virtual Circuit Disconnect)
  3. More generally, pyepics throws (e.g. Epics OverflowError #620)

So in a way, it would be wise to beef up pysmurf against the perils of pyepics. I'll see what I can do.

@agustiner agustiner added the SLAC label Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SLAC
Projects
None yet
Development

No branches or pull requests

4 participants