Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcurrencyContainer not working on branch "ros2-devel"? #88

Open
tiko5000 opened this issue Jun 28, 2023 · 9 comments
Open

ConcurrencyContainer not working on branch "ros2-devel"? #88

tiko5000 opened this issue Jun 28, 2023 · 9 comments
Assignees

Comments

@tiko5000
Copy link

I try to implement a simple Statemachine with a single concurrency container, but it fails to execute:

 Start
  |
  \/
 Container
  |
  ----------------------------
  |                          |
  \/                         \/
  LOG_A                      LOG_B
  |                          |
  \/                         \/
  finshed                    failed
 

This is the statemachine implementation:

concurrency_test_sm.py:


#!/usr/bin/env python
# -*- coding: utf-8 -*-
###########################################################
#               WARNING: Generated code!                  #
#              **************************                 #
# Manual changes may get lost if file is generated again. #
# Only code inside the [MANUAL] tags will be kept.        #
###########################################################

from flexbe_core import Behavior, Autonomy, OperatableStateMachine, ConcurrencyContainer, PriorityContainer, Logger
from flexbe_states.log_state import LogState
# Additional imports can be added inside the following tags
# [MANUAL_IMPORT]

# [/MANUAL_IMPORT]


'''
Created on Wed Jun 28 2023
@author: concurrency_test
'''
class concurrency_testSM(Behavior):
	'''
	concurrency_test
	'''


	def __init__(self, node):
		super(concurrency_testSM, self).__init__()
		self.name = 'concurrency_test'

		# parameters of this behavior

		# references to used behaviors
		OperatableStateMachine.initialize_ros(node)
		ConcurrencyContainer.initialize_ros(node)
		PriorityContainer.initialize_ros(node)
		Logger.initialize(node)
		LogState.initialize_ros(node)

		# Additional initialization code can be added inside the following tags
		# [MANUAL_INIT]
		
		# [/MANUAL_INIT]

		# Behavior comments:



	def create(self):
		# x:30 y:365, x:130 y:365
		_state_machine = OperatableStateMachine(outcomes=['finished', 'failed'])

		# Additional creation code can be added inside the following tags
		# [MANUAL_CREATE]
		
		# [/MANUAL_CREATE]

		# x:30 y:365, x:130 y:365, x:230 y:365, x:330 y:365, x:430 y:365
		_sm_container_0 = ConcurrencyContainer(outcomes=['finished', 'failed'], conditions=[
										('finished', [('A', 'done')]),
										('failed', [('B', 'done')])
										])

		with _sm_container_0:
			# x:165 y:142
			OperatableStateMachine.add('A',
										LogState(text="A", severity=Logger.REPORT_HINT),
										transitions={'done': 'finished'},
										autonomy={'done': Autonomy.Low})

			# x:308 y:137
			OperatableStateMachine.add('B',
										LogState(text="B", severity=Logger.REPORT_HINT),
										transitions={'done': 'failed'},
										autonomy={'done': Autonomy.Low})



		with _state_machine:
			# x:241 y:89
			OperatableStateMachine.add('Container',
										_sm_container_0,
										transitions={'finished': 'finished', 'failed': 'failed'},
										autonomy={'finished': Autonomy.Inherit, 'failed': Autonomy.Inherit})


		return _state_machine


	# Private functions can be added inside the following tags
	# [MANUAL_FUNC]
	
	# [/MANUAL_FUNC]

When executed with "Block transitions which require at least "Low" autonomy the

Console output is:

12:20:41 PM] Onboard engine just started.
[12:20:46 PM] --> Preparing new behavior...
[12:20:46 PM] BE Starting [concurrency_test : 1208022996]
[12:20:46 PM] A
[12:20:46 PM] B
[12:20:46 PM] ConcurrencyContainer Container returning outcome failed (request inner sync)
[12:20:46 PM] Behavior execution for concurrency_test: 1208022996 failed! [-]
    exceptions must derive from BaseException
[12:20:46 PM] Traceback (most recent call last): [+]
[12:20:46 PM] No behavior active.
[12:20:46 PM] Onboard engine just started.
[12:20:46 PM] �[92m--- Behavior Engine finished - ready for more! ---�[0m
[12:20:52 PM] Onboard engine just started.

A and B are printed, which is fine.
I would expect to see A and B in the "Behavior Execution" in the "Runtime Control".
There I would expect to be able to select "done" outcome from either A or B.
But the Behavior finished by itself, without waiting for Operator Input.

Am I missing something?

@dcconner
Copy link
Member

I just pushed a major update to flexbe ros2-devel before I saw this. I'll try to verify this test tomorrow, but in the meantime, I'd love for you to give the new version a try. Check there change logs as there are significant changes. The new version seems more stable, maintains sync better, and uses less CPU resources.

@dcconner dcconner self-assigned this Jun 29, 2023
@tiko5000
Copy link
Author

Thanks for the notice. I gave it a try but still encounter some unexpected behavior.
Here are my Testcases:

  1. Simple Concurrency Container with 2 Log-States - Outcome is A(done) || B(done)
  • Behaviour:
    • The Behaviour stops right after is was started. The "Runtime Control" does not show any states and no outcome can be selected manually.

Bildschirmfoto vom 2023-06-29 09-04-16
Bildschirmfoto vom 2023-06-29 09-09-37

Log:

[7:09:33 AM] Onboard engine just started.
[7:09:43 AM] --> Mirror - received updated structure
[7:09:43 AM] --> Preparing new behavior...
[7:09:43 AM] Received a new mirror structure for checksum 1383782741
[7:09:43 AM] BE Starting [Concurrency_Test : 1383782741]
[7:09:43 AM] A
[7:09:43 AM] B
[7:09:43 AM] ConcurrencyContainer Container returning outcome finished (request inner sync)
[7:09:43 AM] Behavior execution for Concurrency_Test: 1383782741 failed! [-]
    exceptions must derive from BaseException
[7:09:43 AM] No behavior active.
[7:09:43 AM] Onboard engine just started.
[7:09:43 AM] Traceback (most recent call last): [+]
[7:09:43 AM] ␛[92m--- Behavior Engine finished - ready for more! ---␛[0m
[7:09:43 AM] Mirror built for checksum 1383782741.
[7:09:43 AM] Executing mirror...
[7:09:45 AM] Onboard engine just started.
[7:09:45 AM] Onboard engine just started, stopping currently running mirror.
[7:09:45 AM] Mirror finished with result preempted
[7:09:45 AM] ␛[92m--- Behavior Mirror ready! ---␛[0m
[7:09:56 AM] Onboard engine just started.
  1. Simple Concurrency Container with 2 Log-States - Outcome is A(done) && B(done)
  • Is this legit at all? If A is done B will preempted anyways right? So the Concurrency Container will never have a proper outcome "finished"?

  • Behavior:

    • starting the behavior works, the "Runtime Control" shows State A and outcome "done" can be selected.
    • Log shows OCS is possibly out of sync each second
    • When clicking on "done" of state A after the behavior was started, the behavior stops as expected.
    • After that the log shows
      • ConcurrencyContainer Container returning outcome finished
      • Behavior execution for Concurrency_Test: 1708774436 failed! [-] exceptions must derive from BaseException
  • also experienced but cannot reproduce reliably:

    • When clicking on "done" of State A a few seconds after the behavior was started, the CPU usage increases, the sync bar gets yellow. The behavior is stuck and clicking on "done" of State A does not end the behavior.

Bildschirmfoto vom 2023-06-29 09-04-16
Bildschirmfoto vom 2023-06-29 09-06-28
Bildschirmfoto vom 2023-06-29 09-12-57

Log:

[7:06:52 AM] ␛[92m--- Behavior Mirror ready! ---␛[0m
[7:06:52 AM] Onboard engine just started.
[7:07:03 AM] Onboard engine just started.
[7:07:14 AM] Onboard engine just started.
[7:07:25 AM] Onboard engine just started.
[7:07:35 AM] --> Preparing new behavior...
[7:07:35 AM] --> Mirror - received updated structure
[7:07:35 AM] Received a new mirror structure for checksum 1708774436
[7:07:35 AM] BE Starting [Concurrency_Test : 1708774436]
[7:07:35 AM] A
[7:07:35 AM] B
[7:07:35 AM] Mirror built for checksum 1708774436.
[7:07:35 AM] Executing mirror...
[7:07:36 AM] OCS is possibly out of sync - onboard state is /Container/B [-]
        Check UI and consider manual re-sync!
        (mismatch may be temporarily understandable for rapidly changing outcomes) 1
[...]
[7:07:43 AM] OCS is possibly out of sync - onboard state is /Container/B [-]
        Check UI and consider manual re-sync!
        (mismatch may be temporarily understandable for rapidly changing outcomes) 1
[7:07:43 AM] ConcurrencyContainer Container returning outcome finished (request inner sync)
[7:07:43 AM] Behavior execution for Concurrency_Test: 1708774436 failed! [-]
    exceptions must derive from BaseException
[7:07:43 AM] No behavior active.
[7:07:43 AM] Onboard engine just started.
[7:07:43 AM] Traceback (most recent call last): [+]
[7:07:43 AM] ␛[92m--- Behavior Engine finished - ready for more! ---␛[0m
[7:07:43 AM] Onboard behavior failed!
[7:07:43 AM] Mirror finished with result preempted
[7:07:43 AM] ␛[92m--- Behavior Mirror ready! ---␛[0m
[7:07:43 AM] No onboard behavior is active.

@dcconner
Copy link
Member

A couple of notes, then I'll put together an example for tutorial. I think this is normal and expected behaviors

You have "autonomy low", which is typical of log states. That means they finish and move to next state. Both A & B finish after one execution and return, which causes the concurrency to return immediately. Because you have the output of concurrency tied to statemachine finished, the behavior is done. Because both are log states, they both return after one execution call, so it doesn't matter if || or &&.

A limitation of the current version of FlexBE UI is that it only shows one of the active states in concurrency.

Try changing the required autonomy level of output so that it will pause.

The failed and "BaseException" issue is unexpected, and I'll be looking in to that today.

@dcconner
Copy link
Member

There seems to be issue with exiting concurrency container and exiting behavior immediately causing exception.
If I add a log state after the concurrency container it not longer gives the exception.

There also seems to be issue with blocking transitions inside the concurrency, so I'll need to look into that more.

Thanks for reporting.

There is also the known issue of FlexBE UI only showing one state inside the concurrency container.

@tiko5000
Copy link
Author

Ok great, thanks for the fast handling of the issue.

I already set the required autonomy level of the states inside the concurrency container to high and started the behavior with Block transitions which require at least 'High' autonomy.

But only if the outcome of the concurrency container is A(done) && B(done), the outcome of A can be forced in the Runtime Control. If the outcome of the concurrency container is A(done) || B(done)the concurrency container still finishes immediately, even if there is state added after the concurrency container.

@dcconner
Copy link
Member

dcconner commented Jul 24, 2023

I have spent a bit of time looking at the internals of how flexbe handled concurrency containers, and issues with sync I saw on ROS 2.

I have tested a significant modification to FlexBE and posted as ros2-pre-release branches on both flexbe app and flexbe_behavior engine

These two must be used consistently as they do require an API change.

See relevant change logs

I also have developed and introduces a new https://github.com/FlexBE/flexbe_turtlesim_demo release with several detailed examples related to concurrency containers. Specifically, see Examples 3 and 4.

A brief discussion of changes follows. I would appreciate any testing and feedback of these changes. There are still some clean up to do on them, but barring objections I plan to introduce these changes into an Iron release this fall.

The old approach, only set the "current state" as the initial first state in a concurrency container. This would still show as active even if finished and another state was active.

The new approach introduces a "state id" hash code for every state using a masked 23-bit hash code. This hash code is known to both onboard and mirror side. The lower 8-bits are set to the outcome (allows 255 outcomes on a state which is likely way more than anyone needs, but until we clearly need more than 23-bits to encode state id I chose to use 8-bits for outcome mapping.

Instead of reporting only the outcome changes, the new system reports an array of "current active states" for sync, and each outcome encodes both the outcome and state id using a 32-bit value.

This requires a slight increase in bandwidth, but I judge the reliability increases worthwhile.

The new approach reports returns from individual states and containers to help keep the mirror consistent and identify sync issues and recovery.

If an internal state returns, but another remains active the FlexBE UI will change. It currently shows the "deepest" active state. Currently only that state can be preempted, but with new changes we expect to support operator preemption at any level. As part of these changes, the OperatableStateMachine is now a pseudo manually transitionable state. This is a temporary hack during development. Long term, we will introduce a new ManuallyTransitionableStateMachine to mimic the state hierarchy.

Please test the new branches with your system any the Turtlesim tutorials mentioned above, and give me any feedback on the performance

@pschillinger

@pschillinger
Copy link
Member

Thanks for the pointer @dcconner! First as a disclaimer, I'm not yet familiar with every single technical detail of the ros2-devel and ros2-pre-release branches, so I might need to revise or refine during the next days what I say now.

What I can say regarding the way transitions worked in concurrency containers so far is that the transition behavior as described above is indeed as expected, even though admittedly not most intuitive. I would mainly attribute this to an initial design limitation on my side, or in other words, FlexBE did not include concurrency initially and the concurrent execution of states was added on top under the constraint (mainly dictated by the API between the engine and the GUI) that there is always a single active state to be operated.

What this means is briefly summarized in the tutorial on Parallel State Execution:

Nevertheless, there is always one main state in a concurrency container, indicated by the same notation as the initial state of a state machine, which works as described in the next section. In general, any of the states can be set to the main state.
[...]
During execution, the main state of a concurrency container is monitored in the GUI as known from state machines. If this state is a state machine itself, outcomes of inner states can be forced or blocked by the selected autonomy level as usual.
All other states not being the main state are running in the background. Their state of execution is not monitored, even if they are state machines. Consequently, they have no knowledge about the autonomy level and cannot be controlled manually. This might change in the future, but for now, this is how it works.

What this implies in consequence is, as observed in the initial example, that the outcomes of background states are not blocked by the autonomy level and might return immediately if the respective state dictates so. At least this is the expected part. Where it gets messier now is that, due to the fact that background states are not aware of the GUI, background states won't send transition notifications to the GUI, thus the GUI has to assume it might have gotten out of sync whenever a concurrency container returns an outcome (i.e., this happens when the CC outcome was triggered by a background state). This is also related to the observed warning of being potentially out of sync, a monitoring done by the behavior mirror to be precise but resulting from this fact.

Long story short, an improved mechanism to handle outcomes as proposed by @dcconner sounds promising to me and might be designed with a more intuitive handling of shared autonomy in the context of concurrency. I will need to do some testing myself to support with more details, though. I hope I can allocate some time for this next weekend.

@dcconner
Copy link
Member

There is now a rolling-pre-release branch for flexbe_behavior_engine that has rebased from latest humble and rolling releases, and added some additional features. I'm going to leave the ros2-pre-release as is for now, but rolling-pre-release branch is the preferred branch for testing now. Still use ros2-pre-release for the flexbe_app for now.

@dcconner
Copy link
Member

dcconner commented May 2, 2024

The iron, rolling, and ros2-devel branches have the concurrency container and state id changes.
Please use those branches .
For consistency you need version 4.0+ of the UI and 3.0+ of the flexbe_behavior_engine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants