101: How GenMOS Works in a Nutshell

GenMOS (Generalized Multi-Object Search)

Purpose

GenMOS is a gRPC-based client-server system that aims to enable most robots today with a movable viewport to be able to perform object search in 3D environments. It is designed to be robot-independent and environment-agnostic, and integrated in a “plug-and-play” fashion.

How it Works (with Example)

Here, we briefly explain how GenMOS works at a high-level, followed by a concrete example (with code snippets) that illustrate how GenMOS is middleware independent.

[High-level overview] Below is the system architecture of GenMOS. Using this system involves two phases: (A) Initialization, and (B) Task Runtime

(A) Initialization: the client sends to the server (1) configurations of the search agent, (2) initial point cloud of the 3D search region. Upon receiving the configurations and point cloud, the server (1) instantiates POMDP agent for the upcoming search task, and (2) builds an octree occupancy map (similar to OctoMap) based on the point cloud that is used to simulate occlusion in the search region.

Untitled

(B) Task Runtime: The system architecture diagram above shows what happens in this phase. Iteratively, at each decision step,

the client (1) requests an action to be planned by the server, (2) executes the action on the physical robot, and (3) sends to the server object detection results and point cloud from the robot’s perception module to update the server’s belief state and model of the search region.
The server can be viewed as a virtual POMDP agent. It maintains four things: Octree occupancy map, octree belief, robot pose, and a view position graph. Conceptually, these four things make up the POMDP agent’s “belief state” $b_t$ which is used by a planner, such as POUCT. The planner outputs an action, which is either a 6D viewpoint to move to, or a FIND action which means the agent wants to declare certain objects to be found.

As you can see, the GenMOS server handles the search task logic. In fact, it has its own coordinate system of the search space (a discretized 3D grid). The GenMOS client acts as a “bridge” between a specific real robot platform and the server. Since GenMOS is a gRPC system, it provides a set of client API calls, which makes it easier to perform gRPC requests to the server.

So to deploy GenMOS to any particular robot platform, one just needs a GenMOS client that wraps around the middleware of the robot. This client handles reading & processing raw sensor data, as well as how to execute a planned action such as moving to a particular viewpoint. Below are links to two example clients written for ROS and Viam:

As illustrated in the example below, the way to write a GenMOS client for different middlewares is highly similar, making it easy to integrate GenMOS to a new middleware.

[Example] I will mainly describe (A) how GenMOS is integrated with ROS (or any other middleware), and (B) how that enabled a Spot to perform object search. I will interleave how this procedure is adaptable to Viam or other middleware.

Before we dive in, let’s clarify what a user (someone who wants to make their robot search) needs to do. There are two types of users:

User Type 1: The user’s robot uses a middleware (e.g. ROS) that we (developers) have developed a client for. Then all they need to do is to build a set of launch files (for the robot system, search task, action execution, etc).
User Type 2: The user has a new middleware we have not covered, then they need to write this genmos_<middleware>.py script which contains the client-side logic to drive the task forward. See run function below that implements this iterative process of task execution, which is quite robot-agnostic. Arguably, this task logic could be put inside the server. The benefit of driving the task progress on the client side is to give the particular robot more freedom in how it wants to proceed, and I subconsciously treated the server as a “planning server” instead of “task server”.

(A) GenMOS integration with ROS (or any other middleware). As highlighted above, we will write a wrapper around ROS. This wrapper class, GenMOSROS ([code](https://github.com/zkytony/genmos_object_search/blob/main/ros/src/genmos_ros/genmos_ros.py#L48)) effectively creates a GenMOS client instance that makes GenMOS gRPC calls to take to the GenMOS server. While running, GenMOSROS acts as a ROS node.

There are just 3 methods of interest: __init__, setup and run.

# genmos_ros.py
class GenMOSROS:
    def __init__(self, name="genmos_ros"):
        self.name = name
        self._genmos_client = None

    def setup(self):
        ...
    def run(self):
        ...

__init__ is very simple — just saves a name and creates a placeholder for the GenMOS client.

The setup function (1) initailizes the ROS node and the self._genmos_client, (2) loads configurations for the POMDP search agent, and (3) prepares the ROS node. It initializes several publishers, where each either publishes visualization markers, or publishes an action (to be handled by a separate action execution node, specific to the robot).

Note that here, I didn’t explicitly create subscribers. Instead, I’ve written a utility function (WaitForMessages [link](https://github.com/zkytony/genmos_object_search/blob/main/ros/src/genmos_ros/ros_utils.py#L691)) that wraps [message_filters](http://wiki.ros.org/message_filters) which allows easily subscribing to & synchronizing messages from multiple topics. This makes listening to ROS topics behave like an asynchronous RPC call. Below, I paste over the code to show its structure, and omit some details.

# inside GenMOSROS
def setup(self):
    #############################################
    # (1) Initialize ROS node and grpc client
    #############################################
    rospy.init_node(self.name)
    self._genmos_client = GenMOSClient()
    #############################################
    # (2) Load configurations (from ROS parameter server)
    #############################################
    config = rospy.get_param("~config")  # access parameters together as a dictionary
    self.config = config
    self.agent_config = config["agent_config"]
    #############################################
    # (3) Prepares the ROS node -- create Publishers.
    #############################################
    self._action_pub = rospy.Publisher(
        "~action", KeyValAction, queue_size=10, latch=True)
    self._octbelief_markers_pub = rospy.Publisher(
        "~octree_belief", MarkerArray, queue_size=10, latch=True)

The run function holds the logic of interacting with the server iteratively until the task finishes. It essentially first requests the server to instantiate the agent, and then goes through a task execution loop:

 (0) Instantiate the agent & planner

(1) Plan an action

(2) Execute the action and wait for it to be done.

(3) Wait & upon receiving new observations, update agent’s belief

(4) Book-keeping, visualization & check if the task is over.

For more on the steps of running a POMDP agent, check out [this tutorial on the Tiger POMDP domain](https://h2r.github.io/pomdp-py/html/examples.tiger.html#solve-the-pomdp-instance) in pomdp_py;

# inside GenMOSROS
def run(self):
  #############################################
  # (0) First, instantiate an agent and planner 
  #############################################
  self._genmos_client.**createAgent**(
      header=proto_utils.make_header(), config=self.agent_config,
      robot_id=self.robot_id)
  # Sends point cloud to server (This is necessary for the agent to be created)
  self.update_search_region()
  # wait for agent creation
  rospy.loginfo("waiting for genmos agent creation...")
  self._genmos_client.waitForAgentCreation(self.robot_id)
  rospy.loginfo("agent created!")
  # create planner
  response = self._genmos_client.**createPlanner**(config=self.planner_config,
                                              header=proto_utils.make_header(),
                                              robot_id=self.robot_id)
  rospy.loginfo("planner created!")

  #############################################
  # THE TASK EXECUTION LOOP
  #############################################
  for step in range(self.config["task_config"]["max_steps"]):
		  #############################################
		  # (1) Plan an action - 'PlanAction' gRPC call
		  #############################################
      action_id, action_pb = self.**plan_action**()
		  #############################################
		  # (2) Execute an action and wait for it to be done (robot-specific)
		  #############################################
      self.**execute_action**(action_id, action_pb)
      ros_utils.WaitForMessages([self._action_done_topic], [std_msgs.String],
                                allow_headerless=True, verbose=True)
      rospy.loginfo(typ.success("action done."))
		  #############################################
		  # (3) Wait & upon receiving new observations, update agent’s belief
      #     Waiting for observation is robot-specific
      #     Updating belief involves the 'ProcessObservation' gRPC call
		  #############################################
      response_observation, response_robot_belief, detections_pb =\
          self.**wait_observation_and_update_belief**(action_id) 
		  #############################################
		  # (4) Book-keeping & check if the task is over
		  #############################################
      print(f"Step {step}:")  # .. print info
      robot_belief_pb = response_robot_belief.robot_belief
      objects_found = set(robot_belief_pb.objects_found.object_ids)
      self.objects_found.update(objects_found)
      # visualize FOV and belief
      self.get_and_visualize_belief()
      # Check if we are done
      if objects_found == set(self.agent_config["targets"]):
          rospy.loginfo("Done!")
          break
      time.sleep(1)

You can imagine that for different middlewares, the implementation for, e.g. wait_observation_and_update_belief would be different. Below, let’s compare how we wait for observations through ROS and through Viam. Note that this is a completely separate middleware-specific detail from the GenMOS’s core for object search.

Wait for observation with ROS ([source code](https://github.com/zkytony/genmos_object_search/blob/main/ros/src/genmos_ros/genmos_ros.py#L326))

def wait_for_observation(self):
    """We wait for the robot pose (PoseStamped) and the
    object detections (vision_msgs.Detection3DArray)
    Returns:
        a tuple: (detections_pb, robot_pose_pb, objects_found_pb)"""
    # robot pose may be much higher in frequency than object detection.
    robot_pose_msg, object_detections_msg = ros_utils.**WaitForMessages**(
        [self._robot_pose_topic, self._object_detections_topic],
        [geometry_msgs.PoseStamped, vision_msgs.Detection3DArray],
        queue_size=self.obqueue_size, delay=self.obdelay, verbose=True).messages
    #...

Wait for observation with Viam ([source code](https://github.com/zkytony/genmos_object_search/blob/main/viam/genmos_viam.py#L475))

async def wait_for_observation(self):
    """We wait for the robot pose (PoseStamped) and the
    object detections (vision_msgs.Detection3DArray)
    Returns:
        a tuple: (detections_pb, robot_pose_pb, objects_found_pb)"""
    # TODO: future viam: time sync between robot pose and object detection
    robot_pose_viam = await viam_utils.**viam_get_ee_pose**(self.viam_robot, arm_name=self.viam_names["arm"])
    robot_pose = self.process_viam_pose(robot_pose_viam)
    robot_pose_pb = proto_utils.robot_pose_proto_from_tuple(robot_pose_viam)
    # Note: right now we only get 2D detection
    detections = await viam_utils.**viam_get_object_detections2d**(
        self.viam_robot,
        camera_name=self.viam_names["color_camera"],
        detector_name=self.viam_names["detector"],
        confidence_thres=constants.DETECTION2D_CONFIDENCE_THRES)

Basically, just different utility functions. With ROS, we waited for synced messages from two topics, and with Viam, we called two individual utility functions (notice the Viam SDK uses Python asyncio)

(B) How that enabled a Spot to perform object search.

Here, we describe how the launch files are structured to run object search on Spot. The structure is basically identical for MOVO, a different robot we tried that also uses ROS.

Note that we directly interfaced with SpotSDK to get images, execute navigation commands etc. Those wrapper functions do not involve ROS at all. For example, [this function](https://github.com/zkytony/robotdev/blob/master/spot/ros_ws/src/rbd_spot_perception/src/rbd_spot_perception/graphnav.py#L344) tells Spot to navigate to a pose using GraphNav, and [this function](https://github.com/zkytony/robotdev/blob/master/spot/ros_ws/src/rbd_spot_action/src/rbd_spot_action/arm.py#L111) tells Spot to move its arm to a pose with body naturally following the arm, using Spot’s a built-in RPC call. We mainly involve ROS here for visualization with RViZ. (We also published object detection results through ROS)

There are three launch files:

System: [genmos_spot_system.launch](https://github.com/zkytony/genmos_object_search/blob/main/ros/launch/spot/experiments/genmos_spot_system.launch) Launches GraphNav localization, merging point clouds from robot’s cameras, and object detection. These are components independent of object search.
Search Task: [genmos_spot_search.launch](https://github.com/zkytony/genmos_object_search/blob/main/ros/launch/spot/experiments/genmos_spot_search.launch) Launches the GenMOSROS client discussed above, which drives the search task. Also launches task-specific nods such as point cloud filtering for the local search region.
Action execution: [genmos_spot_action_executor.launch](https://github.com/zkytony/genmos_object_search/blob/main/ros/launch/spot/experiments/genmos_spot_action_executor.launch) Launches a node that internally uses our SpotSDK wrapper functions to make Spot move its arm and body to a planned viewpoint.

So when running this system, one likely opens 5 tabs:

Run the system launch file
Run the GenMOS server
Run the search task launch file
Run the action execution launch file
Run RViZ

GenMOS gRPC calls

The GenMOS gRPC methods are general, and robot-independent. You can find the Protobuf definitions here: [genmos_object_search.py](https://github.com/zkytony/genmos_object_search/blob/main/genmos_object_search/protos/genmos_object_search/grpc/genmos_object_search.proto#L12)

service GenMOS {

  rpc CreateAgent(CreateAgentRequest) returns (CreateAgentReply) {}

  // Get agent status
  rpc GetAgentCreationStatus(GetAgentCreationStatusRequest) returns (GetAgentCreationStatusReply) {}

  // More fine-grained rpcs for specific functionality
  // Update search region by interpreting sensor data such as point cloud.
  rpc UpdateSearchRegion(UpdateSearchRegionRequest) returns (UpdateSearchRegionReply) {}

  // Process observation such as object detection or language, update agent belief
  rpc ProcessObservation(ProcessObservationRequest) returns (ProcessObservationReply) {}

  // Get current belief about where the object is over the search region
  rpc GetObjectBeliefs(GetObjectBeliefsRequest) returns (GetObjectBeliefsReply) {}

  // Get current belief about the robot state
  rpc GetRobotBelief(GetRobotBeliefRequest) returns (GetRobotBeliefReply) {}

  // Initialize a planner
  rpc CreatePlanner(CreatePlannerRequest) returns (CreatePlannerReply) {}

  // Calls the planner to return an action for execution
  rpc PlanAction(PlanActionRequest) returns (PlanActionReply) {}

  // receives messages from server
  rpc ListenServer(ListenServerRequest) returns (stream ListenServerReply) {}

  // reset
  rpc Reset(ResetRequest) returns (ResetReply) {}
}

Caveats

Different from libraries such as OpenCV, OMPL or PyTorch which provides middleware-independent functions that take in generic inputs and outputs, GenMOS is a middleware-independent system that separates the task logic from the actual robot system.

Provide feedback

Saved searches