<<design_decisions>>
The GenMOS package is an object search package that performs hierarchical object search with 3D local search and 2D global search.
- genmos object search is a middleware-independent package who
is intended to be run as a gRPC server, and offers its functionality
through handling gRPC calls.
As such, code in
genmos_object_search
is of course ROS-independent. - messages in gRPC should by default contain world-frame coordinates, while utility functions in grpc/utils perform conversion of those quantities to POMDP frames (or back). Quantities that are fed into POMDP models should by default be in POMDP frame.
2D search region, designed for global search, is fundamentally a occupancy grid map. this can be built from a point cloud or an occupancy grid map 3D search region, designed for local search, is fundamentally an occupancy octree this can be built from a point cloud
There is no direct specification of reachable viewpoints.
Instead, one could specify unreachable viewpoints through:
- Inference from the search region. all obstacles (2D) and occupied nodes (3D) are unreachable points. <<reachable_viewpoints_design_1>>
- Provide certain parameters in the configuration dict, such as “unreachable position inflation radius”, and “planning region” (which limits the search space considered during planning)
The global search agent’s action space is based on topological graph nodes. The local search agent’s action space is based on either relative moves, or a set of sampled viewpoints within the local region (like a topograph but fully connected)
When the action is a viewpoint, it will respect the reachable viewpoints specification (see above)
Don’t use labels there for the purpose of plain reachability. The obstacles set is sufficient for that - see design decision on reachable points (reachable_viewpoints_design_1).
The inflation radius of obstacles can be a function in GridMap2 - it is simply a matter of adding more obstacles close to existing ones.
MosAgent
is the high-level class for a multi-object search agent. It does not assume if the world is 2D or 3D. The code is general.SloopMosAgent
is a sloop agent whose underlying OOPOMDP is a MosAgent. Because MosAgent is not grounded to any world representation, I have left_init_oopomdp
unimplemented. TODO: Is SloopMosAgent still valid? 09/06/22MosAgentBasic2D
is just a 2D local search agent (with primitive actions); The correspondingSloopMosAgentBasic2D
will probably not use it for real robot experiments. TODO: Is SloopMosAgentBasic2D still valid? 09/06/22MosAgentTopo2D
, which is an agent with topo map action space. This should also not be too much work. With that, I can implement theSloopMosAgentTopo2D
, which is our global search agent. TODO: Is SloopMosAgentTopo2D still valid? 09/06/22MosAgentBasic3D
is a 3D local search agent with primitive actions.MosAgentTopo3D
inheritsMosAgentBasic3D
and differs in using a topo graph-based action space.- Parameters for sensors are in metric units when being specified by the user (for a real robot scenario), but converted into POMDP coordinate units when passed in for creating an agent. This conversion is handled by us.
The POMDP agent uses a discretized coordinate frame, where the coordinates are integers (could be positive or negative). This frame is translated and scaled with respect to the world frame. There is no rotation difference between them.
Code inside genmos_object_search/oopomdp are by default working with POMDP frame (except for, e.g. SearchRegion which connects the two). Code inside genmos_object_search/grpc assume client and server communicate with messages that by default contain coordinates in the world frame.
The camera model by default looks at +x (both for 2D and 3D).
The POMDP agent allows the robot to have uncertainty over its pose.
It expects a localization module on the system will output estimates
about the robot pose, with uncertainty represented by a covariance
matrix. Therefore, the POMDP agent will model the belief over robot
pose as a Gaussian. During planning, the POMDP agent samples robot
poses from this belief. See RobotObservationModel
.
- belief is handled by pomdp_py.OOBelief. There is no need for additional belief class.
A 3D agent should receive 3D object detections and 3D robot pose estimations. A 2D agent should receive 2D object detections and 2D robot pose estimations.
There are two types of robot observations we care about.
The first type can be thought of as observations about real robot
localization. This is done by an on-board localization module, which
outputs mean and covariance of the current robot pose. These observations
are used to update the robot’s belief about its own pose in the POMDP
model. We require RobotLocalization
to represent the pose estimation
observation, and RobotObservation
to capture the observation of the
localization as well as other attributes (created based on a ProcessObservationRequest
).
The second type can be thought of as obseravtions sampled during
planning (MCTS). These observations are generated based on the
sampled state from the belief. For such, RobotObservation
or RobotObservationTopo
may be used, depending on the agent.
Even though both are related to updating the agent’s model of the world (and/or the belief), we will separate them into two rpc methods. This clarifies and simplifies the implementation, as both are concerned with quite distinct issues and are called likely in different frequency.
Although pomdp_detection_from_proto
allows specifying position
and rotation precisions when converting object detection from
the world frame to the POMDP frame, we do not provide a way
to configure those precisions because the default settings
is already appropriate for the POMDP model (positions are
integers, and rotation/size precision to 0.001 is fine-grained).
regarding the observation of objects in the 3D object search model,
there are three types. ObjectDetection is what the robot would
receive (i.e. what the grpc server receives). Voxel is used when
building a volumetric observation from a set of object detections.
ObjectVoxel is specific to an object
<<hierarchical-planning>>
- Hierarchical planning works as follows. First, a global agent is created
based on client’s agent configuration in the “CreateAgent” request. The global
agent should be a
MosAgentTopo2D
or aMosAgentTopo3D
. The agent config should contain both “sensors”/”detectors” and “sensors_local” and “detectors_local” fields, as the sensor model may be different for the global POMDP agent and the local POMDP agent.Then, the client sends a CreatePlanner request, specifying “HierPlanner” as the planner. The HierPlanner will recognize the agent with robot_id in the request as the global agent. Suppose robot_id=”hrobot0”
Then, the client sends a PlanAction request. The HierPlanner performs planning for the global agent. If the output is a non-stay action, it is returned to the client for execution. If the output is a stay action, then:
(1) the server creates a placeholder for agent “hrobot0_local” (2) the server sends the client a message requesting a UpdateSearchRegion for “hrobot0_local”. This is necessary in order to provide the search region to create the local search agent. (3) Upon receiving UpdateSearchRegion for “hrobot0_local”, the local search agent is created. (4) The HierPlanner is given the local search agent, and plans an action for this agent to be executed by the client.
The client _should be aware of the local agent, and know that they can request, for example GetObjectBeliefs, with the robot_id field equal to “hrobot0_local”. However, PlanAction request should be sent regarding the global agent “hrobot0”. The local agent does not accept planning request directly.
- To use hierarchical planning, in “agent_config” (part of the config for CreateAgent rpc),
the “agent_type” should be set to “hierarchical” and the “agent_class” should be “MosAgentTopo2D”.
Note that “agent_type” should only be either “hierarchical” or “local”
Each PlanActionReply
contains an ‘action_id’, which is used to:
- Inform the server that the action execution has finished
- Label a ProcessObservationRequest to be related to the action.
The server runs planner, holds agent’s beliefs, etc. So ideally, the server is a powerful machine. It is likely for the server to be remote.
The client talks to the server. It also interacts with the robot - the server doesn’t do that - the server just cares about POMDP stuff.
So, you can imagine, setting up the genmos_object_search server on your static desktop machine with good hardware, and running the client on a laptop that you carry when you have a mission with the robot.
You will be able to visualize the necessary things to know what’s going on in planning and in belief state. That’s the intended use case scenario for this package.
<<slp-visualization>> The client wants to know what’s going on. The client may not use RViZ.
Visualization involved in genmos_object_search is in the following aspects:
- visualize the search region (both local and global)
- visualize the belief state (local and global)
- visualize the plan or planned action
- visualize the planning process
- visualize the FOV and observations
Visualization functions in ros_utils.py
that begin with make_*
should be
general. These functions just return visualization markers for the
poses and headers that are given, and don’t make any assumption about what
frames those poses are for. Users of those functions should carefully pass
in the appropriate header and pose - what is visualized is what you pass in.
Visualization functions that do not begin with make_*
do not follow the
above convention. Example viz_msgs_for_robot_pose_proto
will return RVIZ
marker and tf2 message that account for the default rotation differences
between the camera in ROS and the camera in GenMOS.
The following is part of the original objective, yet incomplete within time for ICRA submission – they still require decent effort and is non-trivial. For example, considering correlation between objects on top of the octree belief representation for 3D object search is yet to be solved. We leave these for future work:
- [ ] Allows specification of correlation between objects
- [ ] Allows incremental update of the underlying search region
- [ ] Permits the use of spatial language over the 2D global search region.
- [ ] In fact, allows resolution of spatial language tuples incrementally, as unknown landmarks essentially serve as correlated objects. (A demo/experiment of this is more impressive)