-
Notifications
You must be signed in to change notification settings - Fork 525
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add social nav readme * update format * add FPS * add comments * fix typo --------- Co-authored-by: Jimmy Yang <[email protected]>
- Loading branch information
1 parent
c0bafc9
commit 480f393
Showing
4 changed files
with
135 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,11 +80,142 @@ First download the necessary data with `python -m habitat_sim.utils.datasets_dow | |
|
||
## Social Navigation | ||
|
||
To run multi-agent training with a Spot robot's policy being a low-level navigation policy and a humanoid's policy being a fixed (non-trainable) policy that navigates a sequence of navigation targets. | ||
- `python habitat_baselines/run.py --config-name=social_nav/social_nav.yaml` | ||
In the social navigation task, a robot is tasked with finding and following a human. The goal is to train a neural network policy that takes the input of (1) Spot's arm depth image, (2) the humanoid detector sensor, and (3) Spot's depth stereo cameras, and outputs the linear and angular velocities. | ||
|
||
For evaluating the trained Spot robot's policy | ||
- `python habitat_baselines/run.py --config-name=social_nav/social_nav.yaml habitat_baselines.evaluate=True habitat_baselines.eval_ckpt_path_dir=/checkpoints/latest.pth habitat_baselines.eval.should_load_ckpt=True` | ||
### Observation | ||
The observation of the social nav policy is defined under `habitat.gym.obs_keys` with the prefix of `agent_0` in `habitat-lab/habitat/config/benchmark/multi_agent/hssd_spot_human_social_nav.yaml`. In this yaml, `agent_0_articulated_agent_arm_depth` is the robot's arm depth camera, and `agent_0_humanoid_detector_sensor` is a humanoid detector that returns either a human's segmentation or bounding box given an arm RGB camera. For `humanoid_detector_sensor`, please see `HumanoidDetectorSensorConfig` in `habitat-lab/habitat/config/default_structured_configs.py` to learn more about how to configure the sensor (e.g., do you want the return to be bounding box or segmentation). Finally, `agent_0_spot_head_stereo_depth_sensor` is a Spot's body stereo depth image. | ||
|
||
Note that if you want to add more or use other observation sensors, you can do that by adding sensors into `habitat.gym.obs_keys`. For example, you can provide a humanoid GPS to a policy's input by adding `agent_0_nav_goal_sensor` into `habitat.gym.obs_keys` in `hssd_spot_human_social_nav.yaml`. Notice that the observation key in `habitat.gym.obs_keys` must be a subset of sensors in `/habitat/task/lab_sensors`. Finally, another example would be adding an arm RGB sensor. You can do that by adding `agent_0_articulated_agent_arm_rgb` into `habitat.gym.obs_keys` in `hssd_spot_human_social_nav.yaml`. | ||
|
||
### Action | ||
The action space of the social nav policy is defined under `/habitat/task/[email protected]_0_base_velocity: base_velocity_non_cylinder` in `habitat-lab/habitat/config/benchmark/multi_agent/hssd_spot_human_social_nav.yaml`. The action consists of linear and angular velocities. You can learn more about the hyperparameters for this action under `BaseVelocityNonCylinderActionConfig` in `habitat-lab/habitat/config/default_structured_configs.py`. | ||
|
||
### Reward | ||
The reward function of the social nav policy is defined in `social_nav_reward`. It encourages the robot to find the human as soon as possible while maintaining a safe distance from the human after finding the human. You can learn more about the hyperparameters for this reward function under `SocialNavReward` in `habitat-lab/habitat/config/default_structured_configs.py`. | ||
|
||
### Command | ||
We have released a [checkpoint](https://huggingface.co/datasets/ai-habitat/hab3_episodes/tree/main/checkpoint) based on the below command. To reproduce this, run multi-agent training with a Spot robot's policy being a low-level navigation policy and a humanoid's policy being a fixed (non-trainable) policy that navigates a sequence of navigation targets (please make sure the `tensorboard_dir`, `video_dir`, `checkpoint_folder`, `eval_ckpt_path_dir` are the paths you want): | ||
|
||
```bash | ||
python -u -m habitat_baselines.run \ | ||
--config-name=social_nav/social_nav.yaml \ | ||
benchmark/multi_agent=hssd_spot_human_social_nav \ | ||
habitat_baselines.evaluate=False \ | ||
habitat_baselines.num_checkpoints=5000 \ | ||
habitat_baselines.total_num_steps=1.0e9 \ | ||
habitat_baselines.num_environments=24 \ | ||
habitat_baselines.tensorboard_dir=tb_social_nav \ | ||
habitat_baselines.video_dir=video_social_nav \ | ||
habitat_baselines.checkpoint_folder=checkpoints_social_nav \ | ||
habitat_baselines.eval_ckpt_path_dir=checkpoints_social_nav \ | ||
habitat.task.actions.agent_0_base_velocity.longitudinal_lin_speed=10.0 \ | ||
habitat.task.actions.agent_0_base_velocity.ang_speed=10.0 \ | ||
habitat.task.actions.agent_0_base_velocity.allow_dyn_slide=True \ | ||
habitat.task.actions.agent_0_base_velocity.enable_rotation_check_for_dyn_slide=False \ | ||
habitat.task.actions.agent_1_oracle_nav_randcoord_action.lin_speed=10.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_randcoord_action.ang_speed=10.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_action.lin_speed=10.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_action.ang_speed=10.0 \ | ||
habitat.task.measurements.social_nav_reward.facing_human_reward=3.0 \ | ||
habitat.task.measurements.social_nav_reward.count_coll_pen=0.01 \ | ||
habitat.task.measurements.social_nav_reward.max_count_colls=-1 \ | ||
habitat.task.measurements.social_nav_reward.count_coll_end_pen=5 \ | ||
habitat.task.measurements.social_nav_reward.use_geo_distance=True \ | ||
habitat.task.measurements.social_nav_reward.facing_human_dis=3.0 \ | ||
habitat.task.measurements.social_nav_seek_success.following_step_succ_threshold=400 \ | ||
habitat.task.measurements.social_nav_seek_success.need_to_face_human=True \ | ||
habitat.task.measurements.social_nav_seek_success.use_geo_distance=True \ | ||
habitat.task.measurements.social_nav_seek_success.facing_threshold=0.5 \ | ||
habitat.task.lab_sensors.humanoid_detector_sensor.return_image=True \ | ||
habitat.task.lab_sensors.humanoid_detector_sensor.is_return_image_bbox=True \ | ||
habitat.task.success_reward=10.0 \ | ||
habitat.task.end_on_success=True \ | ||
habitat.task.slack_reward=-0.1 \ | ||
habitat.environment.max_episode_steps=1500 \ | ||
habitat.simulator.kinematic_mode=True \ | ||
habitat.simulator.ac_freq_ratio=4 \ | ||
habitat.simulator.ctrl_freq=120 \ | ||
habitat.simulator.agents.agent_0.joint_start_noise=0.0 | ||
``` | ||
|
||
It is expected to observe the following reward training (learning) curve: | ||
 In addition, under the following slurm job batch script setting: | ||
```bash | ||
#SBATCH --gres gpu:4 | ||
#SBATCH --cpus-per-task 10 | ||
#SBATCH --nodes 1 | ||
#SBATCH --ntasks-per-node 4 | ||
#SBATCH --mem-per-cpu=6GB | ||
``` | ||
we have the following training wall clock time versus reward: | ||
 | ||
|
||
We have the following training FPS: | ||
 | ||
|
||
For evaluating the trained Spot robot's policy based on 500 episodes, run (please make sure `video_dir` and `eval_ckpt_path_dir` are the paths you want and the checkpoint is there): | ||
|
||
```bash | ||
python -u -m habitat_baselines.run \ | ||
--config-name=social_nav/social_nav.yaml \ | ||
benchmark/multi_agent=hssd_spot_human_social_nav \ | ||
habitat_baselines.evaluate=True \ | ||
habitat_baselines.num_checkpoints=5000 \ | ||
habitat_baselines.total_num_steps=1.0e9 \ | ||
habitat_baselines.num_environments=12 \ | ||
habitat_baselines.video_dir=video_social_nav \ | ||
habitat_baselines.checkpoint_folder=checkpoints_social_nav \ | ||
habitat_baselines.eval_ckpt_path_dir=checkpoints_social_nav/social_nav_latest.pth \ | ||
habitat.task.actions.agent_0_base_velocity.longitudinal_lin_speed=10.0 \ | ||
habitat.task.actions.agent_0_base_velocity.ang_speed=10.0 \ | ||
habitat.task.actions.agent_0_base_velocity.allow_dyn_slide=True \ | ||
habitat.task.actions.agent_0_base_velocity.enable_rotation_check_for_dyn_slide=False \ | ||
habitat.task.actions.agent_1_oracle_nav_randcoord_action.human_stop_and_walk_to_robot_distance_threshold=-1.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_randcoord_action.lin_speed=10.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_randcoord_action.ang_speed=10.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_action.lin_speed=10.0 \ | ||
habitat.task.actions.agent_1_oracle_nav_action.ang_speed=10.0 \ | ||
habitat.task.measurements.social_nav_reward.facing_human_reward=3.0 \ | ||
habitat.task.measurements.social_nav_reward.count_coll_pen=0.01 \ | ||
habitat.task.measurements.social_nav_reward.max_count_colls=-1 \ | ||
habitat.task.measurements.social_nav_reward.count_coll_end_pen=5 \ | ||
habitat.task.measurements.social_nav_reward.use_geo_distance=True \ | ||
habitat.task.measurements.social_nav_reward.facing_human_dis=3.0 \ | ||
habitat.task.measurements.social_nav_seek_success.following_step_succ_threshold=400 \ | ||
habitat.task.measurements.social_nav_seek_success.need_to_face_human=True \ | ||
habitat.task.measurements.social_nav_seek_success.use_geo_distance=True \ | ||
habitat.task.measurements.social_nav_seek_success.facing_threshold=0.5 \ | ||
habitat.task.lab_sensors.humanoid_detector_sensor.return_image=True \ | ||
habitat.task.lab_sensors.humanoid_detector_sensor.is_return_image_bbox=True \ | ||
habitat.task.success_reward=10.0 \ | ||
habitat.task.end_on_success=False \ | ||
habitat.task.slack_reward=-0.1 \ | ||
habitat.environment.max_episode_steps=1500 \ | ||
habitat.simulator.kinematic_mode=True \ | ||
habitat.simulator.ac_freq_ratio=4 \ | ||
habitat.simulator.ctrl_freq=120 \ | ||
habitat.simulator.agents.agent_0.joint_start_noise=0.0 \ | ||
habitat_baselines.load_resume_state_config=False \ | ||
habitat_baselines.test_episode_count=500 \ | ||
habitat_baselines.eval.extra_sim_sensors.third_rgb_sensor.height=1080 \ | ||
habitat_baselines.eval.extra_sim_sensors.third_rgb_sensor.width=1920 | ||
``` | ||
|
||
The evaluation is expected to produce values similar to those below: | ||
|
||
```bash | ||
Average episode social_nav_reward: 1.8821 | ||
Average episode social_nav_stats.has_found_human: 0.9020 | ||
Average episode social_nav_stats.found_human_rate_after_encounter_over_epi: 0.6423 | ||
Average episode social_nav_stats.found_human_rate_over_epi: 0.4275 | ||
Average episode social_nav_stats.first_encounter_steps: 376.0420 | ||
Average episode social_nav_stats.follow_human_steps_after_first_encounter: 398.6340 | ||
Average episode social_nav_stats.avg_robot_to_human_after_encounter_dis_over_epi: 1.4969 | ||
Average episode social_nav_stats.avg_robot_to_human_dis_over_epi: 3.6885 | ||
Average episode social_nav_stats.backup_ratio: 0.1889 | ||
Average episode social_nav_stats.yield_ratio: 0.0192 | ||
Average episode num_agents_collide: 0.7020 | ||
``` | ||
|
||
## Social Rearrangement | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.