Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding orientation as observation #12

Open
supersglzc opened this issue Dec 6, 2024 · 2 comments
Open

Question regarding orientation as observation #12

supersglzc opened this issue Dec 6, 2024 · 2 comments
Labels
good first issue Good for newcomers question Further information is requested

Comments

@supersglzc
Copy link

supersglzc commented Dec 6, 2024

Hi Daneil @Danfoa,

First of all thank you for the amazing work! I am trying to build an equivariant policy. I have to include the orientation information in the observation space. I did this by first converting the object orientation from quaternion to euler angles and specify this part of observation as
rep_euler_xyz = G.representations['euler_xyz']
in_field_type = FieldType(gspace, [..., rep_euler_xyz, ...])].

However, after training the policy, it cannot work as expected. If I remove the orientation in my observation space, but only left with object positions and joint positions/velocities, everything works fine.

I am using C2 group for simple reflection about the y-axis plane. Is the problem could be my euler angles are in radian or I have to use G.representations['SO3_flat'] and convert the quaternion to a rotation matrix?

Thanks,
Steven

@supersglzc
Copy link
Author

@Danfoa Hi Daniel, I would like to follow up on this question and very appreciate your help!

@Danfoa
Copy link
Owner

Danfoa commented Dec 23, 2024

Hi @supersglzc,

Apologies for the late response.

The main issue with the orientation configuration is that it evolves in the Lie group SO(3), which means that symmetries do not transform these elements as they do in a vector space.

In practice, this means that the elements of SO(3) are transformed by the adjoint group action (instead of the left group action), which is defined by:

$$ g \triangleright \mathbf{R}_B := \rho _{\mathbb{R}^3}(g) \mathbf{R}_B \rho _{\mathbb{R}^3}(g^{-1}) $$

See, for instance, the example code here:

# Compute new robot base configuration -----------------------------------------------------------------------
# gXB_w = rep_Ed(g) @ RB_w @ np.linalg.inv(rep_Ed(g))
gXB = rep_Ed(g) @ XB @ rep_Ed(g).T
orbit_XB_w[g] = gXB # Add new robot base configuration (homogenous matrix) to the orbit of base configs.
orbit_ori_euler_xyz[g] = Rotation.from_matrix(gXB[:3, :3]).as_euler("xyz", degrees=True)
# If people use euler xyz angles to represent the orientation of the robot base, we can also compute the
# symmetric states of the robot base orientation:
g_euler_xyz = rep_euler_xyz(g) @ base_ori_euler_xyz
# Check the analytic transformation to elements of SO(3) is equivalent to the transformation in euler xyz angles
g_euler_xyz_true = Rotation.from_matrix(gXB[:3, :3]).as_euler("xyz", degrees=True)
assert np.allclose(g_euler_xyz, g_euler_xyz_true, rtol=1e-6, atol=1e-6)

(PS: Run this script with your robot, it should directly verify if the euler rep for your robot is appropiatedly defined)

Hence, the main issue you face is that to include the orientation information in the input of a neural network (NN), you typically need a "flat" representation of the orientation. This is not only a requirement of most NNs but also necessary because escnn works with "left" group actions.

The way I see it, you have two options:

  1. Work with Euler angles
    You can feed your NN with the Euler angle (3,) representation of the orientation configuration. The group action acting on this vector-valued observable is provided by morphosymm, as shown in the example above.
    The issue with this approach is that Euler angles introduce the known singularities of orientation representation. This means that for certain orientations of the robot, the NN will receive a (2,) dimensional orientation representation instead of (3,) (see here).

  2. Work with the gravity vector
    Another option you have is to pass to the input of the network the vector of gravity w.r.t to the robot base frame. This vector is already encoding the "observable" part of the orientation that a robot can sense without any visual exteroception. That is, the gravity vector (an element of R3, which transforms accordingly to rep_R3) encodes the roll and pitch of the robot, and its usually directly measurable from the IMU.

  3. Work with rotation matrices
    The other option is to work directly with the (3x3 = 9,) dimensional representation of a rotation matrix. As shown in this paper, neural networks tend to better process orientation configurations when orientations are represented in a continuous manner (i.e., quaternions or rotation matrices).
    This is essentially because these redundant configurations encode the periodicity of rotation angles in the architecture, which is not captured by the (3,) Euler angle representation. In the paper, they discuss using only the first two column vectors of the rotation matrix—a (6,) dimensional representation that is sufficient to describe the entire rotation configuration to the NN (as the third vector is orthogonal to the first two) in a continuous fashion. This is the representation I usually use for NNs.

The representation rep_SOflat acts on the (9,) flattened version of a rotation matrix

$$ \mathbf{R} = \begin{bmatrix} \mathbf{x} & \mathbf{y} & \mathbf{z} \end{bmatrix}, $$

where flattening is done by stacking the column vectors (x, y, z) below each other:

$$ \mathbf{r}_f = \mathrm{flatten}(\mathbf{R}) = \begin{bmatrix} \mathbf{x}^\top & \mathbf{y}^\top & \mathbf{z}^\top \end{bmatrix}^\top \in \mathbb{R}^9. $$

This flat vector transforms according to the provided representation rep_SO3flat which is a flattened representation of the adjoint action:

$$ g \triangleright \mathbf{R}_B := \rho _{\mathbb{R}^3}(g) \mathbf{R}_B \rho _{\mathbb{R}^3}(g^{-1}) \equiv \rho _{\mathbb{SO}(3) _{flat}}(g) \mathbf{r}_f = (\rho _{\mathbb{R}^3}(g) \otimes \rho _{\mathbb{R}^3}(g)) \mathbf{r}_f $$

See how this representation is defined here.

If you use the (6,) representation (i.e., [x.T, y.T].T), you only need the restriction of rep_SOflat to the first 6 dimensions, ensuring this (6,) is a G-stable subspace. Which in practice, for the symmetry groups we deal with in the repo, is almost always true.

Let me know if this helps :)

@Danfoa Danfoa added good first issue Good for newcomers question Further information is requested labels Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants