Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not throw an error when the origin may be outside the polytope in EPA #632

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hongkai-dai
Copy link
Contributor

@hongkai-dai hongkai-dai commented Feb 25, 2025

Turns out the origin can be slightly outside of the polytope due to numerical errors, so we do not throw the error any more.

We repeatedly incur this error, as mentioned in RobotLocomotion/drake#21673


This change is Reviewable

Copy link
Contributor Author

@hongkai-dai hongkai-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @SeanCurtis-TRI)


a discussion (no related file):
+@SeanCurtis-TRI for feature review please, thanks!

@hongkai-dai hongkai-dai force-pushed the remove_origin_outside_error branch from 1b96827 to db3b692 Compare February 25, 2025 03:50
…EPA.

Turns out the origin can be slightly outside of the polytope due to numerical errors, so we do not throw the error any more.
@hongkai-dai hongkai-dai force-pushed the remove_origin_outside_error branch from db3b692 to dc1f97e Compare February 26, 2025 05:47
Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @hongkai-dai)


a discussion (no related file):

Previously, hongkai-dai (Hongkai Dai) wrote…

+@SeanCurtis-TRI for feature review please, thanks!

There's an old adage where the a patient visits a doctor, saying, "Doctor, it hurts when I do this." The doctor responds, "Well, don't do that."

This PR feels like the doctor's response. We're removing guardrails without clear indication that it's a net benefit. A polytope origin that lies outside the polytope generally seems like a bad thing, and allowing it to lie outside to an arbitrary distance seems likewise a bad thing.

Clearly, removing the tests that detected badness stops (possibly) bad scenarios from blowing up, it's not clear that it doesn't introduce other problems. It would be good to run this change against a whole bunch of simulation to see what happens.


include/fcl/narrowphase/detail/convexity_based_algorithm/gjk_libccd-inl.h line 1708 at r2 (raw file):

        -ccdVec3Dot(&face_normals[i], &nearest_edge->vertex[0]->v.v);
    // If the origin lies *on* the edge, then it also lies on the two adjacent
    // faces. Rather than failing on strictly *positive* signed distance, we

I'm a bit concerned about taking off all of the brakes. We're not asserting things being outside and we're allowing the query point to be arbitrarily outside the polytope.

For example, I restored this test and included distance measurement values (and thresholds). For all of your new unit tests, the planar_threshold was about 4.4e-16 (not surprising). However, the regression tests missed by the following amounts:

Regression 7: 2.17881e-15
Regression 10: 5.94691e-09
Regression 11: 5.39735e-05

(7) could simply the cost that arises from not including the bits lost to frame transforms in the epsilon (but this is merely speculative; I'd have to revisit the code to figure out how q is defined w.r.t. the polytope).
(10) loses several orders of magnitude of precision, so it's difficult to assess the origin of the inside/outside disparity there.
(11) While 10^-5 isn't huge in an absolute sense, in this context, that disparity feels massive. It doesn't seem that a misclassification of that size can simply be attributed to mathematically equivalent, but different numerically sensitive calculations. That just feels like two different calculations.

I'm particularly worried because if the origin does measure outside a polytope's face, then the edge could definitely be the nearest feature because were "outside" and not "inside". Thus, the logic here seems to fall apart of the origin isn't appreciably inside the polytope.

Removing this test seems to paper over the more fundamental problem of the origin being in the wrong place?


include/fcl/narrowphase/detail/convexity_based_algorithm/gjk_libccd-inl.h line 1728 at r2 (raw file):

  // functionally zero, then the edge must *truly* be the nearest feature.
  // If it isn't, then it must be one of the adjacent faces.
  const bool is_edge_closest_feature = nearest_edge->dist < kEps * kEps;

nit: Revisiting this for the first time in years, I'm forced to ask a question that I don't see obviously answered in this code:

Surely, if the nearest feature was misclassified as edge, then that would be detected by the fact that the unsigned distance to at least one of the adjacent faces is less, yes?

So, what if we simply squared distances to the faces and then picked whichever feature had the smallest squared distance (even if that squared distance came from an origin lying outside the face)?


include/fcl/narrowphase/detail/convexity_based_algorithm/gjk_libccd-inl.h line 1735 at r2 (raw file):

    // neighboring face that is closest to the origin.
    polytope->nearest_type = CCD_PT_FACE;
    // Note origin_to_face_distance is the *signed* distance and it is

nit: This comment hasn't be literally true for a while. We allowed slightly positive values (up to scaled epsilon).

Now it's not remotely true. We could easily end up with the query point being outside one face and inside the other. It could be more outside than inside, but this comparison test would select the face we lie outside of (even though it's farther.

Copy link
Contributor

@SeanCurtis-TRI SeanCurtis-TRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @hongkai-dai)


a discussion (no related file):
BTW It's distinctly possible that the CI failure is related to this change. I noticed the CI history oscillated around as you updated the PR. I tried retriggering w.r.t. the current test, but the test_fcl_simple test still times out. (I'm really not sure why, but I haven't investigated it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants