-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recheckAt does not always reproduce a test failure #487
Comments
When minimizing, since the issue only happens randomly, I found it helpful to use the following harness, which repeatedly runs the test and when it finds a failure then reruns with that seed and shrink path; it then reports how often hedgehog failed to replay. {-# LANGUAGE LambdaCase #-}
module Main (main) where
import Control.Monad (replicateM, unless)
import Control.Monad.Trans.Except (ExceptT (ExceptT), runExceptT)
import Data.Functor (void, (<&>))
import Data.List.Extra (enumerate)
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as M
import Data.Void (Void, absurd)
import Hedgehog (Property, Seed, withSkip)
import Hedgehog.Internal.Property (
Property (propertyConfig, propertyTest),
ShrinkPath,
Skip (SkipToShrink),
TestCount,
)
import Hedgehog.Internal.Report (
FailureReport (failureShrinkPath),
Report (reportSeed, reportTests),
Result (..),
reportStatus,
)
import Hedgehog.Internal.Runner (checkReport)
import qualified Hedgehog.Internal.Seed as Seed
import Numeric (showFFloat)
import Numeric.Natural (Natural)
import System.Exit (die)
import Tests.Refine (tasty_replay_broken)
import Prelude
main :: IO ()
main = do
let n = 1000
rs <- replicateM (fromIntegral n) runAndRecheck
let cs = count rs
void $ M.traverseWithKey (\ri c -> putStrLn $ showPad ri <> " : " <> show c) cs
let t = (cs M.! RecheckPass) + (cs M.! RecheckDefeat)
let p = (100 :: Double) * fromIntegral t / fromIntegral n
if t > n `div` 4
then putStrLn $ "This tickled non-replay bug " <> showFFloat (Just 2) p "% > 25%"
else die "This did not tickle non-replay bug much"
-- Bounded & Enum: we explicitly give counts of 0 for those which did not appear
count :: (Bounded a, Enum a, Ord a) => [a] -> Map a Natural
count as = M.unionsWith (+) $ M.fromList [(a, 0) | a <- enumerate] : fmap (`M.singleton` 1) as
-- This runs the test once with a random seed, and
-- - if it fails then rechecks it with the reported skip/shrink, reporting whether it finds an error again
-- - if it passes or gives up, report that
runAndRecheck :: IO RRInfo
runAndRecheck = either id absurd <$> runExceptT go
where
go :: ExceptT RRInfo IO Void
go = do
seed <- Seed.random
shrink <-
ExceptT $
runProp seed tasty_replay_broken <&> \case
Passed -> Left RunPass
Defeat -> Left RunDefeat
Fail tc sp -> Right $ SkipToShrink tc sp
-- This is essentially "recheckAt", with the skip/shrink info from above
ExceptT $
fmap Left $
runProp seed (withSkip shrink tasty_replay_broken) <&> \case
Passed -> RecheckPass
Defeat -> RecheckDefeat
Fail _ _ -> RecheckRefind
data RRInfo
= RunPass
| RunDefeat
| RecheckPass
| RecheckDefeat
| RecheckRefind -- rechecking finds /an/ error, not asserted /the same/ error
deriving (Show, Eq, Ord, Enum, Bounded)
showPad :: RRInfo -> String
showPad ri = let s = show ri in s <> replicate (13 - length s) ' '
data RunInfo
= Passed
| Defeat
| Fail TestCount ShrinkPath
runProp :: Seed -> Property -> IO RunInfo
runProp seed prop = do
report <- checkReport (propertyConfig prop) 0 seed (propertyTest prop) $ const $ pure ()
let testcount = reportTests report
let seed' = reportSeed report
-- check my understanding
unless (seed == seed') $ die "seed /= seed'"
pure $ case reportStatus report of
GaveUp -> Defeat
OK -> Passed
Failed x -> Fail testcount $ failureShrinkPath x which will give output similar to
meaning that, we checked the property 1000 times and
and out of the ones that found a failure (in this case, all of them) we re-ran them with the reported seed and shrink path and found (i.e.
We then report the percentage that tickled this bug (if greater than an arbitrary 25% threshold) |
Weird. I implemented this feature, so I'll try to look into this at some point (feel free to @ me if I disappear for a week). I don't have an immediate guess what's going on. But I am curious what happens if you pass in a seed but not a shrink path. Seems like it will always fail, but will the same seed always give the same counterexample? Will it always give the same shrink path? (You can test with the env var It would also be helpful to confirm that earlier versions don't exhibit this bug. If you try this on 1.1.2, it should give you a |
Oh, actually, here's a guess: discards might throw it off. I think I didn't test with those when implementing. From a quick look at If I'm right about this:
|
@moodmosaic, you have often said that one difference between the Haskel and F# implantations of Hedgehog is that the F# implementation uses SplitMix. I think that means the F# implementation doesn't have this bug (as is currently suspected). The F# implementation of efficient recheck doesn't have to skip inputs that passed the test or were discarded. Instead, SplitMix creates a seed used to generate each shrink tree. The initial input to be tested is the root of this tree. Therefore, the seed in the recheck details (both before and after implementing efficient recheck) is the seed that generates the shrink tree of the failing input. |
From a quick hacking of my harness above, this looks true
as does this, although I only checked we get a failure, not necessarily the same one
I'm not sure what you mean by this -- I'm guessing "if there were discards, and you replay with the reported skip, then there is no guarantee what will happen". If so, then I agree, but note that it can "not produce a failure" by giving up (not only passing)
This seems not true, unfortunately. I have seen rare examples (~ 0.1% of the time) which find a failure with some discards and then passes when replaying with the modified
FTR, my updated harness is
|
Fixes an issue where Hedgehog sometimes fails to replay correctly in the presence of discards, particularly in our `tasty_available_actions_accepted` test: hedgehogqa/haskell-hedgehog#487. Signed-off-by: George Thomas <[email protected]>
I don't think this is related to SplitMix specifically. haskell-hedgehog does use a splitting PRNG (which I think might actually be SplitMix but I don't think it matters). But for rechecking, haskell-hedgehog takes in the seed that it started testing with, then does a bunch of splits to get to the seed that generates the shrink tree. It sounds like in F#, rechecking simply takes in the seed that generates the shrink tree. That's also a fine way to do things, but the haskell-hedgehog approach means you can use the same seed for multiple tests. E.g. if you do
Yeah, you read that right. Giving up is indeed an expected possible outcome there.
Huh. So it sounds like my hypothesis explains most but not all of the errors? That's weird again... I should run your harness myself and see if I can figure it out, but I'm not sure when I'll get to it. Again, feel free to @ me if it takes a while. |
Oh, I misquoted you. QuickCheck doesn't use SplitMix but Hedgehog (both Haskel and F#) do.
Yes, thank you for correcting me.
Yes, everything you said up to here is correct. What is the average of getting multiple failure messages with the same seed? |
(Assuming you mean advantage) It lets you reproduce multiple test failures at once if you're passing in a seed on the command line (which I usually do). Maybe not super helpful, especially if combined with shrink paths. But it also gives me compatibility with hspec, which is the framework I use (via hspec-hedgehog). If I get a failure with --seed, I don't need to change the --seed to reproduce it. And when hspec's failure report tells me to use a --seed, that's the --seed I do in fact need to use. I don't think either of those would be true if I'd gone for F#'s approach. (Actually, I'm not sure I could reproduce failures at all from the command line, if I'd gone for that. I think hspec-hedgehog ignores the HEDGEHOG_SEED environment variable.) |
True. I think it's more likely in (or around)
Yes, we used to have our repo for SplitMix, then Jakob created SplitMix.hs and later we ported that in F#. Any bugfixes in SplitMix.hs |
Okay! I think the continued failures are because when you try to skip to a test higher than the test limit, we hit the |
Closes hedgehogqa#487. In hedgehogqa#454 we introduced test skipping, with the idea that a failed test could report a way for you to jump back to reproduce it without all the preceding tests. But it didn't work if any of the preceding tests had been discarded, because each discard also changes the seed and the size. Users could manually add the discard count to the test count in the `Skip`, but that's no fun. Plus, it wouldn't work if the test count plus discard count exceeded the test limit, because that would generate a success without running any tests. So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as well as a `TestCount`. It's rendered in the compressed path as `testCount/discardCount`, or just `testCount` if `discardCount` is 0. The exact sequence of passing tests and discards doesn't affect the final seed or size, so the counts are all we need. This changes an exposed type, so PVP requires a major version bump.
Oops, sorry! |
Yes, it does. Thanks! |
Closes hedgehogqa#487. In hedgehogqa#454 we introduced test skipping, with the idea that a failed test could report a way for you to jump back to reproduce it without all the preceding tests. But it didn't work if any of the preceding tests had been discarded, because each discard also changes the seed and the size. Users could manually add the discard count to the test count in the `Skip`, but that's no fun. Plus, it wouldn't work if the test count plus discard count exceeded the test limit, because that would generate a success without running any tests. So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as well as a `TestCount`. It's rendered in the compressed path as `testCount/discardCount`, or just `testCount` if `discardCount` is 0. The exact sequence of passing tests and discards doesn't affect the final seed or size, so the counts are all we need. This changes an exposed type, so PVP requires a major version bump.
Closes #487. In #454 we introduced test skipping, with the idea that a failed test could report a way for you to jump back to reproduce it without all the preceding tests. But it didn't work if any of the preceding tests had been discarded, because each discard also changes the seed and the size. Users could manually add the discard count to the test count in the `Skip`, but that's no fun. Plus, it wouldn't work if the test count plus discard count exceeded the test limit, because that would generate a success without running any tests. So now a `Skip` (other than `SkipNothing`) includes a `DiscardCount` as well as a `TestCount`. It's rendered in the compressed path as `testCount/discardCount`, or just `testCount` if `discardCount` is 0. The exact sequence of passing tests and discards doesn't affect the final seed or size, so the counts are all we need. This changes an exposed type, so PVP requires a major version bump.
In hedgehog 1.2, a new feature
recheckAt
was added (#454) to skip directly to a failing test. However this sometimes fails: I have a property that fails, and says "This failure can be reproduced by runningrecheckAt (Seed _ _) "..." <property>
", but running this command reports "gave up after 0 discards, passed 4 tests" (or, sometimes reports the test passed).I have seen this on a few different tests in a real-life codebase, but certainly not all (i.e. it seems that there is a problem with only some properties). I have never seen anything similar on older releases of hedgehog.
I managed to minimize to the following generator and property (note that
recheckAt
fails to re-find a failure around 80% of the time):A representative failure is
Note that sometimes (but more rarely)
recheckAt
will show the test passed.The text was updated successfully, but these errors were encountered: