You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When Zui calls on zq in a way that's expected to fail (such as in Preview & Load when trying to read a file that is not of a format that Zed can read via auto-detect) the expected error message sometimes is not surfaced.
The repro in the video below is taken on a Linux VM run in VirtualBox. The file that's being opened in Zui is the same soccer-ball.png that's opened by this e2e test which has been described as "flaky":
test('bad data displays an error message',async()=>{
awaitapp.dropFile(getPath('soccer-ball.png'));
awaitapp.attached(/Format Detection Error/i);
expect(app.locate('button','Load').isDisabled);
});
In the video I begin the process to load the file in Preview & Load several times. While the auto-detect failure is shown in the app most times, you can see on both the second and final attempts that no error message is shown.
Repro.mp4
What led me to repro this manually is that I'd been studying that "flaky test" shown above via a branch where I ran just that test repeatedly in CI and found it failed 24 times out of 100 in a manner similar to what I showed in my manual repro video.
Based on what's shown here I suspect this may share a root cause with a separate Jest test that's been known to fail occasionally in CI that's been on my list to study:
On the same Linux VM where I did the repro/video above I triggered a failure via this loop:
$ num=1 ; success=0; fail=0; while true; do yarn nx run zed-node:test --skip-nx-cache; if [ $? -eq 0 ]; then success=$(expr $success + 1); else fail=$(expr $fail + 1); fi ; num=$(expr $num + 1) ; echo "success=$success fail=$fail" ; done
To get to a quicker repro, I deleted all the other tests so just this one "zq with a bad zed" test would run, and I also first started a cat /dev/random > /dev/null & for every core on my VM since keeping the system under load seems to force the repro quicker (presumably because heavy load aggravates timing issues). With this approach I got 3 repros out of 218 runs, with this being an example of a failure:
FAIL zed-node zed-node/src/zq.test.ts
✕ zq with a bad zed (112 ms)
● zq with a bad zed
expect(received).rejects.toThrowError()
Received promise resolved instead of rejected
Resolved to value: []
16 | });
17 |
> 18 | await expect(promise).rejects.toThrowError('error parsing Zed');
| ^
19 | });
20 |
at expect (../../node_modules/expect/build/index.js:105:15)
at Object.<anonymous> (src/zq.test.ts:18:9)
The rejection of the promise appears to be a prerequisite for seeing this error, which leads me to guess that in my manual repro in Preview & Load the lack of visible error message is similarly due to the promise resolving like we see here.
Conclusion
When I reviewed the latter test failure with @jameskerr in the past he acknowledged that the code in zq.ts that deals with the callouts to zq is a little complex and probably tricky to debug. But now that we've confirmed a user-facing bug is reproducible and it's not just a "flaky test", it seems worthwhile to invest some cycles to look closer at this.
The text was updated successfully, but these errors were encountered:
As an idea to further debug this, I've been testing on the zui-3021-debug branch where I've created a wrapper script around the calls to zq. I figured with this approach I could separate out the stdout and stderr returned from zq and then pass them upstream to the calling app. A couple things I thought this might enable:
I could log the stdout, stderr, and received command line arguments to tmp files that I could examine after successful/failed runs just to check for any unexpected differences.
I could play around with the order/timing in which stdout and stderr were passed upstream in an attempt to aggravate timing problems, in the event that's the root cause of the problem observed here.
tl;dr
The mere presence of the wrapper script seems to make the test pass consistently.
Details
https://github.com/brimdata/zui/actions/runs/8090373897 was a baseline run where I'd just created the branch based off a recent tip of main and had made just the changes in 8a2f27e to run only the bad data displays an error message e2e test in a loop with zq being invoked directly as normal. This happened to produce 24 failures out of 100 runs just like the previous baseline I'd described in the opening comments of this issue.
tl;dr
When Zui calls on
zq
in a way that's expected to fail (such as in Preview & Load when trying to read a file that is not of a format that Zed can read via auto-detect) the expected error message sometimes is not surfaced.Details
Repro is with Zui commit 57afe04.
The repro in the video below is taken on a Linux VM run in VirtualBox. The file that's being opened in Zui is the same soccer-ball.png that's opened by this e2e test which has been described as "flaky":
zui/packages/zui-player/tests/pool-loads.spec.ts
Lines 42 to 46 in 57afe04
In the video I begin the process to load the file in Preview & Load several times. While the auto-detect failure is shown in the app most times, you can see on both the second and final attempts that no error message is shown.
Repro.mp4
What led me to repro this manually is that I'd been studying that "flaky test" shown above via a branch where I ran just that test repeatedly in CI and found it failed 24 times out of 100 in a manner similar to what I showed in my manual repro video.
Based on what's shown here I suspect this may share a root cause with a separate Jest test that's been known to fail occasionally in CI that's been on my list to study:
zui/packages/zed-node/src/zq.test.ts
Lines 106 to 115 in 57afe04
On the same Linux VM where I did the repro/video above I triggered a failure via this loop:
To get to a quicker repro, I deleted all the other tests so just this one "zq with a bad zed" test would run, and I also first started a
cat /dev/random > /dev/null &
for every core on my VM since keeping the system under load seems to force the repro quicker (presumably because heavy load aggravates timing issues). With this approach I got 3 repros out of 218 runs, with this being an example of a failure:The rejection of the promise appears to be a prerequisite for seeing this error, which leads me to guess that in my manual repro in Preview & Load the lack of visible error message is similarly due to the promise resolving like we see here.
Conclusion
When I reviewed the latter test failure with @jameskerr in the past he acknowledged that the code in zq.ts that deals with the callouts to
zq
is a little complex and probably tricky to debug. But now that we've confirmed a user-facing bug is reproducible and it's not just a "flaky test", it seems worthwhile to invest some cycles to look closer at this.The text was updated successfully, but these errors were encountered: