-
Notifications
You must be signed in to change notification settings - Fork 43
Fix/125 cannot create frame session (context canceled) #129
Conversation
@robingustafsson, do you have any idea why the session gets closed first: Closed session ID:
And then,
Resulting in: |
Can't say for sure from just the those log lines you've posted. The close call says that we received a Regarding the resulting error, the only way to arrive at that error message So could this be in relation to the destruction of an iframe at the same time as a new iframe is created perhaps and we handle that poorly right now? 🤔 |
@robingustafsson I’ll try to find out the target tomorrow and let you know. Thanks for pointing it out!
Thanks! I read about the article you linked in the code about isolated contexts, and yeah, I was suspicious of an iframe handling issue too. I’ll first try to create a test (if not exists) if I can. |
@robingustafsson Weird thing:
What I did:
|
160cbbe
to
0c5984a
Compare
I fixed the problem, but I couldn't write a unit test because of the dependencies, and I didn't want to use a mocking framework either. So, I'll go with the integration test route. |
2341385
to
ff9db39
Compare
ff9db39
to
4f95b1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, great that you fix all my go noob mistakes (make()
etc.) 🙌😅
You mention two deprecated events, afaict Playwright doesn't handle these events either so should not be necessary to support Tom's script. I'd suggest running the equivalent of Tom's script in Playwright (I did that a lot with other scripts, analyzing the CDP traffic differences with xk6-browser).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! This does indeed work much better with the Vistaprint script. It's a shame it's difficult to add a test for this, as we could really use some. Maybe some smaller unit tests for specific method changes could be added?
Although I'm not sure it fixes the Cannot find context with specified id
issue, since it failed for me with that error:
DEBU[0106] sid:85A2399926F5DB8BFC14DD8BB958EE64 tid:CA8F61C073936905D509C09E1BDA2141 wsURL:"ws://127.0.0.1:33215/devtools/browser/e750ffc1-e59e-4dd6-8e2c-5808bc23e6bb", msg err:Cannot find context with specified id (-32000) category="Connection:send" elapsed="0 ms" goroutine=117
ERRO[0106] unable to evaluate expression: Cannot find context with specified id (-32000)
running at reflect.methodValueCall (native)
default at file:///home/ivan/Downloads/tom-12nov-original.js:198:26(9)
at go.k6.io/k6/js/common.Bind.func1 (native)
at customerReviewed (file:///home/ivan/Downloads/tom-12nov-original.js:195:17(34))
at file:///home/ivan/Downloads/tom-12nov-original.js:23:19(26) executor=per-vu-iterations scenario=default source=stacktrace
I'll make another pass or two of this soon, just left a few comments and suggestions for now.
Also, please try to leave large logging changes for separate PRs, since they clutter the purpose of a "fix" PR. Refactorings are mostly fine if they're relevant to the fix, but otherwise those also belong in separate PRs. I mostly review commit by commit anyway, but it's difficult to see the final picture if the code is changed in the final diff.
976a25d
to
b8fee6a
Compare
b8fee6a
to
5c75a5f
Compare
Also refactor getting a context with a world name.
Provide an evaluateOptions type for clarifying the options when calling evaluate on an execution context. Also for ease of debugging.
Sleeping will last for a short amount of time as setting execution contexts happens pretty fast. I don't think sleeping here will be a problem, but I have a solution for not sleeping here as well. But I'm not sure whether it's necessary, as it can complicate this code.
088ce24
to
8138da4
Compare
8138da4
to
f074351
Compare
This log is faulty because sometimes pendingDocument is nil. I don't think adding an extra check in the log will help us. I added it before for convenience.
Browser close happens and then we keep processing events from CDP. Instead of doing that, now we first try to kill the browser process and immediately tell CDP to close the browser. This way CDP stops emitting us more events. We also skip errors from FrameSession if browser is already closed.
dfc56c9
to
77949eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done, this works much better than on main
from my tests.
Just left some minor things and nitpicks.
common/frame.go
Outdated
ec := f.executionContexts[mainWorld] | ||
handle, err = ec.EvaluateHandle(f.ctx, pageFunc, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too we could use a nil check.
Since this is commonly used, let's add a Frame.getContext(world)
method, and Frame.hasContext(world)
could call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Yeah, but we'll still need to check for nil and type the same errors. I'm not sure what benefit getContext
will provide us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what benefit
getContext
will provide us.
Encapsulate the usage of f.executionContextMu
. But yeah, we'd still need to check for nil.
This is not a blocker for me, just a suggestion.
Resolves #129 (comment) Resolves #129 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It seems we hit #161 in CI.
Were you able to reproduce the issues reported on the forum? It would be great if we could add E2E tests for them in a follow-up PR, and probably do the E2E/example tests split at that point as well, which Robin suggested in a PR a few weeks ago.
😒 Any ideas on reproducing these errors in an E2E test without using an external website? I’d appreciate your help because I couldn’t reproduce these problems without using external websites.
Yes, I used the website the user tests on. |
Ah, it's the same situation as with Vistaprint. 😞 I hoped some of these could be easily replicated. We'd need to spend time reverse engineering each site and building a minimal test case that reproduced it, which would vary in complexity/effort. Alternatively, we could try doing a WARC dump using something like ArchiveBox, and then testing against that. It probably won't work for the more complex cases, but should for a lot of them, and it would avoid spending time on re-implementing each scenario ourselves. Or by using some recording/replay proxy, maybe warcprox, though it doesn't inspire much confidence. Actually, pywb has recording capabilities as well and seems more reliable. This seems like the way to go. Failing all of the above, we could just test against the sites themselves, and configure CI to run these jobs on, say, PR merge and release only, to minimize any impact it could have on them. This could of course break at any point, and we'd have to keep updating the tests, so probably not worthwhile. 😄 |
Thanks 😄 It seems like a lot of dead-ends. I can try WARC dump and then keep cleaning up the dump until the error doesn't fade away. Maybe we can have a reproducible result and then create a Go/JS test for it 🤷 😄 |
Not really, I think pywb looks promising. I'll create an issue. |
So pywb -> Clean up the dump -> Go/Js test? |
I don't think any cleanup would be needed, we could load the archive as is. But let's discuss that in the issue whenever we start looking into this. I can't say for sure what the process is without giving it a try. :) |
Resolves #129 (comment) Resolves #129 (comment)
Update (2021-12-23)
I added a few more bug fixes after receiving another script that fails with this error (from a user in our community). Now we no longer try to attach a frame or worker if the browser is already closed. These commits solved the user's issues: 77949eb and a3aea00.
Update (2021-12-20)
This PR only focuses on fixing and gracefully handling "cannot create frame session" problem in #125 with this afe4550 commit. Later on, within other commits, it fixes related data race issues, makes some refactorings, and cleans things up.
The problem was about the propagation of #49 into this one. We throw an error when we cannot find a context with a given id while a new
FrameSession
is trying to boot up. So the extension is already in a terminating state butFrameSession
doesn't know about it. This PR preventsFrameSession
to rethrow if the context was canceled and instead log.However, it's not focusing on fixing #49 as it's a whole world of a different problem, and we don't have a good fix for it yet (see: 6382e42 and d4e3900).
First Update
In this PR:
Skipped creating frame sessions for frames with empty URLs (such as about:blank)With these, this PR seemed to fix the problems to some extent:
clickBusinessCards
—16% of the whole scriptcustomerReviewed
group—%67 of the whole scriptI discovered some reasons why it still fails.
I noticed we receive deprecated events like:
Page.frameScheduledNavigation
Page.frameClearedScheduledNavigation
So, I believe the last steps don't work because of these and we can create new issues for handling them. So I let them return as errors:
Related: #19