-
Notifications
You must be signed in to change notification settings - Fork 43
Get DevTools URL from data dir file instead of by parsing stdout #563
Conversation
This is slightly more elegant than parsing stdout, but more importantly, shouldn't cause race conditions where we miss the "DevTools listening on" line, since it doesn't start a goroutine. A drawback is that it could take slightly longer to return the URL because it's reading from disk and does several read attempts with delays, but in my tests the difference is negligible. In total, it allots 500ms to read the file, which should hopefully be plenty even on IO-starved machines (VMs, CI, etc.). IMO if this takes more time than that then it's likely the environment will not be fit for running a test. Fixes #491
f *os.File | ||
fpath = filepath.Join(dataDir, "DevToolsActivePort") | ||
maxReadAttempts = 10 | ||
readAttemptDelayMS = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, test-tip
failed with panic: launching browser: getting DevTools URL: unable to read file "/tmp/xk6-browser-data-2308635430/DevToolsActivePort" in 500ms
. 😞
I guess we could just increase it to 5s total to be safe (so 100 attempts)? Even 1 or 2s might be too low for some machines.
From this line I gather that currently it is thought that the reason why xk6-browser can't find the port to connect to is because:
|
@imiric We can have the best of both worlds:
This way, we can get the URL faster, and only if it fails can we try to get it from the file. WDYT? |
@mstoykov That's the theory, but it wasn't confirmed since this issue is so difficult to reproduce. It could also be that it just takes more than 30s to start the browser process, which is very excessive, but not out of the question on VMs with bursty/unreliable IO like in CI or EC2. In that case, this won't help and we should increase the number of attempts/timeout, but I would still argue this approach is cleaner than parsing stdout. |
Is there something else reading the output? Because if it isn't - there is no way that we can miss it. If anything the current code should continue to read the output as otherwise the buffer used to save it (in the OS) might overfill. This likely isn't a problem with the current usage where (from my observations) the browser does not output more then 3-4 lines at the very beginning of the execution. So it seems to me that
I meant to open an issue ot log the actual output of the browser 🤔 Maybe that will help with diagnosing this? |
@inancgumus I'd rather have one, preferably simpler, way of doing this. Besides, I'm not certain whether the file is written before or after that line is printed to stdout, so reading the file might not make that big of a difference. At least it doesn't from my local tests. I'll try to confirm with |
var err error | ||
f, err = os.Open(fpath) //nolint:gosec // false positive, https://github.com/securego/gosec/issues/439 | ||
if errors.Is(err, os.ErrNotExist) { | ||
time.Sleep(time.Duration(readAttemptDelayMS) * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might we need to use a select statement here with time.Timer
and also watch for the context
being cancelled?
if err != nil { | ||
return nil, err | ||
} | ||
|
||
wsURL, err := parseDevToolsURL(ctx, stdout) | ||
wsURL, err := getDevToolsURL(dataDir.Dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the getDevToolsURL
was called prior to calling execute
(which will require some goroutine orchestration so that execute is only called once getDevToolsURL
has started waiting for the file) then we should avoid any possible race conditions, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, that could be a solution as well. Though a) we're not 100% sure this is caused by a race, see the discussion above, and b) starting a goroutine with proper orchestration and parsing stdout is arguably messier than just reading a file.
According to
Still, this approach could have a slight delay (~50ms), depending on when we succeed reading the file. Judging by the feedback, you all prefer the stdout parsing approach, and we should drop this? Since a race is unlikely as stdout would be buffered, as Mihail says, then the only explanation is that it truly does take >30s to start the browser in some cases. That's out of our hands to fix here, but maybe we can run on more stable and better performing EC2 instances to minimize the chances of this happening. |
I think it would be good to find a way to replicate this issue. Let me know if you need me to test any changes on my Windows laptop. |
This is slightly more elegant than parsing stdout, but more importantly, shouldn't cause race conditions where we miss the "DevTools listening on" line, since it doesn't start a goroutine.
A drawback is that it could take slightly longer to return the URL because it's reading from disk and does several read attempts with delays, but in my tests the difference is negligible. In total, it allots 500ms to read the file, which should hopefully be plenty even on IO-starved machines (VMs, CI, etc.). IMO if this takes more time than that then it's likely the environment will not be fit for running a test. But we can always increase it if you prefer.
I didn't add tests for it, since pretty much all our tests check this functionality indirectly, but let me know if I should add some.
Also note that I haven't confirmed this definitely fixes grafana/k6#4425, since I wasn't able to reproduce the issue in WSL2 (though did run into some others... 😓), and the issue is very rare on Linux/EC2/Cloud, but I'm confident it does, and we can make a Cloud release and test this thoroughly to have more assurance.