-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(web): enhanced timer for prediction algorithm #7037
Conversation
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
I've re-run the lm-worker test multiple times now - the unit test that's addressed by these changes hasn't failed yet out of at least 5 attempts. ...not that this prevented BrowserStack from failing to run against one of the configured test browser configs for some of those attempts. 😞 |
…web/pred-text-stability
This comment (#7036 (comment)) on the base branch also occurs here, including for the LMLayer's in-browser tests. |
It's a functional change internally. I'd like to see some user tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand what this is doing. I don't have enough context to understand why this would be fixing the inconsistency in predictions, though. So if you can add extra context comments in the code, that'd be great.
@@ -582,17 +644,22 @@ namespace correction { | |||
|
|||
let batcher = new BatchingAssistant(); | |||
|
|||
const timer = new ExecutionTimer(maxTime*3, maxTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why maxTime*3
vs maxTime
? Is that likely to be a constant multiplier? If so, we should build that into the class I should think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... because I needed to set the max true-system-time threshold to something larger than the max execution time, and something based on the max execution time seemed to make the most sense. I just picked '3' for the coefficient because there'd be nearly no chance of that being too short for the execution threshold to be exceeded.
Could probably be maxTime*2
pretty safely, and I could entertain arguments for other values. Maybe even 1.5; we don't expect heavy "thrashing" behavior from constant OS context switches.
The key interpretation: we're willing to wait up to 0.1 sec for predictions if execution gets delayed due to context switches, but for the common case we should only need to wait 0.033 sec (when no context switches occur). That 0.033 sec value has long been set via constant.
Using the * [coefficient]
approach will dynamically adjust that maximum wait up if we're given more execution time, but for the common case, there should be no observable effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The numbers are fairly off-the-cuff, but we need to start somewhere. Measuring and testing on real low-end Android devices may be helpful.
* Base assumption: OS context switches generally last at least 16ms. (Based on | ||
* a window.setTimeout() usually not evaluating for at least | ||
* that long, even if set to 1ms.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basis: try running this code in your browser's developer mode a few times:
var now = Date.now(); var timer = window.setTimeout(() => {console.log(Date.now() - now)}, 1);
Of course, keep in mind that this code will be triggered via user-input interrupt, so it can easily be triggered while the relevant thread would otherwise be asleep, possibly in the middle of whatever timer mechanism the OS uses to periodically context-switch, "off the beat" in a sense. As a result, you will see notably lower values than 16ms... but you'll likely also see some higher values.
Example runs on Win 10 / Firefox:
If a 23ms context-switch pause occurred during a prediction search, with no other interruptions... that'd cut execution time from 33ms to just 10ms without the special handling added here. If it were paused even longer, well... good luck operating with literally 0ms allotted execution time. As noted by #5467 / #5479, it can happen with some regularity.
While we don't know the exact conditions of the BrowserStack servers, devices, etc used for unit testing... I wouldn't be surprised if they have at least a few VMs set up per server; if so, there's a pretty solid chance of significant context-switching happening on those systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://developer.mozilla.org/en-US/docs/Web/API/setTimeout#reasons_for_delays_longer_than_specified may also be helpful background.
On Windows, this may also be relevant:
The resolution of the GetTickCount function is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds. The resolution of the GetTickCount function is not affected by adjustments made by the GetSystemTimeAdjustment function.
I've got no idea what sort of behaviour we are going to see on mobile devices for these scenarios. Sounds like some testing on low-end devices is seriously needed.
What did you have in mind for this? Do we have access to suitable devices for such tests? I assume that we'd need physical devices to test this; after all, this PR's changes are aimed at handling OS behavior that'd be very difficult to emulate directly. Or... is that why it's been given approval - for a 'test in the wild' on alpha? Not the sort of thing we'd usually do, so unless explicitly stated otherwise, I'll avoid merging for now. |
Yeah, I am struggling with this. We do have a couple of old devices in the Keyman office there don't we?
I'm not sure what else we can do here. So if possible, test on those old devices in the office there. And then go ahead and merge. |
(Of course, the alternative is to avoid timing-based approaches altogether, and go with a 'number of iterations' approach where we determine a suitable number based on empirical testing of what would be a typically acceptable response time on some old devices. (And then as devices get faster over time, we can gradually raise the threshold.) |
Changes in this pull request will be available for download in Keyman version 16.0.60-alpha |
Hopefully addresses some aspects of #6985.
This PR is primarily designed to address some CI unit testing stability issues. Since our browser-based unit testing is handled via an external service, it's quite possible that our unit tests may be run alongside others in a manner that causes context-switching. Noting that we've had issues arise from context switching before - see #5467 / #5479 - it's quite possible that some of our prediction instability is arising from the same underlying problem - a user often does have background threads and processes running on their mobile device, after all.
Noting the similarity between this and intermittent failures for predictive text to produce no results when it often otherwise produces something with identical text, if my suspicions are correct, this may actually help mitigate some of the aspects of #6985.
The inner class
ExecutionTimer
here is an attempt to determine the predictive-text search's active execution time (that is, "actual time on the processor") by detecting possible context-switching time intervals and ignoring them. If the longest-interval observed at any point is at least twice as long as the next two longest-intervals observed, we'll call that an outlier. "Zero-length intervals" will also be ignored, since otherwise even a 1ms interval would automatically be an outlier.Note that excluded outliers aren't considered for any future outliers. The expectation/assumption is that each measured loop iteration should take a consistent amount of time... unless affected by context switching. To facilitate this, I've set things up for the timer (as used) to avoid tracking time in the search's
yield
state.@keymanapp-test-bot skip
I think we should be fine without user testing... though I could see an argument for a mild bit of user testing for predictive text in the mobile apps if someone wanted to push for it.