Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(web): enhanced timer for prediction algorithm #7037

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Aug 4, 2022

Hopefully addresses some aspects of #6985.

This PR is primarily designed to address some CI unit testing stability issues. Since our browser-based unit testing is handled via an external service, it's quite possible that our unit tests may be run alongside others in a manner that causes context-switching. Noting that we've had issues arise from context switching before - see #5467 / #5479 - it's quite possible that some of our prediction instability is arising from the same underlying problem - a user often does have background threads and processes running on their mobile device, after all.

Noting the similarity between this and intermittent failures for predictive text to produce no results when it often otherwise produces something with identical text, if my suspicions are correct, this may actually help mitigate some of the aspects of #6985.

The inner class ExecutionTimer here is an attempt to determine the predictive-text search's active execution time (that is, "actual time on the processor") by detecting possible context-switching time intervals and ignoring them. If the longest-interval observed at any point is at least twice as long as the next two longest-intervals observed, we'll call that an outlier. "Zero-length intervals" will also be ignored, since otherwise even a 1ms interval would automatically be an outlier.

  • [1, 1, 4]: the '4' is considered an outlier
  • [1, 2, 4]: the '4' isn't - 3*2 > 4, after all.

Note that excluded outliers aren't considered for any future outliers. The expectation/assumption is that each measured loop iteration should take a consistent amount of time... unless affected by context switching. To facilitate this, I've set things up for the timer (as used) to avoid tracking time in the search's yield state.

@keymanapp-test-bot skip

I think we should be fine without user testing... though I could see an argument for a mild bit of user testing for predictive text in the mobile apps if someone wanted to push for it.

@jahorton jahorton added this to the A16S7 milestone Aug 4, 2022
@keymanapp-test-bot keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Aug 4, 2022
@jahorton
Copy link
Contributor Author

jahorton commented Aug 5, 2022

I've re-run the lm-worker test multiple times now - the unit test that's addressed by these changes hasn't failed yet out of at least 5 attempts.

...not that this prevented BrowserStack from failing to run against one of the configured test browser configs for some of those attempts. 😞

@mcdurdin mcdurdin modified the milestones: A16S7, A16S8 Aug 7, 2022
@jahorton jahorton marked this pull request as ready for review August 8, 2022 04:58
@jahorton
Copy link
Contributor Author

jahorton commented Aug 8, 2022

This comment (#7036 (comment)) on the base branch also occurs here, including for the LMLayer's in-browser tests.

@mcdurdin
Copy link
Member

mcdurdin commented Aug 8, 2022

I think we should be fine without user testing... though I could see an argument for a mild bit of user testing for predictive text in the mobile apps if someone wanted to push for it.

It's a functional change internally. I'd like to see some user tests.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what this is doing. I don't have enough context to understand why this would be fixing the inconsistency in predictions, though. So if you can add extra context comments in the code, that'd be great.

common/web/lm-worker/src/correction/distance-modeler.ts Outdated Show resolved Hide resolved
common/web/lm-worker/src/correction/distance-modeler.ts Outdated Show resolved Hide resolved
common/web/lm-worker/src/correction/distance-modeler.ts Outdated Show resolved Hide resolved
@@ -582,17 +644,22 @@ namespace correction {

let batcher = new BatchingAssistant();

const timer = new ExecutionTimer(maxTime*3, maxTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why maxTime*3 vs maxTime? Is that likely to be a constant multiplier? If so, we should build that into the class I should think.

Copy link
Contributor Author

@jahorton jahorton Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because I needed to set the max true-system-time threshold to something larger than the max execution time, and something based on the max execution time seemed to make the most sense. I just picked '3' for the coefficient because there'd be nearly no chance of that being too short for the execution threshold to be exceeded.

Could probably be maxTime*2 pretty safely, and I could entertain arguments for other values. Maybe even 1.5; we don't expect heavy "thrashing" behavior from constant OS context switches.

The key interpretation: we're willing to wait up to 0.1 sec for predictions if execution gets delayed due to context switches, but for the common case we should only need to wait 0.033 sec (when no context switches occur). That 0.033 sec value has long been set via constant.

Using the * [coefficient] approach will dynamically adjust that maximum wait up if we're given more execution time, but for the common case, there should be no observable effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The numbers are fairly off-the-cuff, but we need to start somewhere. Measuring and testing on real low-end Android devices may be helpful.

Comment on lines +617 to +619
* Base assumption: OS context switches generally last at least 16ms. (Based on
* a window.setTimeout() usually not evaluating for at least
* that long, even if set to 1ms.)
Copy link
Contributor Author

@jahorton jahorton Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basis: try running this code in your browser's developer mode a few times:

var now = Date.now();  var timer = window.setTimeout(() => {console.log(Date.now() - now)}, 1);

Of course, keep in mind that this code will be triggered via user-input interrupt, so it can easily be triggered while the relevant thread would otherwise be asleep, possibly in the middle of whatever timer mechanism the OS uses to periodically context-switch, "off the beat" in a sense. As a result, you will see notably lower values than 16ms... but you'll likely also see some higher values.

Example runs on Win 10 / Firefox:

image

If a 23ms context-switch pause occurred during a prediction search, with no other interruptions... that'd cut execution time from 33ms to just 10ms without the special handling added here. If it were paused even longer, well... good luck operating with literally 0ms allotted execution time. As noted by #5467 / #5479, it can happen with some regularity.

While we don't know the exact conditions of the BrowserStack servers, devices, etc used for unit testing... I wouldn't be surprised if they have at least a few VMs set up per server; if so, there's a pretty solid chance of significant context-switching happening on those systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.mozilla.org/en-US/docs/Web/API/setTimeout#reasons_for_delays_longer_than_specified may also be helpful background.

On Windows, this may also be relevant:

The resolution of the GetTickCount function is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds. The resolution of the GetTickCount function is not affected by adjustments made by the GetSystemTimeAdjustment function.

I've got no idea what sort of behaviour we are going to see on mobile devices for these scenarios. Sounds like some testing on low-end devices is seriously needed.

@jahorton
Copy link
Contributor Author

I've got no idea what sort of behaviour we are going to see on mobile devices for these scenarios. Sounds like some testing on low-end devices is seriously needed.

What did you have in mind for this? Do we have access to suitable devices for such tests? I assume that we'd need physical devices to test this; after all, this PR's changes are aimed at handling OS behavior that'd be very difficult to emulate directly.

Or... is that why it's been given approval - for a 'test in the wild' on alpha? Not the sort of thing we'd usually do, so unless explicitly stated otherwise, I'll avoid merging for now.

Base automatically changed from fix/web/ui-module-test-stability to master August 16, 2022 02:56
@mcdurdin
Copy link
Member

What did you have in mind for this? Do we have access to suitable devices for such tests? I assume that we'd need physical devices to test this; after all, this PR's changes are aimed at handling OS behavior that'd be very difficult to emulate directly.

Yeah, I am struggling with this. We do have a couple of old devices in the Keyman office there don't we?

Or... is that why it's been given approval - for a 'test in the wild' on alpha? Not the sort of thing we'd usually do, so unless explicitly stated otherwise, I'll avoid merging for now.

I'm not sure what else we can do here. So if possible, test on those old devices in the office there. And then go ahead and merge.

@mcdurdin
Copy link
Member

(Of course, the alternative is to avoid timing-based approaches altogether, and go with a 'number of iterations' approach where we determine a suitable number based on empirical testing of what would be a typically acceptable response time on some old devices. (And then as devices get faster over time, we can gradually raise the threshold.)

@mcdurdin mcdurdin added this to the A16S9 milestone Aug 19, 2022
@mcdurdin mcdurdin modified the milestones: A16S9, A16S10 Sep 5, 2022
@mcdurdin mcdurdin merged commit 13a3df3 into master Sep 9, 2022
@mcdurdin mcdurdin deleted the fix/web/pred-text-stability branch September 9, 2022 19:45
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 16.0.60-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants