Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autocomplete: Add "recent senders criterion" to user-mention autocomplete #828

Merged

Conversation

sm-sayedi
Copy link
Collaborator

@sm-sayedi sm-sayedi commented Jul 19, 2024

Besides recent activity in DMs, activity in the current topic/stream is also considered (as the first criterion) when suggesting users in @-mention autocomplete.

Note: This is the third PR in the series of PRs #608 is divided into, coming after #692. Next PR in the series: #849.

Fixes part of: #228

@sm-sayedi sm-sayedi added the integration review Added by maintainers when PR may be ready for integration label Jul 19, 2024
@sm-sayedi sm-sayedi requested a review from gnprice July 19, 2024 20:28
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sm-sayedi for all your work on this!

For the model code, small comments below.

Then in the test code, please do a new self-review of your changes, in light of the comments on #692 and the commits I added to that branch. In particular look to:

  • make each test case compact so it's easy to scan through them — in this case there are already helpers that go a long way, but the formatting can be adjusted for more compactness

  • think of all the different scenarios you can that should be tested:

    • scenarios where the code's behavior should vary
    • scenarios where, given the way you did write the code, the path taken through the control flow will vary
    • possible bugs that an implementation of this could have if someone made a plausible mistake, or if part of your code were accidentally deleted or disabled, and scenarios that would reproduce those bugs

    and make sure that for each scenario, there's a test that covers it and where the reader can easily confirm that it does so.

/// If [a] is non-null and [b] is null, returns a positive number.
/// If both are null, returns zero.
@visibleForTesting
static int compareNullable(int? a, int? b) => switch ((a, b)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name of this, I can't tell whether null comes before all the integers, or after all of them, or something else (equal to zero?). That makes it hard to tell whether or why this function is appropriate to use in any given context.

The name should be something that indicates that, and the summary line in the doc should indicate it too.

It could be something direct about the behavior: like compareNullBeforeInt, or compareNullAsLeast, or something like that.

Or maybe better is to give a name that reflects why it is that two methods on this class both want it: the point is that these aren't just any ints, but specifically are the greatest message IDs in two sets. When the set of message IDs is empty, its value is mathematically $-\infty$, negative infinity; we're representing that as null but it should therefore be treated as more negative than any number.

So the name could be compareRecentMessageIds. The doc can say:

/// Compares [a] to [b], with null less than all integers.
///
/// The values should represent the most recent message ID in each of two
/// sets of messages, with null meaning the set is empty.
///
/// Return values are as with [Comparable.compareTo].

And that's it — the summary line, plus the reference to [Comparable.compareTo] to resolve any ambiguity about what "compare" means, completely specifies the behavior so there's no need to transcribe the implementation into English.

Comment on lines 280 to 281
static int compareNullable(int? a, int? b) => switch ((a, b)) {
(int a, int b) => a.compareTo(b),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: block body, like at #692 (comment)

StreamNarrow() => (narrow.streamId, null),
TopicNarrow() => (narrow.streamId, narrow.topic),
// The CombinedFeedNarrow case should be impossible anyway.
DmNarrow() || CombinedFeedNarrow() => (null, null),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the assert above about CombinedFeedNarrow to inside here ­— that way we're splitting up the cases for narrow's type in just one place.

To make that comfortable, this will probably want to become a switch statement instead of expression.

Comment on lines 216 to 218
if (result != 0) {
return result;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: on one line, same way as below

Comment on lines 242 to 247
final aMessageId = recentSenders.latestMessageIdOfSenderInTopic(
streamId: streamId, topic: topic, senderId: userA.userId);
final bMessageId = recentSenders.latestMessageIdOfSenderInTopic(
streamId: streamId, topic: topic, senderId: userB.userId);

final result = -compareNullable(aMessageId, bMessageId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is a little easier to read if these first two variables are inlined:

Suggested change
final aMessageId = recentSenders.latestMessageIdOfSenderInTopic(
streamId: streamId, topic: topic, senderId: userA.userId);
final bMessageId = recentSenders.latestMessageIdOfSenderInTopic(
streamId: streamId, topic: topic, senderId: userB.userId);
final result = -compareNullable(aMessageId, bMessageId);
final result = -compareNullable(
recentSenders.latestMessageIdOfSenderInTopic(
streamId: streamId, topic: topic, senderId: userA.userId),
recentSenders.latestMessageIdOfSenderInTopic(
streamId: streamId, topic: topic, senderId: userB.userId));

Not really any denser that way, and there's a little fewer names flying around to have to track.

@sm-sayedi
Copy link
Collaborator Author

Thank you for the review @gnprice. Changes pushed. Please have a look.

@sm-sayedi sm-sayedi requested a review from gnprice July 24, 2024 12:24
sm-sayedi and others added 20 commits July 24, 2024 16:45
In @-mention autocomplete, users are suggested based on:
  1. Recent activity in the current topic/stream.
  2. Recent DM conversations.

Note: By "recent activity" we mean recent messages sent to a topic/stream.

Fixes part of: zulip#228
I think this is a bit more idiomatic than setting a record which we
promptly destructure; and it doesn't come out any longer.
Without this, the method's other unit tests (the other tests in this
group) pass even if the method were to ignore its `topic` parameter
and only use the per-stream data.
The full names of these tests were like this:

  MentionAutocompleteView sorting users results MentionAutocompleteView.compareRecentMessageIds both a and b are non-null
  MentionAutocompleteView sorting users results MentionAutocompleteView.compareRecentMessageIds one of a and b is null
  MentionAutocompleteView sorting users results MentionAutocompleteView.compareRecentMessageIds both of a and b are null
  MentionAutocompleteView sorting users results MentionAutocompleteView.compareByRecency favor user most recent in topic

To see the names, one can run a command like:

  $ flutter test test/model/autocomplete_test.dart -r expanded

We can cut the second mention of MentionAutocompleteView.
…rrow

If for example one substitutes some other narrow like `topicNarrow` in
this `checkResultsIn` call, the test still passes.  That's because now
the autocomplete does return results... but they don't match the
arbitrary `expected` list that was passed in, so the inner `check`
call fails, and the outer one succeeds because of that exception.

Instead, check more specifically that the MentionAutocompleteView.init
call throws.
The [List.sort] method is documented as being not necessarily stable,
meaning that elements that compare equal could end up in an arbitrary
order in the result.  Here, that means that users which aren't
distinguished by recency in the topic or stream or in DM conversations
can appear in any order in the autocomplete results.

Several of these test cases were therefore vulnerable to flaking
because users 2 and 4 have no DMs and (in these tests) no messages
in the stream or topic.  I think what they cover is now also covered
by the "ranking across signals" tests above, together with the tests
for RecentSenders and RecentDmConversations that check those data
structures' handling of events.  So just cut the tests.
It's good to have an end-to-end test here that the actual autocomplete
results reflect the sorting that's tested by all the unit tests in the
rest of this file: a test that makes a MentionAutocompleteView, then
gives it a query, and inspects the results.

The several such test cases that were here, though, don't exercise any
distinct scenarios when it comes to that end-to-end logic; they differ
from each other only in ways that are exercised by the unit tests
above, plus the tests for RecentSenders and RecentDmConversations that
check those data structures' handling of events.  So pick just one of
these test cases and cut the rest.
Now that many of these helpers have just one call site, they can
be simplified.
Zulip user IDs (like channel/stream IDs, message IDs, and so on)
are positive integers.  So avoiding zero helps keep the test data
realistic.
…nd test

This is nearly NFC, but also adds a `dispose` call to tidy things up
after getting the results.
…test

This way it's possible to look at the expected results at the end of
this test and compare them with the test data that leads to those
results, without scanning past a lot of other code that isn't related
to why these particular results are the right answers.
This test was relying on its `topic` variable (whose value is
"topic") being different from the topic chosen by default by
eg.streamMessage.  It does happen to be different, but it could
just as well not have been.  Because the difference matters --
the test would fail if they happened to be the same -- the test
case itself should be explicit about that.
@gnprice gnprice force-pushed the issue-228-@-mention-recent-senders-criterion branch from b816843 to 3444a68 Compare July 25, 2024 04:58
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revision!

This didn't quite get the tests to the kind of tests that I was hoping for. I decided the best way this time to communicate that would be by demonstration — so I wound up spending some time today writing tests of the kind I want to see, and revising these tests in that direction. I'll push those changes to the tip of the PR branch.

For each idea that might otherwise have been a separate comment on the PR, I made a separate commit. As a result this ended up being a longer series of separate commits added to the tip of the branch than the number of commits would have been if I were writing the same tests from scratch:

adeb012 autocomplete [nfc]: Format second half of compareByRecency same way as first
a0d0248 autocomplete [nfc]: Set streamId and topic as variables directly
3c7096f autocomplete test [nfc]: Tighten formatting in compareByRecency tests
a5b3cc8 autocomplete test: Check both directions in compareByRecency tests
6e72a5b autocomplete test [nfc]: Tighten names of compareByRecency tests
d91ecca autocomplete test: Check compareByRecency uses per-topic recency
72caacf autocomplete test [nfc]: Deduplicate in test names
4762e4e autocomplete [nfc]: Expose debugCompareUsers, for testing
bfce142 autocomplete test: Unit tests of between-signals logic, for each narrow type
37e643d autocomplete test: Fix test that almost can't fail, of CombinedFeedNarrow
3823ec4 autocomplete test: Cut tests that could flake due to indeterminate sort
a14ea8b autocomplete test: Highlight one end-to-end test, cut the others
1e781dd autocomplete test [nfc]: Do constant-folding in end-to-end test
739fa84 autocomplete test: Fix user IDs of zero
2b8d848 autocomplete test: Separate getting results from checking in end-to-end test
0288ee4 autocomplete test: Test ranking applies in presence of filters, too
b436fb5 autocomplete test [nfc]: Bring details next to details in end-to-end test
363dafa autocomplete test [nfc]: Cut unneeded MessageListView in end-to-end test
3444a68 autocomplete test: Make distinct topic explicitly distinct

Please take a look through each of those commits, including the commit messages, to see the points I had in mind. I also left one GitHub comment below, for an aspect that the commits don't go into full detail on.

With that, this code all looks good; and I also did some quick manual testing and confirms it works end-to-end in the app. So, merging! Thanks again @sm-sayedi for all your work building this feature #228, and @rajveermalviya @hackerkid @chrisbobbe for the previous reviews on this series of PRs.

Comment on lines 634 to 636
test('DmNarrow, with no topic/stream message history', () async {
await prepareStore();
await checkResultsIn(dmNarrow, expected: [3, 0, 1, 2, 4]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to interpret most of the test cases in this "suggests relevant users" group — for example, why is 3, 0, 1, 2, 4 the correct result to expect here, and why is 0, 3, 1, 2, 4 the correct result to expect in some of the tests above? The details that cause the right answers to be those lists and not something else are separated from here by about 100 lines of code, and they require some careful working-through to verify.

In general it's best for a test to be self-contained — all the key details that tell the story of the test, and determine why one value is expected instead of another, live in the test case's own source code, or failing that very close above it. That way

  • a reader can look at the test and confirm it's checking for the right answer instead of a wrong answer;
  • a reader can look at the test and understand what it's testing, and look at the ensemble of tests to think about things that aren't yet tested;
  • in the future when changing something about the code under test and the test breaks, a reader can understand what the test was trying to say so they can decide whether the failure signals a bug in their change or is an intended effect of their change, and so they can update the test if necessary so it continues to check what it was intended to check.

We use a lot of test helper functions to enable test cases to be concise and focused, because that also aids in all three of the points above. But when doing so, the key is to move out the boring details (the things that can be summarized as the helpers just doing things that ought to be done), and keep in the details that are part of the test's story, or that are necessary in order to make sense of other details in the test. So for example as long as this ranking [3, 0, 1, 2, 4] is here, the details of messages and dmConversations that give rise to it should be here or near here too.

@gnprice gnprice merged commit 3444a68 into zulip:main Jul 25, 2024
1 check passed
@sm-sayedi
Copy link
Collaborator Author

Thank you @gnprice for your time and patience in improving the PR. I read through all the commits and learned a lot. I believe this will improve my limited ability to write good tests.

gnprice added a commit to gnprice/zulip-flutter that referenced this pull request Aug 10, 2024
Zulip user IDs are positive integers.  So avoiding zero here
helps keep the test data realistic.

This follows up on 739fa84 (in zulip#828), which made a similar fix
for one test case.
chrisbobbe pushed a commit that referenced this pull request Aug 12, 2024
Zulip user IDs are positive integers.  So avoiding zero here
helps keep the test data realistic.

This follows up on 739fa84 (in #828), which made a similar fix
for one test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration review Added by maintainers when PR may be ready for integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants