-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking: Pagination #2449
Comments
Hi! I've been playing with dropshot since @sunshowers mentioned it to me. I'm generally positive about it, but the pagination interface feels a bit awkward to me. I suspect that you had a different higher level approach in mind from what I want to do when this was designed, and that I may be running in that impedance mismatch! I wanted to share it here where it appears that you're making design considerations. If this is not the best place to share this, I apologize in advance! Note that I'm not currently using an openapi-generated client to access this endpoint, which may be a factor in my impressions around this interface. The context for my application is that I have an endpoint that performs a sqlite full text query. I have a simple In addition to the list of things you've collected above, a couple of things that feel surprising/awkward to me are:
$ curl 'http://192.168.1.92:4242/search/songs?page=eyJ2IjoidjEiLCJwYWdlX3N0YXJ0Ijp7InEiOiJsb3ZlIiwib2Zmc2V0IjozfX0=' -s | jq .
{
"request_id": "ea77c161-5b23-49ed-9716-79ff2ccf6609",
"message": "unable to parse query string: missing field `q`"
} For context/clarity, here are my type definitions:#[derive(Debug, Deserialize, JsonSchema)]
struct SongSearchParams {
q: String,
}
#[derive(Debug, Deserialize, Serialize)]
struct SongSearchPage {
q: String,
offset: usize,
}
#[endpoint {
method = GET,
path = "/search/songs"
}]
async fn search_songs(
request_context: RequestContext<ServerState>,
query: Query<PaginationParams<SongSearchParams, SongSearchPage>>,
) -> Result<HttpResponseOk<ResultsPage<SongMetadata>>, HttpError>; I'm not sure that there is benefit to me in trying to conform to the pagination interface as it stands now; the bifurcation of the query logic around the two #[derive(Debug, Deserialize, JsonSchema)]
struct SongSearchParams {
q: String,
limit: Option<u64>,
offset: Option<u64>,
}
#[endpoint {
method = GET,
path = "/search/songs"
}]
async fn search_songs(
request_context: RequestContext<ServerState>,
query: Query<SongSearchParams>,
) -> Result<HttpResponseOk<Vec<SongMetadata>>, HttpError>; Am I missing something important that led to the current shape of the pagination interface? |
I'd defer to the folks who came up with that interface, but I think it's intentionally designed to encourage cursor-based pagination instead of |
Yeah, so the consumer model we were going for is described in this part of the docs. The rationale is described in this comment. Certainly, my experience has been that limit/offset pagination is strongly discouraged for the reasons @jgallagher mentioned, which are described in more detail in the linked blog post. To summarize: limit/offset leads to invisible inconsistencies (you can miss items during a scan and have no way to know that that's happened) and the implementation usually doesn't scale well (
I think this the right outcome: we want to make it hard to accidentally do this, but if someone understands the tradeoffs and wants to do it, it works.
You shouldn't have to pass
You used the query parameter @jgallagher wrote:
This is true! To get specific: using what that blog post calls keyset pagination, if a client scans a collection in the usual way, it can be sure that it saw any items that existed for the duration of the scan and whose position in the sorted collection (relative to others) didn't change during the scan (e.g., if you're sorting by name, you can think of this as "it wasn't renamed"). There are still plenty of caveats: if an item is added, removed, or reordered (renamed) during the scan, the client may or may not see it. That's usually manageable, though. Limit/offset is much worse in that say a client has retrieved 3 of 10 pages (pages 1, 2, and 3) and then an item is removed from page 1. The client will proceed requesting page 4 and miss the item that was at the top of page 4, now the bottom of page 3. The element that the client misses was not changed by the mutation at all -- it was not added, removed, nor reordered/renamed during the scan, yet the client missed it. That's pretty hard to deal with. |
Ah, I see my mistake. I think this is because the example code for the server doesn't really show those parameters, and the docs around the pagination types, beyond the brief mention in the "Support for paginated resources" section, don't really mention
I understand the rationale, thanks for clarifying! My perception is that the design is (justifiably) opinionated about the approach, while also being very hands-off in terms of pointing people to the intended path in the docs, which presents as a bit of a wide gulf for the casual/external consumer to adopt. If you don't mind me (literally) typing this comment from my armchair(!), my suggestion would be to briefly explain in the docs that the intent is to use something like a cursor-based approach because of the performance characteristics of a simplistic I think it would also be very helpful to include an appropriate RDBMS example in the repo, because the existing pagination examples are all a bit far removed from how most "real" data sources work, which I think adds to the difficulty in more smoothly adopting this interface. Stating things a bit more succinctly: I think the current docs and examples don't really support/serve someone coming in cold who wants to hook up what they might perceive to be a "simple" RDBMS-backed query, so it would be great to give a bit more of a directed example with just a little bit of exposition about why, so that it is easier for them to fall into the pit of success. |
I hear that. Any ideas to improve it? When writing this, I felt like it was hard to describe more concisely than a complete example, but that's complicated enough to be a whole program which means it's sort of outside the main docs. (Is there a spot where the docs refer to it as
Hmm. The intent of the We could have one that actually accesses a real RDBMS? That'd presumably add more deps and a lot of the code would not be relevant for others' RDBMS's, but maybe that's worth it if it's clearer how to use it? |
Replying to myself: the more complicated pagination example includes in a block comment an example of running through it with curl: The simple pagination example just says "Try passing different values of the |
Going through pages of results in the external API is not very good right now. The biggest problem is that you can only go forward. If you have a cursor pointing to a page, there is simply no way to access the previous page without starting over at the beginning of the collection and paging through. Details are in the Dropshot issue, but most likely we need to be able to ask for the page before a given cursor, which would require splitting the existing
page_token
param into two:starting_from
(same behavior as currentpage_token
) andending_before
.Another problem is that we will always give a
next_page
cursor regardless of whether there are actually any more items. As discussed in the Dropshot issue, we need to give the consumer (Nexus) a way of telling Dropshot whether there should be anext_page
param, and it would be up to Nexus to determine the criterion. A very simple one that would require changing very little about our code would be to only includenext_page
if the current page's size equals the current page size limit. This would have a1/limit
chance of being wrong, which is not that bad. A more robust but more complex approach would involve fetchinglimit + 1
results for a given page so we would actually know whether there are more items. The extra precision here is somewhat undermined by the fact that the actual number of items available could change between requests regardless.This mostly depends on work that would take place in Dropshot, so fittingly both of these issues are captured there.
Tasks
time_created
#2343The text was updated successfully, but these errors were encountered: