-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more optimal scheduling for pages stream #5
Comments
This would make sense especially when iterating on the It's a bit unclear from the documentation whether PS. This is a nice project! There's definitely use for this kinda crate. |
Yes, this is exactly what I'm trying to solve in this issue, the new implementation should reduce(but not eliminate) fetch delays at page boundaries.
Yes, certainly, and In general, for non-cursor pagination patterns you should always prefer
Thank you! I'm already at the point where I cannot imagine how to work with AWS/Azure APIs or scrap sites without the help of this crate. It just reduces all the mess into a bunch of combinators or into a simple while loop :) |
Thanks for the explanation @a1akris.
The one thing I'm a bit worried about with this is memory usage. If the processing of elements is slower than the page fetching and the iterated collection is huge, I'll end up with an unbounded amount of pages in memory? If this is correct then I'd feel better about having an argument for throttling the page fetching. E.g. something like |
Good question! The argument you're referring to already exists: Examplelet mut items = std::pin::pin!(client.pages(4, req, Limit::None));
// Here the stream schedules 4(requests_ahead_count) futures
// and as soon as the first one completes it schedules the 5th request
// and returns you the first page to process
let page1 = items.next().await;
// While we're here, no extra futures can be scheduled in the background,
// but futures that have already been scheduled are executing.
process_page(page1); // sleep for 300 seconds
// The 2nd request was certainly completed in 300 seconds, so you
// get the second page immediately but before yielding results the stream
// schedules the 6th request
let page2 = items.next().await;
// Get 3rd page, schedule 7th request.
let page3 = items.next().await;
// and so on...
let page4 = process_page(items.next().await); TL;DRIf |
Thanks again for the detailed explanation! Ya that makes sense. Perhaps a similar note about memory usage could be added to the docs too? |
Currently the
pages
stream is an equivalent of a hand-written loop that can look something like this:The loop above sends the request awaits the response and process results sequentially. However, there is a more optimal way of doing the same things which should be fairly easy to implement by modifying the current request_next_page impl. If there exists a next page we can send a request for it immediately, even before yielding the current response results, and at the next stream evaluation we will await the already progressing future making awaiting the next response and processing current results go in parallel!
Here is a list of primitives which can help to achieve the desired behavior:
The text was updated successfully, but these errors were encountered: