Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-1246. Asynchronously initialize cache before reading #1247

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kotman12
Copy link

@kotman12 kotman12 commented Feb 20, 2025

Linked issue #1246

The current CachedModeledFrameworkImpl doesn't manage cache initialization for you. A perfect example of the kind of code you can expect to see in the wild is in the CachedModeledFramework tests themselves, i.e. blocking on semaphores that pin the reading thread to prevent it from certain operations that are cache dependent (but as far as I can tell exactly which operations are cache dependent is not really guaranteed so this is arguably cumbersome). Either way, this is fine in a lot of cases...

However, I propose an additional InitializedCachedModeledFramework implementation which asynchronously waits for the cache initialization trigger and only then proceeds to read from the cache. I implemented something similar for my personal use-case where I couldn't rely on, i.e. readThrough, to handle the uninitialized case because readThrough cannot disambiguate between a znode that is missing because it truly is missing from zk vs one that is missing because the cache hasn't initialized. In my case the znode wouldn't always exist in zk and so using readThrough would result in a lot of wasted calls to zk, greatly reducing the benefit of the cache in the first place.

To reiterate InitializedCachedModeledFramework has a couple benefits over the existing implementation:

  1. No more possibility of misleading NoNodeException in the case of reading from CachedModeledFramework before the cache has warmed. (I say misleading because the node may exist .. just not in the cache)
  2. No more temptation to add blocking semaphores in front of this non-blocking interface.
  3. IMO generally less confusion about how to properly use this otherwise great(!) feature.

@kotman12 kotman12 force-pushed the Asynchronously-Initialize-Cache-Before-Reading branch from 753ba51 to 9891901 Compare February 21, 2025 21:24
@kotman12 kotman12 changed the title Asynchronously initialize cache before reading [Cirator 1246] Asynchronously initialize cache before reading Feb 26, 2025
@kotman12 kotman12 changed the title [Cirator 1246] Asynchronously initialize cache before reading [Curator 1246] Asynchronously initialize cache before reading Feb 26, 2025
@kotman12 kotman12 changed the title [Curator 1246] Asynchronously initialize cache before reading [CURATOR 1246] Asynchronously initialize cache before reading Feb 26, 2025
@kotman12 kotman12 marked this pull request as ready for review February 26, 2025 23:52
@kotman12 kotman12 changed the title [CURATOR 1246] Asynchronously initialize cache before reading [CURATOR-1246] Asynchronously initialize cache before reading Feb 26, 2025
@tisonkun tisonkun changed the title [CURATOR-1246] Asynchronously initialize cache before reading GH-1246. Asynchronously initialize cache before reading Feb 27, 2025
Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @kotman12!

My extra consideration is whether the original manner has some benefits or we can even replace the old implement with the new one. Also compatibility is a consideration while we can bump a major version if needed and worthy.

Anyway, this patch itself LGTM.

@tisonkun tisonkun requested a review from kezhuw March 1, 2025 09:30
@kotman12
Copy link
Author

kotman12 commented Mar 1, 2025

Thanks for taking a look @tisonkun! I think I agree .. IMO in a majority of cases the "initialized" variety is what you want even though there may be some minimal overhead of going through/checking two futures instead of one. I don't know at what scale that matters, if at all, but didn't benchmark it so didn't want to jump to conclusions.

I'd be happy to refactor CachedModeledFrameworkImpl itself instead of adding another implementation. A benefit of this approach is that tests would be a lot simpler but I'm not sure if this is something that needs to be benchmarked? At any rate, this shouldn't introduce any backwards incompatibilities so not sure a major version bump is even needed to change CachedModeledFrameworkImpl to manage initialization.

A third option is to make InitializedCachedModeledFramework the default cached implementation returned by ModeledFramework::cached. This would make the user-friendlier implementation more discoverable while also giving hardcore users the option to use the original, unitialized version.

Of course I'll defer to the more experienced community here.

@kotman12
Copy link
Author

kotman12 commented Mar 5, 2025

@tisonkun / @kezhuw please take a look at #1250 when you get a chance. I believe it is a less heavyweight implementation of the same thing we are doing here. Please don't merge this PR because while implementing #1250 I realized there are still gaps here. Rather than fix them I'll kindly first refer you to #1250 since if that is merged this one can be closed. I believe it should be backwards compatible and nearly identical from a performance standpoint.

@kotman12 kotman12 marked this pull request as draft March 5, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants