Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRU cache on __getitem__ may be a performance problem #105

Open
sigprof opened this issue Sep 10, 2024 · 1 comment
Open

LRU cache on __getitem__ may be a performance problem #105

sigprof opened this issue Sep 10, 2024 · 1 comment

Comments

@sigprof
Copy link

sigprof commented Sep 10, 2024

The Dotty.__getitem__(self, item) method is decorated with @lru_cache(maxsize=32). When applied to a method, the lru_cache implementation uses the self argument as a part of the cache key, which result in at least one call of the __hash__ method for it, and maybe some calls of __eq__ too. The implementation of __hash__ for Dotty is hash(str(self)), and that str(self) ends up calculating str(self._data), which is rather expensive for a large dict — in fact, it is likely to be more expensive than running the __getitem__ code without caching.

For some real world data, in QMK firmware the caching of Dotty.__getitem__ actually slows down CLI commands like qmk find -f 'split.enabled=true' command by more than 50% (25 vs 39 seconds on my machine).

Should the __getitem__ caching be completely removed, or are there some important cases where it provides a meaningful speedup? If that's the case, maybe the caching could be made optional somehow.

@sigprof
Copy link
Author

sigprof commented Sep 10, 2024

#45, which added that caching, claims that it's “up to 3-4 times faster”… I guess that it highly depends on the particular use case (if you have small dicts and lots of repeated lookups, you will get completely different results compared to large dicts and mostly unique lookups). So this kind of caching really needs to be tunable, but the existing API does not provide any way to reconfigure the cache, apart from accessing Dotty.__getitem__.__wrapped__ directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant