LRU cache on `getitem` may be a performance problem #105

sigprof · 2024-09-10T12:22:10Z

The Dotty.__getitem__(self, item) method is decorated with @lru_cache(maxsize=32). When applied to a method, the lru_cache implementation uses the self argument as a part of the cache key, which result in at least one call of the __hash__ method for it, and maybe some calls of __eq__ too. The implementation of __hash__ for Dotty is hash(str(self)), and that str(self) ends up calculating str(self._data), which is rather expensive for a large dict — in fact, it is likely to be more expensive than running the __getitem__ code without caching.

For some real world data, in QMK firmware the caching of Dotty.__getitem__ actually slows down CLI commands like qmk find -f 'split.enabled=true' command by more than 50% (25 vs 39 seconds on my machine).

Should the __getitem__ caching be completely removed, or are there some important cases where it provides a meaningful speedup? If that's the case, maybe the caching could be made optional somehow.

The text was updated successfully, but these errors were encountered:

sigprof · 2024-09-10T12:46:06Z

#45, which added that caching, claims that it's “up to 3-4 times faster”… I guess that it highly depends on the particular use case (if you have small dicts and lots of repeated lookups, you will get completely different results compared to large dicts and mostly unique lookups). So this kind of caching really needs to be tunable, but the existing API does not provide any way to reconfigure the cache, apart from accessing Dotty.__getitem__.__wrapped__ directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LRU cache on `getitem` may be a performance problem #105

LRU cache on `getitem` may be a performance problem #105

sigprof commented Sep 10, 2024

sigprof commented Sep 10, 2024

LRU cache on __getitem__ may be a performance problem #105

LRU cache on __getitem__ may be a performance problem #105

Comments

sigprof commented Sep 10, 2024

sigprof commented Sep 10, 2024

LRU cache on `getitem` may be a performance problem #105

LRU cache on `getitem` may be a performance problem #105