Integration Issues of Machine Learning Model with CacheLib #291

Da-Dylan-Ma · 2024-02-22T22:14:02Z

Da-Dylan-Ma
Feb 22, 2024

Hello,

We are a group of undergraduate students at the University of Toronto, working on a capstone project focused on incorporating machine learning into cache replacement policies. My team and I are exploring the integration of a machine learning model as a new cache replacement strategy within CacheLib. However, there are several points going beyond our basic understanding of caching so that we cannot copy previous practices to the integration. Therefore, we would like to seek guidance and confirmation on our guesses from you.

After reviewing CacheLib's documentation and codebase, we understand that introducing a new MM container (similar to existing implementations like MMLru, MMTinyLFU, and MM2Q) under the directory of cachelib/allocator/ might be the way forward. However, given the complexity of CacheLib's interface, we are seeking confirmation on whether this approach aligns with correct practices, alongside any additional insights or recommendations you might have, regarding any unnoticed parts to modify.

Additionally, we are trying to ascertain CacheLib's support for set associativity within its caching mechanisms, which is crucial for our ML model's requirements. We still failed to find enough details regarding cache set information and associativity levels of CacheLib. Our preliminary understanding suggests a potential link between CacheLib's pooling/slab concepts and set associativity, although pooling seems to be more dynamic and variable than traditional fixed associativity. Could you clarify this relationship or provide further details on CacheLib's associativity model? If the pooling mechanism is truly an advanced version of set associativity, would it be possible for us to configure the pooling strategy to make the cache behaving equivalently as set associative?

Another critical component of our model involves inputs such as the Program Counter (PC) and memory addresses for operations, which don't seem to be readily available in trace data. Is there a mechanism within CacheLib, or a recommended approach, to access these values for feeding into our ML model?

Our project exclusively focuses on DRAM cache replacement policies, omitting considerations for admission policies at other cache levels, such as NVM.

We greatly appreciate your time and assistance. Insights from experts like yourselves are invaluable to navigating these complex challenges and will significantly contribute to the success of our project.

Thank you for your support.

Best regards,
Da

Answered by wonglkd

Mar 7, 2024

Hi Da,

While I am not a Meta employee - I am a PhD student at CMU who collaborates with Meta on machine learning for flash cache admission & prefetching policies - I hope that my answers will be of some use to you.

First, it's great that you're doing a project on caching using CacheLib! Using a production library comes with some additional difficulties but also increases the utility of your work.

Yes, MMX {X = LRU, etc) is where the eviction policies are located.

You mentioned inputs such as the PC -- please correct me if I'm wrong, but this leads me to assume you are thinking about caches like the CPU cache, page cache or the buffer cache which are transparent caches. CacheLib is designe…

View full answer

wonglkd · 2024-03-07T17:47:01Z

wonglkd
Mar 7, 2024

Hi Da,

While I am not a Meta employee - I am a PhD student at CMU who collaborates with Meta on machine learning for flash cache admission & prefetching policies - I hope that my answers will be of some use to you.

First, it's great that you're doing a project on caching using CacheLib! Using a production library comes with some additional difficulties but also increases the utility of your work.

Yes, MMX {X = LRU, etc) is where the eviction policies are located.

You mentioned inputs such as the PC -- please correct me if I'm wrong, but this leads me to assume you are thinking about caches like the CPU cache, page cache or the buffer cache which are transparent caches. CacheLib is designed to be called explicitly from an application, and use cases (for which there are traces available) would include key-value caches, CDN caches, and bulk storage caches.

If you did know this but still wanted to do it anyway, since you explicitly interact with CacheLib each time (see the ItemHandle API and the find method), you would have to collect such information about the call site in your application (or a wrapper around CacheLib) and supply such information explicitly to your ML policy.

Some advice from another student: if this is for an architecture class and you are strictly looking at CPU caches , you may wish to consider some sort of CPU cache simulator. And if you haven't already, you will also want to consider papers such as https://proceedings.mlr.press/v80/hashemi18a/hashemi18a.pdf, https://proceedings.mlr.press/v119/liu20f/liu20f.pdf, https://dl.acm.org/doi/abs/10.1145/3466752.3480114 -- and the latter two do come with code e.g., https://github.com/google-research/google-research/tree/master/cache_replacement, https://github.com/CMU-SAFARI/Pythia

If you are willing to look beyond CPU caches and look at key-value or CDN or bulk storage caches (which I think would be great for a capstone project on ML for caching!), I would suggest starting out by modifying CacheBench, and pay attention to files such as the workload generators
and CacheStressor.h. And for prototyping ML in caching specifically, I would also consider iterating in a simulator like my own BCacheSim (which one can use to evaluate eviction policies for bulk storage systems like Tectonic - traces are available) or libCacheSim if you are looking at key-value/CDN caches.

All the best,
Daniel
www.cs.cmu.edu/~dlwong

PS also see: https://cachelib.org/docs/Cache_Library_User_Guides/Developing_for_Cachebench/#anatomy-of-cachebench

2 replies

Da-Dylan-Ma Mar 19, 2024
Author

Hello Daniel,

Thank you very much for your detailed answers. We realized that we might be negligent in making the decision to firstly try integrating PARROT (which requires PC and address as inputs) into CacheLib. Here are some of our current thoughts:

Adapt another ML model suitable for KV cache instead of CPU cache. (Give up PARROT)
Instantiate a CPU cache for PARROT-like model. (Give up CacheLib)

However, we guess that it might not be a good choice to fully give up CacheLib as we do want to integrate/develop the ML policy on an advanced base like this. It would be of great help if we could have any advice from you.

Thank you again for your time helping us on this matter despite our naive ideas.

Best regards.
Da

wonglkd Mar 19, 2024

I think you can make your decision depending on whether you want to focus on the ML aspect or the Systems aspect. Recall that Parrot is an ICML paper. If you want to focus on the ML aspect and improve upon Parrot, then you should work on a similar environment, perhaps their Gym, and focus on that. If you want to focus on the systems aspect (which is what it sounds like you want to do), you should start with CacheLib and then try to implement or adapt Parrot (or other ML caching policies for eviction - there are a number, e.g., LRB).

You can generate your own workload by building/modifying an application to use CacheLib; the easier route is to use one of the existing traces (key-value, CDN or bulk storage). And a key part of building a ML policy would be to think about what features you can have or engineer, and that could be one of the technical contributions made in a capstone project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration Issues of Machine Learning Model with CacheLib #291

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Integration Issues of Machine Learning Model with CacheLib #291

Da-Dylan-Ma Feb 22, 2024

Replies: 1 comment · 2 replies

wonglkd Mar 7, 2024

Da-Dylan-Ma Mar 19, 2024 Author

wonglkd Mar 19, 2024

Da-Dylan-Ma
Feb 22, 2024

Replies: 1 comment 2 replies

wonglkd
Mar 7, 2024

Da-Dylan-Ma Mar 19, 2024
Author