feat - support mla kvcache store #888

baowendin · 2025-02-23T14:46:12Z

Summary

related to #877
This PR implement MLA cache store，and passed correctness test in the case of ckv_dim=512 and kpe_dim=64，but no further performance test yet. Sincerely hope somebody who familiar with CUDA can help improve performance.

yzh119

Hi @baowendin , thanks for the contribution, the kernel look good to me overall.

Would you mind trying using triton instead? I expect we can get similar performance to cuda. In the future, we hope all elementwise / data-movement kernels can be written in triton, to save maintenance overhead.

csrc/flashinfer_page_ops.cu

baowendin · 2025-02-24T02:04:21Z

hi, I have formatted code with pre-commit, but since I'm not familiar with triton, so this time I can't reformat it with triton, maybe next time ?

yzh119

It would be great to add a benchmark like: https://github.com/flashinfer-ai/flashinfer/blob/main/benchmarks/bench_append_paged_kv_cache.py

yzh119 · 2025-02-24T08:19:31Z

include/flashinfer/page.cuh

@@ -16,6 +16,7 @@
 #ifndef FLASHINFER_PAGE_CUH_
 #define FLASHINFER_PAGE_CUH_

+#include <assert.h>


Would you mind changing them to FLASHINFER_CHECK? (defined in

flashinfer/include/flashinfer/exception.h

Line 41 in 341ae09

#define FLASHINFER_CHECK(condition, message) \

)

assert would only work when you compile the program in debug mode, not release mode.

yzh119 · 2025-02-24T08:21:20Z

tests/test_mla_page.py

+import flashinfer
+
+
+def test_append_mla_paged_kv_cache():


Can you design some stronger test cases, I apologize that the previous test_page.py is very weak (most of the tests around append page APIs is performed in C++ unittests and in the future we should move them to python).

By "stronger" I mean more cases (nnz/append length), data types, page_size etc.

baowendin · 2025-02-24T15:48:52Z

okay, I'll fix these problem and add more test later in this week

baowendin changed the title ~~feat - support mla kvache store~~ feat - support mla kvcache store Feb 23, 2025

yzh119 reviewed Feb 23, 2025

View reviewed changes

csrc/flashinfer_page_ops.cu Outdated Show resolved Hide resolved

baowendin force-pushed the feature/support_mla_store_kvcache branch from db20a81 to 61fb997 Compare February 24, 2025 01:55

baowending.bwd added 2 commits February 24, 2025 10:00

feat - support mla kvache store

b014d95

style - reformat code with pre-commit

f7bd89f

baowendin force-pushed the feature/support_mla_store_kvcache branch from 61fb997 to f7bd89f Compare February 24, 2025 02:00

yzh119 reviewed Feb 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat - support mla kvcache store #888

feat - support mla kvcache store #888

baowendin commented Feb 23, 2025

yzh119 left a comment

baowendin commented Feb 24, 2025 •

edited

Loading

yzh119 left a comment

yzh119 Feb 24, 2025

yzh119 Feb 24, 2025

baowendin commented Feb 24, 2025

		import flashinfer


		def test_append_mla_paged_kv_cache():

feat - support mla kvcache store #888

Are you sure you want to change the base?

feat - support mla kvcache store #888

Conversation

baowendin commented Feb 23, 2025

Summary

yzh119 left a comment

Choose a reason for hiding this comment

baowendin commented Feb 24, 2025 • edited Loading

yzh119 left a comment

Choose a reason for hiding this comment

yzh119 Feb 24, 2025

Choose a reason for hiding this comment

yzh119 Feb 24, 2025

Choose a reason for hiding this comment

baowendin commented Feb 24, 2025

baowendin commented Feb 24, 2025 •

edited

Loading