Lists (5)
Sort Name ascending (A-Z)
Stars
Asterinas is a secure, fast, and general-purpose OS kernel, written in Rust and providing Linux-compatible ABI.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Large Language Model (LLM) Systems Paper List
Reverse Engineering: Decompiling Binary Code with Large Language Models
CPU inference for the DeepSeek family of large language models in pure C++
FlashInfer: Kernel Library for LLM Serving
Curated collection of papers in MoE model inference
JAX bindings for the flash-attention3 kernels
Fast and memory-efficient exact attention
Custom Linux scheduler for concurrency fuzzing written in Java with hello-ebpf
FlagGems is an operator library for large language models implemented in Triton Language.
My learning notes/codes for ML SYS.
Perceptual video quality assessment based on multi-method fusion.
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
GitHub page for "Large Language Model-Brained GUI Agents: A Survey"
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Port of OpenAI's Whisper model in C/C++
Building blocks for foundation models.
A visualized debugging framework to aid in understanding the Linux kernel.