Lists (2)
Sort Name ascending (A-Z)
Starred repositories
Analyze computation-communication overlap in V3/R1.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
flash attention tutorial written in python, triton, cuda, cutlass
Accessible large language models via k-bit quantization for PyTorch.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
An autoregressive character-level language model for making more things
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt
Auto-Editor: Efficient media analysis and rendering
C, C++ and Python Code for Exercises and Solutions
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Neural Networks: Zero to Hero
Tool for visual profiling Core ML models, compatible with both package and compiled versions, including reasons for unsupported operations on the Neural Engine
High-efficiency floating-point neural network inference operators for mobile, server, and Web
On-device AI across mobile, embedded and edge for PyTorch
本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。
PainterEngine is a application/game engine with software renderer,PainterEngine can be transplanted to any platform that supports C
Code release for book "Efficient Training in PyTorch"
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).