Skip to content
View lgyStoic's full-sized avatar
🎯
Focusing
🎯
Focusing
  • shenzhen

Block or report lgyStoic

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Analyze computation-communication overlap in V3/R1.

898 116 Updated Mar 3, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,854 474 Updated Mar 8, 2025

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 294 31 Updated Jan 3, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,219 784 Updated Mar 1, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 6,771 672 Updated Mar 8, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,699 195 Updated Mar 4, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,771 169 Updated Mar 7, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 12,539 827 Updated Mar 7, 2025

An autoregressive character-level language model for making more things

Python 2,903 761 Updated Jun 4, 2024

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,041 1,406 Updated Feb 1, 2025

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023

Jupyter Notebook 2,738 614 Updated Oct 27, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 140,876 28,226 Updated Mar 8, 2025

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Python 9,492 541 Updated Sep 7, 2024

LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt,Language of GPT, 结构化提示词,结构化Prompt

Jupyter Notebook 8,411 681 Updated Dec 13, 2024

LLM&VLM Tutorial

Python 1,723 1,581 Updated Feb 6, 2025

Auto-Editor: Efficient media analysis and rendering

Python 3,152 442 Updated Mar 8, 2025

C, C++ and Python Code for Exercises and Solutions

C++ 507 185 Updated Dec 17, 2019
C++ 244 38 Updated Sep 15, 2023

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Python 1,648 245 Updated Mar 28, 2024

Neural Networks: Zero to Hero

Jupyter Notebook 13,382 1,852 Updated Aug 18, 2024

Tile primitives for speedy kernels

Cuda 2,130 123 Updated Mar 9, 2025

Tool for visual profiling Core ML models, compatible with both package and compiled versions, including reasons for unsupported operations on the Neural Engine

Swift 28 1 Updated Jun 18, 2024

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 1,978 400 Updated Mar 9, 2025

On-device AI across mobile, embedded and edge for PyTorch

C++ 2,577 469 Updated Mar 9, 2025

本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。

C++ 143 24 Updated Mar 29, 2024

PainterEngine is a application/game engine with software renderer,PainterEngine can be transplanted to any platform that supports C

C 2,752 297 Updated Feb 20, 2025

Code release for book "Efficient Training in PyTorch"

Python 46 6 Updated Oct 13, 2024

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

Python 842 61 Updated Nov 22, 2024

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Python 13,155 2,918 Updated Feb 27, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,761 285 Updated Mar 4, 2025
Next
Showing results