lgyStoic

Follow

🎯

Focusing

Garry Ling lgyStoic

🎯

Focusing

Follow

focus on arm!

6 followers · 109 following

shenzhen

Achievements

Achievements

Lists (2)

Sort

✨ Inspiration

🚀 My stack

Starred repositories

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

898 116 Updated Mar 3, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,854 474 Updated Mar 8, 2025

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 294 31 Updated Jan 3, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,219 784 Updated Mar 1, 2025

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 6,771 672 Updated Mar 8, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,699 195 Updated Mar 4, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,771 169 Updated Mar 7, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 12,539 827 Updated Mar 7, 2025

karpathy / makemore

An autoregressive character-level language model for making more things

Python 2,903 761 Updated Jun 4, 2024

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,041 1,406 Updated Feb 1, 2025

phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023

Jupyter Notebook 2,738 614 Updated Oct 27, 2024

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 140,876 28,226 Updated Mar 8, 2025

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Python 9,492 541 Updated Sep 7, 2024

langgptai / LangGPT

LangGPT: Empowering everyone to become a prompt expert!🚀 Structured Prompt，Language of GPT, 结构化提示词，结构化Prompt

Jupyter Notebook 8,411 681 Updated Dec 13, 2024

InternLM / Tutorial

LLM&VLM Tutorial

Python 1,723 1,581 Updated Feb 6, 2025

WyattBlue / auto-editor

Auto-Editor: Efficient media analysis and rendering

Python 3,152 442 Updated Mar 8, 2025

HandsOnOpenCL / Exercises-Solutions

C, C++ and Python Code for Exercises and Solutions

C++ 507 185 Updated Dec 17, 2019

MegEngine / MegPeak

C++ 244 38 Updated Sep 15, 2023

OpenPPL / ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Python 1,648 245 Updated Mar 28, 2024

karpathy / nn-zero-to-hero

Neural Networks: Zero to Hero

Jupyter Notebook 13,382 1,852 Updated Aug 18, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,130 123 Updated Mar 9, 2025

fguzman82 / CoreMLProfiler

Tool for visual profiling Core ML models, compatible with both package and compiled versions, including reasons for unsupported operations on the Neural Engine

Swift 28 1 Updated Jun 18, 2024

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 1,978 400 Updated Mar 9, 2025

pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch

C++ 2,577 469 Updated Mar 9, 2025

XiaoMi / StableDiffusionOnDevice

本项目是一个通过文字生成图片的项目，基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型，包括其配套的模型运行框架。

C++ 143 24 Updated Mar 29, 2024

matrixcascade / PainterEngine

PainterEngine is a application/game engine with software renderer,PainterEngine can be transplanted to any platform that supports C

C 2,752 297 Updated Feb 20, 2025

ailzhang / EfficientPyTorch

Code release for book "Efficient Training in PyTorch"

Python 46 6 Updated Oct 13, 2024

apple / ml-mobileclip

This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024

Python 842 61 Updated Nov 22, 2024

PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Python 13,155 2,918 Updated Feb 27, 2025

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,761 285 Updated Mar 4, 2025

Starred topics

slam-algorithms

Raspberry Pi

superresolution