Run Large Language Models locally - better than edge, it's already in your browser 💪
- 💰 No Fees: No keys, no costs, no quotas
- 🏎️ Fast Inference: Runs on WASM with WebGPU acceleration
- 🔒 Privacy First: Pure client-side processing
- 🏕️ Offline Ready: Download model once, use anywhere
- 🔄 Streaming: Token-by-token output with minimal latency
- 📱 Device Agnostic: Just needs a modern browser with sufficient memory for the model
The application is built with vanilla JavaScript and uses emerging web standards:
- WebAssembly (WASM): Core runtime for model inference
- WebGPU: Hardware acceleration for supported devices
- Web Workers: Offloads model inference to prevent UI blocking
- transformers.js: Runs transformer models directly in the browser
- onnxruntime-web: Optimized inference engine
- Model Loading: LRU caching system (max 3 models) with quantization fallback (4-bit → 8-bit)
Feature | Chrome | Firefox | Safari | Edge |
---|---|---|---|---|
WASM | ✅ | ✅ | ✅ | ✅ |
WebGPU | ✅ | 🚧 | 🚧 | ✅ |