v0.7: LoRA Merging (linear, TIES, DARE) per request
🎉 Enhancements
- Merge multiple LoRA adapters per request (linear, TIES, DARE) by @tgaddair in #212
- Eetq by @flozi00 in #195
- hqq JIT Quantization by @flozi00 in #147
- Added Bloom dynamic adapter loading by @tgaddair in #187
- Added pbase adapter_source and expose api_token in client by @tgaddair in #181
- Cloudflare R2 Source by @llama-shepard in #198
🐛 Bugfixes
- Fixed Phi for new HF format by @tgaddair in #192
- Fixed OpenAI stream response data by @tgaddair in #193
- fix: OpenAI response format by @tgaddair in #184
- Fix RoPE and YARN scaling by @tgaddair in #202
- check for base model earlier in the adapter function by @noyoshi in #196
📝 Docs
🔧 Maintenance
- Upgrade to pytorch==2.2.0 by @tgaddair in #217
- upgrade exllama kernel by @flozi00 in #209
- Add a model cache to avoid running out of storage by @magdyksaleh in #201
New Contributors
- @llama-shepard made their first contribution in #198
Full Changelog: v0.6.0...v0.7.0