v0.7: LoRA Merging (linear, TIES, DARE) per request

tgaddair released this 01 Feb 22:08

· 317 commits to main since this release

56dc6e2

🎉 Enhancements

Merge multiple LoRA adapters per request (linear, TIES, DARE) by @tgaddair in #212
Eetq by @flozi00 in #195
hqq JIT Quantization by @flozi00 in #147
Added Bloom dynamic adapter loading by @tgaddair in #187
Added pbase adapter_source and expose api_token in client by @tgaddair in #181
Cloudflare R2 Source by @llama-shepard in #198

🐛 Bugfixes

Fixed Phi for new HF format by @tgaddair in #192
Fixed OpenAI stream response data by @tgaddair in #193
fix: OpenAI response format by @tgaddair in #184
Fix RoPE and YARN scaling by @tgaddair in #202
check for base model earlier in the adapter function by @noyoshi in #196

📝 Docs

Updated quantization docs by @tgaddair in #206

🔧 Maintenance

Upgrade to pytorch==2.2.0 by @tgaddair in #217
upgrade exllama kernel by @flozi00 in #209
Add a model cache to avoid running out of storage by @magdyksaleh in #201

New Contributors

@llama-shepard made their first contribution in #198

Full Changelog: v0.6.0...v0.7.0

Contributors

tgaddair, magdyksaleh, and 3 other contributors

Assets 2