Release v0.10.0: Speculative decoding adapters and SGMV + BGMV · predibase/lorax

🎉 Enhancements

Added support for Medusa speculative decoding adapters by @tgaddair in #372
Added Medusa adapters per request by @tgaddair in #454
Support jointly trained Medusa + LoRA adapters by @tgaddair in #482
Adds prompt lookup decoding (ngram speculation) by @tgaddair in #375
Use SGMV for prefill BGMV for decode by @tgaddair in #464
Added phi3 by @tgaddair in #445
Added support for C4AI Command-R (cohere) by @tgaddair in #411
Add DBRX by @tgaddair in #423
Refactor adapter interface to support adapters other than LoRA (e.g., speculative decoding) by @tgaddair in #359
Initializing server with an adapter sets it as the default by @tgaddair in #370
Implement Seed Parameter Support for OpenAI-Compatible API Endpoints by @GirinMan in #374
lorax launcher now has --default-adapter-source by @noyoshi in #419
enh: Make client's handling of error responses more robust and user-friendly by @jeffreyftang in #418
Support both medusa v1 and v2 by @tgaddair in #421
use default HF HUB token when checking for base model info by @noyoshi in #428
Added adapter_source and api_token to completions API by @tgaddair in #446
Increase max stop sequences by @tgaddair in #453
Support LORAX_USE_GLOBAL_HF_TOKEN by @tgaddair in #462
Allow setting temperature=0 by @tgaddair in #467
Merge medusa segments by @tgaddair in #471

🐛 Bugfixes

Fix CUDA compile when using long sequence lengths by @tgaddair in #363
Fix CUDA graph compile with speculative decoding by @tgaddair in #381
Fix mixtral for speculative decoding by @tgaddair in #382
Fix import of EntryNotFoundError by @tgaddair in #401
Fix warmup when using spculative decoding by @tgaddair in #402
fix: assign bias directly by @thincal in #398
fix: Enable ignoring botocore ClientError during download_file by @jeffreyftang in #404
Fix Pydantic v2 adapter_id and merged_adapters validation by @claudioMontanari in #408
fix: Suppress pydantic warning over model_id field in DeployedModel by @jeffreyftang in #409
Fix phi by @noyoshi in #410
fix: Missing / in pbase endpoint by @jeffreyftang in #415
Print correct number of key value heads on dimension assertion. by @dstripelis in #414
Fix request variable by @Infernaught in #416
fix: Rename _get_slice to get_slice by @tgaddair in #424
fix: Hack for llama3 eos_token_id by @tgaddair in #427
fix: checking the base_model_name_or_path of adapter_config and early return if null by @thincal in #431
fix: use logits to calculate alternative tokens by @JTS22 in #425
Fixed default pbase endpoint url by @tgaddair in #435
fix: Downloading private adapters from HF by @tgaddair in #443
Fix Outlines compatibility with speculative decoding by @tgaddair in #447
fix: Handle edge case where allowed tokens are out of bounds by @tgaddair in #449
Fix special tokens showing up in the response by @tgaddair in #450
Fix Medusa + LoRA by @tgaddair in #455
Ensure Llama 3 stops on all EOS tokens by @arnavgarg1 in #456
Reuse session per class instance by @gyanesh-mishra in #468

📝 Docs

Fix chat completion and docs by @GirinMan in #358
Added batch processing example by @tgaddair in #386
Medusa docs by @tgaddair in #459
Updated supported base models in docs by @arnavgarg1 in #458
Docs for private HF models by @tgaddair in #460
Auth header docs by @tgaddair in #461

🔧 Maintenance

Add CNAME file for Docs by @martindavis in #364
Update tagging logic and add flake8 linter by @magdyksaleh in #365
Apply black formatting by @tgaddair in #376
Switch formatting and linting to ruff by @tgaddair in #378
Style: change line length to 120 and enforce import sort order by @tgaddair in #383
Bump pydantic version to >2, <3 by @claudioMontanari in #405
refactor: set config into weights for quantization feature support more easily by @thincal in #400
Update Predibase integration to support v2 API by @jeffreyftang in #403
logging by @magdyksaleh in #436
revert by @magdyksaleh in #437
Upgrade to CUDA 12.1 and PyTorch 2.3.0 by @tgaddair in #472
int: Bump Lorax Client to 3.9 by @gyanesh-mishra in #486
Bump lorax client v0.6.0 by @tgaddair in #488

New Contributors

@GirinMan made their first contribution in #358
@martindavis made their first contribution in #364
@thincal made their first contribution in #398
@claudioMontanari made their first contribution in #405
@dstripelis made their first contribution in #414

Full Changelog: v0.9.0...v0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.0: Speculative decoding adapters and SGMV + BGMV

🎉 Enhancements

🐛 Bugfixes

📝 Docs

🔧 Maintenance

New Contributors

Contributors