Skip to content

Commit

Permalink
dyaminc scaling (Thanks ajinkia 😄 ) (#746)
Browse files Browse the repository at this point in the history
  • Loading branch information
magdyksaleh authored Jan 22, 2025
1 parent 223c554 commit 05b2d63
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion server/lorax_server/layers/fp8.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def apply_fp8_linear(
input_scale_ub: Optional[torch.Tensor] = None,
qbias: Optional[torch.Tensor] = None,
) -> torch.Tensor:
qinput, x_scale = ops.scaled_fp8_quant(input, input_scale, scale_ub=input_scale_ub, use_per_token_if_dynamic=False)
qinput, x_scale = ops.scaled_fp8_quant(input, input_scale, scale_ub=input_scale_ub, use_per_token_if_dynamic=True)

output = ops.cutlass_scaled_mm(
qinput, qweight, out_dtype=input.dtype, scale_a=x_scale, scale_b=weight_scale, bias=qbias
Expand Down

0 comments on commit 05b2d63

Please sign in to comment.