Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update LayerNorm with Welford online algorithm (#1374)
Welford online algorithm improves the numerical stability of variance computation compared to naive or two-pass algorithm. Simple test case: ```python import torch import torch.nn as nn B, D = 1, 5 x = torch.tensor([[1e15, 1e15 + 1, 1e15 + 2, 1e15 + 3, 1e15 + 4]], dtype=torch.float32).to("xpu") layernorm = nn.LayerNorm(D, elementwise_affine=False) y = layernorm(x) print("LayerNorm Output:\n", y) ``` Output now (welford): ```python tensor([[0., 0., 0., 0., 0.]], device='xpu:0') ``` Output before (two-pass): ```python tensor([[nan, nan, nan, nan, nan]], device='xpu:0') ``` --------- Co-authored-by: Yu, Guangye <[email protected]> Co-authored-by: Yutao Xu <[email protected]>
- Loading branch information