You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Masks out smallest values in addition to large outliers.
The `gamma` proportion of the largest weights are first removed, then the
smallest weights are removed to achieve the desired density.
Args:
tensor (torch.Tensor): The tensor to sparsify.
density (float): The proportion of weights to retain.
gamma (float): Percent of largest weights to remove.
"""
ifdensity>=1:
returntensor
num_elems=tensor.numel()
target_n=int(density*num_elems)
n_top=int(gamma*num_elems)
n_bot=num_elems-target_n-n_top
ifn_bot<0:
# cut down on the number of large weights to remove in
# order to hit the target density
n_top+=n_bot
n_bot=0
w=tensor.abs().view(-1)
ifw.device.type=="cpu":
w=w.float()
indices=torch.sort(w, descending=False).indices
mask=torch.zeros_like(tensor)
mask.view(-1)[indices[n_bot:-n_top]] =1
ifrescale:
res=rescale_sum(tensor, mask)
else:
res=tensor*mask
returnres
From the Model-Breadcrumbs paper, they seem to be doing the top-beta and bottom-gamma pruning per layer independently within a task vector. However, in the implementation in your toolkit, it seems like the top-beta and bottom-gamma pruning is done globally across all layers within a task vector? Wouldn't this potentially do an incorrect pruning (based on what the paper describes) if the per-layer statistics of the weights are quite different, across layers?
Please correct me if I am misunderstanding something. Thanks
The text was updated successfully, but these errors were encountered:
Hey,
Thanks for your great work, I have a question about the BreadCrumbs sparsification implementation in
mergekit/mergekit/sparsify.py
Lines 61 to 100 in 57e7d14
From the Model-Breadcrumbs paper, they seem to be doing the top-beta and bottom-gamma pruning per layer independently within a task vector. However, in the implementation in your toolkit, it seems like the top-beta and bottom-gamma pruning is done globally across all layers within a task vector? Wouldn't this potentially do an incorrect pruning (based on what the paper describes) if the per-layer statistics of the weights are quite different, across layers?
Please correct me if I am misunderstanding something. Thanks
The text was updated successfully, but these errors were encountered: