Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Model-Breadcrumbs merge implementation #455

Open
vishaal27 opened this issue Nov 9, 2024 · 0 comments
Open

About Model-Breadcrumbs merge implementation #455

vishaal27 opened this issue Nov 9, 2024 · 0 comments

Comments

@vishaal27
Copy link

Hey,

Thanks for your great work, I have a question about the BreadCrumbs sparsification implementation in

def magnitude_outliers(
tensor: torch.Tensor, density: float, rescale: bool, gamma: float = 0.01
):
"""Masks out smallest values in addition to large outliers.
The `gamma` proportion of the largest weights are first removed, then the
smallest weights are removed to achieve the desired density.
Args:
tensor (torch.Tensor): The tensor to sparsify.
density (float): The proportion of weights to retain.
gamma (float): Percent of largest weights to remove.
"""
if density >= 1:
return tensor
num_elems = tensor.numel()
target_n = int(density * num_elems)
n_top = int(gamma * num_elems)
n_bot = num_elems - target_n - n_top
if n_bot < 0:
# cut down on the number of large weights to remove in
# order to hit the target density
n_top += n_bot
n_bot = 0
w = tensor.abs().view(-1)
if w.device.type == "cpu":
w = w.float()
indices = torch.sort(w, descending=False).indices
mask = torch.zeros_like(tensor)
mask.view(-1)[indices[n_bot:-n_top]] = 1
if rescale:
res = rescale_sum(tensor, mask)
else:
res = tensor * mask
return res

From the Model-Breadcrumbs paper, they seem to be doing the top-beta and bottom-gamma pruning per layer independently within a task vector. However, in the implementation in your toolkit, it seems like the top-beta and bottom-gamma pruning is done globally across all layers within a task vector? Wouldn't this potentially do an incorrect pruning (based on what the paper describes) if the per-layer statistics of the weights are quite different, across layers?

Please correct me if I am misunderstanding something. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant