About Model-Breadcrumbs merge implementation #455

vishaal27 · 2024-11-09T12:58:46Z

Hey,

Thanks for your great work, I have a question about the BreadCrumbs sparsification implementation in

Lines 61 to 100 in 57e7d14

    
           def magnitude_outliers( 
        
               tensor: torch.Tensor, density: float, rescale: bool, gamma: float = 0.01 
        
           ): 
        
               """Masks out smallest values in addition to large outliers. 
        
               The `gamma` proportion of the largest weights are first removed, then the 
        
               smallest weights are removed to achieve the desired density. 
        
               Args: 
        
                   tensor (torch.Tensor): The tensor to sparsify. 
        
                   density (float): The proportion of weights to retain. 
        
                   gamma (float): Percent of largest weights to remove. 
        
               """ 
        
               if density >= 1: 
        
                   return tensor 
        
               num_elems = tensor.numel() 
        
               target_n = int(density * num_elems) 
        
               n_top = int(gamma * num_elems) 
        
               n_bot = num_elems - target_n - n_top 
        
               if n_bot < 0: 
        
                   # cut down on the number of large weights to remove in 
        
                   # order to hit the target density 
        
                   n_top += n_bot 
        
                   n_bot = 0 
        
               w = tensor.abs().view(-1) 
        
               if w.device.type == "cpu": 
        
                   w = w.float() 
        
               indices = torch.sort(w, descending=False).indices 
        
               mask = torch.zeros_like(tensor) 
        
               mask.view(-1)[indices[n_bot:-n_top]] = 1 
        
               if rescale: 
        
                   res = rescale_sum(tensor, mask) 
        
               else: 
        
                   res = tensor * mask 
        
               return res

From the Model-Breadcrumbs paper, they seem to be doing the top-beta and bottom-gamma pruning per layer independently within a task vector. However, in the implementation in your toolkit, it seems like the top-beta and bottom-gamma pruning is done globally across all layers within a task vector? Wouldn't this potentially do an incorrect pruning (based on what the paper describes) if the per-layer statistics of the weights are quite different, across layers?

Please correct me if I am misunderstanding something. Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Model-Breadcrumbs merge implementation #455

About Model-Breadcrumbs merge implementation #455

vishaal27 commented Nov 9, 2024

About Model-Breadcrumbs merge implementation #455

About Model-Breadcrumbs merge implementation #455

Comments

vishaal27 commented Nov 9, 2024