Skip to content
This repository has been archived by the owner on May 4, 2020. It is now read-only.

Add Sparsified sgd #75

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Add Sparsified sgd #75

wants to merge 8 commits into from

Conversation

negar-foroutan
Copy link
Collaborator

This adds an implementation of sparsifiedSGD (the sparse version of SGD). An optimizer class, a scheduler function, and a communication function are added here.

optimizer.step()

if options.model_name == 'logistic_regression' and options.train_validate:
t = options.runtime['current_epoch'] * options.train_num_samples_per_device + batch_idx * options.batch_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single letter variable names are usually not that good for readability (except for something like i as a counter in a for loop), verbose variable names make the code more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'll fix it.


for weight in estimated_weights:
w = weight.squeeze()
batch_loss = np.log(1 + np.exp(-target * (data @ w)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a softmargin loss, right? Could use https://pytorch.org/docs/stable/nn.html#softmarginloss and https://pytorch.org/docs/stable/torch.html#torch.matmul here.

Especially since numpy ops are on the CPU, not GPU, so the .cuda() earlier would just waste time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll use the softmargin loss instead, thanks.


train_loss = global_average(loss, num_samples).item()

l2_loss = sum(weight.norm(2) ** 2 for weight in estimated_weights).item()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we just need to calculate L1 and L2 norm of a tensor here, I think using loss functions make it complicated.


self.num_coordinates = sparse_grad_size

def __setstate__(self, state):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the super class method should be automatically called if the child doesn't have the method, so wrapping it like this is unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I forgot to remove it.

params_sparse_tensors = []

for ind, param in enumerate(model.parameters()):
self.state[param]['memory'] += param.grad.data * lr[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a layer or similar gets manually assigned a different tensor, the dict-key here would change (it's id(parameter) which maps to the python object identifier). This could be prevented by using model.named_parameters() which returns a unique name for each parameter along with the values. Then that name could be used as dict key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot use the name as the dict key, because the key here is not the name of the parameter, it's the parameter itself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants