-
Notifications
You must be signed in to change notification settings - Fork 8
Add Sparsified sgd #75
base: develop
Are you sure you want to change the base?
Conversation
optimizer.step() | ||
|
||
if options.model_name == 'logistic_regression' and options.train_validate: | ||
t = options.runtime['current_epoch'] * options.train_num_samples_per_device + batch_idx * options.batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single letter variable names are usually not that good for readability (except for something like i
as a counter in a for loop), verbose variable names make the code more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I'll fix it.
|
||
for weight in estimated_weights: | ||
w = weight.squeeze() | ||
batch_loss = np.log(1 + np.exp(-target * (data @ w))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a softmargin loss, right? Could use https://pytorch.org/docs/stable/nn.html#softmarginloss and https://pytorch.org/docs/stable/torch.html#torch.matmul here.
Especially since numpy ops are on the CPU, not GPU, so the .cuda()
earlier would just waste time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll use the softmargin loss instead, thanks.
|
||
train_loss = global_average(loss, num_samples).item() | ||
|
||
l2_loss = sum(weight.norm(2) ** 2 for weight in estimated_weights).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, could use https://pytorch.org/docs/stable/nn.html#l1loss and https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we just need to calculate L1 and L2 norm of a tensor here, I think using loss functions make it complicated.
|
||
self.num_coordinates = sparse_grad_size | ||
|
||
def __setstate__(self, state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the super class method should be automatically called if the child doesn't have the method, so wrapping it like this is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I forgot to remove it.
params_sparse_tensors = [] | ||
|
||
for ind, param in enumerate(model.parameters()): | ||
self.state[param]['memory'] += param.grad.data * lr[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a layer or similar gets manually assigned a different tensor, the dict-key here would change (it's id(parameter)
which maps to the python object identifier). This could be prevented by using model.named_parameters()
which returns a unique name for each parameter along with the values. Then that name could be used as dict key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot use the name as the dict key, because the key here is not the name of the parameter, it's the parameter itself.
This adds an implementation of sparsifiedSGD (the sparse version of SGD). An optimizer class, a scheduler function, and a communication function are added here.