You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
The work I am doing involves training and using the Intel NDNS Baseline Solution.
When I attempt to load the model dict from a checkpoint, PyTorch throws the following error:
Error(s) in loading state_dict for SDNN: size mismatch for blocks.1.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.2.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).
File "/debug_space/Framework/utils/io/checkpoint.py", line 117, in load model.load_state_dict(checkpoint["model_state_dict"]) File "/debug_space/Framework/singlerank/run.py", line 100, in load_objects start_epoch, model, optimiser, scheduler = utils.io.checkpoint.load( File "/debug_space/Framework/singlerank/run.py", line 345, in <module> model, criterion, optimiser, scheduler, history, start_epoch = load_objects( RuntimeError: Error(s) in loading state_dict for SDNN: size mismatch for blocks.1.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.2.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).
Upon looking at the code and variable watch, I find that the Delay shape and contents is only initialised during the forward pass.
If I am loading from a checkpoint, what would be the best solution:
Remove the delay from the saved checkpoint
Initialise the Delay when creating the model object, using a dummy tensor, then load the model dict after.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
The work I am doing involves training and using the Intel NDNS Baseline Solution.
When I attempt to load the model dict from a checkpoint, PyTorch throws the following error:
Error(s) in loading state_dict for SDNN: size mismatch for blocks.1.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.2.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).
File "/debug_space/Framework/utils/io/checkpoint.py", line 117, in load model.load_state_dict(checkpoint["model_state_dict"]) File "/debug_space/Framework/singlerank/run.py", line 100, in load_objects start_epoch, model, optimiser, scheduler = utils.io.checkpoint.load( File "/debug_space/Framework/singlerank/run.py", line 345, in <module> model, criterion, optimiser, scheduler, history, start_epoch = load_objects( RuntimeError: Error(s) in loading state_dict for SDNN: size mismatch for blocks.1.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.2.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).
Upon looking at the code and variable watch, I find that the Delay shape and contents is only initialised during the forward pass.
If I am loading from a checkpoint, what would be the best solution:
Thanks for the help
Beta Was this translation helpful? Give feedback.
All reactions