Delay shape incorrect when loading model dict from checkpoint. #288

jomar0 · 2024-02-13T17:44:38Z

jomar0
Feb 13, 2024

Hello,
The work I am doing involves training and using the Intel NDNS Baseline Solution.
When I attempt to load the model dict from a checkpoint, PyTorch throws the following error:

Error(s) in loading state_dict for SDNN: size mismatch for blocks.1.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.2.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).

File "/debug_space/Framework/utils/io/checkpoint.py", line 117, in load model.load_state_dict(checkpoint["model_state_dict"]) File "/debug_space/Framework/singlerank/run.py", line 100, in load_objects start_epoch, model, optimiser, scheduler = utils.io.checkpoint.load( File "/debug_space/Framework/singlerank/run.py", line 345, in <module> model, criterion, optimiser, scheduler, history, start_epoch = load_objects( RuntimeError: Error(s) in loading state_dict for SDNN: size mismatch for blocks.1.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]). size mismatch for blocks.2.delay.delay: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1]).

Upon looking at the code and variable watch, I find that the Delay shape and contents is only initialised during the forward pass.

If I am loading from a checkpoint, what would be the best solution:

Remove the delay from the saved checkpoint
Initialise the Delay when creating the model object, using a dummy tensor, then load the model dict after.

Thanks for the help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay shape incorrect when loading model dict from checkpoint. #288

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Delay shape incorrect when loading model dict from checkpoint. #288

jomar0 Feb 13, 2024

Replies: 0 comments

jomar0
Feb 13, 2024