Can we load a torchscript model and continue training/finetuning the loaded model? #1172

guangster · 2021-08-18T16:14:34Z

guangster
Aug 18, 2021

It seems pytorch allows a Torchscript model to continue to train when loaded outside python, for example in c++ as discussed here https://github.com/pytorch/pytorch/issues/17614.
Is something like this possible with DJL?

I tried to generate a Torchscript model from huggingface following the script here https://huggingface.co/transformers/torchscript.html#saving-a-model
and loading this with DJL.

Then I added my task-specific loss and trained for a few iterations. There were no errors but I noticed only the task head was updated while the language model (the Torchscript model) did not seem to update (same result if I run forward before and after training).

There are no errors from the Pytorch engine. I tried to follow the c++ example from pytorch but I am not sure how to access the parameters as this person did in c++ here https://github.com/pytorch/pytorch/issues/17614#issuecomment-769151466. If I examine the Block object I loaded, I could not find the parameters as the c++ example shows.

Any help is greatly appreciated =)

frankfliu · 2021-08-18T16:29:41Z

frankfliu
Aug 18, 2021

DJL support transfer learning for PyTorch, you can take a look this example: https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/training/transferlearning/TrainAmazonReviewRanking.java

1 reply

guangster Aug 18, 2021
Author

Thanks for the quick response! I took a look at this example. However here the language model is added as a frozen forward-only block according to here https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/training/transferlearning/TrainAmazonReviewRanking.java#L146-L159.

So rather than fine-tuning the entire architecture, I think the code is only updating the task head (the classification layers) while the loaded text embedding layer (the traced distilbert model in this case) is frozen.

saudet · 2021-08-27T07:32:11Z

saudet
Aug 27, 2021

We can access the C++ API with the JavaCPP Presets for PyTorch, so it should be possible to do that:
https://github.com/bytedeco/javacpp-presets/tree/master/pytorch

2 replies

guangster Sep 1, 2021
Author

Thanks for this suggestion! I spent a few days looking into JavaCPP Presets for PyTorch however I'm not sure if this is possible in Java since I couldn't figure out how to even load the weights in the first place.

I couldn't find a way to load torchscript model here. This access to torch.jit seems to be missing (no JIT or load(...) from looking at the javadoc and looking at the code). So we can't do something like this in C++. (Interestingly, DJL does support this somehow).
// Deserialize the ScriptModule from a file using torch::jit::load(). module = torch::jit::load(argv[1]);
If I try to use the python's pattern, neither torch.load or torch.load_state_dict are present, and it seems only torch.pickle_load(...) is exposed.

saudet Sep 1, 2021

The C++ JIT classes and functions of TorchScript have only been mapped recently.
We can easily try them out with the snapshots though: http://bytedeco.org/builds/
And this sample code works fine: https://github.com/bytedeco/javacpp-presets/blob/master/pytorch/samples/ExampleApp.java

guangster · 2021-09-01T14:00:19Z

guangster
Sep 1, 2021
Author

Hi, I made some changes to loading using the PtEngine's loader. Now I can see the blocks from the loaded model appearing as PtSymbolBlock. However loading the model this way causes checkGradients() to fail. When I looked at them closer, it looks like the loaded block is not returning gradient to DJL properly.

Specifically this block of code
https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/training/Trainer.java#L327-L337
is not adding any NDArray to grads. So the next few line fails (NDArrays.stack(list) is trying to stack an empty list). Any ideas?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we load a torchscript model and continue training/finetuning the loaded model? #1172

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can we load a torchscript model and continue training/finetuning the loaded model? #1172

guangster Aug 18, 2021

Replies: 3 comments · 3 replies

frankfliu Aug 18, 2021

guangster Aug 18, 2021 Author

saudet Aug 27, 2021

guangster Sep 1, 2021 Author

saudet Sep 1, 2021

guangster Sep 1, 2021 Author

guangster
Aug 18, 2021

Replies: 3 comments 3 replies

frankfliu
Aug 18, 2021

guangster Aug 18, 2021
Author

saudet
Aug 27, 2021

guangster Sep 1, 2021
Author

guangster
Sep 1, 2021
Author