Safer Checkpoint File Format #3181

mbway · 2024-04-05T19:02:07Z

🚀 Feature Request

I have been investigating if it is possible to store checkpoints in a format that does not depend on pickle, as pickles are inherently unsafe as they allow for arbitrary code execution. The .pt file format created by torch.save uses pickle internally.

Would this functionality be something that you would be interested in? If so I can post a PR with my implementation for consideration.

Implementation

My solution is a custom .safe format that is a .tar archive containing:

my_checkpoint.safe
├── tensors.safetensors
├── ndarrays.npz
└── other.json

To save a state dictionary, I pre-process it so anything not json-serializable gets converted to a dictionary like {"__safe_obj_type": "tensor", "id": "123"} for example. The __safe_obj_type key is then used during loading to restore the object back to what it was before. The gathered tensors and numpy arrays are stored as safetensors and npz respectively and anything else is stored in the json file.

At least for the state-dict I was trying to serialize I found that only a few non-json-serializable data types needed to be supported: torch.tensor, np.ndarray, datetime.timedelta, tuple Though a few others may be needed if this was implemented beyond a proof of concept.

The text was updated successfully, but these errors were encountered:

mvpatel2000 · 2024-04-08T15:16:58Z

@mbway Thanks for flagging this -- we're currently actively working on a major redesign of checkpointing that will include migrating to a safer format. We've been hesitant to change how checkpointing works as we deeply care about backwards compatibility, so we have been buffering up a long list of features we will now include in the revamp. You should start seeing things roll out this month.

CC: @eracah it might be good to publicly share a version of the roadmap as a RFC when its ready

If you already have a PR, please feel free to post it -- it's probably helpful for us to view a design and gain some inspiration. Given the refactor though, we likely wouldn't accept it as this seems like a big change.

mbway added the enhancement New (engineering) enhancements, such as features or API changes. label Apr 5, 2024

mbway mentioned this issue Apr 11, 2024

draft: add prototype for safer checkpoint format #3186

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safer Checkpoint File Format #3181

Safer Checkpoint File Format #3181

mbway commented Apr 5, 2024

mvpatel2000 commented Apr 8, 2024

Safer Checkpoint File Format #3181

Safer Checkpoint File Format #3181

Comments

mbway commented Apr 5, 2024

🚀 Feature Request

Implementation

mvpatel2000 commented Apr 8, 2024