Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP2 and peft #2344

Open
psinger opened this issue Jan 23, 2025 · 4 comments
Open

FSDP2 and peft #2344

psinger opened this issue Jan 23, 2025 · 4 comments

Comments

@psinger
Copy link

psinger commented Jan 23, 2025

Hey, sorry if this is the wrong place. Feel free to move it to discussion.

I am trying to get peft working with fsdp2 and am wondering if someone else attempted that already?

The issue is that Im always getting errors along the lines of:
RuntimeError: aten.mm.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!

Happy for any pointers.

@BenjaminBossan
Copy link
Member

Could you please provide environment info (package versions, hardware, etc), a reproducer and the full error message? In case you're using accelerate (maybe indirectly via transformers Trainer), please take note that FSDP2 is not supported yet.

@psinger
Copy link
Author

psinger commented Jan 23, 2025

No, Im not using accelerate. But Im following torchtriton:
https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_distributed.py

This is more an explorative question if someone has successfully run fsdp2 with peft explicitly? I am not finding any info out there on it.

I think it mostly boils down which layers etc youre sharding at what point.

@BenjaminBossan
Copy link
Member

But Im following torchtriton:
https://github.com/pytorch/torchtune/blob/main/recipes/lora_dpo_distributed.py

Unless you've rewritten that script, note that it uses torchtune, not PEFT.

This is more an explorative question if someone has successfully run fsdp2 with peft explicitly?

Personally, I haven't tried it. I'll probably do that once FSDP2 is supported by accelerate.

@psinger
Copy link
Author

psinger commented Jan 23, 2025

That's why Im asking here, I tried using peft with pretty much their script. It seems to work for them with their lora implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants