-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP2 and peft #2344
Comments
Could you please provide environment info (package versions, hardware, etc), a reproducer and the full error message? In case you're using accelerate (maybe indirectly via transformers |
No, Im not using accelerate. But Im following torchtriton: This is more an explorative question if someone has successfully run fsdp2 with peft explicitly? I am not finding any info out there on it. I think it mostly boils down which layers etc youre sharding at what point. |
Unless you've rewritten that script, note that it uses torchtune, not PEFT.
Personally, I haven't tried it. I'll probably do that once FSDP2 is supported by accelerate. |
That's why Im asking here, I tried using peft with pretty much their script. It seems to work for them with their lora implementations. |
Hey, sorry if this is the wrong place. Feel free to move it to discussion.
I am trying to get peft working with fsdp2 and am wondering if someone else attempted that already?
The issue is that Im always getting errors along the lines of:
RuntimeError: aten.mm.default: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
Happy for any pointers.
The text was updated successfully, but these errors were encountered: