-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with finetuning with Corda #2317
Comments
Thanks for trying out CorDA and reporting your problem. To diagnose further, could you please provide a bit more info:
|
@BenjaminBossan sorry for beeing a bit unprecise. I figured out now that it seems to be just one of my seeds that causes problems for Corda I updated my initial comment with a full training script. For the training script the attached folder promptsource_custom.zip is also needed. It contains the finetuning templates. Since the performance seems to overall be a bit lacking for my dataset I was also wondering if the authors have some suggestions which corda hyperparamters I could change, maybe increase the number of samples for
|
I've noticed that the length of per question in your If length per question is 10x shorter than our experiment, and sample count remain the same, the covariance matrix might be low-ranked. As we apply damp for both KPM and IPM, the damp might be too large, causing the size of LoRA matrix You can try increasing sample count to about 2560 based on average length per question, or using cross attention (like our official example) if needed. |
@5eqn thanks alot for your answer! I will try increasing the sample count. One additional question regarding that, In my script am I defining the |
Yes, it's correct. |
Hi @sirluk , it seems that only one seed (the grey curve) causes the problem. It occurs due to numerical error because the process involves svd and matrix inverse. One suggestion to detect such issue is to compare the initial performances (e.g. on wiki or some benchmark before finetuning) of the pre-trained model and the PEFT model initialized with corda. They should be as close as possible. |
@iboing thanks for you additional suggestions! Increasing the sample count unfortunately didnt work as I was running into OOM errors (also when increasing from 256 to 512). But I will try assembling the dataset differently |
@5eqn @iboing |
Hi @sirluk , thanks for pointing this out! I've fixed this in our development branch by removing unnecessary for name, module in peft_model.base_model.named_modules():
if hasattr(module, "sample_count"):
del module.sample_count
if hasattr(module, "covariance_matrix"):
del module.covariance_matrix
if hasattr(module, "mean"):
del module.mean
if hasattr(module, "std"):
del module.std
if hasattr(module, "corda_method"):
del module.corda_method
if hasattr(module, "rank"):
del module.rank
if hasattr(module, "eigens"):
del module.eigens |
Thanks for this fruitful discussion. Regarding this optimization, could a PR be created for PEFT too? It would probably also help addressing the memory issue I had with CorDA that I reported earlier. I think it would also be great if the example docs could be updated to contain the insights discussed above regarding the size of the sample dataset. |
Sure, I'll open the PR after appending documentation. |
System Info
peft master branch (commit 8d3039b)
Who can help?
@BenjaminBossan @5eqn
Hi, I would like to try out Corda for my finetuning usecase but looking at the loss curves something seems to be going wrong so I just wanted to verify I implemented Corda correctly.
This is the relevant code snippet from my script. I have a tokenized dataset which I wrap with a dataloader with a batch size = 1 to pass to the
preprocess_corda
function. Oncepreprocess_corda
is done computing I can just instantiate the peft model as usual with the required config, correct?Would greatly appreciate some feedback.
Information
Tasks
examples
folderReproduction
Expected behavior
I tried to follow the corda example in the documentation and thought it should work like this
The text was updated successfully, but these errors were encountered: