Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The paper on the Bone structure has been updated #2312

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

JL-er
Copy link
Contributor

@JL-er JL-er commented Jan 8, 2025

The new version of the paper is more reliable and easier for readers to understand.
https://arxiv.org/abs/2409.15371

@BenjaminBossan
Copy link
Member

Thanks for the update.

I haven't compared the new paper, but checking the PR: Is DiSHA a method that could be used on its own, but is not available in PEFT, or is it the union of Bone and Bat? What I'd like to avoid is for users to be confused when they want to use DiSHA but don't find any method with that name in PEFT. Maybe this could be clarified in the description.

@JL-er
Copy link
Contributor Author

JL-er commented Jan 8, 2025

Simply put, DiSHA is the overall framework, while Bone and Bat are just subsets of it. So I even want to rename Bone to DiSHA, and then choose either Bone or Bat during init_weights.

@BenjaminBossan
Copy link
Member

So I even want to rename Bone to DiSHA, and then choose either Bone or Bat during init_weights.

When it comes to the PEFT code, let's avoid any renaming, as this would break backwards compatibility.

@JL-er
Copy link
Contributor Author

JL-er commented Feb 6, 2025

So I even want to rename Bone to DiSHA, and then choose either Bone or Bat during init_weights.

When it comes to the PEFT code, let's avoid any renaming, as this would break backwards compatibility.

So how should I modify the description of Bone? Because DiSHA explains in detail how Bone came about, the theoretical support provided in the older Bone paper is insufficient.

@BenjaminBossan
Copy link
Member

I think rewriting the descriptions of the method as you did is okay, as the existing code is not affected by it. I just think you should highlight that in the PEFT code base, the method is still referred to as Bone and users should use BoneConfig etc. Also, I think it's worth updating the docstring, for instance for BoneConfig, to mention the relationship to DiSHA. Imagine a user reads the new docs and then searches for DiSHA in the code base, they would currently not get any hits.

In theory, we can also allow renaming the method in code. However, for that, we would need a long deprecation period, so that users who use BoneConfig know that they should switch to DishaConfig. If the prefixes of the adapter weights are changed, we also need a way to convert existing checkpoints. I think it is not worth the hassle and we should keep the existing names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants