Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison of Different Fine-Tuning Techniques for Conversational AI #2310

Open
ImamaDev opened this issue Jan 7, 2025 · 5 comments
Open
Labels
contributions-welcome good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ImamaDev
Copy link

ImamaDev commented Jan 7, 2025

Feature request

It would be incredibly helpful to have a clear comparison or support for various fine-tuning techniques specifically for conversational AI. This feature could include insights into their strengths, limitations, and ideal use cases, helping practitioners choose the right approach for their needs.

Here’s a list of techniques to consider:

LoRa
AdaLoRa
BONE
VeRa
XLora
LN Tuning
VbLora
HRA (Hyperparameter Regularization Adapter)
IA3 (Input-Aware Adapter)
Llama Adapter
CPT (Conditional Prompt Tuning)etc

Motivation

With the growing number of fine-tuning techniques for conversational AI, it can be challenging to identify the most suitable approach for specific use cases. A comprehensive comparison of these techniques—highlighting their strengths, limitations, and ideal scenarios—would save time, reduce trial-and-error, and empower users to make informed decisions. This feature would bridge the gap between research and practical application, enabling more effective model customization and deployment.

Your contribution

I’d be happy to collaborate on this! While I might not have a complete solution right now, I’m willing to contribute by gathering resources, reviewing papers, or helping organize comparisons. If others are interested in teaming up, we could work together on a PR to make this feature happen. Let’s connect and brainstorm how we can tackle this effectively!

@BenjaminBossan
Copy link
Member

Thanks for coming up with this proposal. Indeed, this is something we have on our backlog for a long time. As you can imagine, providing objective and useful information on this is a huge undertaking, since relying on the paper results can often be problematic.

As a long term project, we plan to provide some kind of benchmark that compares all these methods in terms of runtime, memory usage, performance, etc. but I can't give any concrete date yet.

In the meantime, we have started to be more rigorous when new methods are being added in requiring a clear description of what the best use cases are. There is still a lot of room for improvement, especially when it comes to methods that were added some time ago.

If you (and others) want to contribute, I think a good place to start would be to go through the individual methods in the PEFT docs and help improve the descriptions. If we can make them more uniform, with more details on the best uses cases, pros and cons, this would already be a nice improvement.

image

There are other places that could benefit from such a clean up, e.g. the description of all the LoRA initialization methods.

@BenjaminBossan BenjaminBossan added good first issue Good for newcomers help wanted Extra attention is needed contributions-welcome labels Jan 7, 2025
@sparsh2
Copy link
Contributor

sparsh2 commented Jan 7, 2025

I would be interested to contribute as well

@BenjaminBossan
Copy link
Member

I would be interested to contribute as well

Thanks for the offer. As mentioned, as a first step, we could use some help with updating the "blurbs" of the PEFT methods. For this, it's often sufficient to read a couple of section from the paper. If anyone wants to work on one such method, please announce it here so that there is no duplicate work.

@imcoza
Copy link

imcoza commented Jan 15, 2025

How about having a sample fine-tuning script for each method and comparing different approaches for different tasks?

@BenjaminBossan
Copy link
Member

How about having a sample fine-tuning script for each method and comparing different approaches for different tasks?

I'm not 100% sure what you mean, but let's start with a single task and then we can expand from there. We haven't come up with such a task yet, but we have some criteria:

  1. It should be a task that is supported by all methods (most likely language model fine-tuning)
  2. The task should be kinda realistic and practical
  3. The task should not take too long to run and should not require expensive hardware
  4. Training code should be easy to adapt for real training (should have example character)

Maybe we can find something from the trl examples that we can adopt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions-welcome good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants