You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I fine-tuned Llama 3.1 8B on 1 epoch of 36.000 samples, with the sample token length ranging from 1000 to 20.000 tokens.
When looking at the average length of a sample, it's only around 2000 tokens though. There are 1600 samples that are over 5000 tokens in length.
I'm training on completions only.
I am teaching it my own, custom prompt format.
There are over 10.000 samples where the completion is over 1000 tokens long.
I'm using a 128 rank, 256 alpha.
My batch size is 1, while my gradient accumulation is 8.
Loss
The train loss & eval loss seemed to do OK.
On average, train loss went from over 1.4 to 1.23
Eval loss went from 1.18 to 0.96
Testing it
But when I actually finally inference something (a sample that was even in the training data), it just starts to repeat itself very, very quickly:
For example:
I woke up with a start. I was sweating. I looked at the clock. It was 3:00 AM. I looked at the phone. I had 100 notifications.
I looked at the first one. It read "DO NOT LOOK AT THE MOON".
I looked at the second one. It read "It's a beautiful night tonight. Look outside."
I looked at the third one. It read "It's a beautiful night tonight. Look outside."
I looked at the fourth one. It read "It's a beautiful night tonight. Look outside."
I looked at the fifth one. It read "It's a beautiful night tonight. Look outside."
...
And it goes on and on.
I can easily make it write other stories that seem fine for a few sentences, then start to repeat themselves in some way after a while.
So is something wrong with finetuning on longer outputs?
Or do I still not have enough data?
Or does finetuning a base model just require a lot more data?
The text was updated successfully, but these errors were encountered:
I fine-tuned Llama 3.1 8B on 1 epoch of 36.000 samples, with the sample token length ranging from 1000 to 20.000 tokens.
When looking at the average length of a sample, it's only around 2000 tokens though. There are 1600 samples that are over 5000 tokens in length.
I'm training on completions only.
I am teaching it my own, custom prompt format.
There are over 10.000 samples where the completion is over 1000 tokens long.
I'm using a 128 rank, 256 alpha.
My batch size is 1, while my gradient accumulation is 8.
Loss
The train loss & eval loss seemed to do OK.
On average, train loss went from over 1.4 to 1.23
Eval loss went from 1.18 to 0.96
Testing it
But when I actually finally inference something (a sample that was even in the training data), it just starts to repeat itself very, very quickly:
For example:
And it goes on and on.
I can easily make it write other stories that seem fine for a few sentences, then start to repeat themselves in some way after a while.
So is something wrong with finetuning on longer outputs?
Or do I still not have enough data?
Or does finetuning a base model just require a lot more data?
The text was updated successfully, but these errors were encountered: