We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hello
Looking at llama 3-8b, it has 32 heads of (1,32,7,7) so around 50176 terms.
If this was flattened and compressed somehow to the input required for Pixart Sigma (300 tokens?)
like a latent text embedding space, could this be used to train a model from scratch? would this be any good?
I guess masking would need to be changed?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hello
Looking at llama 3-8b, it has 32 heads of (1,32,7,7) so around 50176 terms.
If this was flattened and compressed somehow to the input required for Pixart Sigma (300 tokens?)
like a latent text embedding space, could this be used to train a model from scratch?
would this be any good?
I guess masking would need to be changed?
The text was updated successfully, but these errors were encountered: