Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate code between GPT2 and CoCa #63

Open
1 task
spravil opened this issue Mar 4, 2024 · 4 comments
Open
1 task

Remove duplicate code between GPT2 and CoCa #63

spravil opened this issue Mar 4, 2024 · 4 comments

Comments

@spravil
Copy link
Member

spravil commented Mar 4, 2024

  • Use the attention module in modalities.nn in GPT2
@lhahn-iis
Copy link
Contributor

Some points to that after @flxst and me have looked a little into GQA:

  • We found that both, CoCa but also GPT2 are (by today) implemented such that they are using one "central" config object, from which they both initialize there sub-components relevant for the respective model objects. Since this problem here also would involve some kind of refactoring, it might make sense to have a similar nested structure "re-using" analogue sub-components (like the attention implementation).
  • right now, we mimicked the test infrastructure of your changes, @spravil in the CoCa-branch but have the attention utility under "models". It would be good to unify this as well, but this is probably implicitly included within the work in the first point.
  • We are currently doing some config validations, but mostly in the different component's code, not in the related config objects. (mostly due to the fact, that there is no alignment between sub-components and sub-configs yet). We might wanna think about a unified way to do this validations. Those validations mostly take care of picking valid head-counts in relation to the embeddingsize. We also have to think about having those checks at the configs but also the sub-components, since modalities might also be used as library and people use the sub-components directly instead of approaching them via dedicated config objects.

@flxst
Copy link
Member

flxst commented Mar 15, 2024

We should probably work on this as soon as the related PRs (#16 and #74) are merged into main.

@lhahn-iis
Copy link
Contributor

We should probably work on this as soon as the related PRs (#16 and #74) are merged into main.

Good point. We can maybe talk about it on monday during the hackathon

@mali-git
Copy link
Member

@spravil, what is the status here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants