Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much parameters when using mamba #697

Open
sydat2701 opened this issue Feb 26, 2025 · 0 comments
Open

Too much parameters when using mamba #697

sydat2701 opened this issue Feb 26, 2025 · 0 comments

Comments

@sydat2701
Copy link

Hi authors,
I'm using Mamba for my 3D segmentation model. The Mamba block receives input with shape (B, L, D), where L is relatively small (e.g., 30) and D is large (e.g., 1600). When I compute the number of trainable parameters using: "sum(p.numel() for p in model.parameters() if p.requires_grad)". I get an unexpectedly large count (300M - 1B). However, despite this high parameter count, I can train my model with a batch size of up to 12 on a single NVIDIA 3090 (24GB VRAM) without memory issues, and the training speed is also quite fast. This seems unusual.

I have double-checked my model implementation and confirmed that the code is correct. Do you have any insights into what might be causing this?

Thank you for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant