Too much parameters when using mamba #697

sydat2701 · 2025-02-26T04:49:41Z

Hi authors,
I'm using Mamba for my 3D segmentation model. The Mamba block receives input with shape (B, L, D), where L is relatively small (e.g., 30) and D is large (e.g., 1600). When I compute the number of trainable parameters using: "sum(p.numel() for p in model.parameters() if p.requires_grad)". I get an unexpectedly large count (300M - 1B). However, despite this high parameter count, I can train my model with a batch size of up to 12 on a single NVIDIA 3090 (24GB VRAM) without memory issues, and the training speed is also quite fast. This seems unusual.

I have double-checked my model implementation and confirmed that the code is correct. Do you have any insights into what might be causing this?

Thank you for your support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too much parameters when using mamba #697

Too much parameters when using mamba #697

sydat2701 commented Feb 26, 2025

Too much parameters when using mamba #697

Too much parameters when using mamba #697

Comments

sydat2701 commented Feb 26, 2025