Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with number of layers for scan_layers=False in deepseek #1301

Open
deval281shah opened this issue Feb 22, 2025 · 4 comments
Open

Bug with number of layers for scan_layers=False in deepseek #1301

deval281shah opened this issue Feb 22, 2025 · 4 comments

Comments

@deval281shah
Copy link

Hi,

I tried to run a proxy model for deepseekv3 model with scan_layer=False, and it runs only two layers.

I think there is an error in this logic and an inner loop is missing in MaxText/models.py: :

dense_layer = RemattedBlockLayers[0]
moe_layer = RemattedBlockLayers[1]
num_moe_layers = cfg.num_decoder_layers - cfg.first_num_dense_layers
layers = [dense_layer, moe_layer]
layer_prefix = ["dense_layers", "moe_layers"]
num_layers = [cfg.first_num_dense_layers, num_moe_layers]
for index in range(len(layers)):
              y = layers[index](config=cfg, mesh=mesh, name=f"{layer_prefix[index]}_{index}", quant=self.quant)(
                  y,
                  decoder_segment_ids,
                  decoder_positions,
                  deterministic,
                  model_mode,
              )

I think there should be another for loop just after the first for loop that iterates through "range(num_layers[index])".

dense_layer = RemattedBlockLayers[0]
moe_layer = RemattedBlockLayers[1]
num_moe_layers = cfg.num_decoder_layers - cfg.first_num_dense_layers
layers = [dense_layer, moe_layer]
layer_prefix = ["dense_layers", "moe_layers"]
num_layers = [cfg.first_num_dense_layers, num_moe_layers]
for index in range(len(layers)):
     for index_j in range(num_layers[index]):
              y = layers[index](config=cfg, mesh=mesh, name=f"{layer_prefix[index]}_{index_j}", quant=self.quant)(
                  y,
                  decoder_segment_ids,
                  decoder_positions,
                  deterministic,
                  model_mode,
              )
@RissyRan
Copy link
Collaborator

RissyRan commented Feb 25, 2025

Thanks for spotting issue! Yes, you are right. Could you help draft a PR with one line fix? Thank you!

@deval281shah
Copy link
Author

deval281shah commented Feb 26, 2025

I have submitted a PR (changed file: models.py)

@RissyRan
Copy link
Collaborator

Cool! Just have a minor comment. Once you fill in the PR description, and we should be good to go! For this small changes, if you could have a quick run locally that will be perfect, but unit test and compile tests in github runners should be sufficient.

@RissyRan
Copy link
Collaborator

RissyRan commented Mar 3, 2025

We will track this change in #1337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants