Use `scan` and `hostoffloading` for llama model #123

zpcore · 2025-02-25T19:04:38Z

Port scan and hostoffloading for llama model based on @tengyifei 's prototype in 1 and 2.

The sharding schema in torchprime/torch_xla_models/configs/model/scaling/llama-fsdp.yaml also plays well with the scan code.

Currently there are NaN issue when we use scan with flash attention kernel related to pytorch/xla#8734. Need to resolve the issue before producing the correct output.

zpcore · 2025-02-26T02:35:30Z

torchprime/torch_xla_models/configs/model/llama-3-8b.yaml

@@ -20,3 +20,4 @@ attention_dropout: false
 attention_bias: false
 flash_attention: true
 rope_theta: 500000.0
+scan_decoder_layers: true


move to default yaml file

zpcore added 5 commits February 13, 2025 21:52

Support run trainer locally

2a81ec2

nit

9a56545

update docker command

4a24ba6

initial runnable version

252cf49

support hostoffloading

1f2c3e3

zpcore commented Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `scan` and `hostoffloading` for llama model #123

Use `scan` and `hostoffloading` for llama model #123

zpcore commented Feb 25, 2025

zpcore Feb 26, 2025 •

edited

Loading

Use scan and hostoffloading for llama model #123

Are you sure you want to change the base?

Use scan and hostoffloading for llama model #123

Conversation

zpcore commented Feb 25, 2025

zpcore Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Use `scan` and `hostoffloading` for llama model #123

Use `scan` and `hostoffloading` for llama model #123

zpcore Feb 26, 2025 •

edited

Loading