Transformer encoder -> Transformer decoder

In section 11.9.3 Decoder-Only, it should say "GPT pretraining with a Transformer decoder" instead of "GPT pretraining with a Transformer encoder", just as depicted in Fig. 11.9.6
d2l-ai · Jun 19, 2024 · 9695d46 · 9695d46
1 parent 23d7a5a
commit 9695d46
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.md b/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.md
@@ -270,7 +270,7 @@ as its backbone :cite:`Radford.Narasimhan.Salimans.ea.2018`.
 Following the autoregressive language model training
 as described in :numref:`subsec_partitioning-seqs`,
 :numref:`fig_gpt-decoder-only` illustrates
-GPT pretraining with a Transformer encoder,
+GPT pretraining with a Transformer decoder,
 where the target sequence is the input sequence shifted by one token.
 Note that the attention pattern in the Transformer decoder
 enforces that each token can only attend to its past tokens