Skip to content

Commit

Permalink
Transformer encoder -> Transformer decoder
Browse files Browse the repository at this point in the history
In section 11.9.3 Decoder-Only, it should say "GPT pretraining with a Transformer decoder" instead of "GPT pretraining with a Transformer encoder", just as depicted in Fig. 11.9.6
  • Loading branch information
MassEast authored Jun 19, 2024
1 parent 23d7a5a commit 9695d46
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ as its backbone :cite:`Radford.Narasimhan.Salimans.ea.2018`.
Following the autoregressive language model training
as described in :numref:`subsec_partitioning-seqs`,
:numref:`fig_gpt-decoder-only` illustrates
GPT pretraining with a Transformer encoder,
GPT pretraining with a Transformer decoder,
where the target sequence is the input sequence shifted by one token.
Note that the attention pattern in the Transformer decoder
enforces that each token can only attend to its past tokens
Expand Down

0 comments on commit 9695d46

Please sign in to comment.