Skip to content

Commit

Permalink
Rewritten logits line to avoid confusion
Browse files Browse the repository at this point in the history
  • Loading branch information
AlbertDominguez authored Aug 15, 2024
1 parent ad1a661 commit 2e93fe4
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions solution.py
Original file line number Diff line number Diff line change
Expand Up @@ -808,9 +808,9 @@ def show_samples(dataset, title, predictions=None, num_samples=10):
* one convolution, size 3x3, 32 output feature maps, padding=1, followed by a ReLU activation function
* one downsampling layer, size 2x2, via max-pooling
* one fully connected (linear) layer with 64 units (the previous feature maps need to be flattened for that), followed by a ReLU activation function
* one fully connected (linear) layer with 10 units, **without any activation function**. This will be the logits of the network.
* one fully connected (linear) layer with 10 units, **without any activation function**. The output of this layer will be the logits.
The fact that we do not add any activation function in the output is because certain loss functions in PyTorch (e.g. [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)) expect the logits of the network and already apply the activation function in a more efficient manner when computing the loss, offering speedup and more numerical stability compared to explicitly adding it. Therefore, one should not to add an activation function in the output layer when using these loss functions during training (always double check what is the expected input for the loss function you want to use!).
The fact that we do not add any activation function in the output is because certain loss functions in PyTorch (e.g. [`nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)) expect the logits given by the network and already apply the activation function in a more efficient manner when computing the loss, offering speedup and more numerical stability compared to explicitly adding it. Therefore, one should not to add an activation function in the output layer when using these loss functions during training (always double check what is the expected input for the loss function you want to use!).
Each layer above has a corresponding `torch` implementation (e.g., a convolutional layer is implemented by [`nn.Conv2D`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html), and the linear layer by [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html), which you have used before in Task 3). Please find the other necessary modules by browsing the [torch.nn documentation](https://pytorch.org/docs/stable/nn.html)! Flattening can be achieved by using the [`nn.Flatten` module](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) with its default parameters.
Expand Down

0 comments on commit 2e93fe4

Please sign in to comment.