What is Entropy in Deep Learning?
In deep learning, entropy usually refers to information entropy, a concept from information theory introduced by Claude Shannon in 1948. It measures the uncertainty or randomness in a probability distribution.
Intuition:
If you're very uncertain about something (like predicting the class of an image), the entropy is high.
If you're confident about your prediction (for example, the model is very sure it’s a cat), the entropy is low.
Mathematical Definition (Shannon Entropy)
Given a probability distribution $p = [p_1, p_2, ..., p_n]$, the entropy H(p) is calculated as:
This formula sums the "surprise" or "information content" from each possible outcome.
Entropy in Deep Learning: Where and Why?
In deep learning, entropy is most commonly used in the loss function, particularly:
Cross Entropy Loss
- Used for classification problems.
- Compares the true distribution (labels) vs. the predicted distribution (from softmax).
- Encourages the model to reduce uncertainty and improve prediction accuracy.
Cross-Entropy Formula:
Where:
- is the true label (usually one-hot encoded).
- is the predicted probability.
Cross-Entropy Loss Implementation from Scratch
import numpy as np
def cross_entropy_loss(y_true, y_pred, epsilon=1e-15):
"""
Compute the cross-entropy loss between true labels and predicted probabilities.
Args:
y_true (ndarray): Ground truth labels (one-hot encoded). Shape (N, C)
y_pred (ndarray): Predicted probabilities from model (softmax output). Shape (N, C)
epsilon (float): Small value to avoid log(0)
Returns:
float: The average cross-entropy loss over all samples
"""
# Clip predictions to prevent log(0) error
y_pred = np.clip(y_pred, epsilon, 1. - epsilon)
# Calculate the log of predictions
log_preds = np.log(y_pred)
# Element-wise multiplication of true labels and log predictions
# Then take the negative and average over all samples
loss = -np.sum(y_true * log_preds) / y_true.shape[0]
return loss
Test Code: Try It Out
Let's test this function using both:
- A correct one-hot true label
- A predicted softmax probability vector
# Simulated predictions from a model (already passed through softmax)
predicted_probs = np.array([
[0.7, 0.2, 0.1],
[0.1, 0.8, 0.1],
[0.2, 0.2, 0.6]
])
# Ground truth labels (one-hot encoded)
true_labels = np.array([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]
])
# Compute and print cross-entropy loss
loss = cross_entropy_loss(true_labels, predicted_probs)
print(f"Cross-Entropy Loss: {loss:.4f}")
PyTorch Example: Entropy and Cross Entropy
import torch
import torch.nn as nn
import torch.nn.functional as F
# Simulated model output (logits) and true labels
logits = torch.tensor([[2.0, 1.0, 0.1]], requires_grad=True) # Raw output from a neural net
labels = torch.tensor([0]) # Class 0 is the correct one
# Apply softmax to get probabilities
probs = F.softmax(logits, dim=1)
print("Predicted probabilities:", probs)
# Calculate entropy manually
entropy = -torch.sum(probs * torch.log(probs))
print("Entropy:", entropy.item())
# Now use PyTorch's CrossEntropyLoss
criterion = nn.CrossEntropyLoss()
loss = criterion(logits, labels)
print("Cross-entropy loss:", loss.item())
Output:
Predicted probabilities: tensor([[0.6590, 0.2424, 0.0986]], grad_fn=<SoftmaxBackward0>) Entropy: 0.9686 Cross-entropy loss: 0.4170
References
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
https://www.deeplearningbook.org/ - Claude Shannon (1948). A Mathematical Theory of Communication. https://ieeexplore.ieee.org/document/6773024
Comments
Post a Comment