What is Information Theory?
Information theory is a mathematical framework developed by Claude Shannon in the 1940s to understand how information is measured, transmitted, and compressed.
At its core, it deals with questions like:
- How much information is in a message?
- How can we represent that information efficiently?
- How can we reduce noise when transmitting information?
Key Concept: Entropy
Entropy is a measure of uncertainty or unpredictability.
Think of it like this:
- A fair coin (50% heads, 50% tails) has high entropy because it’s unpredictable.
- A coin that always lands on heads has zero entropy because it’s completely predictable.
In deep learning, entropy tells us how much uncertainty there is in the model’s prediction.
What is the Information Bottleneck?
Imagine you're trying to compress an image to send over the internet. You want to remove unnecessary parts (like background noise) but keep the important content $\text{(like a person’s face)}$.
This is the idea behind the Information Bottleneck.
In deep learning:
- A neural network tries to learn a mapping from input (X) to output (Y).
- The Information Bottleneck principle says: try to compress the input X into a hidden representation T, such that T contains as little information about X as possible while still keeping enough information to predict Y well.
This forces the model to focus only on relevant information and ignore noise.
Example: Classifying Handwritten Digits (MNIST)
Imagine a neural network classifying digits from the MNIST dataset.
- Input(X): Raw pixel values from an image.
- Output(Y): Digit label (0–9).
- We want the hidden layers (T) to keep only what’s needed to guess the digit (like shape), and discard irrelevant things (like handwriting style or stroke thickness).
Information Bottleneck in Neural Networks
Here’s a visual breakdown:
Overtraining or keeping too much detail in T can lead to:
- Overfitting (memorizing noise)
- Poor generalization
2025.04.22 - [AI] - Entropy & Cross Entropy Loss in Deep Learning
References
- Tishby et al. (2000) – The Information Bottleneck Method
https://arxiv.org/abs/physics/0004057 - Tishby and Zaslavsky (2015) – Deep Learning and the Information Bottleneck Principle
https://arxiv.org/abs/1503.02406 - Alemi et al. (2016) – Deep Variational Information Bottleneck
https://arxiv.org/abs/1612.00410
Comments
Post a Comment