What is a Convolution Layer in Deep Learning?
A convolution layer is a building block of Convolutional Neural Networks (CNNs). It's mostly used to process image data.
Instead of connecting every pixel of the input to every neuron (as in a fully connected layer), a convolution layer slides a small filter (kernel) across the image and extracts features like edges, textures, or patterns.
Key Terms
- Input: The image or feature map (e.g., 6x6 pixels).
- Kernel(Filter): A small matrix (e.g., 3x3 or 5x5) that moves across the image.
- Stride: How many steps the filter moves at a time.
- Padding: Adding extra pixels around the image to control the output size.
- Feature Map: The result of the convolution operation.
How Convolution Works
Let’s walk through an example with no padding and stride = 1.
1. Input: 6x6 Matrix
Input: [ [9, 4, 1, 6, 5], [1, 1, 1, 0, 2], [1, 2, 1, 1, 3], [2, 1, 0, 3, 0], [1, 4, 2, 5, 6] ]
2. Kernel: 3x3
Kernel: [ [1, 2, 0], [0, 1, 4], [1, 0, 1] ]
3. Convolution Operation (3x3)
At each position:
- Multiply overlapping numbers
- Sum the result
Example at top-left corner (first 3x3 area):
Input Patch: [ [9, 1, 1], [1, 1, 1], [1, 2, 1] ] Calculation: 9*1 + 1*2 + 1*0 + 1*0 + 1*1 + 1*4 + 1*1 + 2*0 + 1*1 = 9 + 2 + 0 + 0 + 1 + 4 + 1 + 0 + 1 = 24**Slide this filter across the whole input to produce the output feature map.
Output Size Formula
Example: 5x5 Kernel
Same 6x6 input, now with 5x5 kernel and no padding:
Only 2x2 positions to apply the kernel.Why Use Convolution Instead of Fully Connected Layers?
1. Parameter Efficiency
Let’s say you have a 32x32 RGB image (i.e., 32x32x3 = 3072 inputs).
- Fully Connected Layer (FC):
Every pixel connects to every neuron.
With 100 neurons: Parameters=3072x100=307,200
- Convolution Layer (3x3, 3 input channels, 32 filters):
Parameters per filter=3x3x3=27 Total=27x32=864
** Way fewer parameters (307,200 → 864)!
2. Spatial Hierarchy
Convolution keeps spatial info (e.g., nearby pixels are related), while fully connected layers flatten everything.
How to Calculate Parameters in Convolution Layer
For each filter:
Then multiply by number of filters:
Example:
- Input Channels = 3 (RGB image)
- Kernel Size = 5x5
- Number of Filters = 64
How to Calculate MACs
Each kernel application =
Multiply by:
- Number of output positions (H x W)
- Number of filters
Example:
- Input: 32x32x3
- Kernel: 3x3
- Filters: 16
- Output: 30x30 (no padding)
References
- Fuhg, J.N., Karmarkar, A., Kadeethum, T. et al. Deep convolutional Ritz method: parametric PDE surrogates without labeled data. Appl. Math. Mech.-Engl. Ed. 44 , 1151–1174 (2023). https://doi.org/10.1007/s10483-023-2992-6.
Comments
Post a Comment