Skip to main content

Posts

Showing posts with the label CNN

Image Augmentation in Computer Vision using PyTorch Transforms v2

Why Image Augmentation is Essential in Deep Learning In computer vision, image augmentation plays a critical role in improving the generalization of deep neural networks. By artificially expanding the diversity of the training dataset through transformations that preserve the label, image augmentation helps reduce overfitting and increases model robustness. Especially for convolutional neural networks (CNNs) and vision transformers (ViTs), which learn hierarchical and spatial features, input variability introduced by augmentation forces the model to learn more invariant and meaningful representations. This is analogous to improving the mutual information between relevant features and output predictions while discarding noise. Common Image Augmentation Techniques and Parameter Descriptions 1. RandomHorizontalFlip Purpose:  Introduces horizontal symmetry by flipping the image left-to-right with a certain probability. from torchvision.transforms import v2 as transforms transform ...

Convolution Operator and Layer Explained in Deep Learning

  What is a Convolution Layer in Deep Learning? A  convolution layer  is a building block of Convolutional Neural Networks (CNNs). It's mostly used to process  image data . Instead of connecting every pixel of the input to every neuron (as in a fully connected layer), a convolution layer  slides a small filter (kernel)  across the image and  extracts features  like edges, textures, or patterns. Key Terms Input : The image or feature map (e.g., 6x6 pixels). Kernel(Filter) : A small matrix (e.g., 3x3 or 5x5) that moves across the image. Stride : How many steps the filter moves at a time. Padding : Adding extra pixels around the image to control the output size. Feature Map : The result of the convolution operation. How Convolution Works Let’s walk through an example with  no padding  and  stride = 1 . 1. Input: 6x6 Matrix Input: [ [9, 4, 1, 6, 5], [1, 1, 1, 0, 2], [1, 2, 1, 1, 3], [2, 1, 0, 3, 0], [1, 4, 2, 5, 6] ] 2. Kernel: ...