AI Practitioner

Image Augmentation in Computer Vision using PyTorch Transforms v2

Why Image Augmentation is Essential in Deep Learning In computer vision, image augmentation plays a critical role in improving the generalization of deep neural networks. By artificially expanding the diversity of the training dataset through transformations that preserve the label, image augmentation helps reduce overfitting and increases model robustness. Especially for convolutional neural networks (CNNs) and vision transformers (ViTs), which learn hierarchical and spatial features, input variability introduced by augmentation forces the model to learn more invariant and meaningful representations. This is analogous to improving the mutual information between relevant features and output predictions while discarding noise. Common Image Augmentation Techniques and Parameter Descriptions 1. RandomHorizontalFlip Purpose: Introduces horizontal symmetry by flipping the image left-to-right with a certain probability. from torchvision.transforms import v2 as transforms transform ...

Convolution Operator and Layer Explained in Deep Learning

What is a Convolution Layer in Deep Learning? A convolution layer is a building block of Convolutional Neural Networks (CNNs). It's mostly used to process image data . Instead of connecting every pixel of the input to every neuron (as in a fully connected layer), a convolution layer slides a small filter (kernel) across the image and extracts features like edges, textures, or patterns. Key Terms Input : The image or feature map (e.g., 6x6 pixels). Kernel(Filter) : A small matrix (e.g., 3x3 or 5x5) that moves across the image. Stride : How many steps the filter moves at a time. Padding : Adding extra pixels around the image to control the output size. Feature Map : The result of the convolution operation. How Convolution Works Let’s walk through an example with no padding and stride = 1 . 1. Input: 6x6 Matrix Input: [ [9, 4, 1, 6, 5], [1, 1, 1, 0, 2], [1, 2, 1, 1, 3], [2, 1, 0, 3, 0], [1, 4, 2, 5, 6] ] 2. Kernel: ...

AI Practitioner

Search This Blog

Posts

Image Augmentation in Computer Vision using PyTorch Transforms v2

Convolution Operator and Layer Explained in Deep Learning

Building an MCP Agent with UV, Python & mcp-use

RF-DETR: Overcoming the Limitations of DETR in Object Detection

Image Classification with ResNet-18: Training, Validation, and Inference using PyTorch

Building an MCP Agent with UV, Python & mcp-use

RF-DETR: Overcoming the Limitations of DETR in Object Detection

Image Classification with ResNet-18: Training, Validation, and Inference using PyTorch