Learn about z-test and p-value in statistics with detailed examples and Python code. Understand how they apply to Machine Learning and Deep Learning for model evaluation. What is a P-Value? The p-value is a probability that measures the strength of the evidence against the null hypothesis. Specifically, it is the probability of observing a test statistic (like the z-score) at least as extreme as the one computed from your sample, assuming that the null hypothesis is true. A smaller p-value indicates stronger evidence against the null hypothesis. Common thresholds to reject the null hypothesis are: p < 0.05: statistically significant p < 0.01: highly significant Python Example of Z-Test Let’s assume we want to test whether the mean of a sample differs from a known population mean: import numpy as np from scipy import stats # Sample data sample = [2.9, 3.0, 2.5, 3.2, 3.8, 3.5] mu = 3.0 # Population mean sigma = 0.5 # Population std dev...
Distance metrics play a crucial role in machine learning, especially in tasks like clustering, classification, and recommendation systems. In this blog, we will explore popular distance metrics including Cosine, Euclidean, Mahalanobis, Hellinger, Jaccard, Manhattan, Correlation, Dice, and Hamming distances. We will also provide PyTorch implementations for each metric. 1. Cosine Distance Measures the cosine of the angle between two non-zero vectors. Often used in text similarity and document clustering. import torch x = torch.tensor([1.0, 2.0, 3.0]) y = torch.tensor([4.0, 5.0, 6.0]) cosine_distance = 1 - torch.nn.functional.cosine_similarity(x.unsqueeze(0), y.unsqueeze(0)) 2. Euclidean Distance Represents the straight-line distance between two points in Euclidean space. euclidean_distance = torch.dist(x, y, p=2) 3. Mahalanobis Distance Accounts for the correlation between variables and scales distances accordingly. Useful in anomaly detection. cov = torch.cov(torch.stack([x,...