Learn about z-test and p-value in statistics with detailed examples and Python code. Understand how they apply to Machine Learning and Deep Learning for model evaluation.
What is a P-Value?
The p-value is a probability that measures the strength of the evidence against the null hypothesis. Specifically, it is the probability of observing a test statistic (like the z-score) at least as extreme as the one computed from your sample, assuming that the null hypothesis is true.
A smaller p-value indicates stronger evidence against the null hypothesis. Common thresholds to reject the null hypothesis are:
- p < 0.05: statistically significant
- p < 0.01: highly significant
Python Example of Z-Test
Let’s assume we want to test whether the mean of a sample differs from a known population mean:
import numpy as np
from scipy import stats
# Sample data
sample = [2.9, 3.0, 2.5, 3.2, 3.8, 3.5]
mu = 3.0 # Population mean
sigma = 0.5 # Population std deviation
n = len(sample)
x_bar = np.mean(sample)
# Calculate z-score
z = (x_bar - mu) / (sigma / np.sqrt(n))
p_value = 2 * (1 - stats.norm.cdf(abs(z)))
print("Z-score:", z)
print("P-value:", p_value)
Using Z-Test and P-Value in ML/DL
In Machine Learning (ML) and Deep Learning (DL), z-tests and p-values help validate experimental results, such as whether a new model significantly outperforms a baseline model. Without statistical testing, we might mistake random fluctuations in performance for real improvements.
- Compare two models: Test if the performance difference between two models (e.g., accuracy) is statistically significant.
- A/B testing: Evaluate changes in algorithms, UI components, or features based on user interactions.
- Feature selection: Check whether the mean of a feature differs between classes significantly, which may indicate predictive power.
Example: Comparing Two Models
Let’s compare the accuracy of two models over multiple runs:
acc_model_a = [0.83, 0.85, 0.82, 0.84, 0.86]
acc_model_b = [0.79, 0.78, 0.80, 0.77, 0.81]
mean_a = np.mean(acc_model_a)
mean_b = np.mean(acc_model_b)
sd = np.std(acc_model_a + acc_model_b, ddof=1)
n = len(acc_model_a)
z = (mean_a - mean_b) / (sd * np.sqrt(2/n))
p = 2 * (1 - stats.norm.cdf(abs(z)))
print("Z-Score:", z)
print("P-Value:", p)
Conclusion
The z-test and p-value are essential statistical tools for validating model improvements, experimental hypotheses, and performance evaluations. Especially in ML/DL pipelines, applying these tests ensures that your decisions are backed by robust statistical evidence rather than randomness.
Comments
Post a Comment