Cosine Similarity vs. Cosine Distance Explained with PyTorch Examples | Applications in Deep Learning
1. What is Cosine Similarity?
Cosine similarity is a metric used to measure the similarity in direction between two vectors, regardless of their magnitude. It is widely used in tasks like text similarity analysis, sentence embedding comparison, and image embedding evaluation. The key idea is that the metric focuses on the angle (or direction) rather than the vector length.
Formula:
cos_sim(A, B) = (A · B) / (||A|| * ||B||)
Here, A and B are input vectors, · denotes the dot product, and ||A|| is the norm (magnitude) of vector A. The cosine similarity value ranges from -1 to 1. A value close to 1 means the vectors are pointing in a similar direction, while a value close to -1 indicates they are pointing in opposite directions.
2. What is Cosine Distance?
Cosine distance is derived from cosine similarity and represents the dissimilarity between vectors. It is defined as follows:
cos_dist(A, B) = 1 - cos_sim(A, B)
The cosine distance ranges from 0 to 2. A value closer to 0 means the two vectors are more similar. This form is often used in clustering and distance-based models.
3. Applications in Deep Learning / Machine Learning
- Natural Language Processing (NLP): Compare similarity between sentence embeddings (e.g., semantic search, FAQ retrieval).
- Computer Vision: Measure similarity between image embeddings (e.g., image retrieval, visual clustering).
- Recommendation Systems: Calculate similarity between users and items to improve personalized recommendations.
- Clustering Algorithms: Use cosine distance as a metric to determine inter-cluster similarity.
4. Python Implementation from Scratch
import math
def cosine_similarity(a, b):
dot_product = sum(x*y for x, y in zip(a, b))
norm_a = math.sqrt(sum(x*x for x in a))
norm_b = math.sqrt(sum(x*x for x in b))
return dot_product / (norm_a * norm_b)
def cosine_distance(a, b):
return 1 - cosine_similarity(a, b)
# Example
vec1 = [1, 2, 3]
vec2 = [4, 5, 6]
sim = cosine_similarity(vec1, vec2)
dist = cosine_distance(vec1, vec2)
print(f"Cosine Similarity: {sim:.4f}")
print(f"Cosine Distance: {dist:.4f}")
Output:
Cosine Similarity: 0.9746
Cosine Distance: 0.0254
5. PyTorch-Based Implementation Example
import torch
import torch.nn.functional as F
def cosine_similarity_torch(x, y):
x = F.normalize(x, dim=-1)
y = F.normalize(y, dim=-1)
return torch.sum(x * y, dim=-1)
def cosine_distance_torch(x, y):
return 1 - cosine_similarity_torch(x, y)
# Example
vec1 = torch.tensor([1.0, 2.0, 3.0])
vec2 = torch.tensor([4.0, 5.0, 6.0])
sim = cosine_similarity_torch(vec1, vec2)
dist = cosine_distance_torch(vec1, vec2)
print("PyTorch Cosine Similarity:", sim.item())
print("PyTorch Cosine Distance:", dist.item())
Output:
PyTorch Cosine Similarity: 0.9746317863464355
PyTorch Cosine Distance: 0.025368213653564453
6. Summary
- Cosine Similarity measures how similar two vectors are in terms of direction and ranges from -1 to 1.
- Cosine Distance is derived by subtracting cosine similarity from 1, making it suitable for use in distance-based algorithms.
- It can be easily implemented using libraries such as PyTorch, NumPy, or Scikit-learn.
- Widely used in sentence embedding comparison, image search, clustering, and recommendation systems in deep learning and machine learning workflows.
Comments
Post a Comment