## Leptonica: Cosine Similarity for Pix comparison

Lately I have been researching various methods of image comparison and classification. One simple method that is often used for comparing documents is known as cosine similarity. The cosine similarity is simply the dot product of two vectors divided by their euclidean norms multiplied. Wikipedia has more information on the actual definition.

```
double cosineSimilarity(Pix* pixA, Pix* pixB) {
double numerator = 0.0;
double denominator_A = 0.0;
double denominator_B = 0.0;
double denominator = 0.0;
int width = 0;
int height = 0;
getSmallestDimensions(pixA, pixB, width, height);
l_uint8** linePtrs_A = (l_uint8**) pixGetLinePtrs(pixA, NULL);
l_uint8** linePtrs_B = (l_uint8**) pixGetLinePtrs(pixB, NULL);
volatile l_uint8 val_A, val_B;
// sum of A * B
for (int i = 0; i < height; ++i) {
l_uint8 *line_A = linePtrs_A[i];
l_uint8 *line_B = linePtrs_B[i];
for (int k = 0; k < width; ++k) {
val_A = line_A[k] & 0x1;
val_B = line_B[k] & 0x1;
numerator += val_A * val_B;
}
}
for (int i = 0; i < height; ++i) {
l_uint8 *line_A = linePtrs_A[i];
l_uint8 *line_B = linePtrs_B[i];
for (int k = 0; k < width; ++k) {
val_A = ((int) (line_A[k] & 0x1), 2);
val_B = ((int) (line_B[k] & 0x1), 2);
denominator_A += val_A;
denominator_B += val_B;
}
}
denominator = sqrt(denominator_A) * sqrt(denominator_B);
return numerator / denominator;
}
void getSmallestDimensions(Pix* pixA, Pix* pixB, int &width, int &height) {
int w1 = 0.0;
int w2 = 0.0;
int h1 = 0.0;
int h2 = 0.0;
pixGetDimensions(pixA, &w1, &h1, NULL);
pixGetDimensions(pixA, &w2, &h2, NULL);
if (w1 < w2) {
width = w1;
} else {
width = w2;
}
if (h1 < h2) {
height = h1;
} else {
height = h2;
}
}
``````
```

```
```

```
```