Project 1

Project Description

The goal of this project is to reproduce a color image by aligning B, G, R color channels. Using different metrics for scoring alignments and implementing techniques to optimize its efficiency are the core of this project.

Naive Procedure

For my naive approach I found that a metric called mutual information worked best for me. In image processing, mutual information (MI) is used to align two images by measuring the amount of shared information between them. It calculates the joint probability distribution of pixel intensities from both images, helping to determine the best alignment. MI is robust for aligning images from different modalities, such as grayscale and infrared, as it does not rely on direct pixel-to-pixel comparisons. The goal is to maximize the mutual information, meaning the images are aligned when they share the most information. This makes MI ideal for tasks like medical imaging and multi-modal image registration.

I used a range of displacements in X and Y direction from [-15,15] for the green and red channels and calculated the mutual information score with the blue channel for each displacement, and chose the displacement with the best score.

Cathedral: Green Offset: (2, 5); Red Offset: (3, 12)

Monastery: Green Offset: (2, -3); Red Offset: (2, 3)

Tobolsk: Green Offset: (2, 3), Red Offset: (3, 6)

Optimized Procedure - Pyramids

In order to deal with much larger images where an exhaustive search of displacements would be far too inefficient, I implemented an image pyramid structure, where I iterate from a downscaled images by factor of 8, 4, 2, 1. Starting from displacement 8, we go through a set amount of displacements, grab the one with best alignment score, and move to the next iteration. The reason this implementation is faster is that we can lower the range of displacements we check at each iteration because we have an approximate location of the best displacement, based on the last iteration. We can afford larger search ranges on more downscaled images because computing the alignment metric is faster with lower resolution images.

The metric I used for this case is called structural similarity. The Structural Similarity Index (SSIM) is a powerful metric for image alignment because it evaluates the perceived structural content of images, rather than just pixel-wise differences. It is widely used in applications such as image registration, compression, and quality assessment. Computing SS score is significantly faster than mutual information, but also more accurate than NCC, Euclidean, etc.

Another important thing to note is that borders can often mess up the calculations of metrics used to measure alignment scores. By calculating metrics within a 20 pixel margin of the borders, I was able to get more accurate displacements.

Church: Green Offset: (26, 4), Red Offset: (60, -4)