Image Comparison

Image comparison of two images separated by a small distance (stereoscopic vision) or time (detection of movement) gives rich information about movement and depth.

Also image comparison may be used for recognition.

In comparing two images we need to know what point on one image corresponds to what point on the other image.

Purpose
The purposes of this page is to demonstrate that creating a vision system is theoretically possible.

This page summarises the basic theory for creating a vision system.

Image Mapping Vectors
The array of vectors describe the relationship between the two images. The source point and the point it is mapped to should be similar in colour. Also the vectors should be similar to adjacent vectors.

The whole problem may be considered one of data compression. Inductive Inference tells us that the smallest (simplist) description of a problem is the most likely to be correct.

So if the source point and the destination point are similar in colour then little information is needed to describe the colour of one image in terms of the other.

Colour Distance
A function is needed to measure the difference between one colour and another. Colour distance is described at,


 * Color Distance

However for the purpose of image comparison image brightness is very much determined by the angle and amount of incident light. So we should give less weight to lightness.

A simple model is,


 * $$d(a, b) = \sqrt{ (\frac{a_l-b_l}{w_l})^2 + (\frac{a_s-b_s}{w_s})^2 + (\frac{a_h*a_s-b_h*b_s}{w_h})^2 } $$

Normalisation of colour over the whole picture may be useful in some circumstances.

Vector Distance
In compressing the vector data we can consider the vectors around the vector. We can use the average of the vectors as a predictor for our vector. The distance of the vector from the average of its neighbors is the vector distance.

Mapped colour
We choose a source point. Then we map it to the destination vector using the mapping vector. This doesnt give a pixel address in the image. Instead the colour at that point must be interpolated from the surrounding pixels.

Measure of Total Information
We sum the squares of weighted colour and vector distances over all points in the source image to give a total information distance. This information distance is related to the information content of the representation. We need to choose mapping vectors to minimise the information content (which is achieved by minimising the total image distance.

Complex Numbers
In describing the effects of rotation it is useful to use complex numbers to represent image position.


 * $$d = r * (v + b) + a - v\!$$

gives


 * $$d_x = r_x (v_x + b_x) - r_y (v_y + b_y) + a_x - v_x \!$$
 * $$d_y = r_x (v_y + b_y) + r_y (v_x + b_x) + a_y - v_y \!$$

now $$r$$ represents rotation and scale.


 * $$r = h * (cos(\theta) + i sin(\theta)) \!$$

for small $$\delta\!$$,
 * $$cos(\theta) = 1 \!$$
 * $$sin(\theta) = \delta \!$$

By applying derivatives we get,


 * $$\frac{\Delta d_x}{\Delta v_x} = h - 1 \!$$
 * $$\frac{\Delta d_x}{\Delta v_y} = -\delta * h \!$$
 * $$\frac{\Delta d_y}{\Delta v_x} = \delta * h \!$$
 * $$\frac{\Delta d_y}{\Delta v_y} = h - 1 \!$$

Rotation Matrix
I will borrow the formula for a rotation of angle $$\theta\!$$ around an axis $$u\!$$.


 * Rotation Matrix


 * $$\begin{align}

Q_{\bold{u}}(\theta) &{}= \begin{bmatrix} u_x^2 (1-c_{\theta}) + c_{\theta} & u_x u_y (1-c_{\theta}) - u_z s_{\theta} & u_x u_z (1-c_{\theta}) + u_y s_{\theta} \\ u_x u_y (1-c_{\theta}) + u_z s_{\theta} & u_y^2 (1-c_{\theta}) + c_{\theta} & u_y u_z (1-c_{\theta}) - u_x s_{\theta} \\ u_x u_z (1-c_{\theta}) - u_y s_{\theta} & u_y u_z (1-c_{\theta}) + u_x s_{\theta} & u_z^2 (1-c_{\theta}) + c_{\theta} \end{bmatrix} \end{align}$$

Now let us take a 3 dimensional point p. The position of the point after scaling and rotating is,


 * $$P = Q_{\bold{u}}(\theta) * (p + b) + a\!$$

and after expanding (using Matrix Multiplication,


 * $$P_x = (u_x^2 (1-c_{\theta}) + c_{\theta}) * (p_x + b_x) + (u_x u_y (1-c_{\theta}) - u_z s_{\theta}) * (p_y + b_y) + (u_x u_z (1-c_{\theta}) + u_y s_{\theta}) * (p_z + b_z) + a_x\!$$
 * $$P_y = (u_x u_y (1-c_{\theta}) + u_z s_{\theta}) * (p_x + b_x) + (u_y^2 (1-c_{\theta}) + c_{\theta}) * (p_y + b_y) + (u_y u_z (1-c_{\theta}) - u_x s_{\theta}) * (p_z + b_z) + a_y\!$$
 * $$P_z = (u_x u_z (1-c_{\theta}) - u_y s_{\theta}) * (p_x + b_x) + (u_y u_z (1-c_{\theta}) + u_x s_{\theta}) * (p_y + b_y) + (u_z^2 (1-c_{\theta}) + c_{\theta}) * (p_z + b_z) + a_z\!$$

For a small angle $$\theta$$ use the approximation,


 * $$c_{\theta} = cos \theta = 1 - \frac{\theta^2}{2} = 1 \!$$
 * $$s_{\theta} = sin \theta = \theta \!$$

this gives,


 * $$P_x = -u_z \theta (p_y + b_y) + u_y \theta (p_z + b_z) + p_x + b_x + a_x\!$$
 * $$P_y = u_z \theta (p_x + b_x) - u_x \theta (p_z + b_z) + p_y + b_y + a_y\!$$
 * $$P_z = - u_y \theta (p_x + b_x) + u_x \theta (p_y + b_y) + p_z + b_z + a_z\!$$

for stereoscopic vision, no rotation, just displacment in the x direction by the eye width.


 * $$P_x = p_x + a_x\!$$
 * $$P_y = p_y\!$$
 * $$P_z = p_z\!$$

Projection onto View
A simplified formula for the projection of z on the image is,


 * $$v_x = \frac{p_x}{p_z}$$ or $$v_x v_z = p_x \!$$
 * $$v_y = \frac{p_y}{p_z}$$ or $$v_y v_z= p_y \!$$
 * $$v_z = p_z \!$$

and,


 * $$V_x = \frac{P_x}{P_z}$$ or $$V_x P_z = P_x \!$$
 * $$V_y = \frac{P_y}{P_z}$$ or $$V_y P_z= P_y \!$$

expanding this out to remove all refence to $$p$$ gives,


 * $$V_x = \frac{-u_z \theta (v_y v_z + b_y) + u_y \theta (v_z + b_z) + v_x v_z + b_x + a_x}{-u_y \theta (v_x v_z + b_x) + u_x \theta (v_y v_z + b_y) + v_z + b_z + a_z}$$
 * $$V_y = \frac{u_z \theta (v_x v_z + b_x) - u_x \theta (v_z + b_z) + v_y v_z + b_y + a_y}{-u_y \theta (v_x v_z + b_x) + u_x \theta (v_y v_z + b_y) + v_z + b_z + a_z}$$

Rotate around Z

 * $$V_x = \frac{- \theta (v_y v_z + b_y) + v_x v_z + b_x + a_x}{v_z}$$
 * $$V_y = \frac{\theta (v_x v_z + b_x) + v_y v_z + b_y + a_y}{v_z}$$

Eliminating $$v_x$$

 * $$v_z = \frac{u_y \theta b_x V_x - u_x \theta b_y V_x - b_z V_x - a_z V_x -u_z \theta b_y + u_y \theta b_z + b_x + a_x}{-u_y \theta v_x V_x + u_x \theta v_y V_x + V_x + u_z \theta v_y - u_y \theta - v_x}$$
 * $$v_z = \frac{u_y \theta b_x V_y - u_x \theta b_y V_y - b_z V_y - a_z V_y + u_z \theta b_x + u_x \theta b_z + b_y + a_y}{-u_y \theta v_x V_y + u_x \theta v_y V_y + V_y - u_z \theta v_x + u_x \theta -v_y}$$

Stereoscopic Imaging
Displacement of two images by a distance in the x direction allows the derivation of depth information, which gives depth perception. X,Y rotation should not be seen in stereoscopic images.

for stereoscopic vision,


 * $$V_x v_z = v_x v_z + a_x \!$$
 * $$V_y = v_x \!$$


 * $$v_z = \frac{a_x}{V_x - v_x} \!$$

Recognition
Image comparison may be used for recognition. A stored image compared with a target image would give an array of mapping vectors.

These mapping vectors may then be analyzed for,
 * Position
 * Rotation

Object Model
Out of this modeling surfaces corresponding to objects may be identified, and recognised, leading to an object view of the world.

Conclusion
A numerical solution is needed to create the mapping vectors and to interpret the vectors to construct an object model.

There is a huge amount of processing required. Using multiple convential processors it should be possible to do this in real time.

A massively parrallel architecture like a neural network is better suited to this task.

But in theory it appears possible to construct a vision system that can perceive the world as objects. This is not the most biggest task to solve theoretically. It is more of an engineering problem.

Links

 * Intelligence and Reasoning