Skip to content

Latest commit

 

History

History
184 lines (128 loc) · 5.75 KB

image-convolutions-transformations.org

File metadata and controls

184 lines (128 loc) · 5.75 KB

Image convolutions and transformations

Linear filters (convolutions)

$$\text{dst}(x,y) = ∑0 \leq x’ < c, \quad 0 \leq y’ < r K(x’,y’) * \text{src}(x + x’- x_a, y + y’ - y_a)$$

where $K$ is the kernel, $c$ is the number of columns in the kernel, $r$ is the number of rows in the kernel, $x_a$ is the $x$ position of the anchor (in the kernel), and $y_a$ is the $y$ position of the anchor.

./images/lena-blur10x10.jpg./images/lena-blur20x20.jpg./images/lena-blur20x1.jpg
Kernel is 10x10, all values equal to 0.01Kernel is 20x20, all values equal to 0.0025Kernel is 20x1, all values equal to 0.05

We can also achieve a sharpening effect.

./images/sharpen-convolution.png./images/lena-sharpen.jpg

./images/3D_Convolution_Animation.gif

From Wikipedia

And we can achieve an edge effect.

./images/edge-convolution.png./images/lena-edges.jpg

Erosion

$$\text{dst}(x,y) = min(x’,y’): K(x’,y’) ≠ 0 \text{src}(x+x’, y+y’)$$

./images/lena-erode4x4.jpg./images/lena-binary.jpg./images/lena-binary-erode4x4.jpg
Erosion with a 4x4 kernel of 1’sLena in binaryBinary image eroded by 4x4 kernel of 1’s

Dilation

$$\text{dst}(x,y) = max(x’,y’): K(x’,y’) ≠ 0 \text{src}(x+x’, y+y’)$$

./images/lena-dilate4x4.jpg./images/lena-binary.jpg./images/lena-binary-dilate4x4.jpg
Dilation with a 4x4 kernel of 1’sLena in binaryBinary image dilated by 4x4 kernel of 1’s

Remapping

$$\text{dst}(x,y) = \text{src}(\text{map}_x(x,y), \text{map}_y(x,y))$$

Affine transformations

An affine transformation is any transformation that can be specified as a linear transformation (matrix multiplication) plus a translation (vector addition).

Affine transformations include: rotations, translations, scaling, and combinations of these.

\begin{equation} A = \begin{bmatrix} a00 & a01 \ a10 & a11 \end{bmatrix} B = \begin{bmatrix} b00 \ b10 \end{bmatrix} \end{equation}

\begin{equation} M = [A;B] = \begin{bmatrix} a00 & a01 & b00 \ a10 & a11 & b10 \end{bmatrix} \end{equation}

Then we take a point, $[x, y]^T$, add an extra $z$ dimension set to $1$ (“homogeneous coordinates”), and multiply:

\begin{equation} T = M × [x, y, 1]^T = \begin{bmatrix} a00x + a01y + b00 \ a10x + a11y + b10 \end{bmatrix} \end{equation}

Examples

The identity transformation:

\begin{equation} A = \begin{bmatrix} 1 & 0 & 0 \ 0 & 1 & 0 \end{bmatrix} \end{equation}

Scale the image to half its size:

\begin{equation} A = \begin{bmatrix} 0.5 & 0 & 0 \ 0 & 0.5 & 0 \end{bmatrix} \end{equation}

Here is a 90-degree rotation:

\begin{eqnarray} M &=& \begin{bmatrix} cos(π/2) & sin(π/2) & (1-cos(π/2))x_c - sin(π/2)y_c
-sin(π/2) & cos(π/2) & sin(pi/2)x_c + (1-cos(pi/2))y_c \end{bmatrix} \ &=& \begin{bmatrix} 0 & 1 & x_c - y_c \ -1 & 0 & x_c + y_c \end{bmatrix} \end{eqnarray}

where $x_c, y_c$ is the location of the center of the image.

Here is a 45-degree rotation:

\begin{equation} M = \begin{bmatrix} cos(π/4) & sin(π/4) & (1-cos(π/4))x_c - sin(π/4)y_c
-sin(π/4) & cos(π/4) & sin(pi/4)x_c + (1-cos(pi/4))y_c \end{bmatrix} \end{equation}

./images/lena-scaled.jpg./images/lena-scaled-trans.jpg./images/lena-45-rotate.jpg
Scaled, no translationScaled, with translation45-degree rotation plus translation

Perspective transformations

\begin{equation} M = \begin{bmatrix} a00 & a01 & a02 \ a10 & a11 & a12 \ a20 & a21 & 1.0 \end{bmatrix} \end{equation}

\begin{equation} [x’, y’, w]^T = M × [x, y, 1]^T = \begin{bmatrix} a00x + a01y + a02
a10x + a11y + a12 \ a20x + a21y + 1.0 \end{bmatrix} \end{equation}

Then we divide out $w$, yielding: $[\frac{x’}{w}, \frac{y’}{w}, 1]^T$. We therefore draw the point $(\frac{x’}{w}, \frac{y’}{w})$.

The example on the left uses this kernel:

\begin{equation} M = \begin{bmatrix} 0.53009 & -0.47993 & 79.0
-0.31048 & 0.50159 & 59.0 \ -0.00094 & -0.00131 & 1.0 \end{bmatrix} \end{equation}

Because an perspective transformation is a $3 × 3$ matrix with eight unknowns, we can determine the transformation between any two images with only four 2D points.

We could, for example, take a skewed view from a surveillance camera, and figure out the actual plane being viewed (assuming we are monitoring a flat area), by just figuring out the ground coordinates of four pixels in the video.

Of course, surveillance videos often have low resolution, so we probably cannot pick out the exact ground coordinates of any pixels. We can instead approximate the affine transformation; that is, we can find the transformation that minimizes the error. Four point correspondances is now not enough; we need lots, maybe 50 or 100, the more the better. (Perhaps we can automatically find these.) Then different transformations can be tested, and we can pick out the one that minimizes the error between chosen ground coordinates and predicted ground coordinates (from the transformation). This learning can be achieved with a single layer neural network.

Source code

Grab the source code for all of these examples: https://github.com/joshuaeckroth/teach-computer-vision