CSC 161 Schedule Readings Labs & Projects Homework Deadlines Resources

Image Convolution

Summary
You will write a program and supporting functions to perform general image filtering operations.
Objectives
  • Reinforce good functional decomposition
  • Learn to use 2D arrays as parameters
  • Practice working with 2D arrays and loops
  • Learn to account for data type limitations
  • See interesting effects of mathematical image processing

Introduction

In the last project, you created a variety of interesting image transformations. While most of them were limited to the information at a single pixel, the Prewitt operator (used to find edges) used a small neighborhood around each pixel. In this assignment we generalize this image processing operation to use an arbitrarily-sized neighborhood surrounding a pixel.

The name for this operation is convolution, where the resulting colors at each pixel are a linear combination of the surrounding pixels, and it plays a central role in today's computer vision and machine learning-driven technology for image and object recognition. The backbone of the convolutional neural network is indeed the convolution operation, because it identifies local image regions that match properties of the kernel. Sometimes the kernel is called a filter and the operation filtering.

The kernel is a small grid of weights used to calculate the contribution of each pixel to the final value. For example, to locate edges, you used following two kernels:

KHoriz=
10-1
20-2
10-1
KVert=
121
000
-1-2-1

The next section will detail how the convolution is to be calculated.

Convolution

For every pixel in an image, imagine overlaying the kernel centered at that pixel. As an example, consider a 3×3 window around a pixel located at row and column [r, c], the coordinates of the pixels in this neighborhood would be:

[r-1, c-1] [r-1, c] [r-1, c+1]
[r, c-1] [r,c] [r, c+1]
[r+1, c-1] [r+1, c] [r+1, c+1]

To calculate result of the convolution at [r, c] for a specific color channel (i.e., Red, Green, or Blue), one would multiply the kernel with the color brightness of the pixel at the corresponding location and sum up the resulting products.

For example, if the brightnesses at the locations above were given by the values

a b c
d e f
g h i

and the kernel contained the values

r s t
u v w
x y z

then the convolution at that location (for that channel) would be given by the value

a×r + b×s +c×t + d×u + e×v + f×w + g×x + h×y + i×z

The same so-called sum-product is then calculated at every location in the image, for each color channel.

Although a different kernel might be used for each color channel, we will simplify matters by using the same kernel for each color channel.

One must decide what to do when the kernel "overhangs" the edge of the image. Because there are no image pixels to act as multipliers, we choose to avoid calculations at these locations in the output image, and ensure they are set to black

Finally, because of the limited range of the unsigned char type used to store the results, it can be useful to include a bias term that also gets added to the sum-product.

Assignment

Implement your program in a file called convolve.c

To simplify the definition of the kernel center as well as the resulting mathematical and iterative operations, you may take as a precondition that the size of the kernel along either dimension must be odd.

0. Handling Borders

Because we want the unassignable borders of the resulting image to be all black, you may use the following function to accomplish this. It utilizes the standard library routine bzero, taking advantage of the fact that the 2D array of Pixels is stored contiguously in memory, and the three color channels which make up each Pixel are also contiguous, allowing us to simply set a big chunk of memory to zero.

/* Zero out all pixels of a picture    
*
 * Preconditions:  
 *  pic points to a valid Picture                          
 * Postconditions:
 *  All pixels are zeroed to black:
 *    pic->pix_array[row][col].X = 0, for X one of R, G, or B, and
 *      0 <= row < 266 (from MyroC.h)    
 *      0 <= col < 427 (from MyroC.h)
 */                                   
void         
zeroPic (Picture *pic) {
  memset( pic->pix_array, 0, sizeof(pic->pix_array) );
} // zero_pic 

Be sure to give appropriate citation for the function in your submission. Note that memset requires string.h, and is now favored over bzero, which is deprecated.

1. Data Range

Write the function

unsigned char trim (double val)

which takes a double value and simply trims that value (by conversion to an unsigned char) if it is within the range (0–255) or clips to one of the extremes (0 or 255) if the value is outside the range.

This utility will allow us to compute the sum-product above as a double value without worry for overflow, converting the result for storage in a Pixel after the arithmetic computation has been done.

2. Multiply and Add

Write the function

Pixel
multiplyAdd (const Picture * pic, int center_row, int center_col,
              int kernel_height, int kernel_width,
              double kernel[][kernel_width], double bias)

This function performs the core of the convolution operation by centering kernel at the Pixel located at (center_row, center_col) in pic, and calculating the sum-product of the kernel with the brightnesses in each color channel.

The bias term should be added to the sum-product before the result is passed to trim.

Note that when centered at the location given, the kernel should be entirely within the image boundaries. Your procedure must verify this with an assertion. (Hint: Use the kernel's width and height as well as the central coordinates to calculate where the left/top and bottom/right edges of the kernel will be.)

Note: you can use just one pair of nested loops—over the image region covered by the kernel—to accumulate the (separate) totals for each color channel.

Hint: Calculate half the kernel size (i.e., height or width), center the kernel atop location (center_row,center_col) and loop from negative half size to positive half size to calculate an offset from the center coordinate, while also using this loop variable to calculate an index into the kernel.

3. Convolution

With all the pieces set in place, you can write the final convolution function:

Picture
convolution (const Picture * pic, int kernel_height, int kernel_width,
          double kernel[][kernel_width], double bias)

The function should initialize the result image to be returned and then iteratively apply multiplyAdd to all the valid locations.

4. Main

Write a program in main that loads one or more JPEG pictures of your choosing. Apply at least five different kernels that yield a visually interesting (or at least noticeable) result on the image to which its applied. Display the original(s) and each of the filtered images. Save all the resulting filtered images, using a file with no path information (that is, there should be no "/" characters in it); for example

rSavePicture(&pic, "example.jpg");

At least one kernel used must be one not provided here. Although you may, you need not have invented it (you can explore online resources for ideas), but it should produce visually interesting results on the image to which it is applied. Include a short comment in the program next to the definition of the kernel describing the filter and how/why it achieves the intended effect.

Examples

First we give an detailed worked example of the computation to be done, followed by several sample kernels one might use to produce interesting effects.

Computation

Suppose the following pixel neighborhood is encountered. The original R; G; B pixel values will be given in that order (and colored to aid interpretability).
34; 192; 64 48; 197; 57 52; 210; 52
33; 192; 60 42; 197; 59 47; 210; 55
35; 248; 65 44; 198; 67 52; 212; 62
Using the 3×3 kernel KHoriz from above with a bias of 128, the three calculated sum-products with bias would be
R = 34×1 + 48×0 + 52×(–1) + 33×2 + 42×0 + 47×(–2) + 35×1 + 44×0 + 52×(–1) + 128
= 65
G = 192×1 + 197×0 + 210×(–1) + 192×2 + 197×0 + 210×(–2) + 248×1 + 198×0 + 212×(–1) + 128
= 110
B = 64×1 + 57×0 + 52×(–1) + 60×2 + 59×0 + 55×(–2) + 65×1 + 67×0 + 62×(–1) + 128
= 153

After passing these results through trim (which leaves them alone since they are all between 0 and 255), the resulting pixel values are 65; 110; 153.

Identity, Scaling, and Shifting

A kernel with a 1 at the center and zeros elsewhere produces the same image as output. A kernel with any value other value at the center (likewise surrounded by zeros) would multiplicatively scale the image brightness by that value. A kernel with a 1 away from the center and zeros elsewhere will translate the output image

These properties can be useful for testing.

Image Blurring

Noise in an image or other fine details can be smoothed out by taking the average of the pixels in a local neighborhood. The simplest way to do this is by using equal weight of 1/N2 in an N×N kernel, known as a box filter.

A more refined technique downweights the contribtions of pixels that are farther from the central pixel. The Gaussian filter takes the shape of a 2D Gaussian (or the normal/bell curve) with a particular scale/size, controlled by a parameter (σ). A compact example with σ=1 sampled for a 3×3 kernel is:

0.0556 0.1246 0.0556
0.1246 0.2791 0.1246
0.0556 0.1246 0.0556

Note that the kernel is rotationally symmetric, which eliminates visual artifacts that can arise with the box filter. The entries of both blurring kernels sum to one, which is important for maintaining the total brightness of the filtered image.

Image Sharpening

The so-called "unsharp mask" technique adds a fraction of the difference between a blurred version of the image and the original back to an image in order to further enhance the places that change most when blurred (i.e., image details).

Original Image Blurred Image Sharpened Image
Original Image Blurred Image Sharpened Image

The kernel for the unsharp mask is a combination of the two steps, and may be written as

0 –α0
–α(4α) + 1–α
0–α0

where α is a sharpening factor (of your choice). You should note that the kernel is a combination of the 3×3 identity and a blurring kernel very similar to the Gaussian given above.

Oriented edge detectors

The Prewitt derivative operators (given above) identify strong brightness changes along a particular direction. Note that unlike the blurring kernels, these kernels sum to zero, so that when there is no change in an image the response is zero.

Original Image Horizontal Edges Image VerticalEdges Image
Original Image Horizontal Derivative KHoriz
(Vertical Edges)
Vertical Derivative KVert
(Horizontal Edges)

Because the filter responses may be positive or negative, centered around zero, we must add a bias to place the results back into a visible range. With the unsigned char ranging from 0–255, using a bias of 128 will center the responses so that zero appears gray, strong "positive" edges appear brighter white, and strong "negative" edges appear darker black.

Spot detection

The so-called Laplacian operator is the second derivative of a 2D Gaussian. Whereas the Gaussian looks like a bell when viewed as a 2D curve, the Laplacian has the shape of an inverted sombrero. Thus, it responds most strongly to a bright band surrounding a dark center. (It also responds less strongly to any edge.)

0.25 0.50 0.25
0.50 –3.00 0.50
0.25 0.50 0.25
Original Image Laplacian Image
Original Image Laplacian

As a differential operator whose kernel also sums to zero, a bias of 128 will aid in visualization.

Grading

In addition to the general grading guidelines for evaluation, the assignment is worth 30 points.

A five point penalty will apply to any submission that includes personally identifying information anywhere other than the references/honesty file.