• CHAPTER 11


  •   
  • FileName: ch11.pdf [read-online]
    • Abstract: most heavily used image compression methods rely on a variant of the discrete Fourier. transform called the discrete cosine transform. This transform interpolates data using basis. functions that are all cosine functions, and involves only real computations. ...

Download the ebook

CHAPTER 11
Compression
HE information age exists because of the ease with which information travels. The fact that sound
T and images can be sent around the world so easily is due to ingenious ways of compressing the
data which represents them. In this chapter we describe methods of data compression, and as
applications of the methods, we describe some modern ways to shrink sound and images files.
11.1 The Discrete Cosine Transform
most heavily used image compression methods rely on a variant of the discrete Fourier
T HE
transform called the discrete cosine transform. This transform interpolates data using basis
functions that are all cosine functions, and involves only real computations.
The discrete cosine transform (DCT) is often used to compress small blocks of an image, as
small as 8 × 8 pixels. The compression is lossy, meaning that some information from the block is
ignored. The key feature of the DCT is that it helps organize the information so that the part that is
ignored is the part that the human eye is least sensitive to. More precisely, the DCT will show us
how to interpolate the data with a set of basis functions that are in descending order of importance
as far as the human visual system is concerned. Then we can drop the less important interpolation
terms, like a newspaper editor cuts a long story on deadline.
We will apply what we have learned about the DCT to compress full images. Using the added
tools of quantization and entropy-based coding, each 8 × 8 block of an image can be reduced to a bit
stream that is stored with bit streams from the other blocks of the image. The complete bit stream
is decoded when the image needs to be uncompressed and displayed, by reversing the encoding
process. We will describe this approach, called Baseline JPEG, the default method for storing JPEG
images.
The Modified Discrete Cosine Transform (MDCT) is the current standard for compression of
sound files. We will introduce MDCT and investigate its application for coding and decoding, which
provides the core technology of file formats such as MP3 and AAC. Finally, in analogy with the Fast
Fourier Transform of Chapter 10, we will show how to develop fast versions of the DCT and MDCT.
11.1.1 One-dimensional DCT
Let n be a positive integer. The one-dimensional discrete cosine transform of order n is defined by
the n × n matrix C whose entries are

2 1/ 2 if i = 0, j = 0, . . . , n − 1 ,
(C)ij = 1
i(j+ 2 )π (11.1)
n cos if i > 0, j = 1, . . . , n − 1
n
398 11.1 T HE D ISCRETE C OSINE T RANSFORM
or  1 1 1 

2

2
··· √
2
cos 2nπ
cos 3π ··· cos (2n−1)π
 
 2n 2n 
2
cos 2π cos 6π cos 2(2n−1)π

C= 
2n 2n ··· 2n
 (11.2)
n . . .


 .
. .
. .
.


cos (n−1)π
2n cos (n−1)3π
2n ··· cos (n−1)(2n−1)π
2n
When working with two-dimensional images, the convention is to begin with 0 instead of 1. The
notation will be easier if we extend this convention to matrix numbering, as you see we have done
in (11.1). For Chapter 11 only, we will use subscripts for n × n matrices that go from 0 to n − 1.
We will treat only the case where n is even in the following discussion.
Definition 11.1 The discrete cosine transform (DCT) of x = (x0 , . . . , xn−1 )T is the n-dimensional
vector y = (y0 , . . . , yn−1 ), where
y = Cx. (11.3)
Note that C is a real orthogonal matrix, meaning that its transpose is its inverse:
cos (n−1)π
 
1 π
√ cos 2n ··· 2n
 12 3π (n−1)3π 
2 2
 √ cos 2n ··· cos 2n 
C −1 = C T = . (11.4)

 . . .
n . . .
 . . . 

1 (2n−1)π (n−1)(2n−1)π

2
cos 2n · · · cos 2n
The rows of an orthogonal matrix are pairwise orthogonal unit vectors. The orthogonality of C
follows from the fact that the columns of C T are the unit eigenvectors of the real symmetric n × n
matrix  
1 −1
 −1 2 −1 
 
 −1 2 −1 
(11.5)
 
 .. .. .. 

 . . . 

 −1 2 −1 
−1 1
Exercise 11.1.7 asks the reader to verify this fact.
This puts us right back in the context of Assumption 10.1 of Chapter 10. The Orthogonal
Function Interpolation Theorem 10.8 applied to the discrete cosine matrix C implies:
Theorem 11.2 DCT Interpolation Theorem. Let tj = j for j = 0, . . . , n − 1 and let x =
(x0 , . . . , xn−1 ) be a vector of n real numbers. Define y = (y0 , . . . , yn−1 ) = Cx, where C is the discrete
cosine transform. Then the real function
√ n−1
1 2 k(2t + 1)π
Pn (t) = √ y0 + √ yk cos
n n 2n
k=1
satisfies Pn (j) = xj for j = 0, . . . , n − 1.
Chapter 11 C OMPRESSION 399
Thus the n × n matrix C transforms n data points into n interpolation coefficients. Like the discrete
Fourier transform, the discrete cosine transform gives coefficients for a trigonometric interpolation
function. Unlike the DFT, the DCT uses cosine terms only, and is defined entirely in terms of real
arithmetic.
1 x0
x1 x3
1 2 3
−1 x2
Figure 11.1: DCT interpolation and least squares approximation. The data points are (j, xj )
where x = (1, 0, −1, 0). The DCT interpolating function P4 (t) of (11.7) is shown as a solid curve
along with the least squares DCT approximation function P3 (t) of (11.8) as a dotted curve.
Example 11.1 Use the DCT to interpolate the points (0, 1), (1, 0), (2, −1), (3, 0).
The order-4 DCT multiplied by the data x = (1, 0, −1, 0) is
 1 1 1 1    

2

2

2 2
√ 1 0.0000
1 cos π cos 3π cos 5π cos 7π   0   0.9239 
  −1  ≈  1.0000
 8 8 8 8     (11.6)
2 cos 2π
8 cos 6π
8 cos 10π
8 cos 14π
8

cos 3π
8 cos 9π
8 cos 15π
8 cos 21π
8
0 −0.3827
According to Theorem 11.2, the function
1 1 (2t + 1)π 2(2t + 1)π 3(2t + 1)π
P4 (t) = · 0 + √ 0.9239 cos + cos − 0.3827 cos (11.7)
2 2 8 8 8
interpolates the four data points. The function P4 (t) is plotted in Figure 11.1.
Just as the DCT Interpolation Theorem 11.2is an immediate consequence of Theorem 10.8, the
least squares result Theorem 10.9 of Chapter 10 shows how to find a DCT least squares approxi-
mation of the data, using only part of the basis functions. Because of the orthogonality of the basis
functions, this can be accomplished by simply dropping the higher frequency terms.
Theorem 11.3 DCT Least Squares Approximation Theorem. Let tj = j for j = 0, . . . , n − 1 and
let x = (x0 , . . . , xn−1 ) be a vector of n real numbers. Define y = (y0 , . . . , yn−1 ) = Cx, where C
is the discrete cosine transform. Let 0 ≤ m ≤ n be another integer. Then the choice of coefficients
y0 , . . . , yn−1 in
√ m−1
1 2 k(2t + 1)π
Pm (t) = √ y0 + √ yk cos
n n 2n
k=1
n−1
minimizes the squared approximation error j=0 (Pm (j) − xj )2 of the n data points.
400 11.1 T HE D ISCRETE C OSINE T RANSFORM
0 S POTLIGHT ON: Orthogonality
The idea behind least squares approximations is that finding the shortest distance from a point to a plane
(or subspace in general) means constructing the perpendicular from the point to the plane. This con-
struction is carried out by the normal equations, as we saw in Chapter 2. In Chapters 10 and 11, this
concept is applied again and again to approximate as closely as possible with a relatively small set of ba-
sis functions, resulting in compression. The message of the later chapters is to carefully choose the basis
functions to be orthogonal, as reflected in the rows of the discrete transform matrix. Then the normal
equations become computationally very simple.
Referring to Example 11.1, if we require the best least squares approximation to the same four
data points, but use the three basis functions
(2t + 1)π 2(2t + 1)π
1, cos , cos
8 8
only, the solution is
1 1 (2t + 1)π 2(2t + 1)π
P3 (t) = · 0 + √ 0.9239 cos + cos . (11.8)
2 2 8 8
Figure 11.1 compares the least squares solution P3 with the interpolating function P4 .
Exercises 11.1
11.1.1. Write out the 2 × 2 DCT matrix and the general two-term DCT interpolation function P2 (t). Then find the
1D-DCT transform and the interpolating function for the data points. (a) [3, 3] (b) [2, −2] [Ans: The DCT
1 1 1 1 (2t + 1)π √
matrix is C = √ , and P2 (t) = √ y0 + y1 cos (a) y = [3 2, 0], P2 (t) = 3 (b) y =
√ 2 √1 −1 2 4
[0, 2 2], P2 (t) = 2 2 cos(2t + 1)π/4]
11.1.2. For n = 2, describe the m = 1 least squares DCT approximation in terms of the input data x = (x0 , x1 ).
11.1.3. Carry out the trigonometry to establish equations (11.10) and (11.11).
11.1.4. Find the 1D-DCT of the following data vectors x, and find the corresponding interpolating function Pn (t) for the
data points (i, xi ), i = 0, . . . , n − 1. You may state your answers in terms of the a, b, c defined in (11.11).
(a) [1, 0, −1, 0]
(b) [1, 1, 1, 1]
(c) [2, 0, −2, 0]
(d) [1, 2, 3, 4]
[Ans.: (a) y = [0, a(b + c), 1, a(c − b)], P4 (t) = a[a(b + c) cos(2t + 1)π/8 + cos 2(2t + 1)π/8 + a(c −
b) cos 3(2t + 1)π/8] (b) y = [2, 0, 0, 0], P4 (t) = 1 (c) y = [0, 2a(b + c), 2, 2a(c − b)], P4 (t) = 2a[a(b +
c) cos(2t + 1)π/8 + cos 2(2t + 1)π/8 + a(c − b) cos 3(2t + 1)π/8] (d) y = [10a, −3b − c, 0, b − 3c], P4 (t) =
5a + a[−(3b + c) cos(2t + 1)π/8 + (b − 3c) cos 3(2t + 1)π/8]]
11.1.5. Find the DCT least squares approximation with m = 2 terms for the data in Exercise 11.1.4.
Chapter 11 C OMPRESSION 401
11.1.6. Find the 1D-DCT of the data vectors (a) [1, 0, −1, 0, 1, 0, −1, 0] (b) [0.3, 0.6, 0.8, 0.5, 0.6, 0.4, 0.2, 0.3]
11.1.7. Show that the columns of C T are the unit eigenvectors of the matrix T in (11.5).
11.1.8.
Computer Problems 11.1
11.1.1.
11.2 Two-dimensional DCT and image compression
discrete cosine transform lies at the core of modern compression techniques for images and
T HE
audio. To discuss the former, we need to develop a two-dimensional version of the DCT. Later
we show how to transform and quantize rectangular grids of pixel values that make up an image.
11.2.1 The two-dimensional discrete cosine transform
The two-dimensional discrete cosine transform is simply the one-dimensional DCT applied in two
dimensions, one after the other. It can be used to interpolate or approximate data given on a two-
dimensional grid, in a straightforward analogy to the one-dimensional case. In the context of image
processing, the two-dimensional grid represents a block of pixel values, say grayscale intensities or
color intensities.
s
x x x x
30 31 32 33
3
x20 x21 x22 x23
2
x x x x
10 11 12 13
1
x x x x
00 01 02 03
0
0 1 2 3 t
Figure 11.2: Two-dimensional grid of data points. The 2D-DCT can be used to interpolate
function values on a square grid, such as pixel values of an image.
In Chapter 11 only, we will list the vertical coordinate first, and the horizontal coordinate sec-
ond, when referring to a two-dimensional point, as shown in Figure 11.2. Although radical, our
goal is to be completely consistent with the usual matrix convention, where the i index of entry xij
changes along the vertical direction, and j along the horizontal. A major application of this section
is to pixel files representing images, which are most naturally viewed as matrices of numbers.
Figure 11.2 shows a grid of (s, t) points in the two-dimensional plane with assigned values
xij at each rectangular grid point (si , tj ). For concreteness, we will often use the integer grid
si = {0, 1, . . . , n − 1} (remember, along the vertical axis) and tj = {0, 1, . . . , n − 1} along the
horizontal axis. The purpose of the 2D-DCT is to construct an interpolating function F (s, t) that fits
the n2 points (si , tj , xij ) for i, j = 0, . . . , n − 1. The 2D-DCT accomplishes this in an optimal way
402 11.2 T WO - DIMENSIONAL DCT AND IMAGE COMPRESSION
from the point of view of least squares, meaning that the fit degrades gracefully as basis functions
are dropped from the interpolating function.
The 2D-DCT is the one-dimensional DCT applied successively to both horizontal and vertical
directions. Consider the matrix X consisting of the values xij as in Figure 11.2. To apply the 1D-
DCT in the horizontal s-direction, we first need to transpose X, then multiply by C. The resulting
columns are the 1D-DCT’s of the rows of X. Each column of CX T corresponds to a fixed ti . To do
a 1D-DCT in the t-direction means moving across the rows, so again transposing and multiplying
by C yields
C(CX T )T = CXC T . (11.9)
Definition 11.4 The two-dimensional discrete cosine transform (2D-DCT) of the n × n matrix X is
the matrix Y = CXC T , where C is defined in (11.1).
If instead we perform the DCT vertically first, then horizontally, the result is the transpose of the
definition. See Exercise 11.2.3.
Example 11.2 Find the 2D discrete cosine transform of the data in Figure 11.3(a).
It is helpful to notice, using elementary trigonometry, that the 4 × 4 DCT matrix can be viewed as
 1 1 1 1   

2

2

2

2
a a a a
1  cos π cos 3π cos 5π cos 7π  = a b
  c −c −b 
C=√  8 8 8 8  (11.10)
2 cos 2π8 cos 6π8 cos 10π
8 cos 14π
8
  a −a −a a 
cos 3π8 cos 9π8 cos 15π
8 cos 21π
8
c −b b −c
where
√ √
π 1 π 2+ 2 3π 2− 2
a = cos = √ , b = cos = , c = cos = . (11.11)
4 2 8 2 8 2
From the definition, the 2D-DCT is the matrix
    
a a a a 1 1 1 1 a b a c
 b c −c −b  1 0 0 1   a c −a −b
Y = CXC T = a 

 a 
 a −a −a a  1 0 0 1   a −c −a b 
c −b b −c 1 1 1 1 a −b a −c
 
3 0 1 0
 0 0 0 0 
=
 1
 (11.12)
0 −1 0 
0 0 0 0
The inverse of the 2D-DCT is easy to express in terms of the DCT matrix C. Since Y = CXC T ,
the X is recovered as X = C T Y C.
Definition 11.5 The inverse two-dimensional discrete cosine transform of the n × n matrix Y is the
matrix X = C T Y C.
Chapter 11 C OMPRESSION 403
1 1 1 1 1.25 0.75 0.75 1.25
3 3
1 0 0 1 0.75 0.25 0.25 0.75
2 2
1 0 0 1 0.75 0.25 0.25 0.75
1 1
1 1 1 1 1.25 0.75 0.75 1.25
0 0
0 1 2 3 0 1 2 3
(a) (b)
Figure 11.3: Two-dimensional data for Example 11.2. (a) The 16 data points (i, j, xij ). (b)
Values of the least squares approximation (11.15) at the grid points.
There is a close connection between inverting a transform like the 2D-DCT and interpolation.
After all, the goal of interpolation is to recover the original data points from functions that are
constructed with the interpolating coefficients that came out of the transform. Recall that C is an
orthogonal matrix, so that C −1 = C T . The definition of 2D-DCT, Y = CXC T , can be rewritten as
a fact about interpolation, X = C T Y C, since in this equation the xij are being expressed in terms
of products of cosines.
To write a useful expression for the interpolating function, note that the definition of C in (11.1)
can be viewed as
2 i(2j + 1)π
(C)ij = ai cos (11.13)
n 2n
for i, j = 0, . . . , n − 1, where

1/ 2 if i = 0,
ai ≡
1 if i > 0.
According to the rules of matrix multiplication, the equation X = C T Y C translates to
n−1 n−1
T
xij = Cik ykl Clj
k=0 l=0
n−1 n−1
= Cki ykl Clj
k=0 l=0
n−1 n−1
2 k(2i + 1)π l(2j + 1)π
= ykl ak al cos cos . (11.14)
n 2n 2n
k=0 l=0
This is a statement about interpolation:
404 11.2 T WO - DIMENSIONAL DCT AND IMAGE COMPRESSION
Theorem 11.6 2D-DCT Interpolation Theorem. Let sj = j, ti = i for j, i = 0, . . . , n − 1 and let
X = (xij ) be a matrix of n2 real numbers. Let Y = (ykl ) be the two-dimensional discrete cosine

transform of X. Define a0 = 1/ 2 and ak = 1 for k > 0. Then the real function
n−1 n−1
2 k(2s + 1)π l(2t + 1)π
Pn (s, t) = ykl ak al cos cos
n 2n 2n
k=0 l=0
satisfies Pn (i, j) = xij for i, j = 0, . . . , n − 1.
Returning to Example 11.2, the only nonzero interpolation coefficients are y00 = 3, y02 = y20 = 1,
and y22 = −1. Writing out the interpolation function in the Theorem 11.6 yields
2 1 1 2(2t + 1)π 1 2(2s + 1)π 2(2s + 1)π 2(2t + 1)π
P4 (s, t) = y00 + √ y02 cos + √ y20 cos + y22 cos cos
4 2 2 8 2 8 8 8
1 1 1 2(2t + 1)π 1 2(2s + 1)π 2(2s + 1)π 2(2t + 1)π
= (3) + √ (1) cos + √ (1) cos + (−1) cos cos
2 2 2 8 2 8 8 8
3 1 (2t + 1)π 1 (2s + 1)π 1 (2s + 1)π (2t + 1)π
= + √ cos + √ cos − cos cos .
4 2 2 4 2 2 4 2 4 4
Checking the interpolation, we get for example
3 1 1 1
P4 (0, 0) = + + − =1
4 4 4 4
and
3 1 1 1
P4 (1, 2) = − − − = 0.
4 4 4 4
The constant term y00 /n of the interpolation function is called the ”DC” component of the expansion
(for ”direct current”, not ”discrete cosine”). It is the simple average of the data; the nonconstant
terms contain the fluctuations of the data about this average value. In this example, the average of
the 12 ones and 4 zeros is y00 /4 = 3/4.
Least squares approximations with the 2D-DCT are done in the same way as with the 1D-DCT.
The ”higher-frequency” basis functions, those whose coefficients have larger indices, are dropped
from the interpolating function. Consider the example we have been working on. The best least
squares fit to the basis functions
i(2s + 1)π j(2t + 1)π
cos cos
8 8
for i + j ≤ 2 is given by dropping the i = j = 2 term, leaving
3 1 (2t + 1)π 1 (2s + 1)π
P (s, t) = + √ cos + √ cos . (11.15)
4 2 2 4 2 2 4
This least squares approximation is shown in Figure 11.3(b).
The Matlab command for the one-dimensional DCT of a vector x is
>> y=dct(x)
Chapter 11 C OMPRESSION 405
The input to dct can be a vector or a matrix. In the latter case, the DCT of each column is re-
turned. Surprisingly, Matlab does not compute the DCT by matrix multiplication by the appropriate
DCT matrix C, but by way of the Fast Fourier Transform, to maximize efficiency. We will investi-
gate this approach in the last section of the chapter. Of course, Matlab can be forced to display the
n × n DCT matrix with the command
>> C=dct(eye(n))
To have Matlab carry out the 2D-DCT, we fall back to our definition of it in (11.9). Therefore
for a matrix X, the command
>> Y=dct(dct(X’)’)
computes the 2D-DCT with two applications of the 1D-DCT.
11.2.2 Image compression
The concept of orthogonality, as represented in the discrete cosine transform, is crucial to perform-
ing image compression. Images consist of pixels, each represented by a number (or a vector, for
color images). The convenient way that methods like the DCT can carry out least squares approx-
imation makes it easy to reduce the number of bits needed to represent the pixel values, while
degrading the picture only slightly, and perhaps imperceptibly to human viewers.
Figure 11.4: Gray-scale image. Each pixel is represented by an integer between 0 and 255. The
pixels form a 64 by 64 grid.
Figure 11.4 shows a grayscale rendering of a 64 × 64 array of pixels. The grayness of each pixel
is represented by one byte, a collection of 8 bits representing 0 to 255, with 0 being black and 255
being white. We can think of the information shown in the figure as a 64 × 64 array of integers.
Represented in this way, the picture requires 8(64)2 = 215 = 32K bits of information.
Matlab imports grayscale or RGB values of images using standard image formats. For example,
given a grayscale file picture.jpg, the commands
>> picture = imread(’picture.jpg’);
>> picture = double(picture);
406 11.2 T WO - DIMENSIONAL DCT AND IMAGE COMPRESSION
puts the matrix of grayscale values into the variable picture. If the image is RGB color, the array
variable will have a third dimension to index the three colors. We will restrict attention to grayscale
for most of the section. Extensions to color are easy, and we indicate the details at the end.
To begin, we simplify the problem to a single 8 × 8 block of pixels, as shown in Figure 11.5(a).
This block was taken from an area of Figure 11.4 just to the left of the nose. Figure 11.5(b) shows the
one-byte integers that represent the grayscales of the 64 pixels. In Figure 11.5(c) we have subtracted
256/2 = 128 from the pixel numbers to make them approximately centered around zero. This step
is not essential, but better use of the 2D-DCT will result because of this centering.
8 8 8
204 206 213 209 199 193 185 200 76 78 85 81 71 65 57 72
7 7 7
210 215 206 197 204 193 165 171 82 87 78 69 76 65 37 43
6 6 6
228 195 212 214 124 104 161 197 100 67 84 86 −4 −24 33 69
5 5 5
212 199 200 95 46 168 202 189 84 71 72 −33 −82 40 74 61
4 4 4
203 199 103 41 80 180 194 181 75 71 −25 −87 −48 52 66 53
3 3 3
179 88 46 72 164 120 68 72 51 −40 −82 −56 36 −8 −60 −56
2 2 2
127 102 97 169 217 190 174 115 −1 −26 −31 41 89 62 46 −13
1 1 1
79 133 158 189 192 201 196 190 −49 5 30 61 64 73 68 62
0 0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
(a) (b) (c)
Figure 11.5: Example of 8 × 8 block. (a) Grayscale view (b) Grayscale pixel values (c) Grayscale
pixel values minus 128.
To compress the 8 × 8 pixel block shown, we will transform the matrix of grayscale pixel values
−49 5 30 61 64 73 68 62
 
 −1 −26 −31 41 89 62 46 −13 
 
 51 −40 −82 −56 36 −8 −60 −56 


Use: 0.299