JPEG Gems
by
bkenwright@xbdev.net
So you have your Red, Green and Blue values… the first thing
to do is convert them to Y, U, and V (alternatively called Y for Lumminance, Cb
(or U) and Cr (or V) for Chrominance).
The Y component contains the details of the image, while the U and V the
colour details. If you display the Y
component you’ll find that it’s the equivalent black and white version of the
image…. Its like viewing a colour tv picture on a black and white tv J
------------------------------------------------------------------------------------------------------------------------------
[Y Cb Cr] -> [R G
B]
------------------------------------------------------------------------------------------------------------------------------
R = Y + 1.402 * (Cr – 128)
G = Y – 0.34414 * (Cb-128) – 0.71414 * (Cr – 128)
B = Y + 1.772 * (Cb – 128)
So I bet your wondering how you would code such a simple few
equations… well I’ll show you a simple example which you could use:
void convert_yuv_to_rgb(int y, int u, int v,
unsigned char* r, unsigned
char* g, unsigned char* b)
{
float red,
green, blue;
red = y +
1.402*(v-128);
if
(red<0) red = 0;
if
(red>255) red=255;
green ==
y-0.34414*(u-128)-0.71414*(v-128);
if( green
< 0) green = 0;
if( green
> 255) green = 255;
blue =
y+1.772*(u-128);
if( blue
< 0) blue = 0;
if( blue
> 255) blue = 255;
*r =
(unsigned char) red;
*g =
(unsigned char) green;
*b =
(unsigned char) blue;
}
Pretty simple ey?…
Level shifting
Well the jpeg format, after converting either from YUV to
RGB for a decoder or from RGB to our YUV for encoding shifts the values to a
signed value. So for example, if we are
decoding a Jpeg, we must subtract 128 from all our YUV values before converting
them to RGB.
Again I’ll show you a snippet of code which demonstrates
such a simple principle.
Note: I’ll assume that our image is in an 8x8 block of
integers, so I pass in a pointer to this block and shift it up by 128. As where passing in a pointer to the image
array, note that the changes in the function will be in place after we leave
the function.
void shift_block(int block[8][8] )
{
for(int y=0;
y<8; y++)
{
for( int x=0; x<8; x++)
{
block[x][y]
+= 128;
}
}
}
Yup its really that simple…. we can call this function and
pass it our 8x8 block of YUV values and it will shift all the values up by 128.
------------------------------------------------------------------------------------------------------------------------------
8x8 Discrete Cosine
Transform (DCT).
------------------------------------------------------------------------------------------------------------------------------
Well I better show you the official DCT and IDCT (the
forward and reverse) Discrete Cosine Transform.
FDCT:
C(u,
v) 7 7 2*x+1 2*y+1
F(u,v) = ---------
* sum sum f(x,y) *
cos ( ------- * u * PI ) * cos ( ------- * v * PI )
4 x=0 y=0 16 16
u,v = 0, 1, …. 7.
{
½ when u=v=0
C(u,v) = {
1/sqrt(2) when u=0, v!=0
{
1/sqrt(2) when u!=0, v=0
{
1 otherwise
IDCT:
1 7
7 2*x+1 2*y+1
f(x, y) = ---- *
sum sum c(u, v) * F(u, v) * cos ( ------- * u * PI ) * cos ( ------- * v * PI )
4 u=0 v=0 16 16
x, y = 0, 1, 2… 7.
Just when things where going so well along comes this and
where stuck…. Well its not as hard as its looks, if you break it up into little
pieces we can write a nice simple function which will do this for us
nicely. I’ll show you how to write a
IDCT function (an inverse cosine transform function).
void idct ( int in_block[8][8], int out_block[8][8])
{
for(int
y=0; y<8; y++)
{
for(int x=0; x<8; x++)
{
out_block[x][y] = f( x, y, in_block);
}
}
}
int f(int x, int y, int block[8][8])
{
float
sum=0;
for( int
u=0; u<8; u++)
{
for(int
v=0; v<8; v++)
{
sum += ( C(u) * C(v) ) * block[u][v] * cos(
((2*x+1) * u * PI) / 16) * cos( ((2*y+1)
* v * PI) / 16);
}
}
return
(int) ((1.0/4.0) * sum);
}
float C(int u)
{
if (u == 0)
return (1.0/squrt(2));
else
return 1.0;
}
Well it wasn’t’ that bad was it? Its not very optimized but you get the jist of it…. You should be
able to decode a DCT transformed array now back to its original form without to
much trouble. I must mention you could
later on optimize these few functions so that it uses matrix’s… but that’s another
story.
------------------------------------------------------------------------------------------------------------------------------
ZigZag - De-ZigZag?
------------------------------------------------------------------------------------------------------------------------------
Well when I first come across this idea of how the data was
arranged … so that it was stored in a zig-zag fashion I thought it was going to
be a nightmare to encode and decode, involving loops and loads of if’s and else’s…
but it’s a lot simpler than you think to do.
A simple example of how you would go around decoding a
zig-zag encoded 8x8 block is as follows:
int zigzag_array[64] =
{
0, 1, 5, 6, 14, 15,
27, 28,
2, 4, 7, 13, 16, 26,
29, 42,
3, 8, 12,
17, 25, 30, 41,
43,
9, 11,
18, 24, 31, 40, 44,
53,
10, 19, 23, 32, 39, 45, 52, 54,
20, 22, 33, 38, 46, 51, 55,
60,
21, 34, 37, 47, 50,
56, 59, 61,
35, 36, 48, 49, 57, 58,
62, 63,
};
void de_zigzag(int out_block[64], int in_block[64])
{
for(int i=0;
i<64; i++)
{
out_block[ zig_zag_array[i] ] =
in_block[i];
}
}
Hmmm… how does that work, well if you follow it through you’ll
see just how simple it is and how easy it can be to de-zig zag your 64 elements
which you have read in from your jpeg.
But wait, I don’t want you to wonder about converting the 64
element array into a 8x8 array… I’ll show you how I would do it.
void transform_array(int in_array[64], int out_array[8][8])
{
int cc = 0;
for( int
y=0; y<8; y++)
{
for(
int x=0; x<8; x++)
{
out_array[x][y] =
in_array[cc];
cc++;
}
}
}
------------------------------------------------------------------------------------------------------------------------------
Quantization – lossy file
format
------------------------------------------------------------------------------------------------------------------------------
Well without the Quantization part of the jpeg file format,
the image would be restored to its original glory…but at the loss of a less
compressed file. We usually extract the
quantization table from the jpeg file, and is simply a 64 array of
integers. When generating the jpeg we
divide the 64 array by our image values… so that certain higher frequency parts
of the image are lost (e.g. if we divide 10 by 100 for example this will be
rounded off to zero).
For decoding of the image, we simply multiply each component
by the value in the 64 image array.
void dequantize_block( int block[64], int quant_block[64] )
{
for( int c=0;
c<64; c++)
{
block[c]
= block[c] * quant_block[c];
}
}
------------------------------------------------------------------------------------------------------------------------------
Huffman Tables and
codes.
------------------------------------------------------------------------------------------------------------------------------
When the jpeg file format is created, the actual codes that
you need to decode the image from the data is not stored in the jpeg file. What is stored is the lengths and there
values to be… from this you must generate the codes for each one yourself.
This is what you get from a jpg file:
List of 16 int which represent length in bits and how many
there are…
No Lengths: 1 2
3 4 5 6 7
8 9 10 11 12
13 14 15 16
Its difficult to understand, but for a typical jpeg file you
would have:
No Lengths: 0, 0, 3,
0, 4, 0 , 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0
What this tells us is that we have 3 codes of length 3, and
4 codes of length 5… a total of 7 codes which would follow next.
So you have the code and you have the length but how do you
generate the codes?… and what do these codes look like?
Here’s an example of some data that we get form the file and
what we generate:
|----------get from the jpg file----------------|
|--------generate with our algorithm---------|
value length in bits code (hex) code(binary)
0 2 0 00
1 3 2 010
2 3 3 011
3 3 4 100
4 3 5 101
5 3 6 110
6 4 e 1110
7 5 1e 11110
8 6 3e 111110
9 7 7e 1111110
a 8 fe 11111110
b 9 1fe 111111110
Those are typical values you could get for a DC Table in a
jpeg file.. and in fact are one’s I actually took from a simple jpeg file.
Generating code!! … it’s a lot harder than it looks.. its
simply this, you pass in the number of values you have, and there lengths and
you’ll get there corresponding codes… this function does exactly that.
struct stBlock
{
int value;
int length // in
bits.
unsigned short
int code; // 2 byte code
};
void gen_huf_codes( int num_codes, stBlock* array
{
int hufcounter =
0;
int
codelengthcounter = 1;
for(int cc=0;
cc< num_codes; cc++)
{
if(
array[cc].length = = codelengthcounter
)
{
array[cc].code
= hufcounter;
hufcounter++;
}
else
{
hufcounter = ( hufcounter << 1 );
codelengthcounter ++;
}
}
}
for example, if num_codes is 24, then array would be stBlock
array[24];… e.g. an array of 24 elements.
The above function would be passed the array of stBlock elements then it
would generate the equivalent Huffman code for each value.
------------------------------------------------------------------------------------------------------------------------------
Software Solutions!
------------------------------------------------------------------------------------------------------------------------------
Well you may be thinking that if we are going to read or
write from a file 1 “bit” at a time … how are we to do this? And all how do we go about checking for
values etc? To keep your mind occupied
and give you some relatively simple examples of how you would go about getting
your data etc, and some other useful functions that you may use.
During various stages of reading the jpg file you’ll either
need to read a word (2 bytes) a byte (8 bits) or 1 bit … so that its nice and
simple… I put together 3 functions that do each of these
unsigned short int get_word(FILE* fp)
{
unsigned
char byte_buf[2];
fread(
&byte_buf, 2, 1, fp);
return (
(byte_buf[0] << 8) | ( byte_buf[1] >> 1) );
}
Remember that where reading the data in using big-endian…which
means that bytes are arranged the opposite away around compared to the
little-endian way that the intel uses.
So we have to swap our values around… for example instead of having
0x1155 we’d shift that around so it would be 0x5511.
unsigned char get_byte(FILE* fp)
{
unsigned
char byte_buf;
fread(&byte_buf,
1, 1, fp);
return
byte_buf;
}
Well its simple enough you could skip this function if you
want, but I think its nice and tidy.
unsigned int current_bit = 0;
unsigned char g_byte;
unsigned char get_bit(FILE* fp)
{
unsigned
char byte_len = 8;
unsigned
char mask;
if(
current_bit == 0 )
{
current_byte = get_c(fp);
if( current_byte = = 0xff )
{
unsigned char byte_buf = get_c(f);
if( byte_buf = = 0 )
// do nothing and just carry on
else
// if we get here an error has occurred somewhere onlong the line?
}
mask = 1 << (byte_len – current_bit
– 1);
current_bit ++;
if( current_bit = = byte_len ) current_bit = 0;
if( (current_byte & mask) = = mask )
return 1;
else
return 0;
}
}
Very important to remember that we have the global variable
which must hold our data, as we are still reading in 1 byte at a time but using
a binary mask, so we are in fact checking each bit one at a time as we shift
the mask along.
Something you’ll defenetly need to do, after you usually
decode your Huffman value from your data bits that you’ve read in, it tells you
how many bits to read in after that, so we need a function which will read in “n”
number of bits.
int get_num_bits(FILE* fp, int length)
{
char bit;
unsigned int
number=0;
unsigned int mask;
mask = 1
<< (length – 1);
for( int u = 0;
u<length; u++)
{
bit = get_bit(fp); // using one of the
functions I defined just above.
if( bit = =
1)
{
number |=1;
};
mask = mask
>> 1;
}
if( number < 1 <<(length-1) )
/*negative number*/
return
number + ( (-1) length) + 1;
else
/*positive number*/
return
number;
}
And my mind has gone blank!…lol… well I think that covers
most of the main parts of decoding a jpeg file…. It should be pretty easy to
see how the pieces go together once you have analysed the data that you’re
getting from the various jpeg markers.
The best way to go around decoding a jpeg is to set up a
switch statement, so that you read in a section marker, e.g. (0xffc4 – DHT –
Define Huffman Table) then you would read in that data and put it into an array
structure which you can use later on when you need it (e.g. in the processing
of the Start Of Scan section).
The reason I put the various parts into nice little
functions above is so that you can see how to go about decoding the jpg format
and also when you make your first jpg decoder its good to actually view the
various sections and see what data they produce.
Sequence of events you may follow in decoding a jpeg file:
[1]. First the DC
coefficient decode:
(a). Fetch a valid hufman code (you check if it
exists in the Huffman dc table).
(b). See at what category this hufman code corresponds
(e.g. positive or negative etc).
(c). Fetch N bits from the file, and determine
what value is represented.
(d). DC += Difference.
(e). Write DC into the 64 array vector….. (e.g.
array[0] = DC ).
[2]. The 63 other AC
coefficients to decode next:
(a). Fetch a valid Huffman code (check it in the
AC Huffman table).
(b). Decode that Huffman code : The Huffman code
corresponds to ( 4 upper bits, 4 lower
bits ) which means, the 4 upper bits means how many zero’s in the array, and the
4 lower bits is the num of bits to read next to get our value.
(c). Fetch N bits (which we found out in (b) )
and determine what value is represented by it.
(d). Write in the number of zero’s form the upper
nibble (e.g. 4 bits we got from (b).)
(c). Increment the AC counter with number of
zeros
(d). Write the AC value we got into the array in
its correct position. (e.g.
vector[21]=AC_value ).
+For the rest of the decoding parts+
[3]. De-quantize the
64 element array : “e.g. for (i=0; i<64; i++) array[i] *= quant_array[i];
[4]. Re-order from the
zig-zag, the 64 elements.
[5]. Convert from a
64 element array to a 8x8 array.
[6]. Apply the
Inverse DCT to our 8x8 block.
[7]. Level shift
(e.g. add 128 to each block element).
[8]. Transform YcbCr
to the RGB
Wow.. we have decoded a jpg image file!
Well it’s a lot of work but once you’ve gone over all the
details once or twice and looked at a few of the coding examples you’ll soon
get the gist of it.
Notes:
EOB – End Of Block – [ 4 upper bits, 4 lower bits] = [0, 0]…
which means all the rest of the elements in the 64 array are zero.
Sampling factor, you may have different values of Y, Cb and
Cr for each image section, for example… the top left corner of the image could
have different horiz and vert frequencies for the Y, Cb, and Cr which would
mean that you might need 4 Y blocks and 1 Cb block and 1 Cr block…?
How does that work ?
Well if Y has a horiz and vert sampling freq of 2, then it means each “block”
is four 8x8 arrays… now if the corresponding Cb and Cr would then have
frequencies of 1, it would mean that for each Cb and Cr they would have to be
stretched to fit across each Y… so in effect where stretching our (8x8) Cb and
Cr across a 16x16 wide block.
Why do this? Well
its all to do with the fact that the Y block contains all the information of
our image, while the Cb and Cr contains the colour.. if we lost our colour components it wouldn’t’ be to bad as we’d
just have a black and white image…we could still make out what it was in the
image… but if we loose the Y block we would not be even able to tell what the
image was.