JPEG Gems

by bkenwright@xbdev.net

So you have your Red, Green and Blue values… the first thing to do is convert them to Y, U, and V (alternatively called Y for Lumminance, Cb (or U) and Cr (or V) for Chrominance). The Y component contains the details of the image, while the U and V the colour details. If you display the Y component you’ll find that it’s the equivalent black and white version of the image…. Its like viewing a colour tv picture on a black and white tv J

------------------------------------------------------------------------------------------------------------------------------

[Y Cb Cr] -> [R G B]

------------------------------------------------------------------------------------------------------------------------------

R = Y + 1.402 * (Cr – 128)

G = Y – 0.34414 * (Cb-128) – 0.71414 * (Cr – 128)

B = Y + 1.772 * (Cb – 128)

So I bet your wondering how you would code such a simple few equations… well I’ll show you a simple example which you could use:

void convert_yuv_to_rgb(int y, int u, int v,

unsigned char* r, unsigned char* g, unsigned char* b)

{

float red, green, blue;

red = y + 1.402*(v-128);

if (red<0) red = 0;

if (red>255) red=255;

green == y-0.34414*(u-128)-0.71414*(v-128);

if( green < 0) green = 0;

if( green > 255) green = 255;

blue = y+1.772*(u-128);

if( blue < 0) blue = 0;

if( blue > 255) blue = 255;

*r = (unsigned char) red;

*g = (unsigned char) green;

*b = (unsigned char) blue;

}

Pretty simple ey?…

Level shifting

Well the jpeg format, after converting either from YUV to RGB for a decoder or from RGB to our YUV for encoding shifts the values to a signed value. So for example, if we are decoding a Jpeg, we must subtract 128 from all our YUV values before converting them to RGB.

Again I’ll show you a snippet of code which demonstrates such a simple principle.

Note: I’ll assume that our image is in an 8x8 block of integers, so I pass in a pointer to this block and shift it up by 128. As where passing in a pointer to the image array, note that the changes in the function will be in place after we leave the function.

void shift_block(int block[8][8] )

{

for(int y=0; y<8; y++)

{

for( int x=0; x<8; x++)

{

block[x][y] += 128;

}

Yup its really that simple…. we can call this function and pass it our 8x8 block of YUV values and it will shift all the values up by 128.

------------------------------------------------------------------------------------------------------------------------------

8x8 Discrete Cosine Transform (DCT).

------------------------------------------------------------------------------------------------------------------------------

Well I better show you the official DCT and IDCT (the forward and reverse) Discrete Cosine Transform.

FDCT:

C(u, v) 7 7 2*x+1 2*y+1

F(u,v) = --------- * sum sum f(x,y) * cos ( ------- * u * PI ) * cos ( ------- * v * PI )

4 x=0 y=0 16 16

u,v = 0, 1, …. 7.

{ ½ when u=v=0

C(u,v) = { 1/sqrt(2) when u=0, v!=0

{ 1/sqrt(2) when u!=0, v=0

{ 1 otherwise

IDCT:

1 7 7 2*x+1 2*y+1

f(x, y) = ---- * sum sum c(u, v) * F(u, v) * cos ( ------- * u * PI ) * cos ( ------- * v * PI )

4 u=0 v=0 16 16

x, y = 0, 1, 2… 7.

Just when things where going so well along comes this and where stuck…. Well its not as hard as its looks, if you break it up into little pieces we can write a nice simple function which will do this for us nicely. I’ll show you how to write a IDCT function (an inverse cosine transform function).

void idct ( int in_block[8][8], int out_block[8][8])

{

for(int y=0; y<8; y++)

{

for(int x=0; x<8; x++)

{

out_block[x][y] = f( x, y, in_block);

}

int f(int x, int y, int block[8][8])

{

float sum=0;

for( int u=0; u<8; u++)

{

for(int v=0; v<8; v++)

{

sum += ( C(u) * C(v) ) * block[u][v] * cos( ((2*x+1) * u * PI) / 16) * cos( ((2*y+1) * v * PI) / 16);

}

return (int) ((1.0/4.0) * sum);

}

float C(int u)

{

if (u == 0)

return (1.0/squrt(2));

else

return 1.0;

}

Well it wasn’t’ that bad was it? Its not very optimized but you get the jist of it…. You should be able to decode a DCT transformed array now back to its original form without to much trouble. I must mention you could later on optimize these few functions so that it uses matrix’s… but that’s another story.

------------------------------------------------------------------------------------------------------------------------------

ZigZag - De-ZigZag?

------------------------------------------------------------------------------------------------------------------------------

Well when I first come across this idea of how the data was arranged … so that it was stored in a zig-zag fashion I thought it was going to be a nightmare to encode and decode, involving loops and loads of if’s and else’s… but it’s a lot simpler than you think to do.

A simple example of how you would go around decoding a zig-zag encoded 8x8 block is as follows:

int zigzag_array[64] =

{

0, 1, 5, 6, 14, 15, 27, 28,

2, 4, 7, 13, 16, 26, 29, 42,

3, 8, 12, 17, 25, 30, 41, 43,

9, 11, 18, 24, 31, 40, 44, 53,

10, 19, 23, 32, 39, 45, 52, 54,

20, 22, 33, 38, 46, 51, 55, 60,

21, 34, 37, 47, 50, 56, 59, 61,

35, 36, 48, 49, 57, 58, 62, 63,

};

void de_zigzag(int out_block[64], int in_block[64])

{

for(int i=0; i<64; i++)

{

out_block[ zig_zag_array[i] ] = in_block[i];

}

Hmmm… how does that work, well if you follow it through you’ll see just how simple it is and how easy it can be to de-zig zag your 64 elements which you have read in from your jpeg.

But wait, I don’t want you to wonder about converting the 64 element array into a 8x8 array… I’ll show you how I would do it.

void transform_array(int in_array[64], int out_array[8][8])

{

int cc = 0;

for( int y=0; y<8; y++)

{

for( int x=0; x<8; x++)

{

out_array[x][y] = in_array[cc];

cc++;

}

------------------------------------------------------------------------------------------------------------------------------

Quantization – lossy file format

------------------------------------------------------------------------------------------------------------------------------

Well without the Quantization part of the jpeg file format, the image would be restored to its original glory…but at the loss of a less compressed file. We usually extract the quantization table from the jpeg file, and is simply a 64 array of integers. When generating the jpeg we divide the 64 array by our image values… so that certain higher frequency parts of the image are lost (e.g. if we divide 10 by 100 for example this will be rounded off to zero).

For decoding of the image, we simply multiply each component by the value in the 64 image array.

void dequantize_block( int block[64], int quant_block[64] )

{

for( int c=0; c<64; c++)

{

block[c] = block[c] * quant_block[c];

}

------------------------------------------------------------------------------------------------------------------------------

Huffman Tables and codes.

------------------------------------------------------------------------------------------------------------------------------

When the jpeg file format is created, the actual codes that you need to decode the image from the data is not stored in the jpeg file. What is stored is the lengths and there values to be… from this you must generate the codes for each one yourself.

This is what you get from a jpg file:

List of 16 int which represent length in bits and how many there are…

No Lengths: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Its difficult to understand, but for a typical jpeg file you would have:

No Lengths: 0, 0, 3, 0, 4, 0 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

What this tells us is that we have 3 codes of length 3, and 4 codes of length 5… a total of 7 codes which would follow next.

So you have the code and you have the length but how do you generate the codes?… and what do these codes look like?

Here’s an example of some data that we get form the file and what we generate:

|----------get from the jpg file----------------| |--------generate with our algorithm---------|

value length in bits code (hex) code(binary)

0 2 0 00

1 3 2 010

2 3 3 011

3 3 4 100

4 3 5 101

5 3 6 110

6 4 e 1110

7 5 1e 11110

8 6 3e 111110

9 7 7e 1111110

a 8 fe 11111110

b 9 1fe 111111110

Those are typical values you could get for a DC Table in a jpeg file.. and in fact are one’s I actually took from a simple jpeg file.

Generating code!! … it’s a lot harder than it looks.. its simply this, you pass in the number of values you have, and there lengths and you’ll get there corresponding codes… this function does exactly that.

struct stBlock

{

int value;

int length // in bits.

unsigned short int code; // 2 byte code

};

void gen_huf_codes( int num_codes, stBlock* array

{

int hufcounter = 0;

int codelengthcounter = 1;

for(int cc=0; cc< num_codes; cc++)

{

if( array[cc].length = = codelengthcounter )

{

array[cc].code = hufcounter;

hufcounter++;

}

else

{

hufcounter = ( hufcounter << 1 );

codelengthcounter ++;

}

for example, if num_codes is 24, then array would be stBlock array[24];… e.g. an array of 24 elements. The above function would be passed the array of stBlock elements then it would generate the equivalent Huffman code for each value.

------------------------------------------------------------------------------------------------------------------------------

Software Solutions!

------------------------------------------------------------------------------------------------------------------------------

Well you may be thinking that if we are going to read or write from a file 1 “bit” at a time … how are we to do this? And all how do we go about checking for values etc? To keep your mind occupied and give you some relatively simple examples of how you would go about getting your data etc, and some other useful functions that you may use.

During various stages of reading the jpg file you’ll either need to read a word (2 bytes) a byte (8 bits) or 1 bit … so that its nice and simple… I put together 3 functions that do each of these

unsigned short int get_word(FILE* fp)

{

unsigned char byte_buf[2];

fread( &byte_buf, 2, 1, fp);

return ( (byte_buf[0] << 8) | ( byte_buf[1] >> 1) );

}

Remember that where reading the data in using big-endian…which means that bytes are arranged the opposite away around compared to the little-endian way that the intel uses. So we have to swap our values around… for example instead of having 0x1155 we’d shift that around so it would be 0x5511.

unsigned char get_byte(FILE* fp)

{

unsigned char byte_buf;

fread(&byte_buf, 1, 1, fp);

return byte_buf;

}

Well its simple enough you could skip this function if you want, but I think its nice and tidy.

unsigned int current_bit = 0;

unsigned char g_byte;

unsigned char get_bit(FILE* fp)

{

unsigned char byte_len = 8;

unsigned char mask;

if( current_bit == 0 )

{

current_byte = get_c(fp);

if( current_byte = = 0xff )

{

unsigned char byte_buf = get_c(f);

if( byte_buf = = 0 )

// do nothing and just carry on

else

// if we get here an error has occurred somewhere onlong the line?

}

mask = 1 << (byte_len – current_bit – 1);

current_bit ++;

if( current_bit = = byte_len ) current_bit = 0;

if( (current_byte & mask) = = mask )

return 1;

else

return 0;

}

Very important to remember that we have the global variable which must hold our data, as we are still reading in 1 byte at a time but using a binary mask, so we are in fact checking each bit one at a time as we shift the mask along.

Something you’ll defenetly need to do, after you usually decode your Huffman value from your data bits that you’ve read in, it tells you how many bits to read in after that, so we need a function which will read in “n” number of bits.

int get_num_bits(FILE* fp, int length)

{

char bit;

unsigned int number=0;

unsigned int mask;

mask = 1 << (length – 1);

for( int u = 0; u<length; u++)

{

bit = get_bit(fp); // using one of the functions I defined just above.

if( bit = = 1)

{

number |=1;

};

mask = mask >> 1;

}

if( number < 1 <<(length-1) )

/*negative number*/

return number + ( (-1) length) + 1;

else

/*positive number*/

return number;

}

And my mind has gone blank!…lol… well I think that covers most of the main parts of decoding a jpeg file…. It should be pretty easy to see how the pieces go together once you have analysed the data that you’re getting from the various jpeg markers.

The best way to go around decoding a jpeg is to set up a switch statement, so that you read in a section marker, e.g. (0xffc4 – DHT – Define Huffman Table) then you would read in that data and put it into an array structure which you can use later on when you need it (e.g. in the processing of the Start Of Scan section).

The reason I put the various parts into nice little functions above is so that you can see how to go about decoding the jpg format and also when you make your first jpg decoder its good to actually view the various sections and see what data they produce.

Sequence of events you may follow in decoding a jpeg file:

[1]. First the DC coefficient decode:

(a). Fetch a valid hufman code (you check if it exists in the Huffman dc table).

(b). See at what category this hufman code corresponds (e.g. positive or negative etc).

(c). Fetch N bits from the file, and determine what value is represented.

(d). DC += Difference.

(e). Write DC into the 64 array vector….. (e.g. array[0] = DC ).

[2]. The 63 other AC coefficients to decode next:

(a). Fetch a valid Huffman code (check it in the AC Huffman table).

(b). Decode that Huffman code : The Huffman code corresponds to ( 4 upper bits, 4 lower bits ) which means, the 4 upper bits means how many zero’s in the array, and the 4 lower bits is the num of bits to read next to get our value.

(c). Fetch N bits (which we found out in (b) ) and determine what value is represented by it.

(d). Write in the number of zero’s form the upper nibble (e.g. 4 bits we got from (b).)

(c). Increment the AC counter with number of zeros

(d). Write the AC value we got into the array in its correct position. (e.g. vector[21]=AC_value ).

+For the rest of the decoding parts+

[3]. De-quantize the 64 element array : “e.g. for (i=0; i<64; i++) array[i] *= quant_array[i];

[4]. Re-order from the zig-zag, the 64 elements.

[5]. Convert from a 64 element array to a 8x8 array.

[6]. Apply the Inverse DCT to our 8x8 block.

[7]. Level shift (e.g. add 128 to each block element).

[8]. Transform YcbCr to the RGB

Wow.. we have decoded a jpg image file!

Well it’s a lot of work but once you’ve gone over all the details once or twice and looked at a few of the coding examples you’ll soon get the gist of it.

Notes:

EOB – End Of Block – [ 4 upper bits, 4 lower bits] = [0, 0]… which means all the rest of the elements in the 64 array are zero.

Sampling factor, you may have different values of Y, Cb and Cr for each image section, for example… the top left corner of the image could have different horiz and vert frequencies for the Y, Cb, and Cr which would mean that you might need 4 Y blocks and 1 Cb block and 1 Cr block…?

How does that work ? Well if Y has a horiz and vert sampling freq of 2, then it means each “block” is four 8x8 arrays… now if the corresponding Cb and Cr would then have frequencies of 1, it would mean that for each Cb and Cr they would have to be stretched to fit across each Y… so in effect where stretching our (8x8) Cb and Cr across a 16x16 wide block.

Why do this? Well its all to do with the fact that the Y block contains all the information of our image, while the Cb and Cr contains the colour.. if we lost our colour components it wouldn’t’ be to bad as we’d just have a black and white image…we could still make out what it was in the image… but if we loose the Y block we would not be even able to tell what the image was.