www.xbdev.net
xbdev - software development
Tuesday September 17, 2024
Home | Contact | Support | Tutorials - Tips - Secrets - for C and C++.. It does all the hard work for us..
>>

Optimising Your C

by bkenwright@xbdev.net

 

Everyone when learning C or even C++, should know the basics of optimising your code... as no matter how fast your CPU gets, you always want to squeeze more and more out of it....  Make your CPU really perform for you.  So we'll go through some common optimisation techniques that would be useful to just know...or maybe add to some of your code...

 

 

Lightning Fast Multiplication and Divide

 

Now when someone first said to me, how can you improve this line of code:

int i=2;

i = i*4;

 

Looking at it you think, its i*4....its a single line...it can't possibly be improved any more...can it?  Well this is where your secret knowledge of binary arithmetic comes in handy, as a binary shift left (<<), or shift right (>>) uses less CPU time than an arithmetic multiplication.

Shifting left (<<) is multiplying by the power of 2, and shifting right (>>) is dividing by the power of 2....

 

Looking at the powers of 2..

2

4

8

16

32

64

128

258

....

 

So for our little example above, we have:

    i = i*4;

and we change it to:

    i = i<<2;

 

Shifting it once, is the same as multiplying it by 2, shifting it two bits, is the same as multiplying it by 4 etc.

 

Lets use a real world example... now in the days of dos, or in console game programming such as GBA or XBOX possibly, you can access memory directly, and that case memory is seen as a single array of pixels.  And to set a pixel at (x, y) you need to do this:

screen[y*480 + x] = colour;

 

But 480 isn't a power of 2?....Well this is where the magic comes in.... we can break 480 into 2 numbers,

 

512-32 = 480

y*(512-32) is the same as y*(480)

multiply out gives:

y*512 - y*32

 

And using our otimisation secret we get:

(y<<9) - (y<<5)

 

So putting it all together gives:

screen[(y<<9) - (y<<5) + x] = colour;

 

As remember binary shifts are extremely simple operations, and hence extremely fast...faster then the multiplication operation.

 

 

Loops....

Not inside the loop...

One of the first things you need to remember with loops, is try and not put things inside the loop that you don't have to.... a good and a bad example will help demonstrate this:

 

Bad example:

// bad loop example

for(i=0; i<10; i++)

{

      int j = 2; // putting declarations in the loop when you don't have to

      aa[i] = j;

}

 

Better example:

// better loop example

int j = 2;

for(i=0; i<10; i++)

{

      aa[i] = j;

}

 

 

Of course we could put aa[i]=2 etc in the loop...but I was just demonstrating, that you should try and not put declarations or anything in the loop if you can avoid it.

 

Loop unrolling..

Now taking this next loop as an example of how unrolling our code, can improve its speed

//simple loop

int indx=0;

for(i=0; i<=40; i++)

{

      aa[indx++].value = true;

}

 

Using the wonderful trick of unrolling, we can improve the speed of our code.

//unrolling the loop to improve speed

int indx=0;

for(i=0; i<=40; i+=2)

{

      aa[indx++].value = true;

      aa[indx++].value = true;

}

 

Why is this faster?... as each time around the loop we do a test to see if its finished, but with our improved code, where only doing it half as many times.  As the code would consist of a conditional jump and a jump condition.

 

 

Loop flipping..

At first this might seem a bit tricky to understand, but its to do with how the cpu does loops...

 

// using loop without-flipping

int idx=0;

for(i=0; i<80; i+=2)

{

      aa[indx++].value = true;

      aa[indx++].value = true;

}

 

// using loop with-flipping

int idx=0;

i=0;

do

{

      aa[indx++].value = true;

      aa[indx++].value = true;

      i+=2;

}while(i<80);

 

Flipping as we've done above, gets rid of the initial conditional jump at the very start of our loop (first time), and gets ride of one of those jump instructions from inside the loop. We only have one conditional jump instruction at the end.

 

With flipping and unrolling we've improved our execution speed!

 

NOTE: On a 486 upwards... an ADD costs one cycle, while an IMUL costs between 13 and 42 cycles!.  Thats worth noting :)

 

Further Loop Unrolling

 

[Under Construction]

 

 

Rounding Numbers the Fast Way

A nice example of where you may want to round a number, is when maybe you want to have your graphics snap to grid?..hmmm??

This speed boost, will only work with numbers that are a power of 2 as with most optimisation tricks... We do this by AND'ing our number with a bit mask.

 

An example is always the best way I think... If you wanted to round a number to the nearest value of 4, you AND the number with 0xFC.  In binary this looks like 1111 1100. 

 

Modulus %..speed up

Well this optimisation trick, only works on modulus numbers that are a power of 2, and is relatively easy to implement.

How it works, is you take the modulus number and subtract 1 from it, you then AND it with your number...

e.g.

28 % 8 = 4

Is the same as:

28 & (8-1) = 4

 

Again it might seem a bit worrying how this work, but if you look at it at the binary level...it soon starts to fall into place.  For example, 8-1 gives 7, and the binary of 7 is111.

So

28 -> 11100 binary

7   -> 00111 binary (e.g. 8-1)

      & 00100  -> 4

 

Using #define to optimise your code

This is a sweet optimisation trick, as each time you call a function its costing you a small amount of processor time...even if its small.  But if your calling this function from a loop, or the function is very small then why not use the #define statement.  This will put your code inline and uses the pre-processor.

example:

 

#define fixetoint(x) ((x) << 8 )

 

 

Register Variables ...eax...ebx...

There is a keyword in C called "register" which informs the compiler to try and have this variable use one of the internal CPU registers and not a a memory location.  And as you should know, nothings faster than the CPU registers...so you would do you code like this:

 

register int i;

 

But remember if you over use the register keyword it can also slow your code down!  But used wisely can give a 10-15% speed improvement :)

 

Sweet Tricks

Usually with experience, and of course time, you learn all sorts of sneaky tricks that you can put into your code.  This is where we go over some of these here.

Lets say you want to swap two variables.... int a and int b.  How would you go about this?  Well the traditional way would be:

int a = 2;

int b = 3;

int temp;

 

temp = a;

a = b;

b = temp;

 

But this requires the use of another register/memory location called temp... How would we do it without temp? Here is the answer:

 

#define SWAP(a, b) \

            a ^= b;  \

            b ^= a;  \

            a ^= b;

 

Where of course, ^ is exclusive OR, and a^=b would be the same as a=a^b;

 

And for those that are a bit sceptical, we'll do a demo with some simple binary :-

a = 11

b = 10

 

a ^= b   =>  a=01

b ^= a   =>  b=11

a ^= b   =>  a=10

 

After the operation:

 

a=10

b=11

 

Almost like magic :)

 

 

A very similar method, to the above one for swapping... is as shown:

#define SWAP(a, b) \

                x = x-y;  \     // (comment: would be the difference)

                y = y+x; \     // (comment: alter y depending on the difference)

                x = y-x;        // (comment: alter x by the difference to know get y)

Of course, I prefair the XOR method myself.... but I thought I'd give it a quick mention, as I remember seeing it once... and its always worth knowing :)

 

 

Quick Division By 4

Sometimes it important to know if a number is exactly divisible by 4...especially with 32 bit machines.... a quick way to do this, without resorting to modulus is to use a bit of binary arithmetic:

 

UINT i;

 

if( i & 3 ) // if false, then the number will divide exactly into 4

 

Lets look at the last 2 bits to see what they mean:

 

00 - 0

01 - 1

10 - 2

11 - 3

 

5   = 0000 0101   //   5/4=1.25

8   = 0000 1000   //   8/4=2

50 = 0011 0010   //   50/4=12.5

60 = 0011 1100   //   60/4=14

 

Note: I mention it here...but the fact of the matter, if you look at the simplification of the modulus operation (%) earlier...you'll notice that its the same.

 

 

[More To Be Added]

"inline" keyword

 

 

 

 

 

 

 

 

 

 

 

 
Advert (Support Website)

 
 Visitor:
Copyright (c) 2002-2024 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.