Wednesday July 2, 2025

Home | Contact | Support | Tutorials - Tips - Secrets - for C and C++.. It does all the hard work for us..

Tutorials - Tips - Secrets - for C and C++..

It does all the hard work for us..

Optimising Your C

by bkenwright@xbdev.net

C/C++ Code Optimization Techniques

Everyone learning C or even C++ should know the basics of optimizing your code. No matter how fast your CPU gets, you always want to squeeze more performance out of it. Let's go through some common optimization techniques that are useful to know and might improve your code.

Lightning Fast Multiplication and Division

When someone first asked me, "How can you improve this line of code?"

<?php
int i = 2;
i = i * 4;

int i = 2;
i = i * 4;

You might think it's already optimal - but binary arithmetic comes to the rescue! A binary shift left (`<<`) or right (`>>`) uses less CPU time than arithmetic multiplication.

- Shifting left (`<<`) multiplies by powers of 2
- Shifting right (`>>`) divides by powers of 2

Powers of 2:

<?php
2, 4, 8, 16, 32, 64, 128, 256, ...

2, 4, 8, 16, 32, 64, 128, 256, ...

So our example becomes:

<?php
i = i << 2;  // Equivalent to i * 4

i = i << 2; // Equivalent to i * 4

Real-world Example

In DOS or console game programming (GBA, XBOX), you might access memory directly as an array of pixels. To set a pixel at (x, y):

<?php
screen[y * 480 + x] = colour;

screen[y * 480 + x] = colour;

480 isn't a power of 2, but we can break it down:

<?php
512 - 32 = 480
y * (512 - 32) = y * 512 - y * 32

512 - 32 = 480
y * (512 - 32) = y * 512 - y * 32

Using our optimization:

<?php
screen[(y << 9) - (y << 5) + x] = colour;

screen[(y << 9) - (y << 5) + x] = colour;

Binary shifts are extremely fast - faster than multiplication.

Loop Optimization

Avoid Unnecessary Operations Inside Loops

Bad example:

<?php
// Bad loop example
for (i = 0; i < 10; i++) {
    int j = 2;  // Declaration inside loop
    aa[i] = j;
}

// Bad loop example
for (i = 0; i < 10; i++) {
int j = 2; // Declaration inside loop
aa[i] = j;
}

Better example:

<?php
// Better loop example
int j = 2;
for (i = 0; i < 10; i++) {
    aa[i] = j;
}

// Better loop example
int j = 2;
for (i = 0; i < 10; i++) {
aa[i] = j;
}

Loop Unrolling

Original loop:

<?php
int indx = 0;
for (i = 0; i <= 40; i++) {
    aa[indx++].value = true;
}

int indx = 0;
for (i = 0; i <= 40; i++) {
aa[indx++].value = true;
}

Unrolled version:

<?php
int indx = 0;
for (i = 0; i <= 40; i += 2) {
    aa[indx++].value = true;
    aa[indx++].value = true;
}

int indx = 0;
for (i = 0; i <= 40; i += 2) {
aa[indx++].value = true;
aa[indx++].value = true;
}

This is faster because we reduce the number of loop condition checks by half.

Loop Flipping

Original:

<?php
int idx = 0;
for (i = 0; i < 80; i += 2) {
    aa[indx++].value = true;
    aa[indx++].value = true;
}

int idx = 0;
for (i = 0; i < 80; i += 2) {
aa[indx++].value = true;
aa[indx++].value = true;
}

Flipped version:

<?php
int idx = 0;
i = 0;
do {
    aa[indx++].value = true;
    aa[indx++].value = true;
    i += 2;
} while (i < 80);

int idx = 0;
i = 0;
do {
    aa[indx++].value = true;
    aa[indx++].value = true;
    i += 2;
} while (i < 80);

This eliminates the initial conditional jump and reduces jump instructions inside the loop.

Note: On a 486 and later processors, an `ADD` costs 1 cycle while an `IMUL` costs 13-42 cycles!

Rounding Numbers the Fast Way

To round to a power of 2, AND the number with a bit mask. For example, to round to the nearest multiple of 4:

<?php
number & 0xFC  // 0xFC = 1111 1100 in binary

number & 0xFC // 0xFC = 1111 1100 in binary

Fast Modulus Operation

For modulus with powers of 2:

<?php
28 % 8 = 4

28 % 8 = 4

Can be replaced with:

<?php
28 & (8 - 1) = 4  // Because 8-1 = 7 (binary 111)

28 & (8 - 1) = 4 // Because 8-1 = 7 (binary 111)

Using #define for Optimization

Small functions can be replaced with macros to avoid function call overhead:

<?php
#define fixetoint(x) ((x) << 8)

#define fixetoint(x) ((x) << 8)

Register Variables

Use the `register` keyword to suggest the compiler use CPU registers:

<?php
register int i;

Used wisely, this can give 10-15% speed improvement, but overuse can slow things down.

Clever Tricks

Variable Swap Without Temporary Variable

Traditional method:

<?php
int a = 2;
int b = 3;
int temp;

temp = a;
a = b;
b = temp;

int a = 2;
int b = 3;
int temp;

temp = a;
a = b;
b = temp;

Optimized version using XOR:

<?php
#define SWAP(a, b) \
    a ^= b;        \
    b ^= a;        \
    a ^= b;

#define SWAP(a, b) \
    a ^= b;        \
    b ^= a;        \
    a ^= b;

Alternative method:

<?php
#define SWAP(a, b) \
    x = x - y;     \
    y = y + x;     \
    x = y - x;

#define SWAP(a, b) \
    x = x - y;     \
    y = y + x;     \
    x = y - x;

Quick Check for Divisibility by 4

<?php
UINT i;
if (i & 3)  // If false, the number is divisible by 4

UINT i;
if (i & 3) // If false, the number is divisible by 4

This works because the last two bits represent:

<?php
00 - 0
01 - 1
10 - 2
11 - 3

00 - 0
01 - 1
10 - 2
11 - 3

Examples:

<?php
5   = 0000 0101   // 5/4=1.25
8   = 0000 1000   // 8/4=2
50  = 0011 0010   // 50/4=12.5
60  = 0011 1100   // 60/4=15

5   = 0000 0101   // 5/4=1.25
8   = 0000 1000   // 8/4=2
50  = 0011 0010   // 50/4=12.5
60  = 0011 1100   // 60/4=15

Note: This is similar to the modulus optimization mentioned earlier.

Other types of optimization - include using the keyword `inline` to be put in place - however, the compiler still makes the final call.

Advert (Support Website)

Visitor: