xbdev - software development
Wednesday February 21, 2024
home | contact | Support

Optimising Your String Searching

by bkenwright@xbdev.net


Lets say you have a string....a big string... a text file for example, and want to search for a sub string....well a single example of how to do this would be:


#include <string.h>


void main()


      char textfile[] = "no thing as vaguenono as something.";


      char * place = strstr( textfile, "nono" );


      // place should point to "nono as something."; as its the start

      // of the first occurence of "nono"

      }// End main()


Its only a simple example.....but you get the idea :)  We are going to look at one method of optimising our search string routine...building a more optimised version.  It is based on a method developed by 'Boyer and Moore' and also in a number of optimisation books/sites.

One of the best recommendations of books is: C++ Foodtprint and Performance Optimisation by Rene Alexander.


The principle is simple - instead of going char by char and checking for a string search...we can jump ahead a calculated number of characters depending on the string where searching for.


Here is the code to take a loosky at:


Code: main.cpp


 *  Simple String Find Optimisation


 *  Saying: Understanding!



#include <stdio.h>

#include <string.h>



 *  A basic plateform independent C search sub string algorithm

 *  referenced from the linux 2.2.7 source code for

 *  demonstration purposes here


 *  my_strstr(..) and not strstr(..)

 *  Well I called it my_strstr(..) so that if you are using #includes(..)

 *  and its already defined ...might give errors...so this way we know

 *  its ours :)  As strstr(..) is also defined in string.h



char * my_strstr(const char * s1,const char * s2)


      int l1, l2;


      l2 = (int)strlen(s2);

      if (!l2)

            return (char *) s1;

      l1 = (int)strlen(s1);

      while (l1 >= l2) {


            if (!memcmp(s1,s2,l2))

                  return (char *) s1;



      return NULL;

}// End strstr(..)




 *  Our new set of functions for searching for our sub string.




static void quick_strstr_init(const char *str, unsigned char * skips )


      unsigned char search_len = strlen(str);


      search_len = (unsigned char)strlen(str);

      int search_len2 = search_len + 1;     // The char past the end of the string


      // For chars no in searchstr.

      for(int i=0; i<256; i++)

            skips[i] = search_len2;            // search_len + 1

                                           // We didn't use (search_len+1) here

                                             // in the loop, as where talking optimisation

                                             // here, and we keep as much as we can out

                                             // of the loop


      // For chars in serachstr, with double chars only

      // right most survives

      for( i=0; i<search_len ; i++)

            skips[str[i]] = search_len - i;   // pos back to front

                                             // We have a count down!...e.g

                                             // our string = "cat"

                                             // skip['c'] = 3

                                             // skip['a'] = 2

                                               // skip['t'] = 1

      // If we have multiple occurences of a letter, we use the smallest

      // skip value.  Don't want to mess up or optimisation with multiple

      // char offsets


}//End init



char * quick_strstr_search(const char *textf, char *searchstr, unsigned char * skips)


      int search_len      = (int)strlen(searchstr);      // Len of our sub string


      int len             = (int)strlen(textf);          // Length of text file

                                                         // where searching

      unsigned char * end = (unsigned char*)textf + len; // Pointer to the end of

                                                         // the text file.


      int len_searchstr   = search_len - 1;              // Sub search string minus

                                                         // 1, which points to the

                                                         // last char, and not the '\0'


      int j;                                             // Variable used in our loop


      // for(;;) generates slightly more otimisation than while(true)



            // main loop for comparing string

            j=len_searchstr;                                // len of search string

            while( textf[j] == searchstr[j] && --j >= 0);   // MAIN Search Loop


            // At this point we either have or have-not found our string after

            // searching the length of the sub string, so how far along do we move

            // in our main string where searching

            // j==-1 if we have

            // j!=-1 if we've not

            if( j!= -1 )


                  // Mismatch; align and check for end of textf

                  unsigned char * skipindex = (unsigned char*)textf + search_len;


                  if( skipindex >= end )

                        return NULL; // string not in textf


                  textf += skips[*skipindex];




                  //Match found

                  return (char*)textf;

            }//End if/else

      }//End for loop



      return NULL; // No sub string found


}//End search(..)



// This is the function we call.  And does all the work on initialising the

// skip array and doing calling the search function.  You could put it

// all into one function - but I thought it would be better to break it

// up like this - so you could see how it works more easily.

char * quick_strstr(char * szstr, char * szsubstr)


      // Quick Error Check at the start


            (szstr == NULL) ||

            (szsubstr==NULL) ||

            (strlen(szsubstr) > strlen(szstr))

         ) return NULL;


      // Do our magic search

      unsigned char skips[256];


      quick_strstr_init(szsubstr, skips);


      char * place = quick_strstr_search( szstr, szsubstr, skips);


      return place;

}// End quick_strstr(..)




void main()




      char textfile[] = "no thing as vaguenono as something.";


      char * place = quick_strstr( textfile, "nono" );

      if( place != NULL )



      printf("\nEnter to cont");



}//End main()




Where basically doing what we would do with strstr(..)...and search char by char....but we have added an if/else in the search code...which determines how far ahead we can jump depending on which char difference was found.




Advert (Support Website)

Copyright (c) 2002-2024 xbdev.net - All rights reserved.
Designated articles, tutorials and software are the property of their respective owners.