Checking the Performance of FindArray
FindArray Example
Let’s create a program that shows how a sample C++ compiler generates code for a function
named FindArray. Later, we will write an assembly language version of the function, attempting
to write more efficient code than the C++ compiler. The following FindArray function (in C++)
searches for a single value in an array of long integers:
bool FindArray( long searchVal, long array[], long count ) { for(int i = 0; i < count; i++) { if( array[i] == searchVal ) return true; } return false; }
Linking MASM to Visual C++
Let’s create a hand-optimized assembly language version of FindArray, named AsmFindArray.
A few basic principles are applied to the code optimization:
• Move as much processing out of the loop as possible.
• Move stack parameters and local variables to registers.
• Take advantage of specialized string/array processing instructions (in this case, SCASD).
We will use Microsoft Visual C++ (Visual Studio) to compile the calling C++ program and
Microsoft MASM to assemble the called procedure. Visual C++ generates 32-bit applications that
run only in protected mode. We choose Win32 Console as the target application type for the examples
shown here, although there is no reason why the same procedures would not work in ordinary
MS-Windows applications. In Visual C++, functions return 8-bit values in AL, 16-bit values in AX,
32-bit values in EAX, and 64-bit values in EDX:EAX. Larger data structures (structure values,
arrays, etc.) are stored in a static data location, and a pointer to the data is returned in EAX.
Our assembly language code is slightly more readable than the code generated by the C++
compiler because we can use meaningful label names and define constants that simplify the use
of stack parameters. Here is the complete module listing:
TITLE AsmFindArray Procedure (AsmFindArray.asm) .586 .model flat,C AsmFindArray PROTO, srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD .code ;----------------------------------------------- AsmFindArray PROC USES edi, srchVal:DWORD, arrayPtr:PTR DWORD, count:DWORD ; ; Performs a linear search for a 32-bit integer ; in an array of integers. Returns a boolean ; value in AL indicating if the integer was found. ;----------------------------------------------- true = 1 false = 0 mov eax,srchVal ; search value mov ecx,count ; number of items mov edi,arrayPtr ; pointer to array repne scasd ; do the search jz returnTrue ; ZF = 1 if found returnFalse: mov al,false jmp short exit returnTrue: mov al, true exit: ret AsmFindArray ENDP END
Checking the Performance of FindArray
Test Program It is interesting to check the performance of any assembly language code
you write against similar code written in C++. To that end, the following C++ test program
inputs a search value and gets the system time before and after executing a loop that calls
FindArray one million times. The same test is performed on AsmFindArray. Here is a listing
of the findarr.h header file, with function prototypes for the assembly language procedure and
the C++ function:
// findarr.h extern "C" { bool AsmFindArray( long n, long array[], long count ); // Assembly language version bool FindArray( long n, long array[], long count ); // C++ version }
Main C++ Module Here is a listing of main.cpp, the startup program that calls FindArray and
AsmFindArray:
// main.cpp - Testing FindArray and AsmFindArray. #include <iostream> #include <time.h> #include "findarr.h" using namespace std; int main() { // Fill an array with pseudorandom integers. const unsigned ARRAY_SIZE = 10000; const unsigned LOOP_SIZE = 1000000; long array[ARRAY_SIZE]; for(unsigned i = 0; i < ARRAY_SIZE; i++) array[i] = rand(); long searchVal; time_t startTime, endTime; cout << "Enter value to find: "; cin >> searchVal; cout << "Please wait. This will take between 10 and 30 seconds...\n"; // Test the C++ function: time( &startTime ); bool found = false; for( int n = 0; n < LOOP_SIZE; n++) found = FindArray( searchVal, array, ARRAY_SIZE ); time( &endTime ); cout << "Elapsed CPP time: " << long(endTime - startTime) << " seconds. Found = " << found << endl; // Test the Assembly language procedure: time( &startTime ); found = false; for( int n = 0; n < LOOP_SIZE; n++) found = AsmFindArray( searchVal, array, ARRAY_SIZE ); time( &endTime ); cout << "Elapsed ASM time: " << long(endTime - startTime) << " seconds. Found = " << found << endl; return 0; }
Assembly Code versus Nonoptimized C++ Code We compiled the C++ program to a
Release (non-debug) target with code optimization turned off. Here is the output, showing the
worst case (value not found):
Assembly Code versus Compiler Optimization Next, we set the compiler to optimize the
executable program for speed and ran the test program again. Here are the results, showing the
assembly code is noticeably faster than the compiler-optimized C++ code:
Pointers versus Subscripts
Programmers using older C compilers observed that processing arrays with pointers was more efficient
than using subscripts. For example, the following version of FindArray uses this approach:
bool FindArray( long searchVal, long array[], long count ) { long * p = array; for(int i = 0; i < count; i++, p++) if( searchVal == *p ) return true; return false; }
Running this version of FindArray through the Visual C++ compiler produced virtually the
same assembly language code as the earlier version using subscripts. Because modern compilers
are good at code optimization, using a pointer variable is no more efficient than using a subscript.
Here is the loop from the FindArray target code that was produced by the C++ compiler:
$L176: cmp esi, DWORD PTR [ecx] je SHORT $L184 inc eax add ecx, 4 cmp eax, edx jl SHORT $L176
Your time would be well spent studying the output produced by a C++ compiler to learn about
optimization techniques, parameter passing, and object code implementation. In fact, many computer
science students take a compiler-writing course that includes such topics. It is also important to
realize that compilers take the general case because they usually have no specific knowledge about
individual applications or installed hardware. Some compilers provide specialized optimization for a
particular processor such as the Pentium, which can significantly improve the speed of compiled
programs. Hand-coded assembly language can take advantage of string primitive instructions, as
well as specialized hardware features of video cards, sound cards, and data acquisition boards.