ACM Student Chapter Newsletter

Welcome to the Student Chapter of the ACM
Processors - Finding the Right Speed for Your Needs
Advanced Programming Techniques: Using Inline Assembler
Internet Entrepeneurs

Advanced Programming Techniques: Using Inline Assembler

By Zev Wolman

What every true programmer desires most is to produce the fastest, most efficient running program from his code. He does this both as a challenge to himself and to impress his colleagues (although he most often just succeeds in thoroughly confusing them). In general, the lower the level of the language he uses to write his code, the more efficient and powerful he can make his final program. It follows then, that learning an efficient, low-level programming language can greatly increase the devoted programmer's arsenal of programming techniques. The most efficient language besides for binary machine code itself is called assembly language, or simply assembler. Although it is not often used today as a primary programming language, it still provides a great way to produce the most powerful code.

What makes assembly language distinct is that it replaces every possible machine code instruction with a command, called a mnemonic. Each mnemonic, therefore, represents the smallest possible task the computer can carry out in a single clock cycle. It seems that people are much better at putting these miniscule instructions together efficiently than compilers are (although compilers are slowly catching up). It is the versatility of being able to manipulate code at the machine's level, which gives assembler its efficiency, power, and speed compared to high-level languages.

The downside of assembler is that it is a difficult language to write and is nearly impossible to understand without extensive commenting. Assembler programs, while smaller and faster, are prone to errors which are almost always difficult to trace. Plus, assembler is almost completely non-portable. Once upon a time, when computer memory and speed were more than most could afford, it was necessary to code entire programs in assembler in order to achieve adequately powerful results. But through the amazing power of Moore's Law, computers have become fast enough and capacious enough that most programmers don't consider efficiency at all when coding their programs. (This would explain why every time I scrape enough money together to buy a bigger hard drive, somebody comes out with a program that fills it completely).

But increases in technology have not made assembler completely obsolete. Certain types of programs that still require the most efficient code, such as device drivers, are still usually written in assembler. Additionally, most compilers allow you to link your code together with modules written in assembler, allowing you write efficient assembler procedures while writing mostly in the language of your choice. But best of all, many compilers even allow you to insert assembler commands directly into your own code (using a built-in "inline assembler"). What does this mean for you? In most programs, about 90% of the work is done by only 10% of the code, which means that modifying just a small portion of your code using assembler can drastically improve the performance of your entire program. Both C and C++ allow you to use inline assembler. That is the reason C and C++ are often called mid-level languages-they offer you the best features of both high- and low-level languages.

Of course, by now you must be wondering how you can do this yourself. Before we start, it is extremely important to stress you must know what you are doing. It is possible that jumping in without proper preparation can detrimentally affect your computer (not to mention your wallet and your sanity). You should have a working knowledge of Intel's 80x86 architecture (registers etc.) and at least a basic knowledge of standard assembler language (Microsoft Assembler, Turbo Assembler). In Borland/Turbo C++, the assembler command is proceeded by the keyword asm. (I have also seen compilers that insist that you use "_asm", but the idea is the same). You can also write multiple assembler instructions by following the asm keyword with braces and putting them inside. Otherwise, you can pretty much insert assembler commands as if you were working with an actual assembly language assembler. The book used in the methodology course offered by Touro College, Mastering Turbo Assembler by Tom Swan, second edition, provides a thorough tutorial of assembler and rudimentary but adequate information about writing inline assembler code.

The following is a simple program that demonstrates inline assembler by displaying a simple "Hello World" message on the screen.

void main(void)
{
char *message =
"Hello World\n\r$";
asm mov DX, [message]
asm mov AH, 09
asm int 21h
}

The program starts out like any other C++ program. A pointer-to-characters called message is created using regular C/C++ notation. (The extra \r and $ are needed because our program uses a DOS function to display an ASCII$ string, a string terminated by a $ instead of a NULL character). The second instruction, asm mov DX, [message], is an inline assembler command to move the contents of message (the address of the first character) into the DX register. We can do this even though message is a C/C++ variable and not an assembler variable. The next instruction, asm mov AH, 09, moves the value 9 into the AH register. The last instruction, asm int 21h, calls interrupt 21h,which is a program controlled by the operating system. The value in AH specifies function 9 which displays the characters in memory pointed to by DX as an ASCII$ string. Notice that no #include statements are necessary since the operating system provided us with our output routine (interrupt 21h, function 09h). The resulting .exe file was only 6264 bytes when I compiled this program using Borland C++ 3.1 for DOS. In comparison, a program I wrote that does the same thing but instead uses cout was almost 24,000 bytes in size (and presumably runs much slower as well).

Using inline assembler offers several advantages to writing standalone assembler code aside from the power and versatility provided by the language itself. Inline assembler lets you use regular C/C++ type variables the same way you would assembler variables. Or, if you prefer, you can still use assembler type variables. You can use software, such as Borland Profiler, to locate the portions of your program that require the most improvement, and concentrate on those sections.

The limitations of inline assembler beyond those already mentioned about regular assembler include the inability to use assembler-type labels, but this can be remedied by using C/C++ type labels instead. Additionally, you don't get the full efficiency of your assembler code since the compiler automatically inserts several instructions before and after each function, but the effect this has on your code is minimal. Also, mixing C/C++ syntax and variables can be confusing at times but this can be overcome through practice and a good reference manual.

As you can see from the example, using inline assembler can greatly increase the efficiency of your programs. However, many of the pitfalls of pure assembler programming, such as non-portability and difficult debugging, still apply here. In addition, it is generally frowned upon to use this technique in most programming situations for obvious reasons. But that shouldn't stop the true programmer at heart from indulging in it for his own pleasure. Happy coding.

top of this page