Nov 29
2009

Assembler to C

Posted by Walter Bright in Assembler Programming

WalterBright

Back in the bad old DOS days, to have a large, capable application meant writing it in assembler because:

1. hand written assembler was often more than twice as fast as C
2. with very tight memory, the smaller the app the more room there was for data
3. compilers in the early days were rather poor

Early successful apps like Lotus 1-2-3 (and of course, DOS itself) were written in assembler.

But, as things got better (especially with the arrival of protected mode) the advantages of assembler started getting outweighed by its disadvantages, i.e. the high cost of developing in it. One was faced with a choice - rewrite the app from scratch, or translate it from assembler to C. Translating had the huge advantage of one didn't lose all the "lore" about how things need to work that gets embedded in the code.

I had the job at the time of converting a huge (and very successful) electronic schematic editor, DASH, from assembler into C. In C it could then be recompiled for 32 bits, and even ported to other platforms like the Sun workstations. The conversion took months, but was a big success.

Today, I am faced with the linker, Optlink. It was originally written in 16 bit assembler, and was also very successful. It survived the transition to 32 bits by being recoded, line by line, by its original author Steve Russell. The linker ran like lightning, and still does. No other linker remotely comes close. Every line of it is heavily optimized, and even scheduled. It's a marvel of assembler coding skill.
But it being in assembler means its stuck in the past: inflexible, unfixable, unportable. The time has come to do something about it.

It's sort of like keeping a P-51 fighter in flying condition these days - the engineers who designed it are all gone, the manufacturer no longer makes any parts for it, the drawings are gone, and the people who built them are gone. Pretty much, you're all on your own, figuring things out, machining any parts needed, etc. Of course, it's all worth it because a flying P-51 is a pretty sweet machine!

I figure the best thing to do is to translate it to C, as linkers have an awful lot of "lore" in them about how things work in the real world rather than how they are documented. I estimated that relearning that lore would take a huge amount of time, and would take an unacceptable toll on the users while they struggled with outlying problems. (Although few will be called upon to convert assembler to C, converting from one language to another often comes up and many of the issues are the same.)

Why use C instead of the D programming language? Certainly, D is usable for such low level coding and, when programming at this level, there isn't a practical difference between the two. The problem is that the system to build Optlink uses some old tools that only work with an old version of the object file format. The D compiler uses newer obj format features, the C compiler still uses the old ones. It was just easier to use the C compiler rather than modify the D one. Once the source is all in C, it will be trivial to shift it over to D and the modern tools.

After a few false starts, I remembered the lessons from converting DASH, which are:

1. Convert a small slice at a time, then build and test. If it broke, you can substitute back in the old asm code, and then figure out where you went wrong as the problem area is small.

2. Don't try and redesign or fix anything during this process. It's terribly hard sometimes to resist this urge, but it always ends up badly if you succumb. It becomes a hunk o' junk that doesn't work and you've no idea why.

3. Doing it a bit at a time means you always have a working version of the app. This is fairly motivating, rather than having a long desert of nothing working and no idea if you've committed some obscure and fundamental error.

4. While not fixing the design in the process, it is very helpful to add in comments as you figure things out, for the next guy who has to translate it from C onwards!

Problem areas:

1. Optlink did not link with any other library, not even the C standard library. So by using C, it had to be "naked" C that used nothing from the library, not even the startup code. This meant I had no printf, which turned out to be so unbearable I grabbed the printf source from the C library and hacked it up to work in Optlink (Optlink didn't have FILE or stdout).

2. The C compiler prepends _ to global identifers, so they wouldn't match with Optlink ones. The easiest solution was to hack the C compiler to stop doing that.

3. Optlink used specially named sections. Fortunately, the C compiler had a switch to name the code sections to match.

4. Function calling sequence. Optlink uses about every function calling sequence imaginable, from registers to stack to even the flags register. Converting any function to C first required figuring out what its (usually undocumented) calling convention was, and what registers it used/destroyed, and recoding it to use the C convention.

5. It was usually easiest to use local variable names with the same name as the register they replace. So the conversion has a lot of locals with names like EAX, ESI, etc. This makes it easy to compare the two sets of sources. Once the C version is verified, more useful names can be used.

6. Gotos. Of course, assembler has no structured statements. Functions are a rat's nest of goto's. The easy way is just leave the goto's intact in the C source. After it all works, then see about replacing them with structured control flow.

7. Accurately typing variables. In particular, C will helpfully multiply the offset to a pointer by the size of the type pointed to. In asm, this is done manually. So one has to figure out the type pointed to, divide the offset by that value, and put that in the C source. This is very easy to get wrong, so I like to verify by compiling then disassembling and comparing the instructions. That's easier than debugging random crashes.

8. Macros and conditional assembly. It's been a looong time since I've used MASM, and how that all works has gone out of my head. It's easier to cut through the chaff and just disassemble the output of the assembler to see what the real instructions are. This also helps to verify that you've got the offsets of struct members the same as in the assembler.

9. Optlink has no unit tests. Writing them would require understanding what the various functions do, and I won't know precisely what they are supposed to do until most of the program is converted.

Once it is all in C, then the old build system can be scrapped, the special compiler switches and hacks removed, it can be hooked up to the C standard library, and then built as a real C app.

Then, it can be easily converted to a D app and the possibilities are unlimited.

Conclusion

Most people would consider this an exercise in self-flagellation, but I enjoy doing it.

I don't know yet how this is all going to work out, as I am only part way through the process. It'll take a long time. But I'm looking forward to freeing Optlink from its heritage so it will be useful for another 10 years.

Early results, though, are the C source code is about 30% smaller than the corresponding ASM code, and the C object code is about 30% larger than the ASM object code. That's in line with historical experience I've had working with C and assembler.

The difference in object code size is primarily due to the assembler having done register assignments that cross over multiple function calls. There's just a lot less pushing and popping of parameters. There are also a lot of functions with multiple entry points. I don't know of any current compiler that is able to enregister across a flow graph of function calls, or is able to tail merge multiple functions.

Although I have not run any speed tests, I expect the performance of the non-I/O bound code to be about 30% slower. Since a linker tends to be I/O bound, the actual performance loss probably will be about 10%, which I can live with.

Thanks to David Held and Bartosz Milewski for reviewing this.



Comments (1)Add Comment
...
written by Jack Woehr, November 30, 2009
Most people would consider this an exercise in self-flagellation, but I enjoy doing it.

Worthy of Steven Wright.

Write comment
You must be logged in to a comment. Please register if you do not have an account yet.

busy

Get your FREE Subscription to Dr. Dobb’s Digest today!

Dobbs Code Talk Quick Poll

This time next year, your most important operating system (host and/or target) will be:

Look Who's Code Talking


George Rappolt
City: Bedford

Tami Carter
City: San Francisco

Med Nation
City: scottsdale

Vedant Kidambi
City: Hyderabad

Iñigo Barrera
City: L'Hospitalet de Llobregat

Hans Uhlig
City: Woodland

Dobbs Code Talk Tags

.NET abstraction Ada Adobe Agile Ajax algorithm Algorithmic complexity ALM Analogical reasoning Android Anecdotes Apple Application Development AppStore Architecture and Design ARM Artificial Intelligence Artificial Life Assembler Programming Audio files AVX AWK Banking Bazaar Best Practices Blender Books Brain computer interfacing Build C C Programming C Sharp Cartoon Category theory Cellular automata Clojure Cloud Computing Cobol Cocoa Coder Of The Month Cognition as compression Collaboration Common Process/Frameworks Compilers Computational humour Computational narrative Computational politics Computer Science Computers in art computing pioneers concurrency Conferences Consciousness research Contest Contest140 contests CPlusPlus crime CSharp D Programming Data Centers Databases Debugging Delphi Deployment design Design Patterns Digital Signal Processing Distributed Django Documentation DSL dynamic language Eclipse EDA education Emacs Embedded Systems Encryption engineering Erlang Etymology Excel exception handling Facebook Financial computing Five Questions Flash Flash Lite Flex Forth Fortran Fraud FreeBSD Fun Functional Programming gadgets Games Gender Git gnuplot Go Google Graphics GUI hardware Heron High School High-Performance Computing History Holographic reduced representations HTML5 Humanity Humour Hungarian Notation Identity Inkscape Innovation Intel Interview iPhone J2EE Java JavaFX JavaOne JavaScript language engineering Legal lex LINQ Linux Lisp Literate Programming Logic Programming m4 Mainframes Make Mathematica Mercurial Mesh messaging Metaprogramming Microsoft MID Miscellaneous Musings ML Mobile Software Mobility modeling modular programming multicore Music MVC myblog Natural Language Processing Networking Neural networks newspeak Nokia numerical computing Object Rexx ObjectiveC Office Office 2007 Online spreadsheets OOP Open Source Openaccess publishing OpenBSD OpenSolaris Operating Systems Optimization Oracle Pair Programming Parallelism Concurrency Parsing Pascal Patents Patterns Performance Perl PHP Podcast Pop11 Poplog Privacy Processing Productivity Programming Language Implementation Programming Language One Programming language semantics Programming Languages Programming Style Project Management Prolog Psychology Public understanding of science puzzle Python QA Quantum Computing Quotes Rails Realtime recls Requirements Research practice REST Review RIA rich internet applications Robotics Ruby SaaS Software as a service Scala Schadenfreude Science fiction Screencast Scripting SD Best Practices Search Security Semantic Web Silverlight Snobol SOA social Social Networks Society for the Study of Artificial Intelligence a Software Development Methodology and Management Songs and poems Spending Priorities Spreadsheets SQL Startups Statistics Storage String pattern matching Survey Teaching Testing The Business of Programming The Dobbs Challenge The Future Theory Topology Transhumanism Travel on the Job Twitter Types Unix Upgrade Usability Use Cases USENET User Experience User Interface Design Version Control video virtual machines Virtualization Visual Studio Visual Studio Sponsored Post WCF Web Development Windows Windows 7 Windows Live Wireless WOA WPF X Window System yacc

Subscribe to Dr. Dobbs Newsletter

Email:
Dr. Dobb's Update
Delivered twice a week, Dr. Dobb's Update provides unbiased and objective news, commentary and technical features spanning the entire software development marketplace.

Latest Comments

Jonathan's Last Day at Sun
For the 8 years I worked there, it was fantastic. I worked there under McNealy and I have undying admiration for the guy. I only knew Jonathan periphe...
Implementing Thread Local Storage on OS ...
Back in the day, I did a fair amount of work with PThreads. Wonderful design. Some quirks, but basically really, really nice. Although I wrote a lot ...
More Technonecrophilia with Snobol One-L...
Yeah, It's probably identical except for the (embedded) copy number, I would think. Once it became freely distributable, the copy I've been distribut...
More Technonecrophilia with Snobol One-L...
There's a spitbol-3.7-win.exe at http://code.google.com/p/spitbol/downloads/list . I found it via Dave Shield's blog page http://daveshields.wordpress...
Jonathan's Last Day at Sun
Sadness.

The Latest From Our Member Blogs

How To Select Trainees
Written by Joel Wiesen   
01/27/10
Hiring the right trainee can be harder than hiring a trained programmer.  One approach is described at my website: http://www.aprtestingservices.com/business/lpat/
 
Technical Job Interviews
Written by Keith Kerlan   
01/20/10
What is the best way to interview for software developer positions?  I've been on both sides of the job interviewing table, but have been on the interviewee side of some not too  great inter
 
Timers/timeouts in multi-threaded event-loops
Written by Christof Meerwald   
01/03/10
The traditional way to integrate timeout handling (or timers) in (single-threaded) event loops was to just pass the appropriate timeout value to the select/poll/epoll syscall. While this works fine
 
C vs C++
Written by Issam Lahlali   
12/04/09
I think that the debate "C vs C++" will end when the two langages died, and each one have its advantages and inconvenients, the choice of one instead of another depend on the application c
 
Great Jobs at CISCO
Written by Brent Rogers   
11/30/09
Hello! I am a recruiter at CISCO. We have a number of great jobopportunities at CISCO right now. Please take a look at the job links listedbelow and please send me an updated resume if you are interes
 
OK Labs, ST-Ericsson, and the Mobile/Wireless Ecosystem
Written by Steve Subar   
11/17/09
Two weeks ago, OK Labs and ST-Ericsson announced the selection of OK Labs as ST-Ericsson's mobile virtualization partner. To earn this coveted position, OK Labs prevailed in a rigorous evaluation
 
C++ Ninjas Needed in Santa Clara, California
Written by Brent Rogers   
09/30/09
Hello! I am a recruiter at CISCO. Our PostPath teamin Santa Clara is building a new Email SaaS business at CISCO. We are looking forsenior developers with Zimbra expertise to help us accomplish this t
 
Fighting Fragmentation with Mobile Virtualization
Written by Steve Subar   
09/21/09
Last week Motorola and T-Mobile announced the launch of a new and innovative Android-based smartphone, the Cliq. This attractive, feature-rich slider handset happens to build on a chipset and firmware
 
Insights into Router Design: Unit Testing of Networking Protocols
Written by Rajesh Kumar Venkateswaran   
09/07/09
  Unit testing is a software validation methodology through which a programmer tests individual modules or units of source code. If the programmer has been responsible for developing a networ
 
Insights into Router Design: Implementation of Networking Protocols
Written by Rajesh Kumar Venkateswaran   
09/06/09
  Modern data networking consists of a large number of networking protocols, each of which has its own domain of applicability. Some run on end stations (also called hosts), some on enterp
 
Insights and Innovations in Networking
Written by Rajesh Kumar Venkateswaran   
09/05/09
Networking devices such as routers and switches have evolved quite a bit over the past years, both in the service provider network and in the enterprise. It is a challenge to build these devices, bo
 
reddit threads community
Written by Christof Meerwald   
08/30/09
I have just started a threads community over at reddit to cover topics such as multithreading, concurrency and parallel programming. Feel free to join if you are interested. -- cmeerw.org 
 

The Latest From Dr. Dobbs

DDJ