Optimization: Your Worst Enemy

Optimization is your enemy

Some many years ago, as I indicated in the introduction, I worked on a large (16-processor) multiprocessor system. We were using custom-modified PDP-11 minicomputers, which were relatively slow. We were programming them in Bliss-11, which as far as I've been able to tell still holds the record for the world's tightest optimizing compiler (although I've seen some quite impressive optimizations in Microsoft C/C++). After doing some performance measurement, we determined that the paging algorithm was the outstanding performance bottleneck. So our first assumption was that the paging algorithm was faulty. We examined the code, and the paging algorithm maintainer rewrote it, taking our performance data into account, and had a new, and much faster, page management algorithm installed within a week.

Meanwhile, up at MIT, MULTICS was still running. They traced a serious performance problem to the paging system. Because it was written in a PL/1 like language, EPL, the assumption was that because it was written in a high-level language, the code was suboptimal, so they launched an effort to rewrite the page management algorithm in assembly code. A year later, the code was working, and was put into the production system. Performance dropped 5%. Upon inspection, it was found that the fundamental algorithm was at fault. They took the EPL code, rewrote the algorithm, and had the improved algorithm working and installed in a few weeks. The lesson: don't optimize something that is not the problem. Understand the problem first. Then, and only then, do the optimization. Otherwise, the optimization is a waste of time and may even make the performance worse.

In the Bliss compiler, the 'register' attribute on a variable said to the compiler "You will assign this variable to a register". In C, it means "I'd like you to assign this variable to a register". Many programmers decided that they should put certain variables in registers to get better code. But the Bliss compiler was very good; it had a very sophisticated register allocation system, and in the absence of direction from the programmer felt free to assign a variable to a register if that produced the best code. Explicitly assigning a variable to a register meant that the register was not available for common subexpressions, particularly those subexpressions implicit in data structure access. After a fair amount of experiments, we determined that almost invariably adding 'register' attributes to declarations produced significantly worse code than letting the compiler do the assignment. For inner-loop code, many hours of effort would usually result in a small improvement in performance, but it was generally understood by the programmers that unless you studied the generated machine code and did a number of calibrated experiments, any attempts at optimization would make the code worse.

If you've heard of the SPECmark performance, you may also be aware of how they are gamed. In particular, IBM wrote a program that took a FORTRAN program as input, such as the SPEC matrix-multiply benchmark, and produced another FORTRAN program as output, but one which was optimized for the cache architecture of the machine it would run on. A small number of parameters described all the cache strategies of each of the models of the RISC 6000 product line. The "optimized" original FORTRAN program performed on one machine at 45 SPECmarks. After being transformed, it performed on the same machine at over 900 SPECmarks. That's a factor of 20 performance improvement based solely on fourth-order effects, cache line hits. If you're doing image processing, particularly of large images, an awareness of cache issues (even relatively machine-independent) can buy you an order of magnitude performance.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” - Bill Gates