.NET
Java
Open Source
- C++
- Django
- PHP
- Python
- Ruby
- Linux
- OpenGL
Mobile
Database
Architecture
RIA & Web
Toolbox

Optimization: Your Worst Enemy

13 Oct 2001 | by Joseph M. Newcomer | Filed in

Comments
PDF

Optimization (2)

A classic blunder in optimization was committed some years ago by one of the major software vendors. We had their first interactive timesharing system, and it was a "learning experience" in a variety of ways. One such experience was that of the FORTRAN compiler group. Now any compiler writer knows that the larger the hash table you use for a symbol table, the better your performance on lookup will be. When you are writing a multipass compiler in a 32K mainframe, you end up using a relatively small symbol table, but you create a really, really good hash algorithm so that the probability of a hash collision is reduced (unlike a binary seach, which is log n, a good hash table has constant performance, up to a certain density, so as long as you keep the density below this threshold you might expect that you will typically have an order 1 or 2 cost to enter or lookup a symbol, on the average. A perfect hash table (which is usually precomputed for constant symbols), has a constant performance between 1.0 and 1.3 or thereabouts; if it gets to 1.5 you rework the hashing to get it lower).

So anyway, this compiler group discovers that they no longer have 32K, or 128K, or 512K. Instead, they now have a 4GB virtual address space. "Hey, let's use a really big hash table!" you can hear them saying, "Like, how about 1 MB table?" So they did. But they also had a very sophisticated compiler technology designed for small and relatively dense hash tables. So the result was that the symbols were fairly uniformly distributed over the 256 4K pages in that 1MB, which meant that every symbol access caused a page fault. The compiler was a dog. When they finally went back to a 64K symbol table, they found that although the algorithm had poorer "absolute" performance from a purely algorithmic viewpoint (taking many more instructions to look up a symbol), because it did not cause nearly as many page faults, it ran over an order of magnitude faster. So third-order effects do matter.

Also, beware of C. No, not the speed of light. When we talk about performance, the algorithmic performance for n is expressed as a function C × f(n). Thus an n² algorithm is formally C × n², meaning that the performance is a constant multiple of the square of the number of elements being considered. We shorten this to O(n²), meaning "order of n²", and in common parlance just drop the "order of" designation. But never forget the C is there. Some years ago, I was doing a project that produced summary set of listings, sorted in a variety of ways. In the first attempt (this was in the days before C and qsort) I just did an ordinary bubble sort, an O(n²) algorithm. After initial testing, I fed it some live data. Ten minutes later, after it had printed the console message "Starting reports", it had not yet produced any reports. A series of probes showed that most of the time was in the sort routine. OK, I was done in by my laziness. So I pulled out my trusty heapsort (n log n) sort and spent an hour modifying it to work in my application (remember, I said qsort did not yet exist). Having solved the problem, I started running it again. Seven minutes into the report phase, nothing had yet appeared. Some probes revealed something significant: it was spending most of its time in the equivalent of strcmp, comparing the strings. While I'd fixed the O issue, I had serious neglected the C issue. So what I did was do one sort of the composite symbol table, all of the names, and then assigning an integer to each symbol structure. Thereafter, when I had to sort a substructure, I just did an integer sort on its ordinal position. This reduced C to the point where less than 30 seconds were required to do the entire report phase. A second-order effect, but a significant one.

So algorithmic performance, particularly paging performance, can matter. Unfortunately, we have neither the proper tools for measuring paging hits nor for reorganizing code to minimize paging of code pages.

Some performance tools measure the total time spent in user space, and treat kernel time as free. This can mask the impact the application has on the kernel time. For example, a few years ago we were measuring the performance of a program whose performance was exceptionally poor. No "hotspot" showed up in terms of program time wasted. However, at one point I was looking at the trace data and noticed that the input routine was called about a million times, which is not surprising when you are reading a megabyte of data, but something seemed odd to me. I looked. Each time it was called, it called the kernel to read a single byte of the file! I changed it to read 8K bytes and unpack the buffer itself, and got a factor of 30 performance improvement! Note the lesson here: kernel time matters, and kernel transitions matter. It is not accidental that the GDI in NT 4.0 is no longer a user-level process but integrated into the kernel. The kernel transitions dominated the performance.

So what to optimize is easy: those parts of the program that are consuming the time. But local optimizations that ignore global performance issues are meaningless. And first-order effects (total time spent in the allocator, for example), may not be the dominant effects. Measure, measure, measure.

You might also like...

Comments

C++ tutorials

C++ books

C++ Primer Plus (6th Edition) (Developer's Library)

C++ Primer Plus is a carefully crafted, complete tutorial on one of the most significant and widely used programming languages today. A friendly and easy-to-use self-study guide, this book is appropriate for both serious students of programming as we...

C++ forum discussion

how can i in C++ send file to other PC over net ?

by greensqeq (7 replies)
QUERY: How to control external exe & read it's process details

by swiftsafe (2 replies)
Sorting parallel arrays in C

by joeyMABIA (4 replies)
help me with a problem anybody?

by Schleons (5 replies)
Logic Warz - Program your own Bot, battle other people's Bots

by Peter767 (2 replies)

C++ podcasts

GoingDeep: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2

Published 8 years ago, running time 4h1m

Herb Sutter presents atomic<> Weapons, 2 of 2. This was filmed at C++ and Beyond 2012. As the title suggests, this is a two part series (given the depth of treatment and complexity of the subject matter).STOP! => Watch part 1 first!Abstract:This session in one word: Deep.It's

C++ jobs

Software Developer - Edinburgh

Runtime Revolution in Edinburgh (EH2), United Kingdom
£25-40k (DOE)
C++ Unix Developer

Flexton Inc. in San Jose, United States
Experienced C++ Developer

Pando Networks in New York, United States
Pando Networks offers employees a generous benefits package which includes health and dental care, short and long term disability, life insurance and retirement plans. The compensation offered for the position will commensurate with experience.

Managed hosting by Everycity

Optimization: Your Worst Enemy

Optimization (2)

You might also like...

Comments

C++ tutorials

C++ books

C++ Primer Plus (6th Edition) (Developer's Library)

C++ forum discussion

how can i in C++ send file to other PC over net ?

by greensqeq (7 replies)

QUERY: How to control external exe & read it's process details

by swiftsafe (2 replies)

Sorting parallel arrays in C

by joeyMABIA (4 replies)

help me with a problem anybody?

by Schleons (5 replies)

Logic Warz - Program your own Bot, battle other people's Bots

by Peter767 (2 replies)

C++ podcasts

GoingDeep: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2

Published 8 years ago, running time 4h1m

C++ jobs

Software Developer - Edinburgh

Runtime Revolution in Edinburgh (EH2), United Kingdom
£25-40k (DOE)

C++ Unix Developer

Flexton Inc. in San Jose, United States

Experienced C++ Developer

Pando Networks in New York, United States
Pando Networks offers employees a generous benefits package which includes health and dental care, short and long term disability, life insurance and retirement plans. The compensation offered for the position will commensurate with experience.

Contribute

Web Development

Developer Jobs

Our tools

Optimization: Your Worst Enemy

Optimization (2)

You might also like...

Comments

by greensqeq (7 replies)

by swiftsafe (2 replies)

by joeyMABIA (4 replies)

by Schleons (5 replies)

by Peter767 (2 replies)

GoingDeep: C++ and Beyond 2012: Herb Sutter - atomic&lt;&gt; Weapons, 2 of 2

Published 8 years ago, running time 4h1m

Runtime Revolution in Edinburgh (EH2), United Kingdom £25-40k (DOE)

Flexton Inc. in San Jose, United States

Pando Networks in New York, United States Pando Networks offers employees a generous benefits package which includes health and dental care, short and long term disability, life insurance and retirement plans. The compensation offered for the position will commensurate with experience.

Contribute

Web Development

Developer Jobs

Our tools

GoingDeep: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2

Runtime Revolution in Edinburgh (EH2), United Kingdom
£25-40k (DOE)

Pando Networks in New York, United States
Pando Networks offers employees a generous benefits package which includes health and dental care, short and long term disability, life insurance and retirement plans. The compensation offered for the position will commensurate with experience.