A guide to sorting

Quick Sort

The most commonly used sorting algorithm is the C function qsort. This is one of several algorithms known as "n log n sorts". The log function is officially log2 n, but the "2" is usually omitted in conversation.. By this measure of complexity, if it takes time T to sort k items, and I need to sort 1000k items, it will take approximately 9000T (1000 * log2 1000, noting that log2 1024 » 9). Compare that with n2 bubble sort, where the cost would be 1,000,000T. Pretty big difference!

qsort is a very sophisticated algorithm, and I don't plan to discuss how it works here. It is part of the C library. Its specification is sometimes a bit hard for a beginner to read:

void qsort(
    void *base,
    size_t num,
    size_t width,
    int (__cdecl *compare )(const void *, const void *)
);

This is designed to be a very general algorithm, and it was designed for C, not for C++. Thus there is no "template" that would generate a family of qsort functions; the size of the object must be provided explicitly. You also have to provide a function which performs the comparison on your objects. That's the last parameter. It is passed two pointers to the elements of the array. So if you have an array of integer values and want to sort it, you will be passed int * pointers as the parameters to the function.

The function itself returns an int value. If the value is negative, it means you have determined that the first value is less than the second. If the value is positive, it means the first value is greater than the second value. If the value is 0, it means the two values are equal.

This makes it very easy to compare integers. You can write a simple function:

int compareint(const void * v1, const void * v2)
{
    return *(int *)v1 - *(int *)v2;
} // compareint

Note that what happens here: you are passed the pointers to the elements of the array. You cast the pointers to int * pointers, then simply subtract. If v1 is less than v2, the result will be negative; if v1 is greater than v2, the result will be positive, and if they are the same, the value will be zero.

Suppose I have a struct of the form

typedef struct {
    CString name;
    UINT birthday;
} myData, * pmyData;

and I have an array pointed to, and I know that there are 178 elements in the array (count). Then I could invoke the qsort function by doing

pmyData info;
UINT count;

qsort(info, count, sizeof(myData, byname);

where the function byname is defined as

int byname(const void * v1, const void * v2);

But what does that function get? It gets pointers to the array elments. Remember that this is an array of myData objects. So each time the byname function is called, it gets a pointer to two of the elements of that array. So you will need to cast them to be pointers to those objects:

int byname(const void * v1, const void * v2)
{
    pmyData * data1 = (pmyData *)v1;
    pmyData * data2 = (pmyData *)v2;
    return _tcscmp(data1->name, data2->name);
} // byname

The function must return a value of < 0 if v1 < v2, 0 if v1 == v2, and > 0 if v1 > v2. _tcscmp, which is the Unicode-aware version of strcmp, does exactly this.

I've represented the birthday by a UINT, which is the year of the person's birth. Suppose I wanted to sort by age. I could write

int byname(const void * v1, const void * v2)
{
    pmyData * data1 = (pmyData *)v1;
    pmyData * data2 = (pmyData *)v2;
    return data1->birthday - data2->birthday; // NOT QUITE RIGHT!!!
} // byname

So this should subtract the two values. If data1->birthday < data2->birthday, then I should get a negative number, if they are equal, I should get 0, and if data1->birthday > data2->birthday I should have a positive number. So this should work. It actually gets a compiler warning.

Why? Because I declared the variable as UINT. You can't really subtract two UINTs and get a signed result. (OK, those of you who want to be pedantic will point out that the 2's complement arithmetic computes the correct bit pattern anyway, so it doesn't matter. But the compiler is unhappy).

There are a couple solutions. One is that I could declare the year of birth as an int, which would allow me negative years (if someone needs to use dates "B.C.E." (Before Christian Era) this could be useful. Or you could cast the values to int and do the arithmetic on an int, e.g.,

return (int)data1->birthday - (int)data2->birthday;

But I want more than a year. I want a day. So I instead choose to use the structure

typedef struct {
    CString name;
    COleDateTime birthday;
} myData, * pmyData;

Now I have represented the birthday as a COleDateTime value. Why not a CTime? Because CTime has an "epoch date" of 1-Jan-1970. It wouldn't represent my birthday (I had been programming for six years at that point) or anyone else's who had been born before that time. COleDateTime has an epoch value of 1 January 1900, and is in fact a floating-point value; in addition, it allows negative values. The specified range is 1-January-100 to 31-December-9999. I would now rewrite the code as:

int byage(const void * v1, const void * v2)
{
    pmyData * data1 = (pmyData *)v1;
    pmyData * data2 = (pmyData *)v2;
    COleDateTime deltaT = v1->birthday - v2->birthday;
    if(deltaT < 0)
        return -1;
    else
        if(deltaT > 0)
            return 1;
        else
            return 0;
}

This is subtle. Why couldn't I just write the same different operator I had before? Why did I have to do this complicated computation?

Because it is a floating-point number! Thus if I had two birth dates, one on the late afternoon (something.75) and another early in the next morning (something+1.20), the difference would be less than a day (.45), and would be truncated to an integer (0), so the two people would look as if they had been born on the same day! So instead I have to decode the value so that the resulting sign is properly accounted for!

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Theory is when you know something, but it doesn't work. Practice is when something works, but you don't know why. Programmers combine theory and practice: Nothing works and they don't know why.”