Floating-Point in .NET Part I: Concepts and Format

Introduction

Floating-point arithmetic is generally considered a rather occult topic. Floating-point numbers are somewhat fuzzy things whose exact values are clouded in ever growing mystery with every significant digit that is added. This attitude is somewhat surprising given the wide range of every-day applications that don't simply use floating-point arithmetic, but depend on it.

Our aim in this three part series is to remove some of the mystery surrounding floating-point math, to show why it is important for most programmers to know about it, and to show how you can use it effectively when programming for the .NET platform. In this first part, we will cover some basic concepts of numerical computing: number formats, accuracy and precision, and round-off error. We will also cover the .NET floating-point types in some depth. The second part will list some common numerical pitfalls and we'll show you how to avoid them. In the third and final part, we will see how Microsoft handled the subject in the Common Language Runtime and the .NET Base Class Library.

Here's a quick quiz. What is printed when the following piece of code runs? We calculate one divided by 103 in both single and double precision. We then multiply by 103 again, and compare the result to the value we started out with:

Console.WriteLine("((double)(1/103.0))*103 < 1 is ", ((double)(1/103.0))*103 < 1);
Console.WriteLine("((float)(1/103.0F))*103 > 1 is ", ((float)(1/103.0F))*103 > 1);

In exact arithmetic, the left-hand sides of the comparison are equal to 1, and so the answer would be false in both cases. In actual fact, true is printed twice. Not only do we get results that don't match what we would expect mathematically. Two alternative ways of performing the exact same calculation give totally contradictory results!

This example is typical of the weird behavior of floating-point arithmetic that has given it a bad reputation. You will encounter this behavior in many situations. Without proper care, your results will be unexpected if not outright undesirable. For example, let's say the price of a widget is set at $4.99. You want to know the cost of 17 widgets. You could go about this as follows:

float price = 4.99F;
int quantity = 17;
float total = price * quantity;
Console.WriteLine("The total price is ${0}.", total);

You would expect the result to be $84.83, but what you get is $84.82999. If you're not careful, it could cost you money. Say you have a $100 item, and you give a 10% discount. Your prices are all in full dollars, so you use int variables to store prices. Here is what you get:

int fullPrice = 100; float discount = 0.1F;
Int32 finalPrice = (int)(fullPrice * (1-discount));
Console.WriteLine("The discounted price is ${0}.", finalPrice);

Guess what: the final price is $89, not the expected $90. Your customers will be happy, but you won't. You've given them an extra 1% discount.

There are other variations on the same theme. Mathematical equalities don't seem to hold. Calculations don't seem to conform to what we learnt in grade three. It all looks fuzzy and confusing. You can be assured, however, that underneath it all are solid and exact mathematical computations. The aim of this article is to expose the underlying math, so you can once again go out and multiply, add, and divide with full confidence.

You might also like...

Comments

About the author

Jeffrey Sax Canada

Jeffrey Sax has been writing numerical software for many years. He is founder and president of Extreme Optimization (http://www.extremeoptimization.com), a Toronto based provider of numerical co...

Interested in writing for us? Find out more.

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“It works on my machine.” - Anonymous