Here are some tricks that I have learnt from Umachandar Jaychandran, a former SQL Server MVP who now works in the SQL Server team at Microsoft.
Using BETWEEN
He suggested that rather than saying
o.OrderID = @orderID OR @orderID IS NULL
You should use:
o.OrderID BETWEEN coalesce(@orderID, @MinInt) AND coalesce(@orderID, @MaxInt)
Here @MinInt and @MaxInt are the smallest and biggest integer values possible. What you gain here is that the optimizer can use the index on OrderID, because if @orderid is NULL, the plan still yields the correct result.
Note that this trick does not work with nullable columns. If you were to use this in a query:
c.Region BETWEEN coalesce(@region, @MinRegion) AND coalesce(@region, @MaxRegion)
then the query:
EXEC search_orders @orderid = 10654
would not return any rows, because the customer on this order does not belong to a region.
How to Get Min and Max Values
How would you get the @MinInt and @MaxInt values? For some data types, you can hard-code the boundaries. This works for integer, binary and datetime columns, and with some care also for float/real, numeric/decimal and money columns.
However, for character data you cannot safely rely on hard values, because they are collation dependent. One way is to retrieve them from the table with
If you decide to skip the variables and put the MIN/MAX queries into the main query, there is a thing to watch out for:
o.CustomerID BETWEEN isnull(@custid, (SELECT MIN(CustomerID) FROM Customers)) AND isnull(@custid, (SELECT MAX(CustomerID) FROM Customers))
isnull is an older function similar to coalesce that accepts only two parameters. Why isnull and not coalesce here? Umachandar discovered that when he used coalesce, he got table scans and bad performance, but it worked well with isnull. Since, this is something can depend on subtle behaviour in the optimizer, I would encourage you evaluate both, if you use this in your search routines.
Using LIKE
As an alternative, Umachandar suggested that for character data, you could do:
c.City LIKE coalesce(@city, '%')
Again, this is a trick that only works for columns that are not nullable. (Customers.City is in fact nullable, but there are no NULL values for this column in Northwind.) Also be careful with the fixed-length data types char/nchar. I tried this:
c.Customer LIKE coalesce(@custid, '%')
but since @custid is nchar(5), the percent was padded with four spaces, and no customer id matches that.
Another potential problem is that you now permit users to enter %York
which may or may not be a good thing. The index on City is not very good for this search string.
search_orders_5
Using Umachandar's tricks, I wrote the procedure search_orders_5, which reads (abbreviated).
SELECT @minint = convert(int, 0x80000000), @maxint = convert(int, 0x7FFFFFFF) IF @custid IS NULL SELECT @mincustid = MIN(CustomerID), @maxcustid = MAX(CustomerID) FROM Customers SELECT ... WHERE o.OrderID BETWEEN coalesce(@orderid, @minint) AND coalesce(@orderid, @maxint) AND od.OrderID BETWEEN coalesce(@orderid, @minint) AND coalesce(@orderid, @maxint) AND o.OrderDate >= coalesce(@fromdate, '17530101') AND o.OrderDate <= coalesce(@todate, '99991231') AND od.UnitPrice >= coalesce(@minprice, 0) AND od.UnitPrice <= coalesce(@maxprice, 1E18) AND o.CustomerID BETWEEN coalesce(@custid, @mincustid) AND coalesce(@custid, @maxcustid) AND c.CustomerID BETWEEN coalesce(@custid, @mincustid) AND coalesce(@custid, @maxcustid) AND c.CompanyName LIKE coalesce(@custname + '%', '%') AND c.City LIKE coalesce(@city, '%') AND (c.Region = @region OR @region IS NULL) AND (c.Country = @country OR @country IS NULL) AND od.ProductID BETWEEN coalesce(@prodid, @minint) AND coalesce(@prodid, @maxint) AND p.ProductID BETWEEN coalesce(@prodid, @minint) AND coalesce(@prodid, @maxint) AND p.ProductName LIKE coalesce(@prodname + '%', '%') ORDER BY o.OrderID
To get the smallest and biggest possible integer values, I work from the bit patterns, which are easier to remember than the actual numbers. To make a small optimization, I only read the min and max values for the customer id, if @custid is NULL.
As you can see I do not use BETWEEN or LIKE for all columns. As we already have noted, we can not use this trick for Customers.Region, since there are NULL values in this column. And for Customers.Country there is no index anyway. (And like City, it permits NULL, even if there are on NULL values in Northwind.)
For the filters on Orders.Orderdate and Order Details.UnitPrice, I don't need to use BETWEEN, since we filter for a range anyway. The values I have used for the second parameter to
To give the optimizer more to work with, I have included tests against both tables for the joining columns OrderId, CustomerID and ProductID.
Did this Perform Well?
The result was very mixed. When I ran this batch in Northgale:
EXEC search_orders_5 @orderid = 11000 WITH RECOMPILE EXEC search_orders_5 @custid = 'ALFKI' WITH RECOMPILE EXEC search_orders_5 @prodid = 76 WITH RECOMPILE EXEC search_orders_5 @prodid = 76, @custid = 'RATTC' WITH RECOMPILE EXEC search_orders_5 @fromdate = '19980205', @todate = '19980209' WITH RECOMPILE EXEC search_orders_5 @city = 'Bräcke', @prodid = 76 WITH RECOMPILE
the searches on customer id alone and customer id together with product id, had instant response time, a lot better than what I got with search_orders_3. On the other hand, the search for product id alone took over five minutes in Northgale on SQL 2000 and almost 80 seconds on SQL 2005, which is far slower than search_orders_3. Confusingly when I reran the test six months later, the search on product id now took only 20 seconds on SQL 2000, which is better than search_orders_3 on this version.
What's more disturbing is that despite the WITH RECOMPILE, the query plan was always the same for a given combination of server/database. What you want for a search routine of this kind, is to pick different plans depending on input, whence the WITH RECOMPILE.
So are Umachandar's tricks no good then? It would be to jump to conclusion to say so. They could work well in a different context. Maybe I went over the board when I wrote search_orders_5 and used his tricks for almost all conditions. When we come to search_orders_7 we will see a case where BETWEEN does help us to get a better plan.
In any case, you should always benchmark the different methods you are contemplating with your tables and data. And don't forget that the right answer may be to combine methods.
Comments