Dynamic Search Conditions in T-SQL

Umachandar's Bag of Tricks

Here are some tricks that I have learnt from Umachandar Jaychandran, a former SQL Server MVP who now works in the SQL Server team at Microsoft.

Using BETWEEN

He suggested that rather than saying

o.OrderID = @orderID OR @orderID IS NULL

You should use:

o.OrderID BETWEEN coalesce(@orderID, @MinInt) AND
                  coalesce(@orderID, @MaxInt)

Here @MinInt and @MaxInt are the smallest and biggest integer values possible. What you gain here is that the optimizer can use the index on OrderID, because if @orderid is NULL, the plan still yields the correct result.

Note that this trick does not work with nullable columns. If you were to use this in a query:

c.Region BETWEEN coalesce(@region, @MinRegion) AND
                 coalesce(@region, @MaxRegion)

then the query:

EXEC search_orders @orderid = 10654

would not return any rows, because the customer on this order does not belong to a region.

How to Get Min and Max Values

How would you get the @MinInt and @MaxInt values? For some data types, you can hard-code the boundaries. This works for integer, binary and datetime columns, and with some care also for float/real, numeric/decimal and money columns.

However, for character data you cannot safely rely on hard values, because they are collation dependent. One way is to retrieve them from the table with SELECT MAX() and SELECT MIN(). But only do this on indexed columns!

If you decide to skip the variables and put the MIN/MAX queries into the main query, there is a thing to watch out for:

o.CustomerID BETWEEN
        isnull(@custid, (SELECT MIN(CustomerID) FROM Customers)) AND
        isnull(@custid, (SELECT MAX(CustomerID) FROM Customers))

isnull is an older function similar to coalesce that accepts only two parameters. Why isnull and not coalesce here? Umachandar discovered that when he used coalesce, he got table scans and bad performance, but it worked well with isnull. Since, this is something can depend on subtle behaviour in the optimizer, I would encourage you evaluate both, if you use this in your search routines.

Using LIKE

As an alternative, Umachandar suggested that for character data, you could do:

c.City LIKE coalesce(@city, '%')

Again, this is a trick that only works for columns that are not nullable. (Customers.City is in fact nullable, but there are no NULL values for this column in Northwind.) Also be careful with the fixed-length data types char/nchar. I tried this:

c.Customer LIKE coalesce(@custid, '%')

but since @custid is nchar(5), the percent was padded with four spaces, and no customer id matches that.

Another potential problem is that you now permit users to enter %York which may or may not be a good thing. The index on City is not very good for this search string.

search_orders_5

Using Umachandar's tricks, I wrote the procedure search_orders_5, which reads (abbreviated).

SELECT @minint = convert(int, 0x80000000),
       @maxint = convert(int, 0x7FFFFFFF)

IF @custid IS NULL
   SELECT @mincustid = MIN(CustomerID), @maxcustid = MAX(CustomerID)
   FROM Customers

SELECT ...
WHERE  o.OrderID BETWEEN coalesce(@orderid, @minint) AND
                         coalesce(@orderid, @maxint)
  AND  od.OrderID BETWEEN coalesce(@orderid, @minint) AND
                          coalesce(@orderid, @maxint)
  AND  o.OrderDate >= coalesce(@fromdate, '17530101')
  AND  o.OrderDate <= coalesce(@todate, '99991231')
  AND  od.UnitPrice >= coalesce(@minprice, 0)
  AND  od.UnitPrice <= coalesce(@maxprice, 1E18)
  AND  o.CustomerID BETWEEN coalesce(@custid, @mincustid) AND
                            coalesce(@custid, @maxcustid)
  AND  c.CustomerID BETWEEN coalesce(@custid, @mincustid) AND
                            coalesce(@custid, @maxcustid)
  AND  c.CompanyName LIKE coalesce(@custname + '%', '%')
  AND  c.City LIKE coalesce(@city, '%')
  AND  (c.Region = @region OR @region IS NULL)
  AND  (c.Country = @country OR @country IS NULL)
  AND  od.ProductID BETWEEN coalesce(@prodid, @minint) AND
                            coalesce(@prodid, @maxint)
  AND  p.ProductID  BETWEEN coalesce(@prodid, @minint) AND
                            coalesce(@prodid, @maxint)
  AND  p.ProductName LIKE coalesce(@prodname + '%', '%')
ORDER  BY o.OrderID

To get the smallest and biggest possible integer values, I work from the bit patterns, which are easier to remember than the actual numbers. To make a small optimization, I only read the min and max values for the customer id, if @custid is NULL.

As you can see I do not use BETWEEN or LIKE for all columns. As we already have noted, we can not use this trick for Customers.Region, since there are NULL values in this column. And for Customers.Country there is no index anyway. (And like City, it permits NULL, even if there are on NULL values in Northwind.)

For the filters on Orders.Orderdate and Order Details.UnitPrice, I don't need to use BETWEEN, since we filter for a range anyway. The values I have used for the second parameter to coalesce(), are the earliest and latest date supported by the datetime data type. For the money column I just assumed "reasonable values" for max and min.

To give the optimizer more to work with, I have included tests against both tables for the joining columns OrderId, CustomerID and ProductID.

Did this Perform Well?

The result was very mixed. When I ran this batch in Northgale:

EXEC search_orders_5 @orderid = 11000 WITH RECOMPILE
EXEC search_orders_5 @custid  = 'ALFKI' WITH RECOMPILE
EXEC search_orders_5 @prodid  = 76 WITH RECOMPILE
EXEC search_orders_5 @prodid  = 76, @custid = 'RATTC' WITH RECOMPILE
EXEC search_orders_5 @fromdate = '19980205', @todate = '19980209' WITH RECOMPILE
EXEC search_orders_5 @city = 'Bräcke', @prodid = 76 WITH RECOMPILE

the searches on customer id alone and customer id together with product id, had instant response time, a lot better than what  I got with search_orders_3. On the other hand, the search for product id alone took over five minutes in Northgale on SQL 2000 and almost 80 seconds on SQL 2005, which is far slower than search_orders_3. Confusingly when I reran the test six months later, the search on product id now took only 20 seconds on SQL 2000, which is better than search_orders_3 on this version.

What's more disturbing is that despite the WITH RECOMPILE, the query plan was always the same for a given combination of server/database. What you want for a search routine of this kind, is to pick different plans depending on input, whence the WITH RECOMPILE.

So are Umachandar's tricks no good then? It would be to jump to conclusion to say so. They could work well in a different context. Maybe I went over the board when I wrote search_orders_5 and used his tricks for almost all conditions. When we come to search_orders_7 we will see a case where BETWEEN does help us to get a better plan.

In any case, you should always benchmark the different methods you are contemplating with your tables and data. And don't forget that the right answer may be to combine methods.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Every language has an optimization operator. In C++ that operator is ‘//’”