Dynamic Search Conditions in T-SQL

Static SQL

Introduction

For a search routine with many parameters like our search_orders example, static SQL is not really an appetising choice – you will have to make trade-offs between performance and maintainability. To always get the best performance, you would have to use IF statements to only include the conditions you need:

IF @orderid IS NOT NULL
   SELECT ...
   WHERE  o.orderd IS NOT NULL
ELSE IF @fromdate IS NOT NULL AND @todate IS NULL AND
        @custid IS NULL ...
   SELECT ...
   WHERE  o.OrderDate <= @fromdate
ELSE IF @fromdate IS NOT NULL AND @todate IS NOT NULL AND
        @custid IS NULL ...

As you realise, this code would be impossible to maintain (and very boring to write).

If you are on SQL 7 or SQL 2000 and cannot give users direct SELECT permissions to the tables you don't have much choice. You must use static SQL, although it's possible to use static SQL together with dynamic SQL in a hybrid solution that we will look at later. In the following sections, I will look at different methods that you can employ in static SQL. These methods are not mutually exclusive, and you can often get the best result by combining two or more of them.

Even if you are on SQL 2005, you should not dismiss the static solutions out of hand. Keep in mind that not all search problems are equally complex as search_orders. If you have a very small set of search options, you may be able to produce a search solution with static SQL that performs perfectly well, and still is manageable – more manageable than dynamic SQL. (After all, dynamic SQL in itself adds a level of complexity with risk for syntax errors not appearing until run-time.) Sometimes your "dynamic" search is only a lookup with alternate keys. This is a special case for which we will look at perfectly good static solution in the section x = @x AND @x IS NOT NULL.

As with dynamic SQL: always be careful to test all input parameters to your procedure, both for correct result and for performance. To test for performance, you need tables with sizes resembling to what you can expect in production.

x = @x OR @x IS NULL

This is the starting point for dynamic searches in static SQL and this is also what the hybrid solutions build on. This simple technique yields one compact query. However performance for large data sets will be poor. Here is search_orders_3 to demonstrate the idea:

CREATE PROCEDURE search_orders_3
                 @orderid   int          = NULL,
                 @fromdate  datetime     = NULL,
                 @todate    datetime     = NULL,
                 @minprice  money        = NULL,
                 @maxprice  money        = NULL,
                 @custid    nchar(5)     = NULL,
                 @custname  nvarchar(40) = NULL,
                 @city      nvarchar(15) = NULL,
                 @region    nvarchar(15) = NULL,
                 @country   nvarchar(15) = NULL,
                 @prodid    int          = NULL,
                 @prodname  nvarchar(40) = NULL AS

SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
       c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
       c.PostalCode, c.Country, c.Phone, p.ProductID,
       p.ProductName, p.UnitsInStock, p.UnitsOnOrder
FROM   Orders o
JOIN   [Order Details] od ON o.OrderID = od.OrderID
JOIN   Customers c ON o.CustomerID = c.CustomerID
JOIN   Products p ON p.ProductID = od.ProductID
WHERE  (o.OrderID = @orderid OR @orderid IS NULL)
  AND  (o.OrderDate >= @fromdate OR @fromdate IS NULL)
  AND  (o.OrderDate <= @todate OR @todate IS NULL)
  AND  (od.UnitPrice >= @minprice OR @minprice IS NULL)
  AND  (od.UnitPrice <= @maxprice OR @maxprice IS NULL)
  AND  (o.CustomerID = @custid OR @custid IS NULL)
  AND  (c.CompanyName LIKE @custname + '%' OR @custname IS NULL)
  AND  (c.City = @city OR @city IS NULL)
  AND  (c.Region = @region OR @region IS NULL)
  AND  (c.Country = @country OR @country IS NULL)
  AND  (od.ProductID = @prodid OR @prodid IS NULL)
  AND  (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL)
ORDER  BY o.OrderID

The effect of all the @x IS NULL clauses is that if that input parameter is NULL, then that AND-condition is always true. Thus, the only conditions that are in effect are those where the search parameter has a non-NULL value.

For maintainability this is a good choice, as there is no extra complexity. But this is a poor solution for good performance. This procedure will never use any selective index, but you will get a query plan as if you had not supplied any search parameters at all.

This may surprise readers who are acquainted with parameter sniffing. That is, when SQL Server builds a query plan for a stored procedure, it looks at the values of the input parameters and uses them as guidance for the plan. Thus, for a call like:

EXEC search_orders_3 @orderid = 11000

it would be reasonable to expect to get a plan looks up the order id through the primary-key index on Orders. So why does this not happen? Keep in mind that the optimizer does not perform any flow analysis of the code. It knows that @orderid is 11000 on input, but it cannot exclude the possibility that at point the query is executed, @orderid has been set to NULL along the way, in which case a plan with a key lookup would yield incorrect result – something which is completely unacceptable for the optimizer.

There is a second problem with parameter sniffing that the optimizer must keep in regard. Assume that SQL Server had in fact looked at the input parameters and opted to use the primary key on Orders. Recall that query plans are cached. So when the next guy submits a search on product id instead, the first query plan would be flat wrong.

Forcing Recompilation Each Time

Since there is no good plan that fits all searches, it can be a good idea to force recompilation each time. This can be done in three different ways:

  • Adding WITH RECOMPILE to the procedure definition.
  • Adding WITH RECOMPILE when calling the procedure.
  • On SQL 2005 use the query hint OPTION (RECOMPILE).

I tried this test batch:

exec search_orders_3 @orderid = 11000 WITH RECOMPILE
exec search_orders_3 @custid  = 'ALFKI' WITH RECOMPILE
exec search_orders_3 @prodid  = 76 WITH RECOMPILE
exec search_orders_3 @prodid  = 76, @custid = 'RATTC' WITH RECOMPILE
exec search_orders_3 @fromdate = '19980205', @todate = '19980209' WITH RECOMPILE
exec search_orders_3 @city = 'Bräcke', @prodid = 76 WITH RECOMPILE

I found that on SQL 2000, I got the same plan for all cases but the last when I ran it in Northgale.

On SQL 2005, I got a different plan for each invocation, and the total execution time for the batch was considerably lower than when I ran the batch without WITH RECOMPILE. So forcing recompilation was certainly worthwhile. However, the plans were not still particularly effective, and compared to the solutions with dynamic SQL, the execution time was about four times more for the entire test batch.

As I discussed above, SQL Server does not perform flow analysis of the procedure, so it cannot assume the input parameters actually retain their values until it reaches the SQL statement. Thus, this poor result is not very surprising. But what if we instead use the new hint OPTION (RECOMPILE)? This query hint instructs SQL Server to recompile the plan for the statement each time. (In SQL 2000 stored procedures are compiled as a whole. SQL 2005 is able to recompile on statement level.) That would mean SQL Server should be able to look at the actual parameter values and take all possible shortcuts, and arrive at a plan which is just as good as what you get with dynamic SQL.

Alas, this does not happen. I found that when I added OPTION (RECOMPILE) to the query, that I got exactly the same plan as when I added WITH RECOMPILE to the EXEC statement. It appears that the optimizer is still a bit nervous that the values may change during the query.

(This is not to say that OPTION (RECOMPILE) is useless. If you have a long stored procedure, and you have multiple-criteria condition in the middle of it, OPTION (RECOMPILE) can be very useful. Also, if the statement includes local variables, or the input parameters have changed values, SQL Server is able to use these values when building the plan.)

Of course, if you force recompilation each time, there is no cache reuse at all. (In fact, the plan is not even put into cache.) But that is a trade-off you have to make. Cache reuse is good as long as it takes of load off the server, but it's not good if it leads to queries running with suboptimal plans.

A Useless Method?

So if this method always yields a table scan, it is worthless and should never be used? Not really so. Sometimes you can reasonably assume that the amount of data that you are to search will not exceed, say, 10.000 rows. As long as the response times are acceptable for the users, and there are no locking issues, there is no reason to make your programming more complex than necessary. And there are plenty of cases where you need to add an extra optional filter to an already efficient query on a non-indexed column, a case where this method is an excellent choice. I would go as far as to say that this technique should be a staple in your T-SQL programming. You just need to know when it's not the right thing to use.

In the section x = @x AND @x IS NOT NULL, I will come back to a "trick" where you actually can make this method to use indexes effectively under some circumstances.

Variation on this Theme

Rather than using OR like above, some people write:

o.orderID = coalesce(@orderid, o.OrderID)

coalesce() is a function that takes a list of values as argument, and returns the first non-NULL value in the list, or NULL if there is no non-NULL value in the list. Thus, if @orderid is NULL, you get o.OrderID = o.OrderID, a complete no-op. You can see full example of this in search_orders_3a.

This yields even more compact code than using OR, but it has the same problems with regards to performance: SQL Server will not be able to use indexes efficiently. Rather, this sort of condition can make it even more difficult for the optimizer, even with recompilation hints. (I ran the six test searches in Northgale. On SQL 2005, search_orders_3a ran considerably slower than search_orders_3, whereas on SQL 2000, it was considerably faster!)

I like to put up a warning sign for coalesce, because there is a trap you can fall into. Run this:

EXEC search_orders_3 @orderid = 10654
EXEC search_orders_3a @orderid = 10654

The first call return three rows, but the last returns no rows at all! Why? Because for this customer, the column Region is NULL. When @region is NULL, the condition

c.Region = coalesce(@region, c.Region)

becomes in essence c.Region = NULL. But in SQL, NULL is not equal to NULL. NULL stands for "unknown value", and whether two unknown values are the same or different is of course unknown.

Thus, in a correct implementation of search_orders_3a, I should have written the condition on Customers.Region differently. Since this is accident waiting to happen, and the solution with coalesce gives no significant advantage, my verdict is that

x = @x OR @x IS NULL

is to prefer over coalesce. But if you want to cram out maximum performance out of this solution, you may want to experiment, and even mix. (And to increase your options, you could try isnull as well!)

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.” - Tom Cargill