Dynamic Search Conditions in T-SQL

Using TEMP tables

Sometimes you can make your code simpler by using a temp table or two. We shall here look at two possibilities, of which the first is really miserable in terms of performance, while the second yields a search_orders procedure which performs quite well to be static SQL.

Deleting the Misfits

This a very simplistic method which may be OK for small data sets, but it will never perform well, and you should never consider it as your main solution. It may occasionally be defensible to use this method to implement search conditions that else would be difficult to handle, for instance conditions that requires access to tables that else would not be involved.

The idea is that you make a basic selection into a temp table, and then you delete the rows that do not match the user's search conditions from the temp table. Here is an extract of search_orders_6 where I use this technique:

SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
       c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
       c.PostalCode, c.Country, c.Phone, p.ProductID,
       p.ProductName, p.UnitsInStock, p.UnitsOnOrder
INTO   #temp
FROM   Orders o
JOIN   [Order Details] od ON o.OrderID = od.OrderID
JOIN   Customers c ON o.CustomerID = c.CustomerID
JOIN   Products p ON p.ProductID = od.ProductID

IF @orderid IS NOT NULL
   DELETE #temp WHERE OrderID <> @orderid

IF @fromdate IS NOT NULL
   DELETE #temp WHERE OrderDate < @fromdate

...

IF @region IS NOT NULL
   DELETE #temp WHERE Region <> @region OR Region IS NULL
...
SELECT * FROM #temp ORDER BY OrderID

The code is so simple to write that you could do it in your sleep. The only thing to watch out for, is nullable columns like Region.

If you decide to use this method, it is essential that you make sure that the basic selection that you put in the temp table is moderate in size. search_orders_6 is OK in Northwind, but absolutely not in Northgale, where the initial SELECT will generate 2.3 million rows in the temp table.

There is a second performance issue to consider that particularly applies to SQL 7 and SQL 2000: recompilations. SQL Server may decide to recompile a procedure if the statistics of a table involved in a query has changed. This is particularly likely to happen with temp tables. Sometimes this is a good thing, but for a procedure like search_orders_6 it is likely to be a waste of time, since the temp table does not have any indexes. But since the procedure is long and have many queries, the recompilation itself could degrade performance. Since SQL 2005 has statement recompilation, the cost for recompilation is smaller there. (But I have not studied that in detail yet.)

One way to avoid these recompilations on SQL 2000 and SQL 2005 is to use table variables instead. Table variables do not have statistics, and therefore cannot cause recompilation. Then again, if you have a DELETE that involves another table, the statistics on the temp table can be useful.

In this example I gathered all the output columns in the temp table. But it could have let it suffice with OrderID, OrderDate and UnitPrice, CustomerID and ProductID, that is the search columns from the big tables Orders and Order Details. Then for deletion on City or Prodname, I would have joined with Customers or Products. The advantage with this is that I would have kept down the table in size. It would still perform unacceptably slow in Northgale, though.

Filtering from Bottom and Up

This example uses a mix of strategies. It is more a demonstration of that there is not a clear-cut solution to implementing these searches in static code. Rather, you have to use your imagination, and even more importantly, your knowledge about the data at hand.

The procedure in the spotlight this time is search_orders_7. This is how the procedure starts:

IF @orderid IS NULL
BEGIN
   INSERT #custids (custid)
      SELECT CustomerID
      FROM   Customers c
      WHERE  (c.CustomerID = @custid OR @custid IS NULL)
        AND  (c.CompanyName LIKE @custname + '%' OR @custname IS NULL)
        AND  (c.City = @city OR @city IS NULL)
        AND  (c.Region = @region OR @region IS NULL)
        AND  (c.Country = @country OR @country IS NULL)
END

INSERT #products(ProductID, ProductName, UnitsInStock, UnitsOnOrder)
   SELECT ProductID, ProductName, UnitsInStock, UnitsOnOrder
   FROM   Products
   WHERE  (ProductID = @prodid OR @prodid IS NULL)
     AND  ProductName LIKE coalesce(@prodname + '%', '%')

That is, I fill two temp tables: one for customers and one for products. To show you different strategies, for customers I save only the customer id in that temp table, but for products I include the lot. By only including CustomerID, I keep the temp table smaller, but I need to join against Customers for the final query too, whereas for Products, the temp table has all I need.

To simplify, I make no attempt to use indexes here. It depends on your business, but tables for customers and products are typically considerably smaller than order tables.

As you see, I don't search Customers if the user has provided an order id, since in this case the customer is given by the order. (Note that the same does not apply to products!)

The next statement in this procedure reads:

EXEC search_orders_7_sub @orderid, @fromdate, @todate, @minprice, @maxprice

This procedure (available in the same file as search_orders_7), features the WITH RECOMPILE option and looks like this in abbreviated form:

IF @orderid IS NULL
BEGIN
   SELECT ...
   FROM   Orders o
   JOIN   [Order Details] od ON o.OrderID = od.OrderID
   JOIN   #custids t ON o.CustomerID = t.custid
   JOIN   Customers c ON o.CustomerID = c.CustomerID
                     AND t.custid = c.CustomerID
   JOIN   #products p ON p.ProductID = od.ProductID
   WHERE  o.OrderDate >= coalesce(@fromdate, '17530101')
     AND  o.OrderDate <= coalesce(@todate, '99991231')
     AND  od.UnitPrice >= coalesce(@minprice, 0)
     AND  od.UnitPrice <= coalesce(@maxprice, 1E18)
   ORDER  BY o.OrderID
END
ELSE
BEGIN
  SELECT  ...
   FROM   Orders o
   JOIN   [Order Details] od ON o.OrderID = od.OrderID
   JOIN   Customers c ON o.CustomerID = c.CustomerID
   JOIN   #products p ON p.ProductID = od.ProductID
   WHERE  o.OrderID = @orderid
     AND  od.OrderID = @orderid
     AND  od.UnitPrice >= coalesce(@minprice, 0)
     AND  od.UnitPrice <= coalesce(@maxprice, 1E18)
END

As in search_orders_4, I use different code paths for when an order id is given, and when it is not. You can see that only when I do not have an order id, I include both #custids and Customers in the join. On the other hand, I always use #products, but never Products, as I have all data I need in the temp table.

Why are there two procedures? I wanted to make sure that there is a recompilation for the final query, since the best plan will depend on the contents of the temp tables, why I put the queries against the temp tables in a subprocedure with WITH RECOMPILE. (And as with search_orders_4a, I could have had one procedure for each of the two final SELECT statements, to minimise how much is recompiled each time.)

Of course, on SQL 2005 I could keep all in one procedure and use OPTION (RECOMPILE) instead to achieve the same effect.

When I ran my standard test batch in Northgale:

EXEC search_orders_7 @orderid = 11000
EXEC search_orders_7 @custid = 'ALFKI'
EXEC search_orders_7 @prodid = 76
EXEC search_orders_7 @prodid = 76, @custid = 'RATTC'
EXEC search_orders_7 @fromdate = '19980205', @todate = '19980209'
EXEC search_orders_7 @city = 'Bräcke', @prodid = 76

Compared to search_orders_3, the result was better in all cases on SQL 2000 and in most cases significantly better. On SQL 2005, the searches on order id and date interval were up to 50-70 % slower, whereas the search on city and product, it was about three times faster. (Recall here that running search_orders_3 WITH RECOMPILE gave much more effect on SQL 2005 than on SQL 2000.) On both versions, about every invocation had a different query plan.

An interesting observation is the condition on UnitPrice. As you can see, I use the max/min method from Umachandar's Bag of Tricks. When I had the condition as:

AND (od.UnitPrice >= @minprice OR @minprice IS NULL)
AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL)

The query plan for the fifth search (@fromdate = '19980205', @todate = '19980209') was different on SQL 2000 and much slower, and it took 65 seconds to run the procedure with these parameters. There is no apparent reason why it should be like that. It only goes to show that you have to try different methods and turn every stone in your endeavour to get the best performance, when you use static SQL.

Note: the "good" plan I got for the search on the date interval was a very complex plan which included parallelism. My workstation has a single hyper-threaded CPU, and thus parallelism is possible. I guess that you can see different plans on machines with fewer or more CPUs.

x = @x AND @x IS NOT NULL

Not all search routines are as complex our search_orders. Sometimes the dynamic search is confined to alternate keys, of which the user always supplies one. In this case, it is not too painful to use IF statements to handle the three different cases. But maybe the column list in the SELECT statement contains complex expressions that you don't want to repeat. You know already that this won't do:

WHERE (key1 = @key1 OR @key1 IS NULL)
  AND (key2 = @key2 OR @key2 IS NULL)
  AND (key3 = @key3 OR @key3 IS NULL)

As you have seen this will yield a table scan. But what do you think about this:

WHERE (key1 = @key1 AND @key1 IS NOT NULL)
   OR (key2 = @key2 AND @key2 IS NOT NULL)
   OR (key3 = @key3 AND @key3 IS NOT NULL)

The logic is here the reverse: give me all rows that matches any of the given conditions. From a performance point of view, this may look like the same thing, but this can in fact perform very well. The odds are very good that SQL Server will generate a plan which seeks the three key indexes and then merges the result either by index concatenation or some other method. Here is the real big scoop: thanks to the condition, @x IS NOT NULL, SQL Server adds a filter with a startup expression to the plan, so if the corresponding variable is NULL, SQL Server will not access that index at all.

Searching Orders

This method is directly applicable is when users want to get back rows that match any of the search conditions. Alas, most search functions are like our search_orders: users want to get back rows that match all conditions. Nevertheless, to improve performance for our dynamic search, we can use the above as a starting point for such a search, under some circumstances:

  • The user must supply at least one search condition; the alternative "return all" is not available.
  • All search columns are in the same table.
  • All search columns are indexed.

This leads to search_orders_10 that due these restrictions has a different interface from the other search procedures:

CREATE PROCEDURE search_orders_10  @orderid   int      = NULL,
                                   @custid    nchar(5) = NULL,
                                   @fromdate  datetime = NULL,
                                   @todate    datetime = NULL,
                                   @empid     smallint = NULL,
                                   @shipdate  datetime = NULL AS

IF @fromdate IS NOT NULL AND @todate IS NULL
   SELECT @todate = MAX(OrderDate) FROM Orders
ELSE IF @fromdate IS NULL AND @todate IS NULL
   SELECT @fromdate = MIN(OrderDate) FROM Orders

SELECT o.OrderID, o.OrderDate, o.EmployeeID, o.ShippedDate,
       od.UnitPrice, od.Quantity,
       c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
       c.PostalCode, c.Country, c.Phone, p.ProductID,
       p.ProductName, p.UnitsInStock, p.UnitsOnOrder
FROM   (SELECT TOP 99.9999999 PERCENT
               o.OrderID, o.OrderDate, o.EmployeeID, o.ShippedDate,
               o.CustomerID
        FROM   Orders o
        WHERE  (o.OrderID     = @orderid   AND @orderid   IS NOT NULL)
           OR  (o.CustomerID  = @custid    AND @custid    IS NOT NULL)
           OR  (o.EmployeeID  = @empid     AND @empid     IS NOT NULL)
           OR  (o.ShippedDate = @shipdate  AND @shipdate  IS NOT NULL)
           OR  (o.OrderDate BETWEEN @fromdate AND @todate AND
                     @fromdate IS NOT NULL AND @todate IS NOT NULL)
        ) AS o
JOIN   [Order Details] od ON o.OrderID = od.OrderID
JOIN   Customers c ON o.CustomerID = c.CustomerID
JOIN   Products p ON p.ProductID = od.ProductID
WHERE  (o.OrderID    = @orderid   OR @orderid   IS NULL)
  AND  (o.CustomerID = @custid    OR @custid    IS NULL)
  AND  (o.EmployeeID = @empid     OR @empid     IS NULL)
  AND  (o.OrderDate  >= @fromdate OR @fromdate  IS NULL)
  AND  (o.OrderDate  <= @todate   OR @todate    IS NULL)
ORDER BY o.OrderID, od.ProductID

The way search_orders_10 works is that in the derived table we run a query where we build the union of the search conditions, and then we filter the output from the derived table in the regular way. The point with this complex arrangement is that we know that SQL Server is able to run the query for the derived table efficiently with help of indexes. As for @fromdate and @todate, I will come back to that.

In case you've never seen a derived table before: a derived table is a virtual temp table within the query, but it is not necessarily materialised and SQL Server may recast the computation order as long the result is the same.

However, in this particular case, we really want SQL Server to compute the derived table, and that is the reason for the TOP 99.9999999 PERCENT. To wit, when I ran the procedure in Northgale on SQL 2000 without the TOP clause, the optimizer was able to look through my intentions and generated the plan as if the derived table wasn't there with poor performance as the result. Interesting enough, SQL 2005 was less smart and computed the derived table even without the TOP clause.

If you run for instance:

exec search_orders_10 @custid = 'ALFKI'

and look at the query plan, you will see that all indexes appear in the plan, but if you let the mouse hover the Filter operators, you will see that they include a startup expression, meaning that if the filter condition is not satisfied, that part of the plan is not executed for this query.

When I compared the execution times for search_orders_10 to the other methods, I got execution times that were in par with those for dynamic SQL – for the three test cases that search_order_10 supports, that is.

Since the same plan is good for all search conditions, there should not be any need for WITH RECOMPILE – or at least so one likes to think. But when I ran:

exec search_orders_10 @custid = 'ALFKI', @orderdate = '19970825',
                      @empid = 501 

and added WITH RECOMPILE to force a new plan, I got a different plan which involved parallelism. It is not that surprising – the more conditions the user adds, the more rows that derived table will hit, and at some point a table scan will seem more efficient to the optimizer. This can be problematic, if the first user that performs a search specifies this many parameters. His plan will be left in the cache for everybody else. I don't really have any good standard suggestion to handle this case. But you need to consider the possibility that the first user enters many search conditions, and then use your own good judgement to determine whether to use WITH RECOMPILE, take a gamble that it does not happen often or implement some other creative solution.

Why the Restrictions?

But why all these restrictions? The restriction that the user must specify at least one search condition follows from the logic of the method. Without any search conditions at all, the derived table will not return any rows.

The other two restrictions are based on performance. If not all search columns are indexed, SQL Server will not use any startup expression and merge different indexes. Since at least one of the conditions will require a scan, the optimizer will pick a scan for everything.

If the search conditions are in different tables, the optimizer does not try to merge the result from different seeks in different tables with startup expressions, but you get a one-size-fits-all plan. The file for search_orders_10 includes search_orders_10a and search_orders_10b that add a @prodid parameter and a @custname respectively. None of them has the performance of search_orders_10. (And without the performance, it is difficult to see the point with the increased complexity of this method.)

Note: this not something is I "know" – it's only based on my own observations when testing with Northgale. Maybe you have better luck than I have. But don't stay up all night to get the optimizer use startup expressions for search conditions in different tables!

Searching on Ranges

As you can see in the code above, I have a special handling of @fromdate and @todate. If only one is given, I set the other to a boundary value to make the range closed. The reason is for this is that a condition like:

@orderdate >= @fromdate AND @fromdate IS NOT NULL

would hit too many rows if @fromdate is not NULL, so SQL Server would probably try to scan the table. I say "probably", because I did in fact not even have the guts to try it...

Handling Complex Conditions

In the example I'm using in this article, we always need to access all tables. But in many situations, there might be tables you only need to access if a certain input parameter is given. Or depending on some input parameter, you might need to access different tables.

With dynamic SQL, these conditions are fairly straightforward to handle. When you build the query, you include only the tables needed to satisfy the current values of the input parameters.

Some of the methods for static SQL can easily be extended to deal with this case. I'm mainly thinking of using IF statements and using temp tables. But there might also be additional tricks, depending on the situation. Here I will only cover one case, for which Phillipp Sumi pointed out a good solution.

Selection Based on Existence in a Other Table

Let's add one more parameter to our procedure: @suppl_country. If this parameter is provided, the procedure should only return information about products of which the supplier comes from the given country. You could implement this by joining to the Suppliers table, but Phillipp Sumi's suggestion was that you should use an EXISTS clause in this way:

@suppl_country IS NULL OR EXISTS (SELECT *
                                  FROM   Suppliers s
                                  WHERE  s.SupplierID = p.SupplierID
                                    AND  s.Country    = @suppl_country)

The point is that if @suppl_country is NULL, SQL Server will not access the Suppliers table at all. To illustrate this I wrote search_orders_9, which for the sake of simplicity is an extension of search_orders_3. To see what is going on, run this in Northwind:

SET STATISTICS IO ON
go
EXEC search_orders_9 @country = 'Sweden'
EXEC search_orders_9 @suppl_country = 'Sweden' 

If you look at the query plans, you will find that these are identical. But if you look at the output from SET STATISTICS IO ON, you will find something interesting:

Table 'Suppliers'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.
Table 'Products'. ...
...

Table 'Suppliers'. Scan count 2075, logical reads 4150, physical reads 0, read-ahead reads 0.
Table 'Products'. ...
...

(This output is for SQL 2000. It looks somewhat different, and somewhat confusing, on SQL 2005). Note that first line in bold: SQL Server is not touching this table at all. This is because the operator the optimizer uses, the Nested loops/Left Semi Join permits SQL Server to first evaluate the variable, and thus find that there is no need to access the table.

This might remind you of shortcutting in a language like C++, but T-SQL does not have any shortcut mechanism. In fact you could change the order of the conditions in the OR expression, and SQL Server would still not access Suppliers.

The standard word of caution applies here as well: you need to test that you actually get the query plan you expect. SQL Server may opt to handle the EXISTS clause in some different way and always access the table.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The question of whether computers can think is just like the question of whether submarines can swim.” - Edsger W. Dijkstra