Dynamic Search Conditions in T-SQL

When Caching Is Not Really What You Want

Before we move on to static SQL, I like to discuss a problematic situation that you may encounter. This far I have preached the virtue of caching. If two users search on the same criterias, they get the same plan with search procedures like search_orders_1 or search_orders_cs that use parameterised queries. But consider these two invocations:

EXEC search_orders_1 @custid = 'ERNTC',
                     @fromdate = '19980205', @todate = '19980205'
EXEC search_orders_1 @custid = 'BOLSR',
                     @fromdate = '19980101', @todate = '19981231'

In Northgale, ERNTC is the top customer with 572 orders, whereas there is one single order for BOLSR. As you may guess, the best query plan for these two invocations are not the same. When searching for orders from a frequent customer in a short time span, it's probably better to the index on OrderDate, but when searching for orders from a infrequent customer in a broad time span, the index on CustomerID is likely to be better. How to deal with this? I will discuss a couple of tricks that you can employ.

Interpolating Some Values

While interpolating all parameter values into the string is a bad idea, it can sometimes make sense to interpolate some parameter values directly. Say that the search had included a @status parameter, and there had only been four possible values for Orders.Status, whereof Completed had accounted for 95 % of the values. A typical search may be for new orders, less than 1 % of the rows. In this case, it would be a very good idea to interpolate the value of the @status parameter into the query string, since different status values could call for different query plans. This applies not the least if there is an index on the Status column.

What about situation above, the combination of customer and search interval? Interpolating the customer id into the string is probably a bad idea, if you have millions of customers. That's one query plan per customer, so you would reduce the cache reuse drastically. The date interval is a better candidate, assuming that users would tend to use the same intervals during a day, for instance for the last seven days. You would still add quite a few more query plans to the cache though. But old plans for last week would age out by time, so unless the usage pattern is extremely diverse, this could work out.

Changing the Query Text

SQL Server looks up a query in the cache by hashing the query text. (Queries not in stored procedures and similar, that is.) This means that two queries with different text are different entries in the cache, even if they are logically equivalent. There are many ways to alter the query text, for instance you could do something like:

IF @fromdate IS NOT NULL AND @todate IS NOT NULL
BEGIN
   SELECT @sql = @sql + CASE WHEN @fromdate = @todate
                             THEN ''
                             WHEN datediff(DAY, @fromdate, @todate) <= 7
                             THEN ' AND 2 = 2 '
                             WHEN datediff(DAY, @fromdate, @todate) <= 30
                             THEN ' AND 3 = 3 '
                             ...

The advantage with changing the query text depending on the parameter values over interpolating them into the query string is that you get fewer plans in the cache, at the price of that in some cases you will not run with the best plan. To wit, you can only make some guesses where the breakpoint between two plans are. With some luck, the damage from this is limited. Even if the plan for a single day is not the best for the span of a week, it may still be decently useful. But you will have to know your data, and possibly tune as time goes.

Could this technique also be used to deal with the fact that different customers can have a very different number of orders? Probably not. Counting the number of orders for a customer before we construct the query is taking it too far in my opinion, and it could be more expensive than what you save in the other end.

As for how to altering the query text, there are many alternatives. With the current implementation in SQL Server, it even works with adding spaces or comments. And SQL Server MVP Adam Machanic pointed out to me that you can also play with the parameter list – as a matter of fact the list is also part of the query text.

Index Hints

I added this here, because I figured that sooner or later someone would ask Could you not use an index hint? Maybe there are situations where they would make sense, but then it would more be the case that if the user provides an order id, no matter which, then we always want the optimizer to use PK_Orders. I find it difficult to see that there are very many situations where it would be useful to add an index hint depending on the input value. As always, only add index hints, if you have identified a situation where the optimizer picks the wrong plan, and no other workaround is available.

OPTION (RECOMPILE)

In SQL 2005 (not in SQL 2000 and earlier) you can add this hint at the end of the query. This instructs SQL Server to recompile the query each time, and in this case SQL Server will not put the plan into cache.

If you feel that the input values can vary so wildly that you want a recompile each time, and yet want to use parameterised queries to be safe from SQL injection, then this option is a good choice. You could also add it selectively, so for instance you would not add it if an order id is given, but in all other cases.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” - Brian Kernighan