Dynamic Search Conditions in T-SQL

Using both Static and Dynamic SQL

In this section will we look at solutions that uses both static and dynamic SQL. The idea is that you put all references to the tables inside a view or a table-valued function, and then use dynamic SQL to access the view or function. In this case users do not need SELECT permissions on the tables, only on the view or the function.

Of these two, using an inline-function is probably the most useful, but I will start with views, because... I'll tell you later.

Using Views

This is based on a suggestion I got from .Net MVP Eric Garza. The idea is that you define a view that exposes the data that the users are permitted to see, and then grant them SELECT access on the view instead.

From this, I defined the view search_order_view:

CREATE VIEW search_order_view AS
   SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
          c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
          c.PostalCode, c.Country, c.Phone, p.ProductID,
          p.ProductName, p.UnitsInStock, p.UnitsOnOrder
   FROM   Orders o
   JOIN   [Order Details] od ON o.OrderID = od.OrderID
   JOIN   Customers c ON o.CustomerID = c.CustomerID
   JOIN   Products p ON p.ProductID = od.ProductID
go

Then I wrote search_orders_8, which is very similar to search_order_1, but where I changed lines 18-28 to read:

SELECT @sql = 'SELECT * FROM dbo.search_order_view WHERE 1 = 1 '

This does not change performance, because SQL Server does not save any query plan for a view. Rather SQL Server expands the view into the query as it was a macro, so basically you get the original query from search_orders_1.

Rather than defining a view, I could have defined a inline table function. Just like views, inline table functions are macros that are expanded into the query plan. You could think of them as parameterised views.

In our example scenario, search_orders_view is kind of pointless, because even if you deprive the users SELECT access to Orders, Order Details, Customers and Products, the users would still have access to all data. However, consider the case that you cannot give SELECT access to the users, because users are only allowed to see data for certain customers, and that this is defined in a table called UserAcceessCustomers:

CREATE TABLE UserAccessCustomers
             (UserName   sysname   NOT NULL,
              CustomerID nchar(10) NOT NULL,
              CONSTRAINT pk_uca PRIMARY KEY (UserName, CustomerID))

UserName here is simply the same as the user name in sysusers in the database. In such case you can write the view as:

CREATE VIEW search_order_view AS
   SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
          c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
          c.PostalCode, c.Country, c.Phone, p.ProductID,
          p.ProductName, p.UnitsInStock, p.UnitsOnOrder
   FROM   Orders o
   JOIN   [Order Details] od ON o.OrderID = od.OrderID
   JOIN   Customers c ON o.CustomerID = c.CustomerID
   JOIN   UsersAccessCustomers u ON u.CustomerID = c.CustomerID
                                AND u.CustomerID = o.CustomerID
   JOIN   Products p ON p.ProductID = od.ProductID
   WHERE  u.UserName = USER

USER on the last line is a built-in T-SQL function that returns the name of the current user in the current database.

Note: the above is how you normally implement row-level security in SQL Server. SQL Server MVP Steve Kass has found that it still is possible for a user to find information about data he is not supposed to see. As I mentioned above, a view is just a kind of macro, and the query optimizer is free to evaluate the tables in the expanded query in any order, something a non-privileged user can use to his advantage. It is not the case that he can access the data directly, but he can draw conclusions by looking at query plans, error messages etc. The attack requires a bit of skill, but if it is critical that users do not get the smallest glimpse of data they have no rights to see, you should not grant them SELECT permissions to views or inline functions like the one above. (A multi-step function has a fixed query plan of its own, so it should be safe from this attack. But then SQL Server would also have to materialise, if only in a table variable, the complete set of data that the user is entitled to see, which is likely to affect performance. And it would be pointless to combine a multi-step function with dynamic SQL.)

Eric Garza arrived at this solution from another angle: he wanted to deny users access on big tables so they could not submit queries that would scan the tables and block other users. Instead he provided a number of user-defined functions, where the users are required to specify certain input parameters, so whatever queries they submitted, he could be sure that SQL Server would use indexes.

Using Inline Table Functions

We now arrive at the solution that is the most challenging for me as an author, because while this method provides a solution to using dynamic SQL without direct SELECT permissions on the tables, it also requires you to have full understanding of how it works and why it works. Else you could end up implement it in a way that is not any more effective than purely static solutions. And to get the full power from this solution, you will need to break the rules.

This solution works on SQL 2000 and SQL 2005, but not SQL 7 which does not have user-defined functions.

This method was discovered by SQL Server MVP Marcello Poletti. He found that if he put a query like the one in search_orders_3 in an inline-table function, and then ran various SELECT statements against it, he would always get the optimal plan, as if he had used dynamic SQL. To get this in our order-search perspective, let's look at this function:

CREATE FUNCTION  search_orders_11fun (
                 @orderid   int          = NULL,
                 @fromdate  datetime     = NULL,
                 @todate    datetime     = NULL,
                 @minprice  money        = NULL,
                 @maxprice  money        = NULL,
                 @custid    nchar(5)     = NULL,
                 @custname  nvarchar(40) = NULL,
                 @city      nvarchar(15) = NULL,
                 @region    nvarchar(15) = NULL,
                 @country   nvarchar(15) = NULL,
                 @prodid    int          = NULL,
                 @prodname  nvarchar(40) = NULL) RETURNS TABLE AS
RETURN(
   SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
          c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
          c.PostalCode, c.Country, c.Phone, p.ProductID,
          p.ProductName, p.UnitsInStock, p.UnitsOnOrder
   FROM   Orders o
   JOIN   [Order Details] od ON o.OrderID = od.OrderID
   JOIN   Customers c ON o.CustomerID = c.CustomerID
   JOIN   Products p ON p.ProductID = od.ProductID
   WHERE  (o.OrderID = @orderid OR @orderid IS NULL)
     AND  (o.OrderDate >= @fromdate OR @fromdate IS NULL)
     AND  (o.OrderDate <= @todate OR @todate IS NULL)
     AND  (od.UnitPrice >= @minprice OR @minprice IS NULL)
     AND  (od.UnitPrice <= @maxprice OR @maxprice IS NULL)
     AND  (o.CustomerID = @custid OR @custid IS NULL)
     AND  (c.CompanyName LIKE @custname + '%' OR @custname IS NULL)
     AND  (c.City = @city OR @city IS NULL)
     AND  (c.Region = @region OR @region IS NULL)
     AND  (c.Country = @country OR @country IS NULL)
     AND  (od.ProductID = @prodid OR @prodid IS NULL)
     AND  (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL)
)

If you invoke it as

SELECT * FROM search_orders_11fun(11000, NULL, NULL, NULL,
                                  NULL,  NULL, NULL, NULL,
                                  NULL,  NULL, NULL, NULL)
ORDER BY OrderID

And compare with the plan for

EXEC search_orders_1 @orderid = 11000

You will see that the plans are exactly the same. The same is true for these two:

SELECT * FROM search_orders_11fun(NULL, NULL, NULL, NULL,
                                  NULL, N'ALFKI', NULL, NULL,
                                  NULL, NULL, NULL, NULL)
ORDER BY OrderID

EXEC search_orders_1 @custid = N'ALFKI'

Why is this? Recall that inline-table functions are just macros, so running queries against the inline-function is just the same as running the query inside the table function directly with the parameters replaced by the actual values. (Except in one important aspect: permissions. Users only need SELECT permission on the table function; not on the tables.) So when we search for an order id, the optimizer works with this WHERE clause:

   WHERE  (o.OrderID = 11000 OR 11000 IS NULL)
     AND  (o.OrderDate >= NULL OR NULL IS NULL)
     AND  (o.OrderDate <= NULL OR NULL IS NULL)
     ...

and the optimizer is smart enough to ignore all that NULL IS NULL. This is what the optimizer should have done, when we added OPTION (RECOMPILE) in search_order_3, but didn't.

At this point, the reader may ask: what this has to do with dynamic SQL? After all, we have a perfect solution, haven't we? Yes, but you will need to invoke it in some way, and this is where the dynamic SQL comes in. Say that were you to put this in an application, and to make it simple you do (assuming .Net and SqlClient):

 cmd = "SELECT OrderID, ..
        FROM   search_orders_11fun(@orderid, @fromdate, @todate, ...

and then you go on defining the parameters, only passing the values the user actually have provided. Now you lose. This is going to work just as poorly as search_orders_3 without WITH RECOMPILE, because now you have a parameterised query for which the plan will be cached, and the plan must work for all search conditions, so it needs a safe catch-all plan. If you add OPTION (RECOMPILE) to the query (in SQL 2005), you get the same result as calling search_orders_3 WITH RECOMPILE. Because, essentially, you have now implemented search_orders_3 in a different way.

When you invoke the function you must pass literal NULL (or DEFAULT) for the parameters where the user has not supplied any values. Thus, the query string will be different depending on user input, and what is that if not dynamic SQL?

Equipped with this knowledge, I went ahead and wrote search_orders_11, which starts off:

SELECT @sql =
       'SELECT OrderID, OrderDate, UnitPrice, Quantity,
               CustomerID, CompanyName, Address, City, Region,
               PostalCode, Country, Phone, ProductID,
               ProductName, UnitsInStock, UnitsOnOrder
        FROM   dbo.search_orders_11fun (' +

        CASE WHEN @orderid IS NOT NULL
             THEN '@orderid'
             ELSE 'DEFAULT'
        END + ', ' +

and then uses sp_executesql to invoke the function. But when I looked at the query plans, they were far more complex than the ideal plans for search_orders_1. On SQL 2000, performance was still much better than for search_orders_3, but on SQL 2005, search_orders_11 was only marginally better than search_orders_3 run WITH RECOMPILE. What is going on? If you look closely, you will see that there is a difference to the direct invocation above. When we pass an order id, the optimizer works with this:

   WHERE  (o.OrderID = @orderid OR @orderid IS NULL)
     AND  (o.OrderDate >= NULL  OR NULL IS NULL)
     AND  (o.OrderDate <= NULL  OR NULL IS NULL)
     ...

Thus, the optimizer must pick a plan that works even if @orderid is NULL - the optimizer cannot know that we will never query the function in this way. And, again, adding OPTION (RECOMPILE) does not help.

So if we want to use this function efficiently, we must expand all values into the query string – something which is normally considered bad practice for two reasons: SQL injection and query-plan reuse. In the section on dynamic SQL, I was cool on using EXEC() and interpolated parameter values, but this is precisely what we need to use here. I wrote search_orders_11a, which starts off:

SELECT @sql =
       'SELECT OrderID, OrderDate, UnitPrice, Quantity,
               CustomerID, CompanyName, Address, City, Region,
               PostalCode, Country, Phone, ProductID,
               ProductName, UnitsInStock, UnitsOnOrder
        FROM   dbo.search_orders_11fun (' +

        CASE WHEN @orderid IS NOT NULL
             THEN str(@orderid)
             ELSE 'DEFAULT'
        END + ', ' +

        CASE WHEN @fromdate IS NOT NULL
             THEN quotename(@fromdatestr, '''')
             ELSE 'DEFAULT'
        END + ', ' +
        ...  

If you did not read the section Using EXEC() very closely, you need to go back and study the detailed comments on search_orders_2 to learn about all the pitfalls with interpolating parameter values into the query string.

When I compared search_orders_11a to search_orders_1, the execution plans were the same in all cases. However, search_orders_11a is a less efficient in one aspect: since we expand input parameters, there is a compilation almost each time and there is a risk that you litter the cache with very many similar plans. To avoid this, on SQL 2005 you can use OPTION (RECOMPILE) to prevent SQL Server from putting the plans into cache at all.

On SQL 2005, there is still one trap you can fall into, even with search_orders_11a: forced parameterisation. Normally, SQL Server auto-parameterises only simple queries, like SELECT ... FROM Orders WHERE OrderID = 11000, so they enter the cache, as if they had been submitted with sp_executesql and a parameter. A query like ours from search_orders_11fun is not likely to be victim to this. But on SQL 2005, a database can be set to forced parameterisation, and with this setting SQL Server parameterises about all ad-hoc queries that comes its way. Forced parameterisation is, in my opinion, a cover-up for poorly designed applications which use queries with  interpolated parameter values. For such an application, forced parameterisation can do miracles for performance. But for search_orders_11a, forced parameterisation is bad news: it converts the query on search_orders_11fun to the query produced by search_orders_11 with sp_executesql. Since you may not know what will happen with a database in production, this is a little devilish. There two possible measures to take to prevent any sort of auto-parameterisation 1) Use OPTION (RECOMPILE). 2) Use sp_executesql and a parameter list with an unused parameter. The difference is that in the second case, the query is put into cache (which is most cases is of dubious value).

Here I have used T-SQL stored procedures to compose the dynamic SQL, but you could of course build the queries against your function in the CLR, or in the client code. The important thing to keep in mind is that to get the best performance with this method you must break the rules: use queries with interpolated parameters.

The final conclusion about this method it that it is a very serious alternative on SQL 2000 when you cannot give users direct SELECT permissions on the tables. If you can giver users this permission on the other hand, pure dynamic SQL as presented in the beginning of this article is still better, since you get better protection against SQL injection and better query-plan reuse. On SQL 2005, you can always work around the permissions issue with certificates or EXECUTE AS, but you may feel that this method gives you less hassle.

Conclusion

You have now seen several ways to implement this kind of searches, both in dynamic SQL and static SQL. You have seen that with dynamic SQL you can get the best performance, while still keeping your code maintainable. With static SQL you have to make compromises to get something which performs decently, and still is maintainable. You have also seen that for static SQL, you need to be creative, and see which tricks that are best for the search you are about to implement. And you have seen how it is possible to combine static SQL with dynamic SQL to almost get the full powers of dynamic SQL, at the cost of violating what is normally good practice.

And let me stress once more that, no matter whether you go for dynamic or static SQL, you should test your procedure for all in input parameters and preferably some combinations too, both for correctness and for performance. And to test the performance, you need data which resembles production data. If you expect to have ten million orders and 50.000 customers in production, you cannot play with a toy database at the size of Northwind or even Northgale.

Feedback and Acknowledgements

My thanks to SQL Server MVPs Steve Kass, Marcello Poletti, Simon Sabin, Adam Machanic, and Umachandar Jaychandran, Phillipp Sumi, Thomas Joseph Olaes, Simon Hayes, Sloan Holliday, Alejandro Mesa and .Net MVP Eric Garza for their very valuable suggestions. Thanks to SQL Server MVP Frank Kalis for providing a translation in German.

If you have questions or comments on the contents in the article, feel free to mail me at [email protected]. If you are working with a specific problem and need help, you can mail me too. However, I would encourage you in such case to post your question on a public forum for SQL Server, as there are more people who can answer your questions. (And you may get your answer more rapidly!)

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The generation of random numbers is too important to be left to chance.” - Robert R. Coveyou