In this section will we look at solutions that uses both static and dynamic SQL. The idea is that you put all references to the tables inside a view or a table-valued function, and then use dynamic SQL to access the view or function. In this case users do not need SELECT permissions on the tables, only on the view or the function.
Of these two, using an inline-function is probably the most useful, but I will start with views, because... I'll tell you later.
Using Views
This is based on a suggestion I got from .Net MVP Eric Garza. The idea is that you define a view that exposes the data that the users are permitted to see, and then grant them SELECT access on the view instead.
From this, I defined the view search_order_view:
CREATE VIEW search_order_view AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID go
Then I wrote search_orders_8, which is very similar to search_order_1, but where I changed lines 18-28 to read:
SELECT @sql = 'SELECT * FROM dbo.search_order_view WHERE 1 = 1 '
This does not change performance, because SQL Server does not save any query plan for a view. Rather SQL Server expands the view into the query as it was a macro, so basically you get the original query from search_orders_1.
Rather than defining a view, I could have defined a inline table function. Just like views, inline table functions are macros that are expanded into the query plan. You could think of them as parameterised views.
In our example scenario, search_orders_view is kind of pointless, because even if you deprive the users SELECT access to Orders, Order Details, Customers and Products, the users would still have access to all data. However, consider the case that you cannot give SELECT access to the users, because users are only allowed to see data for certain customers, and that this is defined in a table called UserAcceessCustomers:
CREATE TABLE UserAccessCustomers (UserName sysname NOT NULL, CustomerID nchar(10) NOT NULL, CONSTRAINT pk_uca PRIMARY KEY (UserName, CustomerID))
UserName here is simply the same as the user name in sysusers in the database. In such case you can write the view as:
CREATE VIEW search_order_view AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN UsersAccessCustomers u ON u.CustomerID = c.CustomerID AND u.CustomerID = o.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE u.UserName = USER
USER on the last line is a built-in
Note: the above is how you normally implement row-level security in SQL Server. SQL Server MVP Steve Kass has found that it still is possible for a user to find information about data he is not supposed to see. As I mentioned above, a view is just a kind of macro, and the query optimizer is free to evaluate the tables in the expanded query in any order, something a non-privileged user can use to his advantage. It is not the case that he can access the data directly, but he can draw conclusions by looking at query plans, error messages etc. The attack requires a bit of skill, but if it is critical that users do not get the smallest glimpse of data they have no rights to see, you should not grant them SELECT permissions to views or inline functions like the one above. (A multi-step function has a fixed query plan of its own, so it should be safe from this attack. But then SQL Server would also have to materialise, if only in a table variable, the complete set of data that the user is entitled to see, which is likely to affect performance. And it would be pointless to combine a multi-step function with dynamic SQL.)
Eric Garza arrived at this solution from another angle: he wanted to deny users access on big tables so they could not submit queries that would scan the tables and block other users. Instead he provided a number of user-defined functions, where the users are required to specify certain input parameters, so whatever queries they submitted, he could be sure that SQL Server would use indexes.
Using Inline Table Functions
We now arrive at the solution that is the most challenging for me as an author, because while this method provides a solution to using dynamic SQL without direct SELECT permissions on the tables, it also requires you to have full understanding of how it works and why it works. Else you could end up implement it in a way that is not any more effective than purely static solutions. And to get the full power from this solution, you will need to break the rules.
This solution works on SQL 2000 and SQL 2005, but not SQL 7 which does not have user-defined functions.
This method was discovered by SQL Server MVP Marcello Poletti. He found that if he put a query like the one in search_orders_3 in an inline-table function, and then ran various SELECT statements against it, he would always get the optimal plan, as if he had used dynamic SQL. To get this in our order-search perspective, let's look at this function:
CREATE FUNCTION search_orders_11fun ( @orderid int = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @minprice money = NULL, @maxprice money = NULL, @custid nchar(5) = NULL, @custname nvarchar(40) = NULL, @city nvarchar(15) = NULL, @region nvarchar(15) = NULL, @country nvarchar(15) = NULL, @prodid int = NULL, @prodname nvarchar(40) = NULL) RETURNS TABLE AS RETURN( SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE (o.OrderID = @orderid OR @orderid IS NULL) AND (o.OrderDate >= @fromdate OR @fromdate IS NULL) AND (o.OrderDate <= @todate OR @todate IS NULL) AND (od.UnitPrice >= @minprice OR @minprice IS NULL) AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL) AND (o.CustomerID = @custid OR @custid IS NULL) AND (c.CompanyName LIKE @custname + '%' OR @custname IS NULL) AND (c.City = @city OR @city IS NULL) AND (c.Region = @region OR @region IS NULL) AND (c.Country = @country OR @country IS NULL) AND (od.ProductID = @prodid OR @prodid IS NULL) AND (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL) )
If you invoke it as
SELECT * FROM search_orders_11fun(11000, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL) ORDER BY OrderID
And compare with the plan for
EXEC search_orders_1 @orderid = 11000
You will see that the plans are exactly the same. The same is true for these two:
SELECT * FROM search_orders_11fun(NULL, NULL, NULL, NULL, NULL, N'ALFKI', NULL, NULL, NULL, NULL, NULL, NULL) ORDER BY OrderID EXEC search_orders_1 @custid = N'ALFKI'
Why is this? Recall that inline-table functions are just macros, so running queries against the inline-function is just the same as running the query inside the table function directly with the parameters replaced by the actual values. (Except in one important aspect: permissions. Users only need SELECT permission on the table function; not on the tables.) So when we search for an order id, the optimizer works with this WHERE clause:
WHERE (o.OrderID = 11000 OR 11000 IS NULL) AND (o.OrderDate >= NULL OR NULL IS NULL) AND (o.OrderDate <= NULL OR NULL IS NULL) ...
and the optimizer is smart enough to ignore all that NULL IS NULL. This is what the optimizer should have done, when we added OPTION (RECOMPILE) in search_order_3, but didn't.
At this point, the reader may ask: what this has to do with dynamic SQL? After all, we have a perfect solution, haven't we? Yes, but you will need to invoke it in some way, and this is where the dynamic SQL comes in. Say that were you to put this in an application, and to make it simple you do (assuming .Net and SqlClient):
cmd = "SELECT OrderID, .. FROM search_orders_11fun(@orderid, @fromdate, @todate, ...
and then you go on defining the parameters, only passing the values the user actually have provided. Now you lose. This is going to work just as poorly as search_orders_3 without WITH RECOMPILE, because now you have a parameterised query for which the plan will be cached, and the plan must work for all search conditions, so it needs a safe catch-all plan. If you add OPTION (RECOMPILE) to the query (in SQL 2005), you get the same result as calling search_orders_3 WITH RECOMPILE. Because, essentially, you have now implemented search_orders_3 in a different way.
When you invoke the function you must pass literal NULL (or DEFAULT) for the parameters where the user has not supplied any values. Thus, the query string will be different depending on user input, and what is that if not dynamic SQL?
Equipped with this knowledge, I went ahead and wrote search_orders_11, which starts off:
SELECT @sql = 'SELECT OrderID, OrderDate, UnitPrice, Quantity, CustomerID, CompanyName, Address, City, Region, PostalCode, Country, Phone, ProductID, ProductName, UnitsInStock, UnitsOnOrder FROM dbo.search_orders_11fun (' + CASE WHEN @orderid IS NOT NULL THEN '@orderid' ELSE 'DEFAULT' END + ', ' +
and then uses sp_executesql to invoke the function. But when I looked at the query plans, they were far more complex than the ideal plans for search_orders_1. On SQL 2000, performance was still much better than for search_orders_3, but on SQL 2005, search_orders_11 was only marginally better than search_orders_3 run WITH RECOMPILE. What is going on? If you look closely, you will see that there is a difference to the direct invocation above. When we pass an order id, the optimizer works with this:
WHERE (o.OrderID = @orderid OR @orderid IS NULL) AND (o.OrderDate >= NULL OR NULL IS NULL) AND (o.OrderDate <= NULL OR NULL IS NULL) ...
Thus, the optimizer must pick a plan that works even if @orderid is NULL - the optimizer cannot know that we will never query the function in this way. And, again, adding OPTION (RECOMPILE) does not help.
So if we want to use this function efficiently, we must expand all values into the query string – something which is normally considered bad practice for two reasons: SQL injection and query-plan reuse. In the section on dynamic SQL, I was cool on using
SELECT @sql = 'SELECT OrderID, OrderDate, UnitPrice, Quantity, CustomerID, CompanyName, Address, City, Region, PostalCode, Country, Phone, ProductID, ProductName, UnitsInStock, UnitsOnOrder FROM dbo.search_orders_11fun (' + CASE WHEN @orderid IS NOT NULL THEN str(@orderid) ELSE 'DEFAULT' END + ', ' + CASE WHEN @fromdate IS NOT NULL THEN quotename(@fromdatestr, '''') ELSE 'DEFAULT' END + ', ' + ...
If you did not read the section Using
When I compared search_orders_11a to search_orders_1, the execution plans were the same in all cases. However, search_orders_11a is a less efficient in one aspect: since we expand input parameters, there is a compilation almost each time and there is a risk that you litter the cache with very many similar plans. To avoid this, on SQL 2005 you can use OPTION (RECOMPILE) to prevent SQL Server from putting the plans into cache at all.
On SQL 2005, there is still one trap you can fall into, even with search_orders_11a: forced parameterisation. Normally, SQL Server auto-parameterises only simple queries, like SELECT ... FROM Orders WHERE OrderID = 11000
, so they enter the cache, as if they had been submitted with sp_executesql and a parameter. A query like ours from search_orders_11fun is not likely to be victim to this. But on SQL 2005, a database can be set to forced parameterisation, and with this setting SQL Server parameterises about all ad-hoc queries that comes its way. Forced parameterisation is, in my opinion, a cover-up for poorly designed applications which use queries with interpolated parameter values. For such an application, forced parameterisation can do miracles for performance. But for search_orders_11a, forced parameterisation is bad news: it converts the query on search_orders_11fun to the query produced by search_orders_11 with sp_executesql. Since you may not know what will happen with a database in production, this is a little devilish. There two possible measures to take to prevent any sort of auto-parameterisation 1) Use OPTION (RECOMPILE). 2) Use sp_executesql and a parameter list with an unused parameter. The difference is that in the second case, the query is put into cache (which is most cases is of dubious value).
Here I have used
The final conclusion about this method it that it is a very serious alternative on SQL 2000 when you cannot give users direct SELECT permissions on the tables. If you can giver users this permission on the other hand, pure dynamic SQL as presented in the beginning of this article is still better, since you get better protection against SQL injection and better query-plan reuse. On SQL 2005, you can always work around the permissions issue with certificates or EXECUTE AS, but you may feel that this method gives you less hassle.
Conclusion
You have now seen several ways to implement this kind of searches, both in dynamic SQL and static SQL. You have seen that with dynamic SQL you can get the best performance, while still keeping your code maintainable. With static SQL you have to make compromises to get something which performs decently, and still is maintainable. You have also seen that for static SQL, you need to be creative, and see which tricks that are best for the search you are about to implement. And you have seen how it is possible to combine static SQL with dynamic SQL to almost get the full powers of dynamic SQL, at the cost of violating what is normally good practice.
And let me stress once more that, no matter whether you go for dynamic or static SQL, you should test your procedure for all in input parameters and preferably some combinations too, both for correctness and for performance. And to test the performance, you need data which resembles production data. If you expect to have ten million orders and 50.000 customers in production, you cannot play with a toy database at the size of Northwind or even Northgale.
Feedback and Acknowledgements
My thanks to SQL Server MVPs Steve Kass, Marcello Poletti, Simon Sabin, Adam Machanic, and Umachandar Jaychandran, Phillipp Sumi, Thomas Joseph Olaes, Simon Hayes, Sloan Holliday, Alejandro Mesa and .Net MVP Eric Garza for their very valuable suggestions. Thanks to SQL Server MVP Frank Kalis for providing a translation in German.
If you have questions or comments on the contents in the article, feel free to mail me at [email protected]. If you are working with a specific problem and need help, you can mail me too. However, I would encourage you in such case to post your question on a public forum for SQL Server, as there are more people who can answer your questions. (And you may get your answer more rapidly!)
Comments