This article was originally published at http://www.sommarskog.se/dyn-search.html and was republished here with permission.
A very common requirement in an information system is to have a function (or several functions) where the users are able to search the data by selecting freely among many possible criterias. This is a tough challenge, because not only must you produce the desired output, but you must also keep the response time within acceptable limits, at least for common searches. And on top of all, the code must be maintainable, so you can meet new needs and requirements.
In this text I will look at various techniques to solve this problem. There are two main alternatives: dynamic SQL and static SQL. There are also hybrid solutions that use both. As soon as the number of possible search conditions is more than just a handful, dynamic SQL is the most effective solution, in terms of performance, development and maintenance. On SQL 2000 and earlier, to use dynamic SQL in its pure form. you need to give the users direct SELECT permissions to the involved tables, and this is far from always permissible. SQL 2005 permit ways around this issue.
I will first look at using dynamic SQL, and try to point out some traps that you should avoid. I will then cover techniques for static SQL to give you a set of methods and tricks that you can combine to implement your search functions. Finally, I will present two hybrid solutions that use both dynamic and static SQL and in this way evades the permissions problem.
This text applies to all versions of SQL Server from SQL 7 and on.
Throughout this text, we will work with implementing a stored procedure that retrieves information about orders in the Northwind and Northgale databases, as well as the customers and the products of the orders. Northwind is a sample database that comes with SQL 7 and SQL 2000. Northwind does not ship with SQL 2005, but you can retrieve it from Microsoft's web site. Northgale is a bigger database that I have derived from Northwind, more about it below.
This is the interface that we expose to the user (or more probably to a GUI or middle-layer programmer):
CREATE PROCEDURE search_orders @orderid int = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @minprice money = NULL, @maxprice money = NULL, @custid nchar(5) = NULL, @custname nvarchar(40) = NULL, @city nvarchar(15) = NULL, @region nvarchar(15) = NULL, @country nvarchar(15) = NULL, @prodid int = NULL, @prodname nvarchar(40) = NULL AS SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, c.CustomerID, c.CompanyName, c.Address, c.City, c.Region, c.PostalCode, c.Country, c.Phone, p.ProductID, p.ProductName, p.UnitsInStock, p.UnitsOnOrder FROM Orders o JOIN [Order Details] od ON o.OrderID = od.OrderID JOIN Customers c ON o.CustomerID = c.CustomerID JOIN Products p ON p.ProductID = od.ProductID WHERE ??? ORDER BY o.OrderID
You see in the SELECT list what information the user gets. Here is a specification of the parameters:
|@orderid||Retrieve this order only.|
|@fromdate||Retrieve orders made on @fromdate or later.|
|@todate||Retrieve orders made on @todate or earlier.|
|@minprice||Retrieve only order details that cost at least @minprice.|
|@maxprice||Retrieve only order details that cost at most @maxprice.|
|@custid||Retrieve only orders from this customer.|
|@custname||Retrieve only orders from customers whose name starts with @custname.|
|@city||Retrieve only orders from customers in this city.|
|@region||Retrieve only orders from customers in this region.|
|@country||Retrieve only orders from customers in this country.|
|@prodid||Retrieve only order details with this product.|
|@prodname||Retrieve only order details with a product starting with @prodname.|
If customer leaves out a search condition, that search condition should not apply to the search. Thus a plain
EXEC search_orders should return all orders in the database.
This example is fairly simple, since each condition can be implemented with a single condition using =, <=, >= or LIKE. In many real-life situations you have more conditions that affects which tables or columns you need to access. I have purposely left out such conditions from search_orders, in order to keep focus on the main issues and I only touch more complex conditions in a few places. My hope is that when you have read this article, you will have a better ground to stand on so that you will be able to tackle the more complex conditions you may run into on your own.
Sometimes you may want to search for a list of values, like @prodid = '14,56,76'. The way to deal with this is to unpack the list with table-valued function. This is not something I cover here, as I have an article Arrays and Lists in SQL Server on my web site, that is devoted to this topic.
It is not uncommon that there is a requirement for the user to be able to specify how the output is to be sorted. Since this article is not long enough, I will not dwell on this topic more than this brief note. If you are using dynamic SQL, you would of course build the ORDER BY clause dynamically as well.
In the text I discuss a number different implementations of search_orders, unimaginatively named search_orders_1 etc. Some of them are included in whole in this text, others only in parts. All are available in the dynsearch directory on my web site. (The numbering of the procedure is somewhat out of order with the text. Originally there were seven of them, but as the article have evolved by time I've added more, but I have not renumbered existing procedures.)
To know what I'm talking about when it comes to performance, I have benchmarked all my search procedures for six different search conditions. While I'm referring to my results in this article, I am not disclosing any numbers, because they are not generally applicable. How good or bad a solution is in a certain situation depends on available indexes, data distribution etc. Throughout this article I will stress that if you care about performance, it is necessary that you run your own benchmarks with production-size data. This applies not the least if you are constrained to static SQL only..
The Northwind database is very small, so it is very difficult to draw conclusions about the performance from queries in this database. A query that one would expect to use an index, may in fact use a scan, simply because at those small volumes, the scan is cheaper.
For this reason I have composed the Northgale database. Northgale has the same tables and indexes as Northwind, but I have exploded the data so that instead of 830 orders, there are 344035 of them. To install Northgale, run Northgale.sql. The data in Northgale is taken from Northwind, so this database must be in place. To install Northgale, you need 4.6 GB of disk space on SQL 2005 and 2.6 GB on SQL 2000. Once installed, the database takes up 520 MB. (Log space needed when inserting the data is reclaimed at the end of the script.) By default, the database is installed in the same directory as the master database, but you can edit the script to change that.
I've exploded the data is by cross-joining the tables. For IDs, I've composed combinations, so there is for instance now 6640 customers rather than 91 as in the original Northwind. I have also generated new customer and product names by permuting the parts of the names. However, I have not created new cities, countries or regions, so a search on a city alone can yield very many hits.
Keep in mind that Northgale too is a small database by today's standards. For instance, it easily fits into cache entirely on reasonably equipped server. A poorly written query that requires a scan of, say, the Orders table, still returns within a few seconds. It's hopefully big enough to give a sense for how good or bad different solutions are, but I would advise you to not draw far-reaching conclusions. It is also worth pointing out that the way the database was composed, the distribution of data is a bit skewed
When you implement a dynamic search function, you should always benchmark your solution with your production database, as well inspect query plans for common search cases.