User-defined SQL Server operations

This article was originally published in VSJ, which is now part of Developer Fusion.
SQL Server's Transact-SQL includes a number of functions that cover the generalities of data manipulation, but what happens when you want to go beyond the pre-provided functions? One answer is to write your own user-defined functions (UDFs). Currently, this option is only available if you're using SQL Server 2000, and there are a number of potential problems that you need to be aware of. Despite this, UDFs can increase your control over your data, and as long as you know the restrictions, you can work around them.

In general, if you want to work on the data within a SQL Server database, you have two choices. You can write code in another development environment, and enter your SQL commands as text containing SQL statements, then use some method to pass those SQL statements through for SQL Server to carry out. Alternatively, you can write directly in the programming language of SQL Server – Transact-SQL. Within Transact-SQL, you write code in procedures, and these procedures are stored as part of the database, hence their name of stored procedures.

Once created, other applications can make use of stored procedures, so instead of writing a complex SQL statement as embedded text in Visual Basic, your VB program can just tell SQL Server to run a stored procedure.

A UDF is nothing more than a Transact-SQL module that, like any function, takes a number of parameters and returns a value. The function exists as a permanent object in your database. In some ways a UDF is more limited than a stored procedure. It doesn't support output parameters, nor can you use it to make modifications to the database or move database cursors, apart from those that are local to the function.

So why would you choose to use a UDF rather than a stored procedure? The main reason is that you can call the UDF from within a SQL statement, so you can call it from a query, for example. You can also call UDFs using an Execute statement, in the same way you would call a stored procedure. Just to confuse the issue, SQL Server 2000 includes a number of built-in UDFs.

SQL Server 2000 supports two main types of UDFs – scalar and table. Scalar functions return a single answer whenever the function is called. A table function returns its answer in the form of what is effectively a temporary table that can then be used wherever a 'real' table is allowed. You might create a UDF that returns the employee ID and name for all the employees in a department, where the department code is supplied as the function's parameter.

Getting started

No matter which type of function you want, you create it in roughly the same way, using Transact SQL's Create Function statement. Here's a function that returns the text Hello World:

CREATE FUNCTION HiWorld ()
RETURNS varchar(12)
AS
BEGIN
	declare @retmess varchar(12)
	set @retmess='Hello world'
	RETURN @retmess
END
This should be reasonably self-evident. SQL expects its variables to be typed, so we've got a variable length character variable called @retmess. You have to tell Transact-SQL the type that the function will return, then give the body of the function, surrounded by BEGIN and END.

If you're not used to programming directly within SQL Server, you first you need to know what software you can use to write your Transact SQL UDF. First, the bad news. SQL Server 2000 will only let you write your UDFs in Transact SQL. So you can't nip off and write them from the Visual Basic development environment, for example. When Yukon (the next version of SQL Server) arrives things will probably be more civilised, but future goodies don't really help those of us programming right now.

If you're writing a lot of Transact SQL, you'll probably have invested in a suitable editor. If not, for the moment your best choice will be to use the SQL Server Enterprise Manager. Make sure you're viewing the database you want to add the UDF to (the sample database Pubs is the one I used). Highlight the User-Defined Functions item, right click, and choose 'New User Defined Function'. This gets you into an editor where you can fill in the blanks in a function that already has the basic command structure. It also has the advantage that you get a syntax checker.

Unfortunately, this function as it stands isn't particularly useful. It returns the string value Hello World (honest). It's just you don't see anything if you run it.

The most immediate way to see what's happening is to run the SQL Query Analyser (from the Enterprise Manager's Tools menu).

Screenshot
Testing the query in SQL Server's Query Analyser (Click to enlarge)

Make sure the results pane is visible (from the Window menu). Then, in the query pane, type:

Use Pubs
Print dbo.hiworld()
Press the f5 key to update the results, and you'll see the text 'Hello world' appear. The dbo preface to the name tells SQL Server the owner of the function, and is sometimes optional – if you hit problems by referencing a function that should work but doesn't, check to see who is referenced as the owner in SQL Server.

So how do you use a user defined function? There are various ways, but one of the most common is from a stored procedure. You can create a stored procedure in a similar way to creating a user defined function – by choosing the Stored Procedures entry in the database within SQL Server Enterprise Manager, right clicking, and choosing New Stored Procedure.

CREATE PROCEDURE testudf AS
print dbo.hiworld()
GO
This just prints the results returned by the user defined function hiworld. You can test this from the query analyser:
Use pubs
Exec dbo.testudf
Of course, most of the UDFs you're likely to want to write will do a lot more sophisticated things than return a text message, but at least you've written and run one. I don't advise being too over-enthusiastic about this victory with your colleagues, as they're likely to remind you that you used to be able to do the same thing in Qbasic in a single line, and that Qbasic cost nothing. Using £10,000 of database server and several different interfaces to get the same result isn't likely to be seen as a great advance in technology use.

Most functions are likely to take the values of the input parameters, manipulate them in some way, then output the results. Here is another user-defined function that takes the length and width of a rectangle and prints the area:

CREATE FUNCTION rectangle
(@length integer, @width integer)
RETURNS integer AS
BEGIN
	return (@length * @width)
END
And here is a stored procedure to find out the area of a rectangle 3 by 4:
CREATE PROCEDURE testudf AS
print dbo.rectangle(3,4)
GO
Most UDFs are used to return not a single value, though, but a temporary table. Here's a UDF that returns all the rows from the Sales table in Pubs for the book with a particular title ID:
CREATE FUNCTION getbook
(@titleID varchar (10) )
RETURNS TABLE AS
RETURN (
	SELECT * FROM dbo.Sales
	WHERE Title_ID= @TitleID
	)
This is what is known as an inline table function, hence the rather different structure of the body of the function without a begin or end. You can test this function from the SQL Query Analyser by typing:
USE pubs
SELECT * FROM dbo.getbook('BU1032')
In the results pane, you should see two whole lines of data returned.

So how can you make use of the UDF from your Visual Basic (or similar) application? The first step is to ensure your project has a connection to the database you're interested in. You can either do this using Visual Basic's data connection tools, or you can hard-code the connection into your program. Make sure you have the most recent ADO library included in your project references.

You can set up links from Visual Basic's Data View window (available from the View menu). This shows the current data links and data environment connections in your project, and you can also add new data links. The provider you want (for SQL Server) is OLE DB for SQL Server. Attach to the Pubs database, and you should be able to expand the data link to show the current contents of the database, including the stored procedures, though not the UDFs.

However, including the connection information in your code gives the most complete picture, so that's the method I'll use.

First of all, the dumb way – with a hard coded parameter to show the store IDs selling the book with the ID BU1032:

Private Sub Command1_Click()
	Dim cn As New ADODB.Connection
	Dim cmd As New ADODB.Command
	Dim rs As ADODB.Recordset
	cn.Open "Provider=sqloledb;" & _
		"Data Source=thorn;" & _
		"Initial Catalog=pubs;" & _
		"User Id=kay;Password=;"

	cmd.ActiveConnection = cn
	cmd.CommandText = "select * & _
		from dbo.getbook('BU1032')"
	Set rs = cmd.Execute()
	rs.MoveFirst
	Do
		MsgBox (rs!stor_id)
		rs.MoveNext
	Loop Until rs.EOF
	rs.Close
	cn.Close
End Sub
This is attached to a command button on a form.

Screenshot
The results in Visual Basic (Click to enlarge)

You should be able to recognise most elements. We're using ADO (nice and efficient as a data retrieval method), and the connection string is hard coded to use the server called Thorn, Pubs as the catalog, and the login ID of Kay. The command text is identical to that entered in the query analyser. To show you the data, I've used a message box that prints the store ID – with just two rows returned, this isn't going to be too time-consuming.

One minor point to notice is the use of the Do.. Loop to move through the records, rather than the alternative of using a loop along the lines of:

Numrec=rs.recordcount
For rcount=1 to numrec
What you need to know is that recordcount isn't a supported method when you're retrieving data from SQL. The reason is that SQL will pass records across in batches, so there's no way for VB to know how many records you'll eventually get. Instead, recordcount will rather unhelpfully return –1 as a value.

Having a function that can't take a value isn't a lot of good. Instead, we'll use a text box called gettitle where the user can enter the title ID they're interested in:

strcomm = "select * from" & _
	dbo.getbook('"
strcomm = strcomm & gettitle.Text & _
	"')"
cmd.CommandText = strcomm
Using scalar functions in VB is equally possible, though you do have to use VB's recordset in a way that seems a bit strange. The command:
Select something as colref
Creates a column in your recordset named colref. So if you have the following code, you get a recordset consisting of a single record, with a single column named mycol:
cmd.CommandText = & _
	"select dbo.rectangle(4,3)" & _
	as mycol"
Set rs = cmd.Execute()

rs.MoveFirst
Do
	MsgBox (rs!mycol)
	rs.MoveNext
Loop Until rs.EOF
This gets you a message box with the value 12. Add a couple of text boxes to your form, and use those for entries, and your function can be a bit more general.
cmd.CommandText = "select " & _
	"dbo.rectangle("
strcomm=strcomm & getlength.text & ","
strcomm=strcomm & getwidth.text & _
	") as mycol"

Performance issues

UDFs are a great idea, and once you've seen how easy they are to write and use, it's tempting to see them as the solution to all sorts of problems, but there are some performance issues to watch out for, particularly in the case of scalar UDFs. The reason is that in SQL Server, they emulate cursor type processing.

In other words, they work on a row-by-row basis, and SQL Server is optimised for working with sets such as tables or relations. Table functions provide a much better performance, so it's always worth seeing whether you can write your scalar UDF as a table function instead.

Having said that, it's worth remembering that the performance hit you get will depend on the size of your result set and the type of thing you're trying to calculate. If you're including a UDF in a Select statement that returns a few thousand rows, the performance overhead is hardly going to be a killer. If you're going to get a huge result set, you will notice the difference. Where you use your UDF will also have an effect – if it has to go in the Where clause of your Select, it will be used against the entire input data, so will also slow things down noticeably.

Table Functions

SQL Server 2000 supports two types of table functions – inline and multi-statement. In an inline function, you have a single Select statement for the body of the function, and you leave SQL to work out the details of the columns and the data types of the table. In a multi-statement function, you give the names of the columns and their data types. You also have the option of specifying a table alias. So our getbook function could be rewritten as:

CREATE FUNCTION getbuk
	(@titleID varchar (10) )
RETURNS @newbk TABLE
	(stor_id char(4)
	)
AS
BEGIN
	INSERT @newbk
	SELECT stor_id FROM dbo.sales
	WHERE Title_ID= @TitleID
	RETURN
END

Screenshot
Creating a UDF in SQL Server's Enterprise Server Manager (Click to enlarge)

While multi-line functions can offer more options for processing data, inline UDFs are nearly always more efficient than the same function written as a multi-statement UDF, so it's worth trying to make sure you write your table UDF as an inline function.

Conclusion

SQL Server's user defined functions aren't necessarily something you'd use every day, but you should remember they exist, because once in a while, they can rescue you from horrendous coding problems. They're more actively supported in Visual Studio .NET, and it's almost certain that the next release of SQL Server will remove some of their shortcomings.


Kay Ewbank, editor of Enterprise Server Magazine, is a highly experienced database analyst who has followed the development of database technology from dBase through to today's SQL servers.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The greatest performance improvement of all is when a system goes from not-working to working.” - John Ousterhout