What is a lock?The reason databases need locks is to handle circumstances where two people want to work with the same database record at the same time. If two people try to change the same record, how can you decide who’s changes should be saved? Similarly, if one user tries to see the information in a record while another user is changing it, the ‘wrong’ information might be viewed. So databases lock records.
TransactionsWhether you know it or not, actions that you carry out on your databases are transactions. A transaction is a complete action on a database. It can be small and simple – an UPDATE statement, for example – or it can be more complex. When you want a more complex set of actions, the transaction is started with the statement:
BEGIN TRANSACTION…and ended by:
COMMITNo matter how complex, whenever you ask SQL to carry out a transaction, a lock is applied, and it is maintained throughout the time the transaction is happening. This is fine when you have a transaction that finishes in a reasonable time, but if you have a long transaction, the recordset that the transaction refers to will remain locked for the full time that the transaction is active. The idea behind transactions and locking is to ensure that the application always receives valid data in multi-user applications.
When you open most non-database files, you lock the entire file for the whole time it’s open. You could do this with a database, but it generally only happens when major structural work is going on. More generally, you would lock a table, a page, or a row. There is also a concept called an Extent that consists of eight pages. There are two main lock types – exclusive and shared – along with several less common types. You don’t need to worry about the most common types of lock so long as your application is working as required; SQL Server will work out the type of lock you need based on the SQL statements that you use, either directly in your application or by calling a stored procedure.
If your application asks to update, insert or delete data, SQL Server will issue an exclusive lock. The problem with exclusive locks is that while the transaction is running, any other requests to view the data such as a report will fail. A shared lock is the type that is issued when your application wants to view data. If you have a Select statement, you’ll get a Shared lock. This means that any other request to view the data will be accepted, but no requests for modifying data will work. Shared locks are one area where you can improve the performance of your application by controlling the locks.
Update locksThe next most common lock type is an Update lock. This is used when a transaction starts with a Select, but follows on with a request to change some data. For example, if you have a SQL statement such as:
UPDATE something WHERE condition…what happens is that the WHERE condition is evaluated first, and that only needs a shared lock. But as soon as the WHERE condition is evaluated, the UPDATE needs an exclusive lock. So a different type of lock called an Update lock is applied. This is more or less equivalent to an exclusive lock, because only one transaction at a time can obtain an Update lock.
The way that SQL Server works is to monitor what locks are in use, and adjust them to get the best performance. Before we start modifying the locking mechanisms, you need to know more about the ways that SQL Server attempts to maximise performance. Essentially, SQL Server uses the smallest locking type it can, so will start off using row level locks if possible. If your application asks for more records than would be sensible for row level locking, SQL Server changes up to a page level lock, and so on up all the way up through the lock levels.
This is an area where the way you write your code can have dramatic effects on the performance of SQL Server for other users accessing the database at the same time as your application. If you write a transaction that is going to pull in thousands of rows at once, SQL Server will escalate your locking level and other users won’t be able to access any of those rows. If you want to carry out some operation that is going to have an effect on an entire table, see if there’s a way to write your code so it breaks the operation into several smaller chunks. If you want to alter the discount level for every single customer from customer number 000001 to 999999, at least do it through customers 000001 to 100000, then 100001 to 200000 and so on. It would make more sense to pull in 1000 customers at a time – it wouldn’t alter your transaction time that much, but it would leave other customers available for access. Another thing to think about is that if you have a query that takes a long time, at least look into the possibility of adding an index to reduce table scans where the entire table is checked through for a particular value.
Controlling lockingWhile SQL Server is pretty good in general at modifying the locking levels and types as required, there are cases where you might want to take control over the locking yourself. The main situation where this likely is where you are using your server for more than one role at once. There’s no doubt that in an ideal world, you’d have separate servers taking orders and creating reports. That way, your reports and long complex queries could issue shared locks on all the tables in your database without stopping your order processing system in its tracks. But we don’t all work in ideal circumstances, and many of us work with servers that have to fulfil multiple roles. In this case, SQL Server can’t necessarily optimise locking, because it can’t know whether the order processing is more important, or whether a particular report is required immediately so has higher priority.
The way that you modify how SQL Server will issue locks is by making use of locking hints. This rather odd terminology refers to a set of keywords that you can add to your SQL statements to tell SQL Server what type of lock you’d like it to use. While the name hint suggests that it’s only a suggestion, in fact if you add a locking hint it overrides the current transaction isolation level for the session.
SQL Server’s default lock hint type is called Read Committed. Essentially, this is designed to return only data that is completely accurate. This means that if any transactions that might change data are in process, the Read Committed transaction will be blocked until the transaction that changes data completes.
This is just what you want if you really need accuracy. If you’re producing a report showing the products ordered by a particular customer, you want to know that you’re seeing exactly what they’ve bought. However, a lot of reports are really designed to show a more general picture. If you want to see a report showing average sales per day over the last month, one or two sales that are in process when the report is running might be important, but you might want the option of taking an executive decision to ignore current transactions and live with the version currently in the tables. After all, reports showing ongoing counts are likely to be run on a fairly regular basis so the data will show up next time. The way you can achieve this is to make use of SQL Server’s Read Uncommitted lock hint. This is also known as No Lock, and when you issue a No Lock command, it returns all the rows, even those that are locked using an exclusive lock. No Lock is the fastest possible option because there’s no way it can be held up no matter what else is happening. You specify No Lock using a With command on your Select. For example:
SELECT lastname FROM contacts WITH (NOLOCK)There’s another advantage to No Lock – it doesn’t hold up your real time transactions because it doesn’t issue a Shared Lock when reading records. That means that any user attempting to edit or add records will be able to go ahead while your report is running. This is particularly useful if your database is running on a dual use system where input and reporting is happening on the same system. There’s nothing more irritating to users than having their new records blocked because the marketing manager is running his favourite report.
The next useful locking hint you might want to use is READ PAST. Whereas No Lock ignores any locks, Read Past skips locked rows. It doesn’t wait to try to read them, but it doesn’t include their data either. This can be just as useful as No Lock depending on the way your reports work. If you want only data that has been committed, and would like to miss out any data that is currently being changed, use Read Past:
SELECT lastname FROM contacts WITH (READPAST)The things to note about Read Past are firstly that while it skips row locks, it will wait for anything locked using a page lock. It’s also more obtrusive than No Lock because it puts a shared lock on the data it reads, so will block transactions that are attempting to change data.
SQL Server 2005 – SnapshotsSo far, we’ve had to make a compromise between speed and accuracy. In SQL Server 2005, this compromise can be avoided by using the new Snapshot isolation feature.
SQL Server 2005’s snapshot facility works at the database level and provides a backup and restore method for pending transactions. However, as database developers you can use it to overcome the problems caused by the need to lock data that is being edited.
The way snapshots work is that you tell SQL Server that you want the database to allow snapshots. From that point onwards, when a change to the database is initiated, the original version of the page is stored in a separate copy in a system database. This means that if the wrong change is made, the copy containing the original data can be restored over the changed data.
The disadvantage of the snapshot option is that unlike the methods we’ve looked at so far, it requires a change in the way the overall database is run, so you’ll have to negotiate with the database administrator. However, it offers so many advantages that you should be able to show that it is worthwhile.
While the snapshot is there mainly as a low-overhead backup, it can be used alongside the original database as the data source for reports and queries. Obviously, you don’t make changes to the snapshot. So when you run a report, you run it on the main database, but if you hit a record that’s being changed by another transaction, you use the data from the snapshot. This means you can be sure there’ll never be any locking conflicts with other users changing data. It has the added advantage that because you don’t issue a shared lock, your reports won’t be blocking other users from entering data.
SQL Server’s database snapshot facility actually relies on a technology called row level versioning (RLV that’s new to SQL Server but was already in Oracle). RLV is required for the snapshots, for Multiple Active Result Sets (MARS), and for the online index rebuild feature. The way RLV works is that when a request comes in that data will be changed, a copy of the current data is written to a temporary store in a system database. If the transaction is committed, the temporary data is deleted. If the transaction is cancelled and rolled back, the data from the temporary store is copied over the changed data. RLV improves transaction performance because a transaction asking to read the data being modified can use the data in the temporary store rather than waiting until the transaction modifying the data has completed.
Because of this new technology, SQL Server 2005 adds several new lock hint types, the first of which is Read Committed Snapshot. This gets enabled for the entire database in a single statement:
ALTER DATABASE vsjdb SET READ_COMMITTED_SNAPSHOT ON…which means it’s easy to test to find out whether the performance of your application improves.
Read Committed Snapshot is different from Read Committed because it reads the last committed value for the rows from the snapshot copy rather than the main database. This avoids blocking statements in other apps or from users who want to write data.
One thing that you need to be aware of if you’re using Read Committed Snapshot is that when you have transactions with multiple statements, the transaction checks for committed data in the snapshot at the start of each statement. So if, for instance, you had a transaction that included several SELECT statements on the same row in a table, and someone changed the row while your transaction was running, you’d get the new value part way through the transaction. If that’s not what you want to happen, you need to use a different lock hint – Allow Snapshot Isolation. This ensures that you keep the same data for the whole of a transaction. Whereas you set Read Committed Snapshot for the whole of the database, to use Allow Snapshot Isolation you have to enable it both at the database level and then again inside each transaction where you want to use it. You set it at the database level using:
ALTER DATABASE vsjdb SET ALLOW_SNAPSHOT_ISOLATION ONTo use it, a transaction has to include a SET statement:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN vsjTransact SELECT lastname FROM contactsThis is obviously an overhead, but a fairly minor one.
ConclusionIn most database applications, you can get away with ignoring the locking, but a few simple experiments might show you how to improve the performance of your application quite significantly.
There’s no one answer to ‘which lock hint is best’, because which method you choose will depend on the way your server is configured, how much spare disk space you have, and the exact requirements of your users. The overall message, though, is that understanding what assumptions SQL Server makes about how to lock your records can save you hours of processing time.
Kay Ewbank is the Editor of the magazine Server Management and is a highly experienced database analyst who has followed the development of database technology from dbase through to today’s SQL servers.