MySQL table options

This article was originally published in VSJ, which is now part of Developer Fusion.
If you're designing an application based on MySQL, a key consideration is the format you're going to use for your data tables. The choices you have depend on which version of MySQL you're using - if you're working with an older release, it might be worth upgrading to a more recent version in order to get a wider choice.

Originally, MySQL used the ISAM file format, and if you're still using a version of MySQL prior to 3.23, ISAM will be the default format. ISAM stands for Indexed Sequential Access Method. The main disadvantage of ISAM from the viewpoint of MySQL is that the tables it creates are specific to the operating system they were created under. So, if you create under Windows, you won't be able to use the tables under Linux.

The more modern default is MyISAM, and it is less restrictive, as you can use the tables from one operating system under another. You shouldn't just accept the default MyISAM format, however, as there are three varieties of MyISAM tables to choose from - Dynamic, Static, and Compressed. Static MyISAM gives you the classic database table layout. All the columns in the tables must be of a fixed width, so that each record occupies a known space. This naturally wastes a lot of space, but offers fast and efficient data access, and a better chance of retrieving some of the information if something goes wrong, as you can work out where individual records start and end. The other point to watch for is an upper limit of 4GB on the table size.

Dynamic MyISAM is the next option, and this is what you get if you define a table with any VarChar, Text, or Blob fields. The main advantage over the Static MyISAM is that you use less space, because your record lengths vary depending on how much information you enter per field. However, it's not as fast as Static MyISAM, and the performance degrades over time, because the table becomes increasingly fragmented as you delete or change information. The reason for this is that when you add a record, it is allocated a specific size in the data table, sufficient to hold all the information entered. If that record has more information added later, the record might not fit back into the same space, so it will be moved and a hole will be left. There is a chance of another record fitting into the 'spare' space, but over time your database will be full of holes. If you are using Dynamic MyISAM, make sure you run the command Optimize Table on a regular basis.

The third type of table, Compressed MyISAM, can be achieved by taking either a static or a dynamic MyISAM table and running it through myisamchk, the compression program that comes with MySQL. The advantage you get is shrinkage of around a half, but the disadvantage is that you end up with a read-only table. Another potential drawback is that whenever a record is retrieved, it has to be decompressed, though this doesn't usually cause problems so long as the machine you're running it on is fast enough.

While MyISAM has a number of advantages, MySQL supports several other types of table, each with its own particular strengths. Three of the extra types come with transaction support, so you can use commit and rollback to ensure multiple stage transactions run correctly.

One possibility for transactions is Berkeley DB (BDB). However, BDB tables don't provide equivalent performance to MyISAM, and you can't use the full text indexing facility of MySQL if you choose BDB. A very new option is Gemini, but this is still in the beta experimental stage.

The best choice if you're looking for transaction support is InnoDB. This is newer than BDB, but offers better performance, and row-level locking is supported. InnoDB also supports non-locking reads on Selects, rather like Oracle. These locking options give better performance and better multi-user support because the records are less likely to be locked out in a page-level lock. If you decide to use InnoDB, it's important to keep up-to-date with the improvements and patches being offered by MySQL, and some users have reported problems using Blobs with InnoDB.

The final two table types are Merge and Heap. Heap tables are extremely fast because they are stored in RAM rather than on disk, and they use hash indexes. They are great for temporary tables where you want the fastest possible performance, but fairly obviously are not a lot of use otherwise. Merge tables are the final table type of interest. They are essentially collections of MyISAM tables in a virtual union to improve performance or to get over the 2GB size limit.

Transactions save the world

The support for transactions in InnoDB means you can hand over all the problems of multiple stage operations to the database. The classic example is when you're handling orders and payments. The customer makes a request for a product, and so long as the item is in stock, you check their credit card to see whether they can pay for it. What you want to do is to decrease the stock level by 1 at the point you make the check on stock levels, to 'hold' the item while you check the credit card - otherwise you could end up selling the same item twice. If the credit card isn't authorised, you need to put the 'held' item back into stock. By using a transaction that says:

Is the item in stock - if so
	Begin
	Decrease the number in stock by 1
	Check the credit card
	If it isn't authorised Then
		Rollback
	Else
		Debit the credit card
		Commit
	End If
The Rollback undoes everything that's happened since the Begin. Commit makes the changes permanent.

Make InnoDB available

If you installed MySQL without knowing you were going to use InnoDB format tables, the chances are you haven't set it up with the correct parameters. You need to check the my.cnf or my.ini configuration files and specify some configuration parameters. Look in the [mysqld] section, and make sure you've got an entry for the innodb_data_file_path. This is used to specify the names and the sizes of your data files. If you're using MySQL 4.0.x, you don't actually have to specify the data file path, as there's a default option.

You can read more details of how to set up MySQL for use with InnoDB in the documentation, available on the MySQL Web site.

Set up your tables

In order to carry out transaction processing, you need to create data tables of the right format. The way to do this is to specify the type when creating the table using the TYPE qualifier:

CREATE TABLE vsjProds (vsjID INT AUTO_INCREMENT,
	numInStock INT, PRIMARY KEY (vsjID)) TYPE=InnoDB
If you've already got tables in MyISAM format, you can convert them to InnoDB format relatively easily using the ALTER TABLE command:
ALTER TABLE vsjISAM TYPE InnoDB
Most databases have quite a lot of tables, and if this is true in your case, it's probably easier to use a script. If you're using MySQL under Linux or Unix, you get a pre-written script called mysql_convert_table_format. This will convert all the tables in a particular database to the specified format:
mysql_convert_table_format -type=InnoDB dbname
Once you've got at least one table in InnoDB format, you can execute some test transactions so you can check out what happens and make sure it is all working as expected.

The simplest way to do this is to view the same table in two mysql monitor windows, which I'll refer to as win1 and win2. Assuming you have a table called vsjProds in the database vsjdb, we're going to add some data, then change it differently in the two windows, and see what happens. In the first window, type the command:

USE vsjdb;
INSERT INTO vsjProds (numInStock) VALUES (20);
SELECT * FROM vsjProds;
You should see the result:
VsjID	numInStock
1		20
Now swap to Win2, and type:
SELECT * FROM vsjProds;
You should also see the result:
VsjID	numInStock
1		20
Now we are going to make some changes using transaction processing. Transactions in SQL start with the command BEGIN, and end with either a COMMIT for a successful transaction, or a ROLLBACK for an unsuccessful transaction. We'll look at a successful transaction first of all. In Win2, type:
BEGIN;
UPDATE vsjProds
	SET numInStock=15 WHERE vsjID=1;
SELECT * FROM vsjProds;
You should see the result:
VsjID	numInStock
1		15
If you now swap back to Win1, though, and do the Select, you'll get the old result of numInStock=20, because the BEGIN starts the transaction, and you haven't yet had a COMMIT to end the transaction in Win2. If you swap back to Win2 and type:
COMMIT;
...then retry the SELECT in Win1, you'll see the matched result.

So what effect does ROLLBACK have? At the moment, a SELECT in either window will show the stock level set at 15. In Win1, enter the code:

BEGIN;
UPDATE vsjProds
	SET numInStock=10 WHERE vsjID=1;
SELECT * FROM vsjProds;
You should see the result:
VsjID	numInStock
1		10
Win2 would still show a stock level of 15, because the transaction has not been completed.

We'll suppose our credit card authorisation has failed, and we want to put the stock level back to its previous level, so in Win1, enter:

ROLLBACK;
If you now do a Select, you'll see the stock level set back to 15 in both windows, because the entire transaction was aborted.

So far we have altered things in one window, but not in the other. There is nothing to stop you changing things in more than one place, though.

At the moment, numInStock on the product with ID=1 is 15. In Win1, enter the code:

BEGIN;
UPDATE vsjProds
	SET numInStock=numInStock-1 WHERE vsjID=1;
SELECT * FROM vsjProds;
You should see:
VsjID	numInStock
1		14
If you try a Select in Win2, you will see the stock level still reported as 15, because the transaction in Win1 isn't completed. What we are now going to do is to start a second transaction in Win2, so enter the code:
BEGIN;
UPDATE vsjProds
	SET numInStock=numInStock-2 WHERE vsjID=1;
What will happen is that you won't get the statement returned as OK, because the transaction in Win1 has locked the record you are trying to update, so you cannot alter it at the same time. If you now swap back to Win1 and enter:
COMMIT;
...and then swap back to Win2, you'll see the statement flagged as OK to show that your Win2 transaction has started. At this point, a Select in Win1 will show a stock level of 14, because Win1's transaction has completed, but Win2's hasn't, but in Win2, the stock level will be shown as 12, because the transaction there is underway, and the current reality is that two has been subtracted from the stock level. You can now finish the transaction in Win2. If you finish with a COMMIT, both windows will return a stock level of 12, if you finish with a ROLLBACK, both will return a stock level of 14.

Locking alternatives

You might be feeling rather worried by the way that uncompleted transactions are being handled. In InnoDB, the default behaviour is to let you carry out Selects on records that are in reality locked by an ongoing transaction, and the results you are shown don't reflect the transaction that is being worked through. That's why when you've started the transaction in one, then do a Select in both, you get different results in the two windows. This can cause big problems if, for example, you're really trying to enter orders - you could see an item as being in stock when an ongoing transaction elsewhere has really removed the remaining items and it's actually out of stock.

There are various options to overcome this. If you put the qualifier:

LOCK IN SHARE MODE
...on the end of your Selects, your Selects won't appear until any transactions that have currently locked the item have completed. So, if you were to start a transaction in Win1, then enter in Win2:
SELECT * FROM vsjProds LOCK IN SHARE MODE;
...you wouldn't see the result until you'd entered either COMMIT or ROLLBACK in Win1. This is known as a shared lock.

So what happens if you use a shared lock in the middle of a transaction? It will lock the records so they can't be changed or deleted by other processes until the end of your transaction. The locked records can be read by any process, however, even if those other processes have also used shared locks. But, any attempt to change the locked records is blocked until the transaction is completed.

In other circumstances, you might want to be more stringent about letting other processes see your locked records. The InnoDB format supports SELECT FOR UPDATE, which gives your selected records an exclusive lock. What this means is that another process could see your locked records if they used a 'normal' Select, but won't get to see them if they used a Select...Lock in Share Mode.

Avoiding Deadlock

Whenever you are using transactions, one worry is that you'll end up in deadlock. This happens when transaction 1 locks record A, then requests record B in order to finish. At the same time, transaction 2 locks record B, then requests record A in order to finish. Neither can finish, so both records are locked forever.

Fortunately, the MySQL InnoDB format is supposed to recognise this and work around it. The workaround is achieved by having an automatic Rollback on the transaction that triggered the deadlock. What isn't clear is how complex the deadlock situation can be - if it's a case of transaction 1 locking A and requesting B, transaction 2 locking B and requesting C, transaction 3 locking C and requesting A, for example, will deadlock be automatically recognised? Personally, I'd still use the maximum wait time setting in the configuration file by adding:

Set-variable=innodb_lock_wait_timeout=x
...where x is the number of seconds to wait.

Conclusion

Transactions can save lots of headaches and erroneous results in multi-user situations, and the MySQL InnoDB format is a great way to get a transactional database.


Kay Ewbank is a database consultant, author of many articles on advanced data related topics and editor of Enterprise Server Magazine (ESM).

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The trouble with programmers is that you can never tell what a programmer is doing until it's too late.” - Seymour Cray