The DataAdapter is not really a new idea born with ADO.NET. Its roots come not from Tao, but from the DOW – the Data Object Wizard first introduced in Visual Basic 6.0 but henceforth abandoned in light of the innovations in Visual Studio.NET. The DOW was used to generate a class that fetches a rowset and defines the UPDATE, DELETE and INSERT SQL needed to manage the rowset. Basically, the Visual Basic.NET DataAdapter does the same thing. Unlike the Data Object Wizard, the DataAdapter permits you to define both the input and action queries with a few more options. It also sports a facility to automatically generate the action queries – but regrettably, only for "home" database challenges.
One of the most significant differences between COM-based ADO "classic" (ADOc) and ADO.NET is how updates are handled. In ADOc, action SQL was generated on the fly controlled by the Update Criteria and other properties. It did not require an extra round trip to generate this SQL, but it did require a bulkier SELECT FOR BROWSE to return schema information needed to construct these action commands. ADO.NET takes another approach – it leaves the generation of the update commands up to the developer. This means that developers have far more flexibility in how updates are posted to the data. As we'll see later, this new flexibility means developers have to take on additional responsibilities once assumed by ADOc.
Choosing between the DataAdapter and the DataReaderWhen you first start working with the ADO.NET DataAdapter, you might be puzzled as to how to best use it with existing data access architectures. Developers, yearning for better performance, often lean toward the low-level DataReader approach – rolling their own update code or using the Command methods to execute stored procedures which update the source tables. They're not convinced that the extra features exposed by the DataAdapter and its associated DataSet and DataTable are worth the extra CPU cycles. I'm not in that clique. I think it's far easier to code the DataAdapter and in most cases the loss in performance is easily outweighed by the ability to let ADO.NET handle many of the tedious and trouble-prone tasks of connection management, query execution, rowset population, and DataTable construction. In addition, once the DataTable is created, the additional functionality exposed by its methods and properties further simplify and stabilise the code – without further loss in the developers or the code's performance.
Getting the best performance from the DataAdapterThe disconnected architecture implemented by ADO.NET assumes that your code returns a set of rows to the client and posts changes to the in-memory copy of the data. These changes are not reflected in the database until you submit a "batch" of updates using the DataAdapter Update method. No, it's not really a batch in the traditional sense, but that's what Microsoft chose to call it. What really happens is ADO.NET walks the set of rows passed to the Update method and executes either the UpdateCommand, DeleteCommand or InsertCommand based on the row's RowState value. Each execution requires a round trip.
That said, I also think some developers lose sight of what makes a program perform and scale to its full potential. It does not help that all too many of the examples you'll encounter suggest that you use the DataAdapter to fetch all of the rows (and columns) from a database table and construct appropriate action queries to enable the Update method. While this approach is valid for "toy" databases, it flies in the face of the constraints of scalable, high-performance, real-world systems. If you're working with home databases or those in your office with only a few hundred rows, and you aren't concerned with multi-user issues (and never plan to be), whether or not you fetch all of the rows from the table won't make much difference. However, in most business applications, especially those that have to scale up to support dozens to thousands of users and work with thousands to millions of rows, how you fetch and update the data is critical to a successful application.
The role of the CommandBuilderOne of the most respected product managers at Microsoft quipped that the CommandBuilder should be renamed the "CommandDon'tUseBuilder". I agree. The CommandBuilder is a class used to construct the action SQL that changes the data specified in the DataAdapter SelectCommand. That is, the CommandBuilder uses the SelectCommand.CommandText to query the database engine for additional schema information using the undocumented SET NO_BROWSETABLE function (or other mechanisms as dictated by the data provider). If you choose to use the CommandBuilder, you'll discover the SQL it generates is pretty crude. In that the approach it takes to concurrency management is very simplistic and inefficient and, more importantly, not functional in a variety of common situations – not to mention the need for a costly additional round trip to the server. I show where the CommandBuilder can and cannot be used in the subsequent discussion.
Fetching Data with the DataAdapterThere are several approaches to fetching data using the DataAdapter – only a few of the more common techniques are discussed here. Each of these techniques lends itself to addressing specific issues and data access challenges and opportunities for better performance. The following discussion illuminates these techniques and balances their advantages and challenges.
- Return a rowset generated by an SQL SELECT that references a single-table, view or stored procedure.
- Return two or more rowsets generated by a "batch" SQL query containing more than one SELECT referencing one or more data tables or views.
- Return a rowset generated from a JOIN product that references several related tables.
- Return a rowset generated from a composite query. That is, the data is returned from more than one data source.
da = New SqlDataAdapter("SELECT pubid, pubname, _ city, state" & " FROM publishers WHERE _ state = @StateWanted " & " SELECT pubid,isbn, title, _ price FROM titles where pubid in " & _ " (SELECT pubid from publishers WHERE state = _ @StateWanted)", cn) da.SelectCommand.Parameters.Add(_ "@StateWanted", SqlDbType.VarChar, 2) dsPublisherTitle = New DataSet() _ 'Create new DatasetThe Fill method makes sure the connection is open, executes the query, constructs any needed DataTables, completes rowset population (fetches the rows into the DataTable Rows collection), and closes the connection. The figure below illustrates this mechanism:
Using the SelectCommand to manage an SQL query …and here is an example of the code:
da.SelectCommand.Parameters(_ "@StateWanted").Value = txtStateWanted.Text da.Fill(dsPublisherTitle) DataGrid1.DataSource = dsPublisherTitle.Tables(0) DataGrid2.DataSource = dsPublisherTitle.Tables(1) lblTables.Text = dsPublisherTitle.Tables.Count.ToString lblRows.Text = _ dsPublisherTitle.Tables(0).Rows.Count.ToString & " : " & _ dsPublisherTitle.Tables(1).Rows.Count.ToStringDepending on how the DataAdapter is configured, the Fill method either creates a new DataTable for each rowset returned, or cleverly "updates" the DataTable objects already constructed in a selected DataSet. But before we get to that most powerful achievement, let's step through some of the interesting fetch scenarios. The scenario that makes the most sense for you depends on where the data is located and whether or not you need to use the Update method to perform updates on a selected database table.
The Single-Table Fetch ApproachAs I said, most of the examples you'll encounter (in sources other than my books) illustrate use of the DataAdapter by coding the SelectCommand CommandText as "SELECT * FROM Authors". This simple query masks several important issues. First, it does not limit the number of rows or columns returned from the data table. This approach is fine for home databases, but cripples an application dealing with serious amounts of data. This means you'll need an example of fetching selected columns focused (and limited) by a WHERE clause for your production application that extracts rows from that 100,000 (or 1,000) row table (assuming you want to build a scalable application). This simplistic approach also assumes that you can return the information you want without benefit of data from other tables. Sure, there are lots of cases where a simple query will do. But as I frequently reiterate, SELECT * is evil. It blindly returns all columns from a product whether you need them or not and assumes that changes to the database table post-deployment won't affect your application. This is called optimistic programming, or programming by wishful thinking. Ironically, this approach is the one most likely to work with the DataAdapter's CommandBuilder – used to generate the action queries to update the fetched data.
The Multiple Resultset ApproachAssuming the single-table query approach won't work for your application, you should examine the outcome of a "multiple resultset" query. Suppose you avoid the lure of the DataAdapter configuration wizard and create a DataAdapter using the code in our first listing, above.
This query returns two independent rowsets which might be logically related. In this case, ADO.NET constructs two DataTables (or it will if the Fill method is executed) one containing selected Publisher names and the other associated Titles. It's up to my own client-side code to relate these two tables if I want to let ADO.NET manage and bind controls to show these relationships. The code shown in our second listing does just that – it executes the query and returns the two DataTables which are bound to separate DataGrid controls.
This approach assumes that you don't expect to update the source tables, you expect to use the DataAdapter Update method to update only one of the source tables, or you plan to provide your own update routines and not use the Update method. In this case the CommandBuilder will be able to construct the action commands for you – but only for the first data table specified by the initial SELECT.
The JOIN Product ApproachOf course, you can still use the server's ability to JOIN two or more database tables or views together to produce a composite SQL product and the DataAdapter can execute this query and construct a DataTable from the product's rowset. No, ADO.NET does not treat this rowset differently from any other – one DataTable is created for each unique rowset returned by the server. Yes, this means that you can return a rowset from a JOIN product and another from a single-table query or another JOIN product and use these to generate multiple DataTables as necessary.
Unlike the previous approach, the CommandBuilder won't be able to construct the action Commands so you'll have to fall back on your own ad hoc action commands or stored procedures to make changes to the base tables. For many developers, this is what they've had to do for years anyway.
The Composite Query ApproachOne of the more interesting (and powerful) approaches you can use is construction of "composite" queries that return rowsets from more than one data source. The trick here is to use more than one DataAdapter to specify different data sources drawing rowsets from the same or different servers, from different providers or from non-database data sources. Once the data is downloaded into DataTable objects, it's a simple matter to construct relationships between them. Updating is also straightforward in this case, and if the SelectCommand is simple enough the CommandBuilder can be used to generate the action commands (not that I recommend doing so). Ideally, you could use stored procedures on each data source system to perform updates to the data table associated with each DataAdapter.
This approach is bound to become one of the most popular as it deals with so many issues quite neatly. Regardless of the SQL product's source, you can define a suitable DataAdapter to focus the update operation on a specific member. That is, you define a DataAdapter UpdateCommand, InsertCommand and DeleteCommand to change a specific row in a specific data table – you define a separate DataAdapter for each database table you wish to update. Just because a DataSet contains DataTables drawn from disparate data sources that's no reason you can't update any or all of the base database tables as needed.
How Fill manages Inbound RowsetsNote the code in our first listing, above, reports the number of DataTables and rows created in each. If I run the code again, the Fill is repeated and ADO.NET adds any rows returns to the existing tables created in the first pass instead of creating a new DataTable. This is because ADO.NET sees that the schema for the inbound rowset matches the schema of the existing DataTable(s). This is especially useful behaviour if the DataSet is expected to contain a composite resultset. For example, if we wanted to fetch publishers and titles from Washington and California, one could change the input parameter from "WA" to "CA" and execute Fill again.
But what if the schema changes? Well, if the inbound rowset's table is the same, but the SELECT columns change, ADO.NET alters the schema of the DataTable to match. This means existing DataTable columns that don't exist in the inbound rowset's schema, are simply set to NULL in all existing DataTable rows. In a similar way, new columns in the inbound rowset's schema that don't match existing DataTable schema columns are added to the DataTable Columns collection. That's what's so unique about ADO.NET and how it handles data structures. Unlike ADO classic's Recordset whose state is frozen post open, the DataTable object can be easily morphed to reflect changes whenever needed.
One important point: If you expect to have ADO.NET handle parent/child cascading deletes, you'll want to set the MissingSchemaAction property to MissingSchemaAction.AddWithKey. This changes the way ADO.NET submits the SelectCommand to the server. For example, in the case of SQL Server, if you leave the MissingSchemaAction property set to its default state (MissingSchemaAction.Add), the example (see our first listing, above) query executes the code shown below on the server as shown by the Profiler:
exec sp_executesql N'SELECT pubid, pubname, city, state FROM publishers WHERE state = @StateWanted SELECT pubid,isbn, title, price from titles where pubid in (SELECT pubid from publishers where state = @StateWanted)', N'@StateWanted varchar(2)', @StateWanted = 'CA'
However, if you set MissingSchemaAction.AddWithKey, ADO.NET adds two additional clauses to the query that is about to be executed:
AddWithKey set exec sp_executesql N' SET FMTONLY OFF; SET NO_BROWSETABLE ON;SELECT pubid, pubname, city, state FROM publishers WHERE state = @StateWanted SELECT pubid, isbn, title, price FROM titles where pubid in (SELECT pubid FROM publishers WHERE state = @StateWanted)', N'@StateWanted varchar(2)', @StateWanted = 'CA'These tell the engine to return not only the rowset, but additional schema information including keys and timestamps as hidden output columns. These values are used to populate the DataTable primary key constraint.
As an alternative, you can explicitly set the primary key constraints on each DataTable which avoids returning extra information in the query. This ensures that incoming records that match existing records are updated instead of appended. The Visual Studio .NET documentation says this is accomplished by having the SQL Server .NET Data Provider append a FOR BROWSE clause to the statement being executed. Based on the Profile dump, this is clearly not being sent by the SqlClient .NET data provider. However, the SET FMTONLY OFF; SET NO_BROWSETABLE ON query preamble that it is appending must be doing the same thing.
After some research I discovered a KB article (Q178366) that reveals what's going on:
"SET NO_BROWSETABLE ON is an undocumented option performed for Remote Data Service (RDS) ActiveX Data Connector (ADC) connections to SQL Server. Enabling this option makes every SELECT statement act as though FOR BROWSE had been appended to the statement, but bypasses the temporary table that FOR BROWSE normally pipes the results through. The net effect is to add keys and timestamps to the query as hidden output columns so the client can update specific rows (updateable cursors) without separate trips to the server to pick up the meta-data and munging the query to get the appropriate columns."The second part of this article will discuss how to update data once the DataAdapter has populated the DataTable objects it manages through a DataSet "container". We've laid most of the needed foundations here to permit you to manage updates in a number of ways. All that remains is getting changes posted to the DataTable(s), submitting those changes to the data server and dealing with concurrency violations.
William Vaughn is an industry-recognised author who has worked with mainframe, minicomputer and personal computer systems as a developer, manager, trainer, marketer, support person and writer for over 30 years. His area of specialisation is focused on data access and especially Visual Basic and SQL Server. He speaks at conferences all over the world including VSLive, Developer Connections, and DevWeek, and has written articles for publications such as SQL Server Magazine and MSDN Magazine.