Related Topics: XML Magazine, Java Developer Magazine

XML: Article

Java Feature — JDBC 4.0

XML, performance, and more

It's been over three years since the JDBC Expert Group held its first meeting to gather requirements, requests, and pipe dreams for the JDBC 4.0 specification. In that meeting, we discussed a wide variety of topics, including performance enhancements, clarifications on the existing JDBC 3.0 specification, and Ease of Development features. Unbelievably, everything but the kitchen sink ended up making it into the release. In this article, we'll look at several key features that made the enhancement list for JDBC 4.0, and we'll discuss why those features are important.

At the time of this publication, the JDBC 4.0 specification should be close to shipping as part of Java SE 6.0. The key goals of the JDBC Expert Group were to align with the most important features of the SQL 2003 specification, to provide constructs that improve developer productivity (sometimes called Ease of Development or EOD features), to fine tune pooling constructs, and to improve scalability. While many aspects of the JDBC 3.0 specification were somewhat limited, the new additions to the JDBC 4.0 specification apply to a wider audience. Additionally, the most common complaint about the JDBC specification is that it did not provide enough specifics. This vagueness often led to different implementations between JDBC driver vendors. In addition to the wealth of new features, JDBC 4.0 includes hundreds of fixes for bugs that have been addressed and clarifications stated to eliminate ambiguity among implementations.

XML Support
One of the most useful new features in JDBC 4.0 is support for the SQL 2003 XML data type. Support for the XML data type is also the feature that has changed the most since the initial public draft of the JDBC 4.0 specification. This draft introduced support for XML data types in the database, Java XML bindings, and SQL/XML extensions to the SQL grammar. Based on feedback from both the JDBC and XML communities, the JDBC interfaces for working with XML data have been enhanced and expanded significantly.

Today, applications must use either JDBC driver extensions or the Clob and Blob interfaces to transfer XML data to or from the database. Using JDBC driver extensions makes it hard to write a portable database application. Using the Clob and Blob interfaces limits the application to working with string representations of XML data and, in many cases, requires vendor-specific extensions to the SQL grammar to tell the database whether to return the data as a character or binary representation of the XML string data.

Now, the new JDBC data type, SQLXML, is part of the JDBC 4.0 specification. Applications can use the getTypeInfo() method to determine if their database supports a native XML data type. For example, using getTypeInfo() against a Microsoft SQL Server 2000 instance does not return a result row corresponding to the SQLXML data type, indicating that there is no native XML data type available for that particular database. In contrast, using getTypeInfo() against a Microsoft SQL Server 2005 instance returns a result row, indicating that an XML data type is available. Additionally, it returns information indicating that the native type name is XML. From this information, applications can create tables that contain columns of the XML data type.

To allow applications to populate data into XML columns and retrieve data from those columns, JDBC has been expanded to include native Java bindings for XML. In the initial public draft, the JDBC Expert Group defined bindings for Java strings and StAX. The thinking at the time was that this subset of XML representations could be used to easily construct other XML representations. However, feedback from the community made it clear that there are many applications today that rely on DOM and SAX and that any JDBC XML solution must provide support for those and other XML representations. The expert group went back and totally reworked the SQLXML interface definition.

The definition of the SQLXML interface in the proposed final draft of the JDBC 4.0 specification now includes support for generating and retrieving a character or binary representation of XML data as a Java String, a character stream, or a binary stream. More importantly, the SQLXML interface includes methods for working with the Source and Result interfaces defined in the javax.xml.transform package. These methods provide flexibility for supporting any XML representation for which there is a javax.xml.transform Source or Result implementation. Additionally, supporting the Source and Result interfaces allows a JDBC driver to easily become an end point of XSLT transforms or XPath evaluations.

At a minimum, the JDBC 4.0 specification requires JDBC drivers to support the Source and Result interfaces shown in Table 1.

To create a Java construct that can be used to process XML data, an application can create a SQLXML object using the Connection.createSQLXML() method. The object that is created does not contain any data initially. Data is added to the object using one of the following methods:

  • Calling setString()
  • Using the Writer returned from setCharacterSteam()
  • Using the OutputStream returned from setBinary-Stream()
  • Associating a Result with the SQLXML object using setResult()
Listings 1 and 2 illustrate how an application can use these techniques to insert a row containing XML data. The examples used in this section work on a table that has a column named id of type INTEGER and a column named xmlCol of type XML.

Similarly, applications can retrieve XML data from a SQLXML object using the getString( ), getCharacterSteam( ), getBinaryStream( ), and getSource( ) methods. Listing 3 illustrates how an application can query a column of the SQLXML data type, create a SQLXML Java binding using the getSQLXML() method on the result set, and retrieve a StAXSource object that is used to process the XML data.

SQL 2003 also includes extensions to the SELECT syntax that allow you to construct XML results from tabular columns. Listing 4 shows a simple example of how to create a SELECT statement that produces a result set containing two columns: a CustId column of type integer and a CustInfo column of type SQLXML.

The SELECT statement uses the new SQL/XML extension XMLELEMENT to process multiple base columns into a single XML result column. JDBC 4.0 also has been expanded to support using database metadata methods to determine which SQL/XML constructs are supported on the connection. Applications can execute any supported SELECT statement with SQL/XML extensions to produce SQLXML result columns that can be processed using the new XML Java bindings.

Connection and Statement Pooling Enhancements
Today, if you have a JDBC application deployed inside an application server or Web server, there's a good chance it uses connection pooling, statement pooling, or a combination of both to obtain better application performance. Pooling is great - except it's not tunable, it's hard to map end users back to connections in the pool, and if a connection ever becomes invalid inside the pool, removing only that connection from the pool is nearly impossible. JDBC 4.0 addresses all these drawbacks.

The architecture of many Java application servers or Web servers dictates that database interaction occurs as a result of an incoming message, a user clicking a button in a Web browser, or through some other real-time event. In each of these occurrences, a database connection is usually established, one or more SQL statements are executed, results are processed, and the connection is closed. In this type of architecture, the response time for the application is limited by the response time of the connection attempt. That is, connecting to a database is one of the most performance-expensive operations that a JDBC application can do. A connection involves multiple network round-trips between a JDBC driver and a database server to perform the following actions:

  • Establish memory on behalf of the database user in the database server
  • Authenticate the user
  • Negotiate details between the JDBC driver and the database server, such as code page settings, getting the database version, and determining the optimal database protocol packet size
To limit this overhead, most Java environments use a JDBC connection pool. A connection pool manager establishes multiple database connections at system startup and keeps these connections available. When an application needs to establish a connection, the pool manager assigns one of the pre-allocated connections to the application instead of going through the time-consuming process of establishing a new physical connection to the database. When the application is finished with the connection, the connection is returned to the pool manager's cache of connections. In this type of environment, the JDBC driver and the database are not aware that the application connected and disconnected. When a connection is established, the pool manager loans out a connection and when a connection is closed, the loaned connection is returned to the pool. To the JDBC driver and to the database, the connection has been active since the system was started.

Connection pooling works great until there is a problem you need to investigate. When the response time of your database queries takes minutes instead of milliseconds, your application server suddenly starts to run out of memory or CPU cycles and your database appears to be "hung." Another possibility is that when you try to monitor the status of your applications, you find that "some JDBC connection" is using all the CPU and that the facilities available to help you find the culprit are not very good. Once a JDBC connection is established, the tracking mechanism between the physical connection and the application's use of the logical connection is lost. The connection pool manager assigns physical connections in the pool to any application that meets authentication requirements; the pool manager does not keep any statistics on the application requesting a connection, and the connection itself is a black box to the application. In other words, if you are using a monitoring tool and see that a JDBC connection is "bogging down the system," it's not possible to track down which JDBC application is actually invoking the driver.

As we stated earlier, connection pooling is usually provided by the application/Web server, so a connection request from an application is not sent to a driver, but is instead sent to the pooling component inside the server. For a JDBC driver to associate an application to a connection, it must be involved in the connection establishment process, which does not happen when using JDBC 3.0-compliant connection pool managers.

JDBC 4.0 has added the setClientInfo() and getClientInfo() methods to the connection interface to solve many of the problems mentioned above. After connecting, an application can call setClientInfo() to associate client-specific information to the JDBC connection object, such as application name, site name, and department name for the JDBC connection. The setClientInfo() request is executed in the JDBC driver and not in the connection pool manager. The JDBC driver then passes along this information to the database server. In this way, monitoring tools can retrieve client information for specific database connections from either the JDBC driver or from the database server to help pinpoint where problems are occurring.

Another problem we often see in today's popular JDBC connection pool implementations is that there is no good way for a connection pool manager to determine if a database connection has become unusable. Typically, if a pool manager detects that any single connection in the pool has become invalid, the pool manager terminates all connections in the pool regardless of whether they're usable. After the pool is flushed, the pool manager re-initiates the pool with new connections. Flushing the pool is a drastic process that results in the potential loss of business logic, poor performance, and, typically, irate users. Some pool managers erroneously use the Connection.isClosed() method to check the state of a database connection, but the intention of isClosed() is to check if a connection is open or closed, not to determine if the connection is still usable. A new method has been added, Connection.isValid(), to allow pool managers to specifically request from the driver if a connection is still usable. If a connection is invalid, the pool manager can discard only the marred connection rather than the contents of the entire pool.

In addition to the connection pooling mechanism previously discussed, JDBC 3.0 introduced a statement pooling mechanism to cache prepared statements. The statement pool manager, which can be part of the JDBC driver or part of the application/Web server, keeps prepared SQL queries in a cache that can be reused by applications. In the same way that a connection pool manager keeps performance-expensive database connections in a pool for "loan," the statement pool manager keeps SQL prepared queries in a cache that can be loaned out when an application attempts to use a SQL query that matches one in the pool.

JDBC statement pooling provides performance gains for JDBC applications that execute the same SQL statements multiple times in the life of the application. Most applications have a certain set of SQL statements that are executed multiple times and a few SQL statements that are executed only once or twice during the life of the application. Unfortunately, existing JDBC statement pooling implementations give no weight to a SQL statement that's executed 100 times versus one that's executed only twice. Either a statement goes into the pool, potentially displacing another statement from the pool, or there is no pool. JDBC 4.0 provides a more granular level of statement pooling by allowing applications to provide directives to the pool manager about whether a SQL statement should be pooled.

The PreparedStatement interface has been expanded by the addition of two new methods: isPoolable() and setPoolable(). The isPoolable() method returns a Boolean flag that denotes whether the SQL statement identified on the PreparedStatement object should be pooled (by default, a statement is poolable when it's created). Applications specifically can request that a statement not be pooled by calling setPoolable(false). Using these constructs, application architects gain more control over the performance aspects of their JDBC applications. Queries that are reused are pooled and provide optimal performance, and queries that are used infrequently do not affect the pool.

National Character Set Support
When the JDBC 1.0 specification was developed, the primary goal was to make the specification fit the Java model of programming and make it easy to use. When the topic of national language sets was discussed, it was decided to postpone introducing these types into the specification because they were complex to explain, not well understood, and the hope was that JDBC drivers could mask most differences, for example, those between NCHAR and CHAR. Besides, Java was Unicode anyway.

It turns out that most JDBC drivers do need to know when sending character data to a database if the type the database server is expecting is a Unicode type (or National Character). It also turns out that most JDBC drivers can't figure out what the database is expecting without expensive network round-trips to the database server. JDBC provides only one type of binding using setString(), setCharacterStream(), and setClob(). If a SQL parameter corresponds to an NCHAR type, the application binds the parameter using setString(). Similarly, if the parameter corresponds to a CHAR type, the application uses setString() also.

To compensate for this deficiency, JDBC drivers typically adopted one of the following three strategies:

  • Always send the character data to the database server in "safe" mode, which typically results in a performance penalty when the data doesn't match the format expected on the database server
  • Provide connection properties that must be set to correspond to the types being used on the server - clearly a problem when both types are used
  • Assume all data is ANSI and can possibly fail with data corruption if the database types are Unicode (or National types)
None of these options benefit application designers, so the JDBC 4.0 specification now provides mechanisms to denote National Type characters - (setNString(), setNCharacterStream(), setNClob(), and setObject()); and ANSI characters - (setString(), setCharacterStream(), setClob(), and setObject()).

Other New Features
We've touched on only three new features of the JDBC 4.0 specification, but the specification contains over 20 new features as well as hundreds of valuable specification clarifications. The JDBC 4.0 specification also includes support for extended SQLException hierarchies, a new ROWID data type, improved management of Clob and Blob objects, an improved mechanism for installing and recognizing JDBC drivers on a system, and more.

One of the innovations removed late in the specification process were the Ease of Development features, including annotation support. Look for annotation support soon after the release of Java SE 6.0. Take a good look at the JDBC 4.0 specification at for details about the features we've mentioned in this article as well as all the features we didn't have space to include. As you start to use JDBC 4.0, remember that some of the new features are targeted for JDBC pooling components usually available in an application server or Web server while others are targeted for JDBC drivers. Check out the components you're using to make sure that they support the parts of the specification you're interested in and start reaping the benefits of JDBC 4.0.

More Stories By John Goodson

As vice-president of product operations, John Goodson leads the product strategy, direction, and development efforts at DataDirect Technologies. For more than 10 years, he has worked closely with Sun and Microsoft on the development and evolution of database connectivity standards including J2EE, JDBC, .NET, ODBC, and ADO. His active memberships in various standards committees, including the JDBC Expert Group, have helped Goodson's team develop the most
technically advanced data connectivity technologies. He holds a BS in computer science from Virginia Tech.

More Stories By Mark Biamonte

Mark Biamonte is Program Manager for DataDirect?s Connect for JDBC product. He is responsible for defining the technical features and future direction of the Connect for JDBC product. Mark has over 20 years of experience designing computer hardware and software. He has been working with Java and JDBC for over 5 years and database APIs for over 7 years. He is currently an active member of the JDBC Expert Group defining the next version of the JDBC specification. Mark holds a master of science in electrical engineering from Worcester Polytechnic Institute in Worcester, Mass and a bachelor of science in electrical engineering from the University of Vermont in Burlington, Vt.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Most Recent Comments
Rob Bygrave 11/17/06 06:20:03 PM EST

Nice article and some good points.

Yes, these look useful. It should be noted that on the java client side you can determine the calling code/method by looking at the stack trace and this can be done automatically by the pool. This maybe available to diagnose problems for some.

I guess some pools are as dire as the 'Typical' one you describe but maybe isValid is less useful than it appears.

Specifically I would have hoped most pools would use a simple light query (e.g. select count(*) from dual) to test connections to make sure they are valid. Certainly I expect existing pools to detect dead connections and remove them individually (rather than flush the entire pool).

Perhaps isValid is better in that it is more standard and perhaps more performant than executing a simple query - but perhaps a query is better in that it is exercising the database as well. This after all, only has to be done on Connections that need to be checked for validity (when they have errored) or occassionally to check for Database down.

Q: Can you use isValid to check for the Database down event? Its not clear to me that this is included as a intended use of isValid? If isValid doesn't do this then are we not better using a simple light query so that we are also detecting database down events?

JDBC statement pooling
Its nice to standardise the API to tune the statement pooling but this is of course possible right now. Pools can right now use a LRU PreparedStatement cache and provide tuning for it. Being LRU then the frequently used statements stay in the cache so your statement thats executed twice should not kick out the one executed 100 times.

I am interested in JDBC4's RowId but I'm looking for more information on it. Specifically on what its intended use is? The example in the Spec 15.10.3 why would I use RowId instead of just binding the ID value? I have to know its the ID by the DML (the example sheds no light for me).

I am also interested to see which databases provide a 'Physical ROWID'. I know Oracle does but do any other Databases make such a Physical ROWID available?

I am interested in using Physical RowId from an ORM perspective.