XML Databases
By Danny Kalev
Date: September 27, 2002
A relational database stores information in tables, with the ability to
relate information from one table to information in another table. The
primary data unit in relational databases is a column, which may contain
one or more fields. While this model operates well under traditional
data oriented systems, it isn't ideal for manipulating XML data.
Database vendors have come up with several solutions to this problem. I
will discuss two representative products that are available on Linux.
Data-centric and Document Centric XML
Document-centric XML documents typically contain essays, poems, letters
etc. In these documents, the primary data unit is the entire document,
rather than individual fields. By contrast, data-centric documents can
be an employee's information or a catalog item. Human beings are not
intended to read such documents in their raw form, but rather they are
created and used by software application. In such documents, individual
nodes contain meaningful information even when used outside the context
of their document, an employee's name for example.
XML-enabled Databases
Relational databases that offer special capabilities for dealing with
XML data are known as "XML-enabled databases". They map XML data to
traditional rows and columns internally. The simplest mapping model
simply stores the original XML document in a table column as a Large
OBject, or LOB. This mapping model isn't ideal for performing
sophisticated queries based on specific elements of an XML document
since the database engine isn't aware of its structure. IBM's DB2 uses
"side tables" to solve this problem. A side table contains additional
indexes that point to specific elements in the original XML document.
Alternatively, a side table contains the navigation scheme of the
original document (known as "XPath" in XML-parlance). However, neither
of these methods is truly relational. If we wish to change a single
node, we need to retrieve the entire document value, change it, and
write it back. This approach is therefore more suitable for
document-centric applications.
Object-relational Storage
A second mapping model breaks the elements of an XML document into
individual table rows, or fields. This way, the database is aware of the
original document's structure. Oracle's 9i database implements this
approach. The DBA can decide which documents should be mapped to a table
schema and which documents should be stored as LOBs. Equipped with an
array of XML-related software tools such as XML Class Generator, XML SQL
Utility and XML Parser, Oracle's 9i handles document-centric and
data-centric XML equally well.
Bruce Spencer,
baspence@us.ibm.com