kiranreddys.com
Kiran's personal website
Blog   Forums   Sitemap   Social    Phone
 

Archive for September 2008

Storing XML in RDBMS

As XML is the de-facto standard for data exchange, systems need to store large volumes of XML documents and as I discussed earlier RDBMS is the best system to store and manage data.

DBMS provide various storage options, mechanisms and methods to store XML data in the Database.

Storing XML as Non-Native Character Data
In RDBMS you can store entire XML either as a character type (VARCHAR) or as a single value of large object (CLOB). When storing as character type, it can be split into multiple rows. XML stored into CLOBs or VARCHAR columns is not parsed.

Storing XML in non-native XML type gives very high performance because no parsing takes place in the process. Parsing can be done only at retrieval time. Storing XML data in not-native type is simple and easy.

 

Figure 1 Storing XML as non-native character data

Figure
1 Storing XML as non-native character data

Storing in to VARCHAR or CLOBs is easy to implement but DBMS does not know XML data structure and it’s not be parsed and no DOM object is created hence querying and searching of data is not possible or very difficult.

This type of storage is reasonable choice for following requirements
• When DBMS does not support native XML types
• Element search is not required
• Full documents insert and retrieval is more common operation rather than accessing partial data or individual elements of data.
• No updates or deletes required for XML elements
• XML validation is not required
• XML parsing and random access is not required
• When entire Documents need to be stored and retrieved in encrypted format

Storing XML as Non-Native Relational Values
Storing XML as Relational values is removing entire XML format and converting all the values in to relational values and saving them in to normal RDBMS tables.

In this approach it discards all the XML structure element tags and coverts all the elements, text and attributes in to relational values, normally this involves very complex table structure and multiple inserts or updates and involves significant time.

Figure 2 Storing XML as non-native relational values
Figure 2 Storing XML as non-native relational values

With this approach you loose all the XML capabilities and simply uses RDBMS capabilities, depending on the level of complexity and hierarchy of XML elements it might requires large number of relational tables and columns, defining such complex relational table schema could be very complicated and cumbersome; mapping XML elements and attributes in to relational fields is also a very complicated task.

This type of storage is reasonable choice for following requirements
• XML structure is very simple and data is regular and repetitive
• DBMS data retrieval performance is high priority
• Decomposition and reconstruction of XML to/form relational values is simple and straight forward
• Query data with plain SQL

Following example shows simple XML structure that can be easily converted in to relational values

<CLASS CLSID="007" CLSNAME=“Database“>
<STUDENT>
<STDNO>U012345</STDNO>
<FSTNAME>NAVEEN</FSTNAME>
<LASTNAME>REDDY</LASTNAME>
<DOB>1978-01-01</DOB>
</STUDENT>
<STUDENT>
<STDNO>U011119</STDNO>
<FSTNAME>RONCHE</FSTNAME>
<LSTNAME>UEL</LSTNAME>
<DOB>1988-05-05</DOB>
</STUDENT>
</CLASS>

Example 2 Class Students XML Document

The XML in the above example can be easily converted in to relational values and relational values can reconstruct back to XML Document.

Above example requires only two database tables as follows

CLASS

CLSID

CLSNAME

007

Database

009

Networks

STUDENT

STDNO

FSTNAME

LSTNAME

DOB

CLSID

U012345

NAVEEN

REDDY

1978-01-01

007

U011119

RONCHE

UEL

1988-05-05

007

Example 3 Class and Students tables

Once the values are inserted in to relational tables, you can use all the DBMS SQL capabilities; for example you can retrieve all the students’ details that are in the Database class with following simple SQL.

SELECT STDNO, FSTNAME, LSTNAME, DOB, CLSID
FROM CLASS C, STUDENT S
WHERE
C.CLSID = S. CLSID
AND CLSNAME = ‘Database’
Example 4 SQL for Database class students

Results from the above SQL query can re-construct back to XML Document by using any programming language. Many main stream programming languages like JAVA, C++ and C# provide native XML writers and readers.

Storing XML in a Native XML Data Type

Many popular DBMS provides native XML storage with native XML Data types which preserve XML structure.

Figure 3 Storing XML as a native XML type
Figure 3 Storing XML as a native XML type

XML enabled DBMS provides not only XML types but also provides all the XML capabilities such as Xquery, XPath etc.. You can define a table schema and a create table in a simple way; for example you can create table with the following create statement:

CREATE TABLE COURSE (COURSEID INTERGER, COURCE_DETAILS XMLTYPE);

Following example Course details XML document can be directly inserted in to COURSE_DETAILS column and you can easily query individual elements with XQuery/XPath

<COURSE COURSEID="15" COURSENAME=ORACLE“>
STUDENT>
<STDNO>1</STDNO>
FIRSTNAME>IAN</FIRSTNAME>
LASTNAME>PETTIT</LASTNAME>
BIRTHDATE>1978-01-01</BIRTHDATE>
STUDENT>
<STUDENT>
<STDNO>2</STDNO>
FIRSTNAME>MARK</FIRSTNAME>
LASTNAME>RAMPARKASH</LASTNAME>
BIRTHDATE>1988-05-05</BIRTHDATE>
STUDENT>
</COURSE>
Example 5 Course XML Document

Even though DBMS stores XML in its native form, different Database Management Systems store XML documents using different technologies internally, some databases might convert data in to relational and XQuery into SQL (this is basically either storing XML as non-native character objects or storing XML as non-native relational values) but more databases are now using native XML types and its technologies like DOM (Document Object Model).

XML type supported databases that store native XML types follow XML Data model or XQuery Data model. This is completely different with previous non-native XML storage. Stored XML represents hierarchical structure and DBMS parses XML before inserting it to XMLType.

XML enabled native XML database systems uses trees representation of elements as the fundamental storage and processing model. Some DBMS requires serialization to retrieve full XML document but DBMS like Oracle, SQL server full document can be retrieved without serialization.

This type of storage is reasonable choice for following requirements
• High query performance is required
• XML Schema is very complex and mapping is very difficult
• Need to query individual data elements or partial document.
• Changing evolving XML Schema
• Parsing and validation is required
• Require to manipulate data elements in the XML document
• The internal structure of XML documents must be preserved