An investigation of query techniques for semistructured data

Publication Type:
Thesis
Issue Date:
2007
Filename Description Size
01Front.pdfcontents and abstract1.52 MB
Adobe PDF
02Whole.pdfthesis37.59 MB
Adobe PDF
Full metadata record
NO FULL TEXT AVAILABLE. Access is restricted indefinitely. ----- Many organizations have managed their data using relational database management solutions for more than 20 years. Most of the data was initially structured and fitted well with the relational data model. Support for unstructured data was added later to enable the storage of documents, images, and similar data objects. Relational Database Management Systems (RDBMS) and more recently Object-Relational Database Management Systems (ORDBMS) have successfully addressed the requirements of data management for structured and unstructured data. More recently, other kinds of data have become widely available electronically, for example loosely structured information used in Web (World Wide Web) applications. Challenges arise when managing large amounts of complex information stored in many different databases and files across multiple computer systems. Many newer applications store such data in a loosely structured format. These datasets (e.g. Web pages, oceanographic data and health care data) frequently lack the level of structure needed to model the data and build a schema in a traditional database sense and the datasets are usually large with complex structures. These data types are referred to as semi structured data. XML (eXtensible Markup Language) has emerged as a standard format for the exchange of data. Semi structured data formatted in XML is present in most organizations today and its significance is likely to increase in the future. The XML data format provides a way of regularizing the storage of semi structured data. Data encoded in XML exhibits the classic characteristics of semi structured data. We used XML as the data format to represent semi structured data in this research. Queries on multiple large volumes of complex semi structured data are particularly difficult to implement at present, because of the limitations of XML query techniques. Querying such data requires an approach that can integrate multiple semi structured datasets and implement suitable query techniques. In this thesis, we investigate existing techniques that can be used to store and query semi structured data and propose a method that involves schema integration and transformation to enable the storage of semi structured data in an ORDBMS. First, we describe a method that can be used to integrate multiple sets of XML data and then analyse both semantic and structural schema integration issues. Second, we describe a method for transforming XML schemas into Object-relational schemas in order to store XML data in an object-relational database. Third, we propose a method of integrating multiple sets of XML data by using a predefined vocabulary to resolve schema integration. Finally, we demonstrate a practical way of transforming an XML schema into Object-relational schema, and demonstrate this method with a simple example. The resulting database allows repeated queries over the datasets, fully utilizing the features of the object-relational query engine.
Please use this identifier to cite or link to this item: