An investigation of query techniques for semistructured data
- Publication Type:
- Thesis
- Issue Date:
- 2007
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
01Front.pdf | contents and abstract | 1.52 MB | |||
02Whole.pdf | thesis | 37.59 MB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
NO FULL TEXT AVAILABLE. Access is restricted indefinitely. ----- Many organizations have managed their data using relational database management
solutions for more than 20 years. Most of the data was initially structured and fitted
well with the relational data model. Support for unstructured data was added later to
enable the storage of documents, images, and similar data objects. Relational
Database Management Systems (RDBMS) and more recently Object-Relational
Database Management Systems (ORDBMS) have successfully addressed the
requirements of data management for structured and unstructured data. More recently,
other kinds of data have become widely available electronically, for example loosely
structured information used in Web (World Wide Web) applications. Challenges arise
when managing large amounts of complex information stored in many different
databases and files across multiple computer systems. Many newer applications store
such data in a loosely structured format. These datasets (e.g. Web pages,
oceanographic data and health care data) frequently lack the level of structure needed
to model the data and build a schema in a traditional database sense and the datasets
are usually large with complex structures. These data types are referred to as semi
structured data.
XML (eXtensible Markup Language) has emerged as a standard format for the
exchange of data. Semi structured data formatted in XML is present in most
organizations today and its significance is likely to increase in the future. The XML
data format provides a way of regularizing the storage of semi structured data. Data
encoded in XML exhibits the classic characteristics of semi structured data. We used
XML as the data format to represent semi structured data in this research. Queries on
multiple large volumes of complex semi structured data are particularly difficult to
implement at present, because of the limitations of XML query techniques. Querying
such data requires an approach that can integrate multiple semi structured datasets and
implement suitable query techniques. In this thesis, we investigate existing techniques
that can be used to store and query semi structured data and propose a method that
involves schema integration and transformation to enable the storage of semi
structured data in an ORDBMS. First, we describe a method that can be used to
integrate multiple sets of XML data and then analyse both semantic and structural
schema integration issues. Second, we describe a method for transforming XML
schemas into Object-relational schemas in order to store XML data in an object-relational
database. Third, we propose a method of integrating multiple sets of XML
data by using a predefined vocabulary to resolve schema integration. Finally, we
demonstrate a practical way of transforming an XML schema into Object-relational
schema, and demonstrate this method with a simple example. The resulting database
allows repeated queries over the datasets, fully utilizing the features of the object-relational
query engine.
Please use this identifier to cite or link to this item: