Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
In this dissertation, I present a table-driven streaming XML (Extensible Markup Language) parsing and searching technique, called TDX, and investigate related techniques. TDX expedites XML parsing, validation and searching by pre-recording the states of an XML parser in tabular forms and by utilizing an efficient runtime streaming parsing engine based on a two-stack push-down automaton. The parsing tables are automatically produced from the XML schemas or from the WSDL (Web Services Description Language) service descriptions. Because the schema constraints and XPath expressions are pre-encoded in a parsing table, the approach effectively implements a schema-specific XML parser and/or query processor that combines parsing, validation and search into a single pass. Moreover, the runtime parsing engine is independent of XML schemas and XPath query expressions, parsing can be populated on-the-fly to the runtime engine, thus TDX efficiently eliminates the recompilation and redeployment requirements of schema-specific parsers to address the schema changes. Similarly, different XPath queries can also be preprocessed at compile time and populated on-the-fly to the TDX searching engine without runtime overhead. To construct the parsing tables, we developed a set of mapping rules that translate XML schemas to augmented grammars. The augmented grammars support the full expressive power of the W3C XML Schema by introducing permutation phrase grammars and multi-occurrence phrase grammars. The augmented grammars are suitable to construct a predicative parsing table. The predictive parsing table constructed from the augmented grammars can be integrated into the parser at any time to maximize the performance or be populated on-the-fly at runtime and address schema changes efficiently. Because parsing tables or searching tables are pre-processed at compile time, and looking up the tables at runtime is deterministic and takes constant time, TDX efficiently implements a single pass, predictive validating parser without backtracking or function calling overheads. Our experimental results show a significant performance improvement compared to widely used XML parsers, either validating and non-validating, and to XML query processors.
A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Robert A. van Engelen, Professor Directing Dissertation; Erlebacher Gordon, University Representative; Xiuwen Liu, Committee Member; Xin Yuan, Committee Member; Zhenhai Duan, Committee Member.
Publisher
Florida State University
Identifier
FSU_migr_etd-5297
Use and Reproduction
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.