You are here

Spatial Approximate String Search

Title: Spatial Approximate String Search.
Name(s): Yao, Bin, author
Li, Feifei, professor directing thesis
Mio, Washington, university representative
Srinivasan, Ashok, committee member
Kumar, Piyush, committee member
Department of Computer Science, degree granting department
Florida State University, degree granting institution
Type of Resource: text
Genre: Text
Issuance: monographic
Date Issued: 2011
Publisher: Florida State University
Place of Publication: Tallahassee, Florida
Physical Form: computer
online resource
Extent: 1 online resource
Language(s): English
Abstract/Description: Many applications are featured with both text and location information, which leads to a novel type of search: spatial approximate string search (Sas). The Sas is gaining attention from the database community only recently. A large variety of problems remain open. Spatial and text data have been independently studied for decades. Spatial data have many unique features that are drastically different from text data. As a result, most of the existing techniques for string processing are either inapplicable or inefficient when adapted to spatial databases. We have investigated four issues in the general area of Sas. They are: (i) Spatial approximate string search in Euclidean space (Esas); (ii) Spatial approximate string search on road networks (Rsas); (iii) Selectivity Estimation for Esas Range Queries; (iv) Multi-Approximate-Keyword Routing query on road networks. For efficiently answering spatial approximate string queries in Euclidean space, we propose a novel index structure, MhR-tree, which is based on the R-tree augmented with the min-wise signature and the linear hashing technique. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on set resemblance between the query string and the q-grams from the sub-trees of index nodes. For the Rsas, we propose a novel method, RsasSol, which is superior to the base line algorithm in practice. The RsasSol is a combination technique of inverted list and reference points. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our methods. We also discuss how to estimate range query selectivity in Euclidean space since the query selectivity estimation is always important for the query optimization. Based on the spatial uniformity principle and the minimum number of neighborhoods principle for strings, we define a partitioning metric to facilitate the construction of buckets. We present a novel adaptive algorithm for finding balanced partitions using both the spatial and string information stored in the R-tree, which works well in practice. Lastly, we go further beyond the basic kNN and range queries for Sas and consider an interesting application of Sas on the road network: find the shortest path with keywords constraints on the road network. We propose both exact and approximate solutions for this issue. For small set of query keywords (m ≤ 6), our exact solutions can quickly find the top-k shortest paths. For large set of query keywords, our approximate solutions can still return the answer on real time with good approximate quality.
Identifier: FSU_migr_etd-0988 (IID)
Submitted Note: A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Degree Awarded: Summer Semester, 2011.
Date of Defense: June 24, 2011.
Keywords: Routing, R-tree, Selectivity Estimation
Bibliography Note: Includes bibliographical references.
Advisory Committee: Feifei Li, Professor Directing Thesis; Washington Mio, University Representative; Ashok Srinivasan, Committee Member; Piyush Kumar, Committee Member.
Subject(s): Computer science
Persistent Link to This Record:
Owner Institution: FSU

Choose the citation style.
Yao, B. (2011). Spatial Approximate String Search. Retrieved from