Suffix array of alignment: A practical index for similar data

Joong Chae Na, Heejin Park, Sunho Lee, Minsung Hong, Thierry Lecroq, Laurent Mouchard, Kunsoo Park

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretically. However, suffix trees are not widely used in biological applications because of their huge space requirements, and instead suffix arrays are used in practice. In this paper we propose a space-economical version of the suffix tree of alignment, named the suffix array of alignment (SAA). Given an alignment ρ of similar strings, the SAA for ρ is a lexicographically sorted list of all the alignment-suffixes of ρ. The SAA supports pattern search as efficiently as the generalized suffix array. Our experiments show that our index uses only 14% of the space used by the generalized suffix array to index 11 human genome sequences. The space efficiency of our index increases as the number of the genome sequences increases. We also present an efficient algorithm for constructing the SAA.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings
PublisherSpringer Verlag
Pages243-254
Number of pages12
ISBN (Print)9783319024318
DOIs
StatePublished - 2013 Jan 1
Event20th International Symposium on String Processing and Information Retrieval, SPIRE 2013 - Jerusalem, Israel
Duration: 2013 Oct 72013 Oct 9

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8214 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other20th International Symposium on String Processing and Information Retrieval, SPIRE 2013
CountryIsrael
CityJerusalem
Period13/10/713/10/9

Keywords

  • Alignments
  • Indexes for similar data
  • Suffix arrays

Fingerprint Dive into the research topics of 'Suffix array of alignment: A practical index for similar data'. Together they form a unique fingerprint.

  • Cite this

    Na, J. C., Park, H., Lee, S., Hong, M., Lecroq, T., Mouchard, L., & Park, K. (2013). Suffix array of alignment: A practical index for similar data. In String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Proceedings (pp. 243-254). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8214 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-02432-5_27