Using multiple indexes for efficient subsequence matching in time-series databases

Seung Hwan Lim, Hee Jin Park, Sang Wook Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Time-series subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to resolve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of pairs (length, frequency) of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of experiments with a real-life stock data set and a large volume of a synthetic data set to show the superiority of our approach.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings
Pages65-79
Number of pages15
StatePublished - 2006 Jul 7
Event11th International Conference on Database Systems for Advanced Applications, DASFAA 2006 - Singapore, Singapore
Duration: 2006 Apr 122006 Apr 15

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3882 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other11th International Conference on Database Systems for Advanced Applications, DASFAA 2006
CountrySingapore
CitySingapore
Period06/04/1206/04/15

Fingerprint

Subsequence
Time series
Interpolation
Interpolate
Degradation
Query
Database Design
Size Effect
Synthetic Data
Costs
Experiments
Resolve
Optimality
Entire
Distinct
Target
Series
Estimate
Experiment

Cite this

Lim, S. H., Park, H. J., & Kim, S. W. (2006). Using multiple indexes for efficient subsequence matching in time-series databases. In Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings (pp. 65-79). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3882 LNCS).
Lim, Seung Hwan ; Park, Hee Jin ; Kim, Sang Wook. / Using multiple indexes for efficient subsequence matching in time-series databases. Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings. 2006. pp. 65-79 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ee5cb3611e5a4f7e9920562f4d0abd9f,
title = "Using multiple indexes for efficient subsequence matching in time-series databases",
abstract = "Time-series subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to resolve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of pairs (length, frequency) of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of experiments with a real-life stock data set and a large volume of a synthetic data set to show the superiority of our approach.",
author = "Lim, {Seung Hwan} and Park, {Hee Jin} and Kim, {Sang Wook}",
year = "2006",
month = "7",
day = "7",
language = "English",
isbn = "3540333371",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "65--79",
booktitle = "Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings",

}

Lim, SH, Park, HJ & Kim, SW 2006, Using multiple indexes for efficient subsequence matching in time-series databases. in Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3882 LNCS, pp. 65-79, 11th International Conference on Database Systems for Advanced Applications, DASFAA 2006, Singapore, Singapore, 06/04/12.

Using multiple indexes for efficient subsequence matching in time-series databases. / Lim, Seung Hwan; Park, Hee Jin; Kim, Sang Wook.

Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings. 2006. p. 65-79 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3882 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Using multiple indexes for efficient subsequence matching in time-series databases

AU - Lim, Seung Hwan

AU - Park, Hee Jin

AU - Kim, Sang Wook

PY - 2006/7/7

Y1 - 2006/7/7

N2 - Time-series subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to resolve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of pairs (length, frequency) of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of experiments with a real-life stock data set and a large volume of a synthetic data set to show the superiority of our approach.

AB - Time-series subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to resolve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of pairs (length, frequency) of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of experiments with a real-life stock data set and a large volume of a synthetic data set to show the superiority of our approach.

UR - http://www.scopus.com/inward/record.url?scp=33745515351&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33745515351

SN - 3540333371

SN - 9783540333371

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 65

EP - 79

BT - Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings

ER -

Lim SH, Park HJ, Kim SW. Using multiple indexes for efficient subsequence matching in time-series databases. In Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings. 2006. p. 65-79. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).