Searching correlated objects in a long sequence

Ken C.K. Lee, Wang-chien Lee, Donna Jean Peuquet, Baihua Zheng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA) and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
Pages436-454
Number of pages19
DOIs
StatePublished - Aug 14 2008
Event20th International Conference on Scientific and Statistical Database Management, SSDBM 2008 - Hong Kong, China
Duration: Jul 9 2008Jul 11 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5069 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
CountryChina
CityHong Kong
Period7/9/087/11/08

Fingerprint

Query
Requirements
Causality
Object
Control Parameter
Two Parameters
Scanning
Eliminate
Evaluate
Processing
Experiment
Experiments
Knowledge
Text

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Lee, K. C. K., Lee, W., Peuquet, D. J., & Zheng, B. (2008). Searching correlated objects in a long sequence. In Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings (pp. 436-454). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5069 LNCS). https://doi.org/10.1007/978-3-540-69497-7_28
Lee, Ken C.K. ; Lee, Wang-chien ; Peuquet, Donna Jean ; Zheng, Baihua. / Searching correlated objects in a long sequence. Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. 2008. pp. 436-454 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{d21d01bcac2445d4aac0a4e5cf067942,
title = "Searching correlated objects in a long sequence",
abstract = "Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA) and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.",
author = "Lee, {Ken C.K.} and Wang-chien Lee and Peuquet, {Donna Jean} and Baihua Zheng",
year = "2008",
month = "8",
day = "14",
doi = "10.1007/978-3-540-69497-7_28",
language = "English (US)",
isbn = "3540694765",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "436--454",
booktitle = "Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings",

}

Lee, KCK, Lee, W, Peuquet, DJ & Zheng, B 2008, Searching correlated objects in a long sequence. in Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5069 LNCS, pp. 436-454, 20th International Conference on Scientific and Statistical Database Management, SSDBM 2008, Hong Kong, China, 7/9/08. https://doi.org/10.1007/978-3-540-69497-7_28

Searching correlated objects in a long sequence. / Lee, Ken C.K.; Lee, Wang-chien; Peuquet, Donna Jean; Zheng, Baihua.

Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. 2008. p. 436-454 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5069 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Searching correlated objects in a long sequence

AU - Lee, Ken C.K.

AU - Lee, Wang-chien

AU - Peuquet, Donna Jean

AU - Zheng, Baihua

PY - 2008/8/14

Y1 - 2008/8/14

N2 - Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA) and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.

AB - Sequence, widely appearing in various applications (e.g. event logs, text documents, etc) is an ordered list of objects. Exploring correlated objects in a sequence can provide useful knowledge among the objects, e.g., event causality in event log and word phrases in documents. In this paper, we introduce correlation query that finds correlated pairs of objects often appearing closely to each other in a given sequence. A correlation query is specified by two control parameters, distance bound, the requirement of object closeness, and correlation threshold, the minimum requirement of correlation strength of result pairs. Instead of processing the query by scanning the sequence multiple times, that is called Multi-Scan Algorithm (MSA), we propose One-Scan Algorithm (OSA) and Index-Based Algorithm (IBA). OSA accesses a queried sequence once and IBA considers correlation threshold in the execution and effectively eliminates unneeded candidates from detail examination. An extensive set of experiments is conducted to evaluate all these algorithms. Among them, IBA, significantly outperforming the others, is the most efficient.

UR - http://www.scopus.com/inward/record.url?scp=49049102115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49049102115&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-69497-7_28

DO - 10.1007/978-3-540-69497-7_28

M3 - Conference contribution

SN - 3540694765

SN - 9783540694762

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 436

EP - 454

BT - Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings

ER -

Lee KCK, Lee W, Peuquet DJ, Zheng B. Searching correlated objects in a long sequence. In Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. 2008. p. 436-454. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-69497-7_28