Alphabet size selection for symbolization of dynamic data-driven systems

An information-theoretic approach

Soumalya Sarkar, P. Chattopdhyay, Asok Ray, Shashi Phoha, Mark Levi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Symbolic time series analysis (STSA) is built upon the concept of symbolic dynamics that deals with discretization of dynamical systems in both space and time. The notion of STSA has led to the development of a pattern recognition tool in the paradigm of dynamic data-driven application systems (DDDAS), where a time series of sensor signals is partitioned to obtain a symbol sequence that, in turn, leads to the construction of probabilistic finite state automata (PFSA). Although modeling of PFSA from symbol sequences has been widely reported, similar efforts have not been expended to investigate how to find an appropriate alphabet size for partitioning of time series so that the symbol sequences can be optimally generated. This paper addresses this critical issue and proposes an information-theoretic procedure of data partitioning to extract low-dimensional features from time series. The key idea lies in optimal partitioning of the time series via maximization of the mutual information between the input state probability vector and pattern classes. The proposed procedure has been validated by two examples. The first example elucidates the underlying concept of data partitioning for parameter identification in a Duffing system with a sinusoidal input excitation. The second example is built upon time series of chemiluminescence data to predict lean blow-out (LBO) phenomena in a laboratory-scale combustor. Classification performance of data partitioning is analyzed in each of the two examples.

Original languageEnglish (US)
Title of host publicationACC 2015 - 2015 American Control Conference
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5194-5199
Number of pages6
Volume2015-July
ISBN (Electronic)9781479986842
DOIs
StatePublished - Jul 28 2015
Event2015 American Control Conference, ACC 2015 - Chicago, United States
Duration: Jul 1 2015Jul 3 2015

Other

Other2015 American Control Conference, ACC 2015
CountryUnited States
CityChicago
Period7/1/157/3/15

Fingerprint

Time series
Information systems
Time series analysis
Finite automata
Chemiluminescence
Combustors
Pattern recognition
Identification (control systems)
Dynamical systems
Sensors

All Science Journal Classification (ASJC) codes

  • Electrical and Electronic Engineering

Cite this

Sarkar, S., Chattopdhyay, P., Ray, A., Phoha, S., & Levi, M. (2015). Alphabet size selection for symbolization of dynamic data-driven systems: An information-theoretic approach. In ACC 2015 - 2015 American Control Conference (Vol. 2015-July, pp. 5194-5199). [7172150] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ACC.2015.7172150
Sarkar, Soumalya ; Chattopdhyay, P. ; Ray, Asok ; Phoha, Shashi ; Levi, Mark. / Alphabet size selection for symbolization of dynamic data-driven systems : An information-theoretic approach. ACC 2015 - 2015 American Control Conference. Vol. 2015-July Institute of Electrical and Electronics Engineers Inc., 2015. pp. 5194-5199
@inproceedings{e517f52ba45543eaa4649040532464d7,
title = "Alphabet size selection for symbolization of dynamic data-driven systems: An information-theoretic approach",
abstract = "Symbolic time series analysis (STSA) is built upon the concept of symbolic dynamics that deals with discretization of dynamical systems in both space and time. The notion of STSA has led to the development of a pattern recognition tool in the paradigm of dynamic data-driven application systems (DDDAS), where a time series of sensor signals is partitioned to obtain a symbol sequence that, in turn, leads to the construction of probabilistic finite state automata (PFSA). Although modeling of PFSA from symbol sequences has been widely reported, similar efforts have not been expended to investigate how to find an appropriate alphabet size for partitioning of time series so that the symbol sequences can be optimally generated. This paper addresses this critical issue and proposes an information-theoretic procedure of data partitioning to extract low-dimensional features from time series. The key idea lies in optimal partitioning of the time series via maximization of the mutual information between the input state probability vector and pattern classes. The proposed procedure has been validated by two examples. The first example elucidates the underlying concept of data partitioning for parameter identification in a Duffing system with a sinusoidal input excitation. The second example is built upon time series of chemiluminescence data to predict lean blow-out (LBO) phenomena in a laboratory-scale combustor. Classification performance of data partitioning is analyzed in each of the two examples.",
author = "Soumalya Sarkar and P. Chattopdhyay and Asok Ray and Shashi Phoha and Mark Levi",
year = "2015",
month = "7",
day = "28",
doi = "10.1109/ACC.2015.7172150",
language = "English (US)",
volume = "2015-July",
pages = "5194--5199",
booktitle = "ACC 2015 - 2015 American Control Conference",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

Sarkar, S, Chattopdhyay, P, Ray, A, Phoha, S & Levi, M 2015, Alphabet size selection for symbolization of dynamic data-driven systems: An information-theoretic approach. in ACC 2015 - 2015 American Control Conference. vol. 2015-July, 7172150, Institute of Electrical and Electronics Engineers Inc., pp. 5194-5199, 2015 American Control Conference, ACC 2015, Chicago, United States, 7/1/15. https://doi.org/10.1109/ACC.2015.7172150

Alphabet size selection for symbolization of dynamic data-driven systems : An information-theoretic approach. / Sarkar, Soumalya; Chattopdhyay, P.; Ray, Asok; Phoha, Shashi; Levi, Mark.

ACC 2015 - 2015 American Control Conference. Vol. 2015-July Institute of Electrical and Electronics Engineers Inc., 2015. p. 5194-5199 7172150.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Alphabet size selection for symbolization of dynamic data-driven systems

T2 - An information-theoretic approach

AU - Sarkar, Soumalya

AU - Chattopdhyay, P.

AU - Ray, Asok

AU - Phoha, Shashi

AU - Levi, Mark

PY - 2015/7/28

Y1 - 2015/7/28

N2 - Symbolic time series analysis (STSA) is built upon the concept of symbolic dynamics that deals with discretization of dynamical systems in both space and time. The notion of STSA has led to the development of a pattern recognition tool in the paradigm of dynamic data-driven application systems (DDDAS), where a time series of sensor signals is partitioned to obtain a symbol sequence that, in turn, leads to the construction of probabilistic finite state automata (PFSA). Although modeling of PFSA from symbol sequences has been widely reported, similar efforts have not been expended to investigate how to find an appropriate alphabet size for partitioning of time series so that the symbol sequences can be optimally generated. This paper addresses this critical issue and proposes an information-theoretic procedure of data partitioning to extract low-dimensional features from time series. The key idea lies in optimal partitioning of the time series via maximization of the mutual information between the input state probability vector and pattern classes. The proposed procedure has been validated by two examples. The first example elucidates the underlying concept of data partitioning for parameter identification in a Duffing system with a sinusoidal input excitation. The second example is built upon time series of chemiluminescence data to predict lean blow-out (LBO) phenomena in a laboratory-scale combustor. Classification performance of data partitioning is analyzed in each of the two examples.

AB - Symbolic time series analysis (STSA) is built upon the concept of symbolic dynamics that deals with discretization of dynamical systems in both space and time. The notion of STSA has led to the development of a pattern recognition tool in the paradigm of dynamic data-driven application systems (DDDAS), where a time series of sensor signals is partitioned to obtain a symbol sequence that, in turn, leads to the construction of probabilistic finite state automata (PFSA). Although modeling of PFSA from symbol sequences has been widely reported, similar efforts have not been expended to investigate how to find an appropriate alphabet size for partitioning of time series so that the symbol sequences can be optimally generated. This paper addresses this critical issue and proposes an information-theoretic procedure of data partitioning to extract low-dimensional features from time series. The key idea lies in optimal partitioning of the time series via maximization of the mutual information between the input state probability vector and pattern classes. The proposed procedure has been validated by two examples. The first example elucidates the underlying concept of data partitioning for parameter identification in a Duffing system with a sinusoidal input excitation. The second example is built upon time series of chemiluminescence data to predict lean blow-out (LBO) phenomena in a laboratory-scale combustor. Classification performance of data partitioning is analyzed in each of the two examples.

UR - http://www.scopus.com/inward/record.url?scp=84940936378&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940936378&partnerID=8YFLogxK

U2 - 10.1109/ACC.2015.7172150

DO - 10.1109/ACC.2015.7172150

M3 - Conference contribution

VL - 2015-July

SP - 5194

EP - 5199

BT - ACC 2015 - 2015 American Control Conference

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Sarkar S, Chattopdhyay P, Ray A, Phoha S, Levi M. Alphabet size selection for symbolization of dynamic data-driven systems: An information-theoretic approach. In ACC 2015 - 2015 American Control Conference. Vol. 2015-July. Institute of Electrical and Electronics Engineers Inc. 2015. p. 5194-5199. 7172150 https://doi.org/10.1109/ACC.2015.7172150