### Abstract

Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabetmay uniquely specify the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to identify or characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. The seminal work of Hirata, Judd, and Kilminster (2004) derives a novel objective function, akin to a clustering objective, that measures the discrepancy between a set of reconstruction values and the points from the time series. They cast estimation of a generating partition via the minimization of their objective function. Unfortunately, their proposed algorithm is nonconvergent, with no guarantee of finding even locally optimal solutions with respect to their objective. The difficulty is a heuristic nearest neighbor symbol assignment step. Alternatively, we develop a novel, locally optimal algorithm for their objective. We apply iterative nearest-neighbor symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the entire time series is achieved. While most previous approaches frame generating partition estimation as a statespace partitioning problem, we recognize that minimizing the Hirata et al. (2004) objective function does not induce an explicit partitioning of the state space, but rather the space consisting of the entire time series (effectively, clustering in a (countably) infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. Improvement,with respect to several measures, is demonstrated over popular methods for symbolizing chaotic maps. We also apply our approach to time-series anomaly detection, considering both chaotic maps and failure application in a polycrystalline alloy material.

Original language | English (US) |
---|---|

Pages (from-to) | 2500-2529 |

Number of pages | 30 |

Journal | Neural Computation |

Volume | 30 |

Issue number | 9 |

DOIs | |

State | Published - Sep 1 2018 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Arts and Humanities (miscellaneous)
- Cognitive Neuroscience

### Cite this

}

*Neural Computation*, vol. 30, no. 9, pp. 2500-2529. https://doi.org/10.1162/neco_a_01101

**A locally optimal algorithm for estimating a generating partition from an observed time series and its application to anomaly detection.** / Ghalyan, Najah F.; Miller, David Jonathan; Ray, Asok.

Research output: Contribution to journal › Letter

TY - JOUR

T1 - A locally optimal algorithm for estimating a generating partition from an observed time series and its application to anomaly detection

AU - Ghalyan, Najah F.

AU - Miller, David Jonathan

AU - Ray, Asok

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabetmay uniquely specify the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to identify or characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. The seminal work of Hirata, Judd, and Kilminster (2004) derives a novel objective function, akin to a clustering objective, that measures the discrepancy between a set of reconstruction values and the points from the time series. They cast estimation of a generating partition via the minimization of their objective function. Unfortunately, their proposed algorithm is nonconvergent, with no guarantee of finding even locally optimal solutions with respect to their objective. The difficulty is a heuristic nearest neighbor symbol assignment step. Alternatively, we develop a novel, locally optimal algorithm for their objective. We apply iterative nearest-neighbor symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the entire time series is achieved. While most previous approaches frame generating partition estimation as a statespace partitioning problem, we recognize that minimizing the Hirata et al. (2004) objective function does not induce an explicit partitioning of the state space, but rather the space consisting of the entire time series (effectively, clustering in a (countably) infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. Improvement,with respect to several measures, is demonstrated over popular methods for symbolizing chaotic maps. We also apply our approach to time-series anomaly detection, considering both chaotic maps and failure application in a polycrystalline alloy material.

AB - Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabetmay uniquely specify the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to identify or characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. The seminal work of Hirata, Judd, and Kilminster (2004) derives a novel objective function, akin to a clustering objective, that measures the discrepancy between a set of reconstruction values and the points from the time series. They cast estimation of a generating partition via the minimization of their objective function. Unfortunately, their proposed algorithm is nonconvergent, with no guarantee of finding even locally optimal solutions with respect to their objective. The difficulty is a heuristic nearest neighbor symbol assignment step. Alternatively, we develop a novel, locally optimal algorithm for their objective. We apply iterative nearest-neighbor symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the entire time series is achieved. While most previous approaches frame generating partition estimation as a statespace partitioning problem, we recognize that minimizing the Hirata et al. (2004) objective function does not induce an explicit partitioning of the state space, but rather the space consisting of the entire time series (effectively, clustering in a (countably) infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. Improvement,with respect to several measures, is demonstrated over popular methods for symbolizing chaotic maps. We also apply our approach to time-series anomaly detection, considering both chaotic maps and failure application in a polycrystalline alloy material.

UR - http://www.scopus.com/inward/record.url?scp=85051641717&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051641717&partnerID=8YFLogxK

U2 - 10.1162/neco_a_01101

DO - 10.1162/neco_a_01101

M3 - Letter

C2 - 29894657

AN - SCOPUS:85051641717

VL - 30

SP - 2500

EP - 2529

JO - Neural Computation

JF - Neural Computation

SN - 0899-7667

IS - 9

ER -