CAREER: Optimization Based Methods for Robust Pattern Recognition in Time-Series Data

Project: Research project

Project Details


This research proposes new mathematical and algorithmic tools for identifying patterns in and effectively

mining time-series data, i.e. a sequence of data points, measured typically at successive time instants

spaced at uniform time intervals. A large variety of real-world data sources such as speech and audio,

biomedical signals, health care records, network-traffic and stock market data etc. manifest as time-series

and their analysis is of significant interest to both government and industry. The explosion of such

data sources has only been exacerbated by the digital revolution, viz. the generous amount of audio-video

streams on the internet, the storage of large amounts of chronological health care records in electronic

databases and the continuous generation of new time-series data from advances in sensing. Automated software

tools that can find patterns in a large time-series sequence, help in fast and scalable retrieval, and

categorize large time-series collections are hence highly desirable. The proposed research is in developing

such software (algorithmic)tools with a particular focus on robustness and scalability. The problem of

robustness refers to the fact that time-series that may have the 'same appeal' to a human consumer, e.g.

different versions of the same song/video, may not necessarily be digitally identical. Hence, robust

techniques are needed that can withstand distortions which do not change the essence of the time-series

content. Scalability requires that the pattern-matching techniques be fast and easy to implement, so that

the solutions can be deployed to mine large collections. Further, to prepare the next generation of

engineers in electrical engineering and computer science, the project includes a strong educational

component. At the heart of this educational component is an edutainment game where a human player, i.e.

students with varying levels of academic preparation (high-school, undergraduate and graduate), compete

against a computer algorithm in a video piracy challenge. The game is aimed at making the learning

process more interactive, particularly for undergraduate students.

A serious practical challenge in mining time-series data for emerging applications is the ability to

withstand distortions - that is often instances of the 'same underlying' time series are observed

under noise, amplitude and/or time scaling and other miscellaneous operations. Many existing

techniques for time-series comparisons do not enable distortion robustness and the ones that do,

often come at a substantial computational cost. Further, existing algorithmic techniques enable

control of key properties of time-series features such as robustness and uniqueness only at an

intuitive, often heuristic level. The proposed research advocates judicious selection of time-series

extrema and aims to break the classical trade-off between computational efficiency in time-series

feature extraction and comparison vs. enabling robustness to distortions. Unlike existing methods,

which employ pre-processing time-series filters 'inspired' from intuition, explicit optimization of the

filter is proposed in the sense of cost functions that capture key feature attributes such as robustness

and uniqueness of the extracted extrema. Optimal extrema extraction will be investigated in two

different setups: a.) a deterministic framework where example training time-series are used in the

optimization, and b.) a statistical framework where stochastic models on time-series are used. A

variety of related sub-problems also emerge, namely: a.) connections to edge detection problems

in image processing and vision, b.) encoding and comparisons of time-series extrema, and c.)

extensions to finding robust extrema under non-linear operations on the time-series. The research

plan is to juxtapose the development of the algorithmic tools with two real-world applications: 1.)

multimedia fingerprinting, and 2.) bio-medical time series analysis. Additionally, software tools

namely edutainment games will be developed based on these applications which will play a crucial

role in enhancing the PI's research and classroom teaching. Dissemination of research results will be

done via articles in leading Journals and conferences, and via online MATLAB software toolboxes.

Effective start/end date5/1/154/30/20


  • National Science Foundation: $500,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.