Project Details
Description
This research proposes new mathematical and algorithmic tools for identifying patterns in and effectively
mining time-series data, i.e. a sequence of data points, measured typically at successive time instants
spaced at uniform time intervals. A large variety of real-world data sources such as speech and audio,
biomedical signals, health care records, network-traffic and stock market data etc. manifest as time-series
and their analysis is of significant interest to both government and industry. The explosion of such
data sources has only been exacerbated by the digital revolution, viz. the generous amount of audio-video
streams on the internet, the storage of large amounts of chronological health care records in electronic
databases and the continuous generation of new time-series data from advances in sensing. Automated software
tools that can find patterns in a large time-series sequence, help in fast and scalable retrieval, and
categorize large time-series collections are hence highly desirable. The proposed research is in developing
such software (algorithmic)tools with a particular focus on robustness and scalability. The problem of
robustness refers to the fact that time-series that may have the 'same appeal' to a human consumer, e.g.
different versions of the same song/video, may not necessarily be digitally identical. Hence, robust
techniques are needed that can withstand distortions which do not change the essence of the time-series
content. Scalability requires that the pattern-matching techniques be fast and easy to implement, so that
the solutions can be deployed to mine large collections. Further, to prepare the next generation of
engineers in electrical engineering and computer science, the project includes a strong educational
component. At the heart of this educational component is an edutainment game where a human player, i.e.
students with varying levels of academic preparation (high-school, undergraduate and graduate), compete
against a computer algorithm in a video piracy challenge. The game is aimed at making the learning
process more interactive, particularly for undergraduate students.
A serious practical challenge in mining time-series data for emerging applications is the ability to
withstand distortions - that is often instances of the 'same underlying' time series are observed
under noise, amplitude and/or time scaling and other miscellaneous operations. Many existing
techniques for time-series comparisons do not enable distortion robustness and the ones that do,
often come at a substantial computational cost. Further, existing algorithmic techniques enable
control of key properties of time-series features such as robustness and uniqueness only at an
intuitive, often heuristic level. The proposed research advocates judicious selection of time-series
extrema and aims to break the classical trade-off between computational efficiency in time-series
feature extraction and comparison vs. enabling robustness to distortions. Unlike existing methods,
which employ pre-processing time-series filters 'inspired' from intuition, explicit optimization of the
filter is proposed in the sense of cost functions that capture key feature attributes such as robustness
and uniqueness of the extracted extrema. Optimal extrema extraction will be investigated in two
different setups: a.) a deterministic framework where example training time-series are used in the
optimization, and b.) a statistical framework where stochastic models on time-series are used. A
variety of related sub-problems also emerge, namely: a.) connections to edge detection problems
in image processing and vision, b.) encoding and comparisons of time-series extrema, and c.)
extensions to finding robust extrema under non-linear operations on the time-series. The research
plan is to juxtapose the development of the algorithmic tools with two real-world applications: 1.)
multimedia fingerprinting, and 2.) bio-medical time series analysis. Additionally, software tools
namely edutainment games will be developed based on these applications which will play a crucial
role in enhancing the PI's research and classroom teaching. Dissemination of research results will be
done via articles in leading Journals and conferences, and via online MATLAB software toolboxes.
Status | Finished |
---|---|
Effective start/end date | 5/1/15 → 4/30/20 |
Funding
- National Science Foundation: $500,000.00