We investigate if the mapping between text and time series data is feasible such that relevant data mining problems in text can find their counterparts in time series (and vice versa). As a preliminary work, we present the T 3 (Text To Time series) framework that utilizes different combinations of granularity (e.g., character or word level) and n-grams (e.g., unigram or bigram). To assign appropriate numeric values to each character, T3 adopts different space-filling curves (e.g., linear, Hilbert, Z orders) based on the keyboard layout. When we applied T3 approach to the "record linkage" problem, despite the lossy transformation, T 3 achieved comparable accuracy with considerable speed-up.
|Original language||English (US)|
|Journal||CEUR Workshop Proceedings|
|State||Published - 2009|
|Event||3rd Alberto Mendelzon International Workshop on Foundations of Data Management, AMW 2009 - Arequipa, Peru|
Duration: May 12 2009 → May 15 2009
All Science Journal Classification (ASJC) codes
- Computer Science(all)