Indexing earth mover's distance over network metrics

Ting Wang, Shicong Meng, Jiang Bian

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The Earth Mover's Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for L-p feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose Oasis, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, Oasis employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that Oasis significantly outperforms the state-of-the-art methods in query processing cost.

Original languageEnglish (US)
Article number6963483
Pages (from-to)1588-1601
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number6
DOIs
StatePublished - Jun 1 2015

Fingerprint

Earth (planet)
Query processing
Probability distributions
Costs
Experiments

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Wang, Ting ; Meng, Shicong ; Bian, Jiang. / Indexing earth mover's distance over network metrics. In: IEEE Transactions on Knowledge and Data Engineering. 2015 ; Vol. 27, No. 6. pp. 1588-1601.
@article{9276a55ce46c4fd7b04e8784470c6171,
title = "Indexing earth mover's distance over network metrics",
abstract = "The Earth Mover's Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for L-p feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose Oasis, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, Oasis employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that Oasis significantly outperforms the state-of-the-art methods in query processing cost.",
author = "Ting Wang and Shicong Meng and Jiang Bian",
year = "2015",
month = "6",
day = "1",
doi = "10.1109/TKDE.2014.2373359",
language = "English (US)",
volume = "27",
pages = "1588--1601",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "6",

}

Indexing earth mover's distance over network metrics. / Wang, Ting; Meng, Shicong; Bian, Jiang.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No. 6, 6963483, 01.06.2015, p. 1588-1601.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Indexing earth mover's distance over network metrics

AU - Wang, Ting

AU - Meng, Shicong

AU - Bian, Jiang

PY - 2015/6/1

Y1 - 2015/6/1

N2 - The Earth Mover's Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for L-p feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose Oasis, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, Oasis employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that Oasis significantly outperforms the state-of-the-art methods in query processing cost.

AB - The Earth Mover's Distance (EMD) is a well-known distance metric for data represented as probability distributions over a predefined feature space. Supporting EMD-based similarity search has attracted intensive research effort. Despite the plethora of literature, most existing solutions are optimized for L-p feature spaces (e.g., Euclidean space); while in a spectrum of applications, the relationships between features are better captured using networks. In this paper, we study the problem of answering k-nearest neighbor (k-NN) queries under network-based EMD metrics (NEMD). We propose Oasis, a new access method which leverages the network structure of feature space and enables efficient NEMD-based similarity search. Specifically, Oasis employs three novel techniques: (i) Range Oracle, a scalable model to estimate the range of k-th nearest neighbor under NEMD, (ii) Boundary Index, a structure that efficiently fetches candidates within given range, and (iii) Network Compression Hierarchy, an incremental filtering mechanism that effectively prunes false positive candidates to save unnecessary computation. Through extensive experiments using both synthetic and real data sets, we confirmed that Oasis significantly outperforms the state-of-the-art methods in query processing cost.

UR - http://www.scopus.com/inward/record.url?scp=84929484575&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929484575&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2014.2373359

DO - 10.1109/TKDE.2014.2373359

M3 - Article

AN - SCOPUS:84929484575

VL - 27

SP - 1588

EP - 1601

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 6

M1 - 6963483

ER -