Toward an accurate analysis of range queries on spatial data

Ning An, Ji Jin, Anand Sivasubramaniam

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information Is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.

Original languageEnglish (US)
Pages (from-to)305-323
Number of pages19
JournalIEEE Transactions on Knowledge and Data Engineering
Volume15
Issue number2
DOIs
StatePublished - Jan 1 2003

Fingerprint

Data structures
Costs

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

@article{04eaa0cef0d14bec88dba8ff2557c689,
title = "Toward an accurate analysis of range queries on spatial data",
abstract = "Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information Is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.",
author = "Ning An and Ji Jin and Anand Sivasubramaniam",
year = "2003",
month = "1",
day = "1",
doi = "10.1109/TKDE.2003.1185836",
language = "English (US)",
volume = "15",
pages = "305--323",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "2",

}

Toward an accurate analysis of range queries on spatial data. / An, Ning; Jin, Ji; Sivasubramaniam, Anand.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 2, 01.01.2003, p. 305-323.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Toward an accurate analysis of range queries on spatial data

AU - An, Ning

AU - Jin, Ji

AU - Sivasubramaniam, Anand

PY - 2003/1/1

Y1 - 2003/1/1

N2 - Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information Is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.

AB - Analysis of range queries on spatial (multidimensional) data is both important and challenging. Most previous analysis attempts have made certain simplifying assumptions about the data sets and/or queries to keep the analysis tractable. As a result, they may not be universally applicable. This paper proposes a set of five analysis techniques to estimate the selectivity and number of index nodes accessed in serving a range query. The underlying philosophy behind these techniques is to maintain an auxiliary data structure, called a density file, whose creation is a one-time cost, which can be quickly consulted when the query is given. The schemes differ in what information is kept in the density file, how it is maintained, and how this information Is looked up. It is shown that one of the proposed schemes, called Cumulative Density (CD), gives very accurate results (usually less than 5 percent error) using a diverse suite of point and rectangular data sets, that are uniform or skewed, and a wide range of query window parameters. The estimation takes a constant amount of time, which is typically lower than 1 percent of the time that it would take to execute the query, regardless of data set or query window parameters.

UR - http://www.scopus.com/inward/record.url?scp=0037341265&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037341265&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2003.1185836

DO - 10.1109/TKDE.2003.1185836

M3 - Article

AN - SCOPUS:0037341265

VL - 15

SP - 305

EP - 323

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 2

ER -