Unbiased estimation of size and other aggregates over hidden web databases

Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)

Abstract

Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 International Conference on Management of Data, SIGMOD '10
Pages855-866
Number of pages12
DOIs
StatePublished - Jul 23 2010
Event2010 International Conference on Management of Data, SIGMOD '10 - Indianapolis, IN, United States
Duration: Jun 6 2010Jun 11 2010

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2010 International Conference on Management of Data, SIGMOD '10
CountryUnited States
CityIndianapolis, IN
Period6/6/106/11/10

Fingerprint

World Wide Web
Query processing
Websites
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Dasgupta, A., Jin, X., Jewell, B., Zhang, N., & Das, G. (2010). Unbiased estimation of size and other aggregates over hidden web databases. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10 (pp. 855-866). (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1807167.1807259
Dasgupta, Arjun ; Jin, Xin ; Jewell, Bradley ; Zhang, Nan ; Das, Gautam. / Unbiased estimation of size and other aggregates over hidden web databases. Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. 2010. pp. 855-866 (Proceedings of the ACM SIGMOD International Conference on Management of Data).
@inproceedings{1aa99ecf829c4dc7930ee3c6faadd1ce,
title = "Unbiased estimation of size and other aggregates over hidden web databases",
abstract = "Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.",
author = "Arjun Dasgupta and Xin Jin and Bradley Jewell and Nan Zhang and Gautam Das",
year = "2010",
month = "7",
day = "23",
doi = "10.1145/1807167.1807259",
language = "English (US)",
isbn = "9781450300322",
series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
pages = "855--866",
booktitle = "Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10",

}

Dasgupta, A, Jin, X, Jewell, B, Zhang, N & Das, G 2010, Unbiased estimation of size and other aggregates over hidden web databases. in Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 855-866, 2010 International Conference on Management of Data, SIGMOD '10, Indianapolis, IN, United States, 6/6/10. https://doi.org/10.1145/1807167.1807259

Unbiased estimation of size and other aggregates over hidden web databases. / Dasgupta, Arjun; Jin, Xin; Jewell, Bradley; Zhang, Nan; Das, Gautam.

Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. 2010. p. 855-866 (Proceedings of the ACM SIGMOD International Conference on Management of Data).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Unbiased estimation of size and other aggregates over hidden web databases

AU - Dasgupta, Arjun

AU - Jin, Xin

AU - Jewell, Bradley

AU - Zhang, Nan

AU - Das, Gautam

PY - 2010/7/23

Y1 - 2010/7/23

N2 - Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.

AB - Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.

UR - http://www.scopus.com/inward/record.url?scp=77954730150&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954730150&partnerID=8YFLogxK

U2 - 10.1145/1807167.1807259

DO - 10.1145/1807167.1807259

M3 - Conference contribution

SN - 9781450300322

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 855

EP - 866

BT - Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10

ER -

Dasgupta A, Jin X, Jewell B, Zhang N, Das G. Unbiased estimation of size and other aggregates over hidden web databases. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. 2010. p. 855-866. (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1807167.1807259