Unbiased estimation of size and other aggregates over hidden web databases

Arjun Dasgupta, Xin Jin, Bradley Jewell, Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

46 Scopus citations

Abstract

Many websites provide restrictive form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we consider the problem of estimating the size of a hidden database through its web interface. We propose novel techniques which use a small number of queries to produce unbiased estimates with small variance. These techniques can also be used for approximate query processing over hidden databases. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 International Conference on Management of Data, SIGMOD '10
Pages855-866
Number of pages12
DOIs
StatePublished - Jul 23 2010
Event2010 International Conference on Management of Data, SIGMOD '10 - Indianapolis, IN, United States
Duration: Jun 6 2010Jun 11 2010

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Other

Other2010 International Conference on Management of Data, SIGMOD '10
CountryUnited States
CityIndianapolis, IN
Period6/6/106/11/10

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Unbiased estimation of size and other aggregates over hidden web databases'. Together they form a unique fingerprint.

  • Cite this

    Dasgupta, A., Jin, X., Jewell, B., Zhang, N., & Das, G. (2010). Unbiased estimation of size and other aggregates over hidden web databases. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10 (pp. 855-866). (Proceedings of the ACM SIGMOD International Conference on Management of Data). https://doi.org/10.1145/1807167.1807259