HDSampler

Revealing data behind web form interfaces

Anirban Maiti, Arjun Dasgupta, Nan Zhang, Gautam Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.

Original languageEnglish (US)
Title of host publicationSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems
Pages1131-1133
Number of pages3
DOIs
StatePublished - Dec 4 2009
EventInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09 - Providence, RI, United States
Duration: Jun 29 2009Jul 2 2009

Other

OtherInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
CountryUnited States
CityProvidence, RI
Period6/29/097/2/09

Fingerprint

Sampling
Search engines
World Wide Web

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Maiti, A., Dasgupta, A., Zhang, N., & Das, G. (2009). HDSampler: Revealing data behind web form interfaces. In SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems (pp. 1131-1133) https://doi.org/10.1145/1559845.1560001
Maiti, Anirban ; Dasgupta, Arjun ; Zhang, Nan ; Das, Gautam. / HDSampler : Revealing data behind web form interfaces. SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. pp. 1131-1133
@inproceedings{fee6f1a438e34b2cbb7ed25da9996a97,
title = "HDSampler: Revealing data behind web form interfaces",
abstract = "A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.",
author = "Anirban Maiti and Arjun Dasgupta and Nan Zhang and Gautam Das",
year = "2009",
month = "12",
day = "4",
doi = "10.1145/1559845.1560001",
language = "English (US)",
isbn = "9781605585543",
pages = "1131--1133",
booktitle = "SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems",

}

Maiti, A, Dasgupta, A, Zhang, N & Das, G 2009, HDSampler: Revealing data behind web form interfaces. in SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. pp. 1131-1133, International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09, Providence, RI, United States, 6/29/09. https://doi.org/10.1145/1559845.1560001

HDSampler : Revealing data behind web form interfaces. / Maiti, Anirban; Dasgupta, Arjun; Zhang, Nan; Das, Gautam.

SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. p. 1131-1133.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - HDSampler

T2 - Revealing data behind web form interfaces

AU - Maiti, Anirban

AU - Dasgupta, Arjun

AU - Zhang, Nan

AU - Das, Gautam

PY - 2009/12/4

Y1 - 2009/12/4

N2 - A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.

AB - A large number of online databases are hidden behind the web. Users to these systems can form queries through web forms to retrieve a small sample of the database. Sampling such hidden databases is widely desired for understanding the nature and quality of data stored in them. We have developed HDSampler, which to the best of our knowledge is the first practical system for sampling structured hidden web databases. It enables efficient sampling of the databases and accurate answering of aggregate queries, to provide analysts with valuable information for data analytics, as well as help power a multitude of third-party applications such as web-mashups and meta-search engines. For the purpose of this demo, we present an instance of HDSampler on Google Base a content-rich hidden web database maintained by Google. By using HDSampler, the demo reveals a snapshot of the marginal distribution of various attributes of Google Base in a matter of minutes.

UR - http://www.scopus.com/inward/record.url?scp=70849103063&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70849103063&partnerID=8YFLogxK

U2 - 10.1145/1559845.1560001

DO - 10.1145/1559845.1560001

M3 - Conference contribution

SN - 9781605585543

SP - 1131

EP - 1133

BT - SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems

ER -

Maiti A, Dasgupta A, Zhang N, Das G. HDSampler: Revealing data behind web form interfaces. In SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems. 2009. p. 1131-1133 https://doi.org/10.1145/1559845.1560001