K-selection query over uncertain data

Xingjie Liu, Mao Ye, Jianliang Xu, Yuan Tian, Wang-chien Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the "best available" objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k · N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.

Original languageEnglish (US)
Title of host publicationDatabase Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings
Pages444-459
Number of pages16
EditionPART 1
DOIs
StatePublished - Dec 28 2010
Event15th International Conference on Database Systems for Advanced Applications, DASFAA 2010 - Tsukuba, Japan
Duration: Apr 1 2010Apr 4 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume5981 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
CountryJapan
CityTsukuba
Period4/1/104/4/10

Fingerprint

Uncertain Data
Query
Dynamic programming
Pruning
Search Space
Dynamic Programming
Decision making
Decomposition
Web Search
Recursion
Baseline
Decision Making
Object
Decompose
Evaluate

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Liu, X., Ye, M., Xu, J., Tian, Y., & Lee, W. (2010). K-selection query over uncertain data. In Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings (PART 1 ed., pp. 444-459). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5981 LNCS, No. PART 1). https://doi.org/10.1007/978-3-642-12026-8_34
Liu, Xingjie ; Ye, Mao ; Xu, Jianliang ; Tian, Yuan ; Lee, Wang-chien. / K-selection query over uncertain data. Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings. PART 1. ed. 2010. pp. 444-459 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{08f7f66cfdb549e8ba400602ad64c75d,
title = "K-selection query over uncertain data",
abstract = "This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the {"}best available{"} objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k · N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.",
author = "Xingjie Liu and Mao Ye and Jianliang Xu and Yuan Tian and Wang-chien Lee",
year = "2010",
month = "12",
day = "28",
doi = "10.1007/978-3-642-12026-8_34",
language = "English (US)",
isbn = "3642120253",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "444--459",
booktitle = "Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings",
edition = "PART 1",

}

Liu, X, Ye, M, Xu, J, Tian, Y & Lee, W 2010, K-selection query over uncertain data. in Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings. PART 1 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 5981 LNCS, pp. 444-459, 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010, Tsukuba, Japan, 4/1/10. https://doi.org/10.1007/978-3-642-12026-8_34

K-selection query over uncertain data. / Liu, Xingjie; Ye, Mao; Xu, Jianliang; Tian, Yuan; Lee, Wang-chien.

Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings. PART 1. ed. 2010. p. 444-459 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5981 LNCS, No. PART 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - K-selection query over uncertain data

AU - Liu, Xingjie

AU - Ye, Mao

AU - Xu, Jianliang

AU - Tian, Yuan

AU - Lee, Wang-chien

PY - 2010/12/28

Y1 - 2010/12/28

N2 - This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the "best available" objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k · N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.

AB - This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the "best available" objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k · N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.

UR - http://www.scopus.com/inward/record.url?scp=78650478588&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650478588&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-12026-8_34

DO - 10.1007/978-3-642-12026-8_34

M3 - Conference contribution

SN - 3642120253

SN - 9783642120251

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 444

EP - 459

BT - Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings

ER -

Liu X, Ye M, Xu J, Tian Y, Lee W. K-selection query over uncertain data. In Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings. PART 1 ed. 2010. p. 444-459. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-12026-8_34