TY - GEN
T1 - Efficient computation of regret-ratio minimizing set
T2 - 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
AU - Asudeh, Abolfazl
AU - Nazi, Azade
AU - Zhang, Nan
AU - Das, Gautam
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/5/9
Y1 - 2017/5/9
N2 - Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research. A critical observation is that the convex hull is the subset of tuples that can be used to find the maxima of any linear function. However, in real world applications the convex hull can be a significant portion of the database, and thus its performance is greatly reduced. Thus, computing a subset limited to r tuples that minimizes the regret ratio (a measure of the user's dissatisfaction with the result from the limited set versus the one from the entire database) is of interest. In this paper, we make several fundamental theoretical as well as practical advances in developing such a compact set. In the case of two dimensional databases, we develop an optimal linearithmic time algorithm by leveraging the ordering of skyline tuples. In the case of higher dimensions, the problem is known to be NP-complete. As one of our main results of this paper, we develop an approximation algorithm that runs in linearithmic time and guarantees a regret ratio, within any arbitrarily small user-controllable distance from the optimal regret ratio. The comprehensive set of experiments on both synthetic and publicly available real datasets confirm the efficiency, quality of output, and scalability of our proposed algorithms.
AB - Finding the maxima of a database based on a user preference, especially when the ranking function is a linear combination of the attributes, has been the subject of recent research. A critical observation is that the convex hull is the subset of tuples that can be used to find the maxima of any linear function. However, in real world applications the convex hull can be a significant portion of the database, and thus its performance is greatly reduced. Thus, computing a subset limited to r tuples that minimizes the regret ratio (a measure of the user's dissatisfaction with the result from the limited set versus the one from the entire database) is of interest. In this paper, we make several fundamental theoretical as well as practical advances in developing such a compact set. In the case of two dimensional databases, we develop an optimal linearithmic time algorithm by leveraging the ordering of skyline tuples. In the case of higher dimensions, the problem is known to be NP-complete. As one of our main results of this paper, we develop an approximation algorithm that runs in linearithmic time and guarantees a regret ratio, within any arbitrarily small user-controllable distance from the optimal regret ratio. The comprehensive set of experiments on both synthetic and publicly available real datasets confirm the efficiency, quality of output, and scalability of our proposed algorithms.
UR - http://www.scopus.com/inward/record.url?scp=85021234742&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021234742&partnerID=8YFLogxK
U2 - 10.1145/3035918.3035932
DO - 10.1145/3035918.3035932
M3 - Conference contribution
AN - SCOPUS:85021234742
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 821
EP - 834
BT - SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 14 May 2017 through 19 May 2017
ER -