TY - JOUR
T1 - Crawling Hidden Objects with kNN Queries
AU - Yan, Hui
AU - Gong, Zhiguo
AU - Zhang, Nan
AU - Huang, Tao
AU - Zhong, Hua
AU - Wei, Jun
N1 - Funding Information:
Hui Yan and Zhiguo Gong were supported in part by Fund of Science and Technology Development of Macau Government under FDCT/106/2012/A3 and FDCT/116/2013/A3 and in part by University Macau Research Committee under MYRG105-FST13-GZG and MYRG2015-00070-FST. Nan Zhang was supported in part by the US National Science Foundation under grants 0852674, 0915834, 1117297, and 1343976, and by the US Army Research Office under grant W911NF-15-1-0020.
Publisher Copyright:
© 2015 IEEE.
PY - 2016/4/1
Y1 - 2016/4/1
N2 - Many websites offering Location Based Services (LBS) provide a k NN search interface that returns the top-k nearest-neighbor objects (e.g., nearest restaurants) for a given query location. This paper addresses the problem of crawling all objects efficiently from an LBS website, through the public k NN web search interface it provides. Specifically, we develop crawling algorithm for 2D and higher-dimensional spaces, respectively, and demonstrate through theoretical analysis that the overhead of our algorithms can be bounded by a function of the number of dimensions and the number of crawled objects, regardless of the underlying distributions of the objects. We also extend the algorithms to leverage scenarios where certain auxiliary information about the underlying data distribution, e.g., the population density of an area which is often positively correlated with the density of LBS objects, is available. Extensive experiments on real-world datasets demonstrate the superiority of our algorithms over the state-of-the-art competitors in the literature.
AB - Many websites offering Location Based Services (LBS) provide a k NN search interface that returns the top-k nearest-neighbor objects (e.g., nearest restaurants) for a given query location. This paper addresses the problem of crawling all objects efficiently from an LBS website, through the public k NN web search interface it provides. Specifically, we develop crawling algorithm for 2D and higher-dimensional spaces, respectively, and demonstrate through theoretical analysis that the overhead of our algorithms can be bounded by a function of the number of dimensions and the number of crawled objects, regardless of the underlying distributions of the objects. We also extend the algorithms to leverage scenarios where certain auxiliary information about the underlying data distribution, e.g., the population density of an area which is often positively correlated with the density of LBS objects, is available. Extensive experiments on real-world datasets demonstrate the superiority of our algorithms over the state-of-the-art competitors in the literature.
UR - http://www.scopus.com/inward/record.url?scp=84963781075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963781075&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2015.2502947
DO - 10.1109/TKDE.2015.2502947
M3 - Article
AN - SCOPUS:84963781075
SN - 1041-4347
VL - 28
SP - 912
EP - 924
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
M1 - 7335622
ER -