With the increasing popularity of mobile devices and wireless networks, location-based services have become widely adopted by consumers. As business points of interest (POIs) and associated information are subject to change over time, it is critical to ensure the correctness of data with a reasonable cost of data verification. In this article, we propose new machine learning approaches for detecting outdated POI information (e.g., location/address and POI name) via web-derived features. We represent POI information by <address, name> pairs and formulate the problem of detecting the outdated POI pairs in two forms: a classification problem and a ranking problem. We evaluate the proposed methods using a real-world dataset crawled from Yellow Pages websites. For POI pairs with a one-to-one relationship between addresses and names (Address-to-one-POI), the supervised approaches achieve 95.7% accuracy. For POI pairs with a one-to-many relationship between addresses and names (Address-to-many-POIs), the best accuracy is 65.8%. By exploiting the strengths of different classifiers, we improve performance by tri-training to achieve an accuracy of 72.8%.
All Science Journal Classification (ASJC) codes
- Earth and Planetary Sciences(all)