Estimating the web robot population

Yang Sun, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

In this research, capture-recapture (CR) models are used to estimate the population of web robots based on web server access logs from different websites. Each robot is considered as an individual randomly surfing the web and each website is considered as a trap that records the visitation of robots. We use maximum likelihood estimator to fit the observation data. Results show that there are 3,860 identifiable robot User-Agent strings and 780,760 IP addresses being used by web robots around the world. We also examine the origination of the named robots by their IP addresses. The results suggest that over 50% of web robot IP addresses are from United States and China.

Original languageEnglish (US)
Title of host publicationProceedings of the 19th International Conference on World Wide Web, WWW '10
Pages1189-1190
Number of pages2
DOIs
StatePublished - 2010
Event19th International World Wide Web Conference, WWW2010 - Raleigh, NC, United States
Duration: Apr 26 2010Apr 30 2010

Other

Other19th International World Wide Web Conference, WWW2010
CountryUnited States
CityRaleigh, NC
Period4/26/104/30/10

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Sun, Y., & Giles, C. L. (2010). Estimating the web robot population. In Proceedings of the 19th International Conference on World Wide Web, WWW '10 (pp. 1189-1190) https://doi.org/10.1145/1772690.1772868