The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective

Jian Wu, Kyle Williams, Madian Khabsa, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

CiteSeerX is a crawl-based digital library search engine providing free access to more than 4 million academic papers. Since metadata in the digital library is obtained through automatic extraction, it is inevitable that errors will occur. CiteSeerX offers a feature allowing registered users to correct paper metadata including titles, authors, abstracts, publication years, venues, etc. We claim that user corrections, as a form of crowd-collaboration, provide a useful and efficient way to improve metadata quality and the impact of the digital library. As evidence to support this claim, we investigate user corrections from the last 5 years and analyze: the nature of the corrections; the quality of the corrections; and the impact of the corrections on downloads.

Original languageEnglish (US)
Title of host publicationCollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing
Subtitle of host publicationNetworking, Applications and Worksharing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages171-176
Number of pages6
ISBN (Electronic)9781631900433
DOIs
StatePublished - Jan 19 2015
Event10th IEEE/EAI International Conference on Collaborative Computing, CollaborateCom 2014 - Miami, United States
Duration: Oct 22 2014Oct 25 2014

Publication series

NameCollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

Other

Other10th IEEE/EAI International Conference on Collaborative Computing, CollaborateCom 2014
CountryUnited States
CityMiami
Period10/22/1410/25/14

Fingerprint

Digital libraries
Metadata
Search engines

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Cite this

Wu, J., Williams, K., Khabsa, M., & Giles, C. L. (2015). The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective. In CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing (pp. 171-176). [7014562] (CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.4108/icst.collaboratecom.2014.257563
Wu, Jian ; Williams, Kyle ; Khabsa, Madian ; Giles, C. Lee. / The impact of user corrections on a crawl-based digital library : A CiteSeerX perspective. CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 171-176 (CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing).
@inproceedings{e07b6b3edc544bbd90011c5b599888e4,
title = "The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective",
abstract = "CiteSeerX is a crawl-based digital library search engine providing free access to more than 4 million academic papers. Since metadata in the digital library is obtained through automatic extraction, it is inevitable that errors will occur. CiteSeerX offers a feature allowing registered users to correct paper metadata including titles, authors, abstracts, publication years, venues, etc. We claim that user corrections, as a form of crowd-collaboration, provide a useful and efficient way to improve metadata quality and the impact of the digital library. As evidence to support this claim, we investigate user corrections from the last 5 years and analyze: the nature of the corrections; the quality of the corrections; and the impact of the corrections on downloads.",
author = "Jian Wu and Kyle Williams and Madian Khabsa and Giles, {C. Lee}",
year = "2015",
month = "1",
day = "19",
doi = "10.4108/icst.collaboratecom.2014.257563",
language = "English (US)",
series = "CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "171--176",
booktitle = "CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing",
address = "United States",

}

Wu, J, Williams, K, Khabsa, M & Giles, CL 2015, The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective. in CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing., 7014562, CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, Institute of Electrical and Electronics Engineers Inc., pp. 171-176, 10th IEEE/EAI International Conference on Collaborative Computing, CollaborateCom 2014, Miami, United States, 10/22/14. https://doi.org/10.4108/icst.collaboratecom.2014.257563

The impact of user corrections on a crawl-based digital library : A CiteSeerX perspective. / Wu, Jian; Williams, Kyle; Khabsa, Madian; Giles, C. Lee.

CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. Institute of Electrical and Electronics Engineers Inc., 2015. p. 171-176 7014562 (CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - The impact of user corrections on a crawl-based digital library

T2 - A CiteSeerX perspective

AU - Wu, Jian

AU - Williams, Kyle

AU - Khabsa, Madian

AU - Giles, C. Lee

PY - 2015/1/19

Y1 - 2015/1/19

N2 - CiteSeerX is a crawl-based digital library search engine providing free access to more than 4 million academic papers. Since metadata in the digital library is obtained through automatic extraction, it is inevitable that errors will occur. CiteSeerX offers a feature allowing registered users to correct paper metadata including titles, authors, abstracts, publication years, venues, etc. We claim that user corrections, as a form of crowd-collaboration, provide a useful and efficient way to improve metadata quality and the impact of the digital library. As evidence to support this claim, we investigate user corrections from the last 5 years and analyze: the nature of the corrections; the quality of the corrections; and the impact of the corrections on downloads.

AB - CiteSeerX is a crawl-based digital library search engine providing free access to more than 4 million academic papers. Since metadata in the digital library is obtained through automatic extraction, it is inevitable that errors will occur. CiteSeerX offers a feature allowing registered users to correct paper metadata including titles, authors, abstracts, publication years, venues, etc. We claim that user corrections, as a form of crowd-collaboration, provide a useful and efficient way to improve metadata quality and the impact of the digital library. As evidence to support this claim, we investigate user corrections from the last 5 years and analyze: the nature of the corrections; the quality of the corrections; and the impact of the corrections on downloads.

UR - http://www.scopus.com/inward/record.url?scp=84923013374&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923013374&partnerID=8YFLogxK

U2 - 10.4108/icst.collaboratecom.2014.257563

DO - 10.4108/icst.collaboratecom.2014.257563

M3 - Conference contribution

AN - SCOPUS:84923013374

T3 - CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

SP - 171

EP - 176

BT - CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Wu J, Williams K, Khabsa M, Giles CL. The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective. In CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. Institute of Electrical and Electronics Engineers Inc. 2015. p. 171-176. 7014562. (CollaborateCom 2014 - Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing). https://doi.org/10.4108/icst.collaboratecom.2014.257563