Classifying and ranking search engine results as potential sources of plagiarism

Kyle Williams, Hung Hsuan Chen, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Source retrieval for plagiarism detection involves using a search engine to retrieve candidate sources of plagiarism for a given suspicious document so that more accurate comparisons can be made. An important consideration is that only documents that are likely to be sources of plagiarism should be retrieved so as to minimize the number of unnecessary comparisons made. A supervised strategy for source retrieval is described whereby search results are classified and ranked as potential sources of plagiarism without retrieving the search result documents and using only the information available at search time. The performance of the supervised method is compared to a baseline method and shown to improve precision by up to 3.28%, recall by up to 2.6% and the F1 score by up to 3.37%. Furthermore, features are analyzed to determine which of them are most important for search result classification with features based on document and search result similarity appearing to be the most important.

Original languageEnglish (US)
Title of host publicationDocEng 2014 - Proceedings of the 2014 ACM Symposium on Document Engineering
PublisherAssociation for Computing Machinery, Inc
Pages97-106
Number of pages10
ISBN (Electronic)9781450329491
DOIs
StatePublished - Jan 1 2014
Event2014 ACM Symposium on Document Engineering, DocEng 2014 - Fort Collins, United States
Duration: Sep 16 2014Sep 19 2014

Publication series

NameDocEng 2014 - Proceedings of the 2014 ACM Symposium on Document Engineering

Other

Other2014 ACM Symposium on Document Engineering, DocEng 2014
CountryUnited States
CityFort Collins
Period9/16/149/19/14

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Classifying and ranking search engine results as potential sources of plagiarism'. Together they form a unique fingerprint.

  • Cite this

    Williams, K., Chen, H. H., & Giles, C. L. (2014). Classifying and ranking search engine results as potential sources of plagiarism. In DocEng 2014 - Proceedings of the 2014 ACM Symposium on Document Engineering (pp. 97-106). (DocEng 2014 - Proceedings of the 2014 ACM Symposium on Document Engineering). Association for Computing Machinery, Inc. https://doi.org/10.1145/2644866.2644879