Optimising and automating the choice of search strings when investigating possible plagiarism

Mike Child, Fintan Culwin

Research output: Contribution to conferencePaper

Abstract

This paper describes how to optimise the use of Internet search engines when investigating a document for possible non-original content. Services such as Turnitin do not guarantee to identify all non-original content, leading tutors to have to conduct manual searches when suspicion of non-originality remains. Previous studies have suggested that the investigator should manually select memorable phrases from the paper and submit them to a general search engine. The studies in this paper demonstrate that selecting phrases at random is just as effective. Several corpora of documents were obtained from a number of different academic areas, and several phrases were obtained from each. Strings, of increasing length starting with a single word, from these phrases were submitted to specialised and general search engines and the number of hits recorded. A common finding of these searches was that, in almost all cases, strings of six words were sufficiently distinct to uniquely identify the document that the string was taken from. One consequence of this is that totally automated tools are possible for this search-engine based non-originality detection technique.
Original languageEnglish
Publication statusPublished - 2010
Event4th International Plagiarism Conference -
Duration: 1 Jan 2010 → …

Conference

Conference4th International Plagiarism Conference
Period1/01/10 → …

Keywords

  • Plagiarism, academic integrity, non-originality analysis, internet search engines

Fingerprint

Dive into the research topics of 'Optimising and automating the choice of search strings when investigating possible plagiarism'. Together they form a unique fingerprint.

Cite this