The Effectiveness of Plagiarism Detection Software

As you'd expect, it's not very good:

But this measure [Turnitin] captures only the most flagrant form of plagiarism, where passages are copied from one document and pasted unchanged into another. Just as shoplifters slip the goods they steal under coats or into pocketbooks, most plagiarists tinker with the passages they copy before claiming them as their own. In other words, they cloak their thefts by scrambling the passages and right-clicking on words to find synonyms. This isn't writing; it is copying, cloaking and pasting; and it's plagiarism.

Kerry Segrave is a right-clicker, changing "cellar of store" to "basement of shop." Similarly, he changes goods to items, articles to goods, accomplice to confederate, neighborhood to area, and women to females. He is also a scrambler, changing "accidentally fallen" to "fallen accidentally;" "only with" to "with only;" and, "Leon and Klein," to "Klein and Leon." And, he scrambles phrases within sentences; in other words, the phases of his sentences are sometimes scrambled.

[...]

Turnitin offers another product called WriteCheck that allows students to "check [their] work against the same database as Turnitin." I signed up and submitted the early pages of Shoplifting. WriteCheck matched many of Shoplifting's phrases to those of the i>New York Times articles in its library of student papers. Remember, I submitted them as a student paper to help Turnitin find them; now WriteCheck has them too! WriteCheck warned me that "a significant amount of this paper is unoriginal" and advised me to revise it. After a few hours of right-clicking and scrambling, I resubmitted it and WriteCheck said it was okay, being cleansed of easily recognizable plagiarism.

Turnitin is playing both sides of the fence, helping instructors identify plagiarists while helping plagiarists avoid detection. It is akin to selling security systems to stores while allowing shoplifters to test whether putting tagged goods into bags lined with aluminum thwart the detectors.

Posted on September 19, 2011 at 6:35 AM • 37 Comments

Comments

D0RSeptember 19, 2011 6:44 AM

Well, as I always say, Turnitin is playing both sides of the fence, helping instructors identify plagiarists while helping plagiarists avoid detection. It is akin to selling security systems to stores while allowing shoplifters to test whether putting tagged goods into bags lined with aluminum thwart the detectors.

BazSeptember 19, 2011 7:21 AM

There's another interesting plagiarism detection system at http://churnalism.com/. It shows news articles which are barely-altered press releases, resulting in lazy journalism like this: http://churnalism.com/leq85/ where the same article appears in multiple papers, almost word-for-word. Unlike in education, it seems there's no incentive for newspaper editors to stop this; so journalists don't even try to hide it.

RorySeptember 19, 2011 7:33 AM

I think the analogy at the end here is unfair. With the store security example, you would be cloaking the item until eventually you could take the valuable object out unguarded. Now, suppose the system is responding to this and getting better at catching these subtle efforts. Eventually the shoplifter gets out with someone else's product. But in the writing case, at some point it just will not be Plagiarism - I think it would be such a sophisticated copy that it would no longer be its own thing. Highly unoriginal, but altered enough that it just is not close enough to the old thing anymore.

Another KevinSeptember 19, 2011 7:42 AM

Fortunately, for high-school and undergraduate papers, a system like Turnitin doesn't need to be perfect, or even nearly so. At some point, the amount of recasting needed to pass the screen - even with checking - approaches the amount of work needed simply to make the paper original. This is particularly true because these papers aren't presenting original research; rather, they are exercises in the mechanics of research. There is only so much that can be said on their usual subjects. A paper that makes sense, cites sources and isn't directly lifted from something else is probably the best that an instructor can expect.

Automated detection of plagiarism will do for the "copy and paste" form of plagiarism. At the level of the typical term paper, however, a skillfully-crafted paraphrase will be nearly indistinguishable from "original" work. These papers are covering already well-trodden ground, and a skillfully-crafted paraphrase will be nearly indistinguishable from an "original" work.

What concerns me more is the problem of false positives. I've seen this in practice: a couple of correctly-cited direct quotations from another source, and the system starts complaining that the student's paper is plagiarizing that source. The problem is particularly severe when the purpose of the paper is to criticize that source, and the student has included extensive quotations to illustrate the argument.

Habitual cheating catches up with the student eventually: in the long run, only the student loses. There isn't that much harm in letting a few clever cheaters through the screen. There's considerably more harm in accidental damage to the reputations of the honest.

As long as student assessment consists of having the student regurgitate the ideas in the textbook while attempting to support them with a different set of sources and in a different set of words, there will be an opportunity to game the system by somehow processing the ideas without learning them. The fact that the Internet simultaneously makes it easier to find sources to copy and makes it easier to find the source that someone has copied from doesn't appear to change the balance very much. The fact that the tools are fallible is not the fault of the tools. The tools can't replace the judgment and experience of a teacher, only augment it.

Pete SSeptember 19, 2011 7:46 AM

You are crediting students with too much nouce. Most I have seen in 10 years plus are just copying whole swathes of text and claiming it as their own work.

This is really evident when marking multiple responses to an essay question. You get to read the same source text many times. That makes it relatively easy to flag the text as plagiarised

Scott herbertSeptember 19, 2011 7:48 AM

The problem is it's a ballancing act, POS (Part of speech) is much more effective at finding plagiarists then just comparing the hash of a sentance with a stored copy (the way the smart money think turn-it-in does it). However it's much more expensive in term of computational power and storage.

WinterSeptember 19, 2011 8:18 AM

An amusing anecdote:

Student submits work. Teacher checks and finds it is a copy of a work on the internet. Student graded F. Student insists her work is original and written by her, and only her.

In the end, it was found out that the teacher forgot to check who had written the work on the internet: The very student she accused of plagiarism. The student had uploaded it to an official site for term papers.

The same with some writers in the Netherlands. The author accused of plagiarism was the original author. Several other writers had copied from her work. They just mixed up the publishing dates.

Note too that some Universities have a policy that students can be punished for plagiarizing their very own work. My university tried to label the practice that a student would submit the same answer (ie, code) to identical questions in different courses as "fraud". Fraud that was punishable like any other exam fraud.

PaeniteoSeptember 19, 2011 8:34 AM

@Winter: "a student would submit the same answer to identical questions in different courses"

The *real* problem here is not the question whether the student commits fraud or not...

wiredogSeptember 19, 2011 8:41 AM

In college I took a paper I'd written in high school, re-typed it, and submitted it. Got a better grade. I always considered that self-plagiarism, but felt no guilt.

Thomas B.September 19, 2011 9:32 AM

> I've seen this in practice: a couple of correctly-cited direct quotations ... and the system starts complaining...

I was under the impression Turnitin told you which passages were copied, so the system would make such cases easy to distinguish.

Even so, I'm greatly offended at the Turnitin erosion of author's rights.

Imagine a professor who announced, "If any student writes a noteworthy piece, I will publish it under my own name with no attribution, and keep all the profits. If you refuse this arrangement, you'll be kicked out of school."

Seem abhorrent? The Turnitin case is only one step removed. Professors submit student works to a third party for its profit. Little comfort, introducing the unfettered ability to transfer rights around.

Stupid Security QuestionsSeptember 19, 2011 9:35 AM

@Winter: That's actually pretty ridiculous. First of all 'plagarism' is defined pretty clearly everywhere I look as stealing the words or ideas of another person and passing them off as your own. Reusing your own ideas decidedly does not count as that.

As a writer I have once or twice lifted passages from my earlier works because I had a particular phrase that I liked, that was well suited to the new situation. Sometimes that old material was previously published (or submitted) and sometimes it was just something unfinished I had lying around. That's absolutely not unethical in any ways unless I'm selling it to two different publishers and promising both publishers that they're the only ones I sold it to. Submitting an exam or assignment has no such agreement, since there's no economic reason for it.

As Paeniteo suggests, two courses having questions that can be answered by the exact same code means that: a) you're having your students do at best mindless busywork and at worst problems that aren't really related to the course, and b) your curriculum is probably poorly designed.

Anonymous 1September 19, 2011 9:43 AM

Another Kevin:

What concerns me more is the problem of false positives. I've seen this in practice: a couple of correctly-cited direct quotations from another source, and the system starts complaining that the student's paper is plagiarizing that source. The problem is particularly severe when the purpose of the paper is to criticize that source, and the student has included extensive quotations to illustrate the argument.
The plagiarism detection software only claims to mark passages which appear identical to other documents and specifically states that the marker is to examine the context to determine whether it is plagiarised (i.e. properly cited quotes and bibliography entries get flagged by the person marking realises there's no problem there).

Another Kevin:

Habitual cheating catches up with the student eventually: in the long run, only the student loses. There isn't that much harm in letting a few clever cheaters through the screen. There's considerably more harm in accidental damage to the reputations of the honest.
I'd say mostly true as in cases where there are quotas a cheater may very well take the place of an honest student who actually deserved to be there.

Pete S

You are crediting students with too much nouce. Most I have seen in 10 years plus are just copying whole swathes of text and claiming it as their own work.
Most probably are like that but there are some who will find a way around (and a lot of them probably tell their friends).

Glenn MaynardSeptember 19, 2011 9:46 AM

This is just the usual misunderstanding of plagiarism: the idea that if you take someone else's stuff and make enough small changes, it becomes yours and isn't plagiarism anymore.

The real problem is in reenforcing that notion--that's more damaging in the long term than cheating on papers during college.

maybe cheating worksSeptember 19, 2011 9:58 AM

I think cheating students benefit more from it than several posters suggest. When an education system is degrading into a gateway hurdle, and employment depends more on your grade point and who you know than what you know... plagiarism, and all other forms of cheating might be a very successful strategy. There are some fields for which it might not succeed. But how many of us have noticed that even in technical fields, our actual work does not depend upon anything we learned in school? I don't advocate cheating. I was too arrogant myself to ever descend to that level, and I think it ultimately does impoverish everyone who does it, in an intellectual and spiritual sense. But it may actually benefit them in a pecuniary sense.

Anonymous 1September 19, 2011 10:40 AM

From what I've heard the personality of a student is what really determines whether they cheat and that there's very little relation between how well a student is doing and their likelihood of cheating.

askmeSeptember 19, 2011 10:48 AM

Not just students: My alma mater's President.

For the record, this is one of the genuine gents I've met and had more integrity than most. Just took a shortcut and forgot to reference.

Duane GranSeptember 19, 2011 10:53 AM

If the security is weak, the institutional support to punish cheaters is even weaker. I've heard enough from professors to sense that it is a difficult and thankless task to punish plagiarism in the classroom. In word it is the darkest deed committed in the academy, but in practice is not penalized often.

Poser of Brucedom Currently Being TrackedSeptember 19, 2011 12:18 PM

I agree with Kevin. My mom taught me how to plagerize. I take something from somewhere and write it again until I'm sure nobody would catch the plagerizm.

Later I recognized what she'd done. My mom was smart.

DavidSeptember 19, 2011 12:18 PM

Given the threats massed against those accused of plagiarism, I think it's only appropriate that a student have some means to safeguard themselves.

MarkHSeptember 19, 2011 1:49 PM

A few of the comments have referred to defining plagiarism to include submitting one's original work for more than one academic assignment.

Whatever one may think of the merits of this idea, I recall that it was part of Columbia University's plagiarism policy in the mid 70s. I just looked up Columbia's present policy; in a list of 13 examples of academic dishonesty, the second item is:

Self-plagiarism (the submission of one piece of work in more than one course without explicit permission of the instructors involved)
(emphasis added).

ChrisSeptember 19, 2011 4:06 PM

I remember as a student at the University of Virginia, so-called "double submission" was an honor offense, serious violations of which would get you kicked out of the university. I found the idea of plagiarizing oneself as being an honor offense to be stupid, and still do.

edSeptember 19, 2011 4:54 PM

@MarkH

In software engineering courses, that would be called "sensible reuse of tested code", and would be graded higher than reinventing the wheel. Assuming the course level has progressed beyond learning the mechanics of software wheels.

JeremySeptember 19, 2011 5:46 PM

I took a couple of writing classes in high school ("Imaginative Writing" and "College Composition") where the instructors informed the class that students taking both were allowed to re-submit a single piece written for one course in the other, but only one. I thought that was a rather peculiar rule.

Since the courses were teaching students to write, rather than testing their knowledge of the subjects written about, requiring new work for each assignment does kind of make sense. But if that's your argument, why allow even a single reuse?

Maria HelmSeptember 19, 2011 8:06 PM

My son's school actually encourages students to use WriteCheck. Their statement is that students can identify passages within their work which require citation, and then make sure they have provided those citations. They advise students that their teachers will also be using it for the same purpose.

anonySeptember 20, 2011 4:56 AM

> The Turnitin case is only one step removed. Professors submit student works to a third party for its profit. Little comfort, introducing the unfettered ability to transfer rights around.

@ThomasB: Indeed! I remember returning to school for a postgrad degree after years in the private sector. I was quickly informed that departmental policy for submission of papers was via TurnItIn.

I took the time to read the terms of service and was appalled by the policy: To pass a course I was required to use the software, but using the software meant giving TurnItIn full digital publishing and distribution rights. As the papers I was submitting were drafts of papers later published in peer reviewed journals, I bluntly refused, and caused a huge stink -- I successfully argued that if my previous papers could pass peer review in top-ranked journals, then my current papers were above petty plagiarism, and if they forced me to give up authorship rights to TurnItIn, I would use the bureaucracy to apply the same rules to the researchers on the faculty.

The best researchers of the faculty got wind of this, read the TurnItIn ToS and immediately asked: "We force our grad students to DO WHAT?!" and that was the end of that...

GeorgeSeptember 20, 2011 12:16 PM

Interesting. I have noticed Turnitin's bot crawling my Web site many times. By not taking active steps to block it, have I perhaps inadvertently "agreed" to Turnitin's terms of service and granted them ownership of my Web site? I can't imagine any court affirming that scenario, but courts seem to be willing to give Corporate Persons greater rights than those of non-corporate persons.

Maybe I will block that crawler. It seems rather aggressive, and there doesn't seem to be any benefit to letting it consume my bandwidth (and content). If a student plagiarizes my content, he or she will be the one who gets the F, not me!

Onkel BobSeptember 20, 2011 6:34 PM

As a TA, I loved Turnitin, but not for plagiarism detection, but rather for allowing me to grade papers anywhere I had an internet connection. At the time I was commuting coast to coast so not having to lug papers through airports was a feature. The s/w also has sophisticated mark-up features, which allowed me to communicate with students without problem. As for plagiarism, my assignments all but precluded such activity, and the students loved or hated me for it.

ScottSeptember 22, 2011 1:30 PM

Two points:

1. This just tells me that programs like Turnitin aren't a panacea; they have an effective scope. Granted they only catch very low-hanging fruit but: A) I wouldn't trust any single product that claims to solve all securit issues and B) What's wrong with catching low-hanging fruit?

2. I don't think it's a bad thing that tools can be used to determine whether a given paper will pass Turnitin's muster. I don't believe in "security through obscurity."

PDF FTWSeptember 22, 2011 8:59 PM

I have to use Turn it In for some extramural papers I'm doing, and I'm not terribly happy about it. So I've started submitting PDFs of photocopies of my assignments. Strictly it's against the varsity's rules, but no one seems to have noticed yet - although the 0% correlation score does get some odd comments.

Dirk PraetSeptember 23, 2011 2:42 AM

Since it's a business, they will work all sides of the fence. Since it's software, it will have flaws. I can live with both.

But what I have a real problem with are their ToS. How can anyone be asked to voluntarily give up authorship rights on an original piece of work for the sole purpose of trying to detect plagiarism ? I guess that would work well too in the music industry.

IsotecSeptember 23, 2011 11:18 AM

While I agree that the software has limitations, it was actually surprising to me (as a former college instructor) how many students do use flagrant forms of plagiarism. Students will copy and paste entire passages from Wikipedia and try to pass it off as their own. I feel like the issue is time and laziness; most of my students wouldn't have even taken the time to check and see if Turnitin caught their plagiarism. They'd rather risk expulsion.

AlexSeptember 27, 2011 7:39 PM

I actually REFUSED to allow my writing to be submitted to Turnitin due to their keeping copies of the work and profiting from it. Fortunately my professors agreed.

From what I've seen, the professors and medical journals need to be the ones questioned about plagiarism. One of my profs got nailed as they plagiarized part of a textbook they published! Nailed by the original author. NOT the school -- the Uni calls him a "Distinguished Professor Emeritus". More like disgraced in my book. If any undergrad tried the same trick, they'd be kicked to the street. Never mind the prof was SELLING plagiarised work.

Dan SmithSeptember 28, 2011 4:45 PM

What concerns me about this "evolution" to better and better detection software that has broader criteria is this: There's only a limited (though sometimes large) way of saying things about a given subject, and especially a given book or article or person, especially when the information the student has comes from teachers all teaching the same thing, primary and secondary sources all saying the same thing, etc.

Applying Bayesian analysis, at one point it's a virtual certainty that papers will start being deemed "plagiarism" even though a student wrote it himself. It's inevitable.

Dan SmithSeptember 28, 2011 4:47 PM

About ten years ago a student copied straight from the internet into a paper. What clued me in was that he also copied the car advertising in the middle of the article.

Adele PappaMay 5, 2012 1:16 PM

I'm really suspicious of a good number my students' research papers. SafeAssign doesn't show any plagiarized passages, but when I read them I sense that they got their material from somewhere other than their own heads. I also noticed that there are some strange word usages which suggests to me that there was some creative use of the thesaurus going on.

I downloaded the documents they submitted to SafeAssign into a Notepad document and saw that the docs are full of programming codes. Some show only the codes and no text from the research paper at all! My son says it looks Unicode which, I've read, is used by students to circumvent Safeassign's plagiarism detection functions.

Is there any reason a document submitted to SafeAssign would have extensive Unicoding in it? I opened one of my own documents in Notepad and there wasn't a single code mark. In one of the documents there is the name of a company that sells software that converts files from one format to another, which I thought was really strange as well.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc..