Tuesday, February 17, 2015

Finding cheaters using multiple-choice comparisons

Summary

An interesting method by which I found out that people were cheating on my final exam.

Background

I use different versions of midterm examinations to discourage cheating in my population biology class (~200 students). When the course started, I used to do the same thing for the final exam, but it was a little more complicated, because the final exam is administered by the registrar's office, not by me and my teaching team.
At some point, somebody advised me not to bother with versions: the registrar's office is supposed to be professional about administration, and they usually mix people who are taking different exams in the same room, so I stopped bothering with different versions for the final exam for a year or two. I do it again now, and you'll see why.

The incident

In the year in question, my exam was given in two separate medium-sized rooms. My class was alone in these two rooms. I received a report from the invigilators in Room 1 about suspicious behaviour. They had warned a couple of students for acting strangely, and then warned them again. They weren't prepared to say that they were sure that the students were cheating, but wanted me to compare their answer slates. In retrospect, they should have left the students alone until they were ready to sign a complaint against them (or until they had cheated enough to have it proved against them).

My response

The final is entirely multiple choice. I got the results files from the scantron office. I figured that I wouldn't quite know what to do with a comparison just between these two kids (unless the tests were identical), and that it would be just about as easy (and far more informative) to compare everybody to everybody else. It's still kind of hard for me to get used to the fact that we have computers now and can really do stuff like this. I calculated the number of identical right answers and the number of identical wrong answers for each pair of students (~18K pairs), and plotted it out.
(cplot.Rout-0.png)
The line corresponds to forty total shared answers (two students having identical test papers). This did not happen. But there were four points near the line that looked like clear outliers to me:
(cplot.Rout-1.png)

The follow up

I wasn't sure what to do next, but the registrar's office knew. They make seating maps during exams. They didn't offer to help out, but I was allowed to go and examine the maps.
The results were amazing.
  • All four of the identified pairs were seated adjacent (three pairs were side by side, and the fourth pair had one student behind the other). The probability that this might have happened by chance is beyond ridiculous.
  • None of the four identified pairs were seated in the room where the alert invigilators hassled the pair of cheaters. This might have been by chance, but I doubt it. Likely the invigilators in the other room were visibly less alert.
I talked to the academic integrity office, and various experts, and figured out that it really was impossible to be sure who had cheated in the side-by-side pairs. I did put all 6 of them through a bit of an ordeal, though, and at least half of them deserved it. I was also unable to convict the person in front of the front-back pair (although it's hard to see how that one would have worked without collusion). The person in the back of the front-back pair denied all knowledge, but received a zero for the exam grade plus a confidential, temporary notation of my finding at the integrity office (the strongest punishment I was allowed to give). They promised to fight it, but never did.

Postscript

I now use versioning, but I'm starting to discover that this does not necessarily prevent cheating, either. I may have more adventures to report, soon.
   I definitely get the feeling that the person I caught cheated their way through Mac. The initial response to my call was pretty relaxed. They did get an F in my class (I couldn't give an automatic F for the class, but the exam zero was sufficient). They retook the class and passed, expunging the F, and graduating presumably with a clean record.
   I have heard a lot of anecdotal reports of people dealing with cheating informally (or not at all). It's kind of depressing. My impression is that Mac has a cheating problem, and we need to fight back.

Code


The code used to produce these plots in R is shown here.

61 comments:

  1. Given the way the main bulge appears skewed, I think it might look better if you plotted shared wrong answers vertically and shared total answers (right+wrong) horizontally. Once you do that, the unsuspicious part of the plot appears very much like a (2-d) normal distribution. Is that what theory says we should get in this case?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. A 2-D normal distribution can be diagonal. What you propose is a linear transformation to make the distribution straight (isotropic), but a linear transform of a normal dist is a normal dist, therefore the above dist is normal.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. It would be interesting to create a dataset out of this:
    Node n = [A's answers, B's answers, distance(A,B)]

    If there is indeed a mathematical relationship between the way two people answer questions and how far apart they are (i.e. cheaters that sit next to each other, behave very similarly), then that relationship should be detectable by using a regressor.

    What if you made this dataset, and trained a machine-learning algorithm on 80% of the dataset (excluding the outliers)? You could use a random forest to detect the outliers that you suspect of cheating:
    http://artax.karlin.mff.cuni.cz/r-help/library/CORElearn/html/rfOutliers.html

    Disclaimer: I am in no way saying that you came to the wrong conclusion. I am interested from a purely ML point of view; is there a direct relationship between how similarly people answer and how close people sit next to each other?

    ReplyDelete
  4. I just ran this code for my recent exam (intro majors biology, 165 students, 52 questions). I did have a point on the line. What would be the easiest way to identify which two rows from my exam data were identical?

    ReplyDelete
  5. Jordan: I have added code to track the actual identifiers. I think I removed some of this code while anonymizing. I also added a function that prints out the top suspicious matches. Check the updated version of the page.

    ReplyDelete
    Replies
    1. Perfect! This worked like a charm. Thanks for such a quick response.

      Delete
  6. Daniel:

    I'm pretty sure you could make a reasonable theory for either version being closer to (independent) normal. But I also think it would be worth doing to see what the actual pattern tells you about how the kids are behaving.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. interesting graph. thanks for the insights.As an old prof who never gave a multiple choice test, cheating in multi-step mathematical problem homework or exams was always easier to spot in wrong answers than correct answers

    a constructive comment;this graph would flunk visual communications 101.
    1. the use of the phrase "shared data" as the axes labels is very misleading on a cheating-related graph.
    2. It has no title
    3. from the legend, it appears that the dot diameters were used to scale the circles, rather than the areas which our perceptual system uses uses to make judgements.

    j

    ReplyDelete
    Replies
    1. I didn't use 'the phrase "shared data"' and I'm not sure what you are suggesting I have misled people about.

      I agree in general about area-based scaling. The purpose of this picture is to identify the outliers. Since there can be over 300 observations at one point, area-based scaling would render the peripheral points hard to see, and the outliers effectively invisible. I have added such a plot to the page.

      Finally: if you had really intended to be constructive, you could have skipped the clause after your assertion that your comments were constructive.

      Delete
  9. love the way the distribution is represented, it's very visually pleasing.

    ReplyDelete
  10. I occurs to me that you could also use this type of analysis to determine if you've got bad questions that everyone is getting wrong.

    ReplyDelete
    Replies
    1. This was my thought too. Use this tool as an evaluation of your test instead of trying to catch cheaters. If you have people in the room that aren't doing their jobs to watch the crowd then that is where your problem is.

      Delete
    2. This was my thought too. Use this tool as an evaluation of your test instead of trying to catch cheaters. If you have people in the room that aren't doing their jobs to watch the crowd then that is where your problem is.

      Delete
    3. That's a good point. The policy is that I'm supposed to investigate and recommend discipline when I suspect cheating. There is also the idea of a deterrent effect. I hear a lot of complaining that McMaster has "a culture of cheating".

      But the idea of prevention rather than punishment has its appeal. OTOH, if I weren't willing to punish, I guess I'd have to give up on for-credit assigments. OTTH, I may have to do that anyway -- apparently there is at least one facebook page dedicated to sharing information about my assignments.

      Delete
    4. Yes, I do use the coding tool to evaluate my questions as well. Typically, I don't have any questions that a majority gets wrong, but I often have hard questions that 30 or 40% of people get wrong. I just recently wrote something that decodes versioned exams so that I can examine answer distributions for those.

      Delete
  11. There is a disturbing lack of mathematics here considering the gravity of the accusations. What statistical tests were done to determine who the outliers were? Was it just by visual inspection of the graph?

    As well, it does not seem to me that similar answers are necessarily an indication of cheating. Two people who choose to sit beside one another during a final are likely friends who study together.

    ReplyDelete
    Replies
    1. I agree. I'm not comfortable with a statistical analysis being used as proof of cheating. That's something to take pretty seriously and 95% sure doesn't cut it.

      Delete
    2. >What statistical tests were done to determine who the outliers were? Was it just by visual inspection of the graph?

      You have a line that gives you maximum possible values and you already know to check specifically for a pair with a large number of correlating wrong answers (as a basis to be suspicious).

      >Two people who choose to sit beside one another during a final are likely friends who study together.

      That's plausible and would certainly reduce the odds of cheating for certain types of problems. That said, from what I gathered reading the blog, I believe the seating is assigned.

      My concern with this particular method would be that it would only catch the worst cheaters. The cheater's answers must correlate quite a bit with a single one of his peers - and not diversifying where you pull answers from is just bad form. The only justifications for copying most of your answers from one person would be (1) the peer is known to the class as very intelligent (in which case, this method wouldn't even catch the cheater as the cheater would also score well), and (2) collusion. Collusion is problematic, as Dr. Dushoff realized because now instead of one suspect, you have two - and they both must be guilty but they'll both claim innocence. The slim possibility that one person is actually innocent however will keep you from punishing both, which is a real predicament.

      That said, this was a very interesting read and a great start to solving a fairly persistent problem in college. I'd be interested in seeing more datasets and improvements.

      Delete
    3. Agreed. This method makes me very uncomfortable. If the proctors didn't catch it then it's done.

      Delete
  12. Great post, but i can't help but use this graph to show my students a good example of a bad graph!

    ReplyDelete
  13. "I did put all 6 of them through a bit of an ordeal, though, and at least half of them deserved it."

    This part of the post is extremely disturbing. The implication here is that some students were targeted and put through an ordeal despite being innocent. The fact that you are so excited to exercise such callous indifference to you students seriously calls into question whether you are actually fit to be a professor.

    You have tried extremely hard to prove that your students are cheaters and you likely could have failed students who did nothing wrong (i.e. another student copied their test without their knowledge). It is your prerogative to make sure that cheating is not occurring, not for the student to ensure that no one can sneak a look at his or her test.

    In the end, some of your students may have failed the test, but it is really you who has failed as a professor. Oh, the irony.

    ReplyDelete
    Replies
    1. You've misunderstood him I think: He gave them a grilling, a harsh talking to, there were no actual academic effects for all the side-by-side pairs.

      If one of them was actually innocent then he'd know it and know the stern telling-off didn't apply to him, so it's nothing to worry about.

      Delete
    2. Is this what happened? If so, Jonathan would benefit from clearing that up because I also misunderstood him.

      Delete
    3. "The person in the back of the front-back pair denied all knowledge, but received a zero for the exam grade plus a confidential, temporary notation of my finding at the integrity office (the strongest punishment I was allowed to give). They promised to fight it, but never did."

      " I definitely get the feeling that the person I caught cheated their way through Mac. The initial response to my call was pretty relaxed. They did get an F in my class (I couldn't give an automatic F for the class, but the exam zero was sufficient)."

      From a legal perspective, if this your standard of proof in any respectable criminal hearing (as a judge) you would find yourself in hot water very, very quickly. It seems to me that this may be a creation of both you ego and frustration, which you took out on potentially innocent students. Also, did it ever occur to you that the relaxed response to the call was due the student thinking that he did not commit the act, and therefore had nothing to worry about. Yes, cheating happens. Is this proof that these (x) students cheated? No.

      Delete
    4. I'm pretty convinced that the student I convicted cheated. There were other anomalies from their other tests, for example. They denied being friendly with the person in front of them. They did not make any claims about studying with others. They said that they would appeal, and then did not.

      There were many, many pairs of students who studied together and did or did not sit together (they did not have complete control over who they sat with), but these four pairs seemed like clear outliers on the graph, and all four pairs were adjacent!

      I avoided using an explicit statistical test on the answers because I think that that could have greatly exaggerated the improbability of answer matching, for reasons that have been stated here. I found the existence of visual outliers, combined with perfect, independent verification from seating chart, to be far more convincing. I did not consider the possibility of their being a strong correlation between study patterns and seating patterns, but now that I do, I don't really buy it.

      Delete
    5. "They denied being friendly with the person in front of them."

      Did you consider that under high pressure situations, it is fairly common for students to lie (e.g. tests can generate cheating due to increased pressure). Its very common for people under investigation to spout whatever answer they think will make them seem less guilty. In this case, a friendship might, from their perspective, make collusion seem more likely?

      "They did not make any claims about studying with others."

      The same reasoning applies... In addition, students should not be required to prove their innocence. Their guilt should be proven by their professor. Its also a possibility that the students simply did not think to bring up the "studying together" in hopes of reducing an administrator's/professor's suspicion of them cheating. Lastly, its entirely possible, as I stated before, that this is in fact a coincidence and they were not "friends" (a very ambiguous term) or studied together at all.

      "They said that they would appeal, and then did not."

      Victims of an incorrect/ false judgment more often then not fail to appeal. Reason for this phenomena are extensive and well documented. The lack of appeal can simply show a lack of faith in the appeal system, dismay at disproportionate power within the legal (or in this case) academic setting. However, the most common reason is often simply they have just had a very negative experience and may not want to wast any additional time or effort on someone who has clearly already presumed them to be guilty.

      I am just wondering why you would not simply accept and live with your suspicions after a moderate and reasonable inspection into the matter ("grilling"does not sound very moderate or reasonable to me) and change back to versioning your tests the next semester? It seems like you made a choice to change your format, did not like the potential results, and proceeded adopted a relatively poorly though out yet exceedingly draconian system of academic punishment... If students pay to attend your course and fail to take advantage of your instruction, that should be squarely on their shoulders. I do however have a problem with your reasoning as it seems to be exceptionally narrow and focused simply from your perspective. I am certainly not saying that these students without a doubt refrained from cheating. However, there still remains a clearly identifiable level of doubt as to their guilt. You (the institution) accepted their money and punished them with this reasonable doubt being addressed by seeking to support your own conclusion with this narrow and therefore problematic reasoning.

      Delete
    6. I did my best to follow the McMaster policy. The final decision is not mine, and I'm pretty comfortable with my choices.

      That said, the argument above that I should work harder on proctoring also has merit.

      _That_ said, I've done that, too. My most recent test had no computer-caught cheaters, and some proctor-caught cheaters.

      Delete
    7. Again, the real issue here is not with your statistical analysis. The issue lies with your attitude toward your students, a lesson that seems to be lost on you. You clearly are not giving them the benefit of the doubt, and you are treating them as if they are guilty until proven innocent. A good professor (or teacher or mentor) would know that this attitude is extremely inappropriate and detrimental to your students. There is no consideration of you might affect a student who you erroneously accuse of cheating. As someone who was wrongly accused of cheating multiple times in college (and nearly expelled as a result) I can tell you it is an extremely stressful situation to go through.

      Your inability to understand the human component of education and your blind belief in your statistical process truly calls into question whether you should have be in such a position of influence over young people who are seeking an education. A quick glance at your CV shows that you have no teaching credentials (only a Ph.D. which is a research degree.) You would best serve your students by swallowing your pride and sense of self-infallibility, realizing that you too have something to learn, and understanding what it means to truly be a teacher.

      Hint: There is more to life than statistics.

      Delete
    8. Having trouble following here. I guess you're aware that I did give 7 out of the 8 students the benefit of the doubt.

      I'm a bit at a loss how you conclude that: I treated them as if they were guilty; had no consideration of the effects of the "ordeal" on them; am unable to understand human components; and have blind faith in my statistics. I don't think any of these are true, but perhaps you can explain your reasoning.

      Delete
    9. I'm surprised at how many people are trying to tear you down for having a guilty-until-proven-innocent teaching a college course - my guess is that most of these commenters are students... and all of the personal attacks are immature and irrelevant.

      I think this is extremely interesting and a very practical application of a powerful tool. It's by no means meant to be used as the sole determining factor for cheating, but allows for a quick analysis of multiple choice responses to aid in finding cheaters. You honestly didn't do anything I wouldn't do.

      Delete
    10. I'm surprised at how many people are trying to tear you down for having a guilty-until-proven-innocent teaching a college course - my guess is that most of these commenters are students... and all of the personal attacks are immature and irrelevant.

      I think this is extremely interesting and a very practical application of a powerful tool. It's by no means meant to be used as the sole determining factor for cheating, but allows for a quick analysis of multiple choice responses to aid in finding cheaters. You honestly didn't do anything I wouldn't do.

      Delete
  14. Possible scenario: Two students study a lot together for the exam. Then of course they sit next to each other during the exam, because they are close friends, and arrive at the exam together. Their scores--including wrong answers--end up being very similar....because they studied together. You know, learned and mislearned the same stuff, in the same way.

    ReplyDelete
  15. Possible scenario: Two students study a lot together for the exam. Then of course they sit next to each other during the exam, because they are close friends, and arrive at the exam together. Their scores--including wrong answers--end up being very similar....because they studied together. You know, learned and mislearned the same stuff, in the same way.

    ReplyDelete
  16. I would do it a bit more simply. Why not count the number of shared answers between students then rank them based on number of shared incorrect responses.

    ReplyDelete
  17. Did this account for mismatched incorrect answers or were all incorrect answers grouped together? ie If the correct answer was A and one person guessed B and the other guessed C were they still lumped together as possibly colluding?

    ReplyDelete
  18. Very interesting analysis - I really like the approach. Like others have mentioned (though I must say not very professionally or constructively), this does not produce a definitive answer as to which students are and are not cheating. However, that does not mean that this is without use - there is a lot of value in identifying students that are likely to cheat again. Semesters are likely to have multiple exams of a similar format, and applying this analysis consistently across exams will likely reveal the pervasive, consistent cheaters. One point may be coincidence, but having a single (or a few) students show up as outliers across exams isn't. At this point, it's up to the professor to make his or her best judgment call on approaching the student.

    This statistical application is not a silver bullet, but it certainly gives the professor some ammo.

    ReplyDelete
  19. Yeah this professor is clearly a scumbag.

    ReplyDelete
    Replies
    1. Wow. Way to take this to an uncomfortable place. He did an interesting analysis and wanted to share that with us. I happen to think he overstepped a little, but with the numbers we're talking about its EXTREMELY unlikely that the people he picked out weren't cheating. I think we're looking at someone who is frustrated with a decline in academic integrity and maybe overreacted a little.

      Certainly no reason to resort to name calling.

      Delete
  20. Your cheating stats program is a reasonable first crack at the problem of catching cheaters but as many have pointed out, it lacks statistical rigor. A different department at our university than the one I teach for gives multiple choice question exams and uses a very advanced statistics program in their 1000+ student classes. The main issue with your graph is that when the number of identical right answers is very high, there is a high probability of a false accusation if you aren't careful with the analysis. Good students are supposed to get the right answers. What this other department does is solely compare the combination of identical wrong answers. To rule out that students can have common misunderstandings, it computes the likelihood of having that many identical wrong answers given the way the rest of the class performed. I don't have all the specifics of how their program runs, but this approach would give a much lower rate of false accusations at the expense of missing a few cheaters. For any program, one can compute a receiver operating characteristic (ROC) curve and quantify this tradeoff and pick a suitable operating point.

    ReplyDelete
  21. This is only a way to find out potential cheaters. this test would show you what cheaters test would look like and allow you to narrow it down. It does not mean that everyone who falls into this statistical area is a cheater.

    This is a understanding I think any stats teacher should instantly have. Also 40 questions means a very high number of shared correct answers which directly increased your chances for a false positive.

    This is only statistically significant if you showed me a large population of students along with a high number of test questions and compared the classes to one another. showing that a particular class had offset numbers.

    ReplyDelete
  22. First off, how do you distinguish between collusion and one person copying off another's test without their knowledge? While I think you may have identified some potential cheaters, the lack of a statistically rigorous approach when dealing with such a serious accusation with huge potential consequences is concerning. Two of the alleged cheating pairs had a high number of correct scores. This reduces the amount of information that can be extracted from the remaining incorrect scores. This is especially true if the assumption that all incorrect responses on a question are equally likely to be selected. My days of multiple choice exams are long behind me, but I recall that it was usually easy to eliminate at least one answer on a multiple choice test and sometimes two. This has a potentially enormous effect on the probability of picking the same wrong answer. Rather than confronting the alleged cheaters, why not go back and run this analysis on previous test or on future tests. Cheaters are likely to be repeat offenders and a pattern of suspicious results would make the case airtight. By jumping the gun you have at the most scared the alleged cheaters and forced them to come up with less obvious cheating methods. At worst, you may have accused innocent students of a serious ethical and moral violation.

    ReplyDelete
    Replies
    1. There were many hundreds of pairs of students with 32 correct answers in common (the highest number for any of my pairs). Of those hundreds, there were _zero_ with 5 incorrect answers in common, _zero_ with 6 incorrect answers in common and _one_ pair with 7 incorrect answers in common. That sort of evidence is not as weak as you think.

      My response to that was to investigate it by talking to the students, which I believe I am officially _required_ to do when I believe I have evidence of serious misconduct. I don't think that was jumping the gun; since it was a final exam, waiting was not an option; and I'm not absolutely convinced that waiting and trying to entrap them would have even been ethical, let alone good.

      Running it on past exams would have been a good idea, and I have some memory of trying it: I think there was a problem with statistical power (many fewer questions on the midterms), but I also think I was running out of time and energy.

      Delete
    2. Oh, and about the collusion. Was I not clear that I let 7 of 8 people off after one interview because I recognize that I can't make that distinction with confidence?

      Delete
    3. In my view the strength of the evidence of cheating is unclear. There are multiple hidden assumptions whose impact on the probabilities that these could be chance correlations is unknown.

      Delete
  23. I cannot believe someone would spend this much energy and brain to caught some students who cheated in an exam. Instead you could have used same amount of energy to find out if your method of teaching is working for all of your students or if the questions in the exam are good and fair or some other useful things. Something that would help the current students and future students. Don't act so obsessive about student cheating in exams. They are not criminals they are normal people under a lot of pressure, and sometimes they do stupid things but that does not mean, they deserve to be treated the way you were hopping for.

    ReplyDelete
  24. Seems like that by combining the spatial position of each student relative to each other (using a Self Organizing Map?) with the neighbor distance score for each pair of student, this method could maybe be automated to detect cheating on multiple-choice type exams (or at least parts of the exam, some other parts may consist of free form entry, but then plagiarism detection could be used for these). That's very interesting, thank you for reporting this out! If there's no academic paper on the subject, you should consider to do a little paper on it.

    ReplyDelete
  25. Is there any Judgement passed by Court of Law in these type of matters???

    ReplyDelete
  26. Hi Jonathan, I used your method and code to detect some cheating in my course this semester. Details are provided here: http://rynesherman.com/blog/adventures-in-cheater-detection/ Thanks for your work on this!

    ReplyDelete
  27. I'm assigned an online course. I just did this with one of my exams. There are far too many falling on the line of complete matching. Who would imagine students being dishonest when online and not proctored.

    ReplyDelete
  28. We are a team of Professional Hackers and private investigators
    we provide PROOF BEFORE PAYMENT
    EMAIL: hireaprohacker@gmail.com
    Website: http://hireaprohacker.wix.com/hireapro

    ReplyDelete
  29. This comment has been removed by a blog administrator.

    ReplyDelete
  30. This comment has been removed by a blog administrator.

    ReplyDelete
  31. Thank you very much for posting this! After I received an allegation that two students were cheating on my final exam, I adapted your analysis into Stata (since as an economist I never learned R). I jittered the points instead of using different-sized circles and created confidence regions based on the assumption of bivariate normality (which I know isn't right, but it's halfway-decent). Anyway, these students were right on the 99% confidence contour for the final exam, but when I ran the same analysis on the second midterm, they were insane outliers (out of 35 questions, 21 shared right answers and 12 shared wrong answers). If you believe the bivariate normality, they were at p < 1 in 10 trillion.

    And of course by doing this, I found a group of four students who cheated on all three exams, creating more work for me and the deans. Sigh.

    ReplyDelete
    Replies
    1. PS I already give four versions of exams. Now I'm going to move to versioning with assigned seats, any bags at the front of the lecture hall, etc, etc.

      Delete
    2. Thanks for commenting! I don't think normality has much to do with it. I don't expect the data to follow the theory (and I've been much attacked by people who think I do expect that). But insane outliers are insane outliers, and it's very hard to approach that line by chance.

      It is discouraging to me that conscientiousness rarely seems to go unpunished, and also discouraging to have to police all of the students because of the actions of a few. But it's our duty to the good students and the system to be preventive when we can, and to catch the cheaters when we haven't been preventive enough.

      Delete