Wednesday, March 4, 2015

Posted this on reddit, but I don't think it got "red"

Man. My blog is only ever read by people I know, and I certainly would have been more careful if I'd had any idea this was going to blow up. Friends have suggested I should clarify some points:

* Nobody suffered permanent marks on their transcript or lost marks from this

* One person lost a bunch of time retaking the course, the others had to go through an ordeal being grilled by me and were then told that I had decided not to follow up.

* I did not enjoy the grilling. Didn't really enjoy any of it after the amazing seating discovery (which may be one of the reasons it took me many years to post).

* It is not at all clear that three or four of these people were innocent: on the occasions where we've observed cheating in person, it has been collaborative.

* I followed up according to my understanding of the McMaster rules. If I find evidence of dishonesty, I'm expected to report it, investigate it, and then report my findings.

The reason the part about grilling them was written cavalierly is that, until now, I've mostly been criticized (though lightly) for letting 7 of the 8 off scot free. Had I expected this to reach a wide audience, I would have written more carefully.

What else? In this test, I think people were broken into big blocks by last name, but were then free to sit where they wanted. The questions were MC questions (often tricky), and I wouldn't expect a large fraction of common wrong answers from people who studied together. It's hard to be sure, though, which is one of the reasons I looked for outliers, instead of using statistics, and then verified with the seating chart. I honestly hadn't thought about the correlation between sitting together and studying together, but if any of the eight people had offered that as a defense, I would have.

Tuesday, February 17, 2015

Finding cheaters using multiple-choice comparisons

Summary

An interesting method by which I found out that people were cheating on my final exam.

Background

I use different versions of midterm examinations to discourage cheating in my population biology class (~200 students). When the course started, I used to do the same thing for the final exam, but it was a little more complicated, because the final exam is administered by the registrar's office, not by me and my teaching team.
At some point, somebody advised me not to bother with versions: the registrar's office is supposed to be professional about administration, and they usually mix people who are taking different exams in the same room, so I stopped bothering with different versions for the final exam for a year or two. I do it again now, and you'll see why.

The incident

In the year in question, my exam was given in two separate medium-sized rooms. My class was alone in these two rooms. I received a report from the invigilators in Room 1 about suspicious behaviour. They had warned a couple of students for acting strangely, and then warned them again. They weren't prepared to say that they were sure that the students were cheating, but wanted me to compare their answer slates. In retrospect, they should have left the students alone until they were ready to sign a complaint against them (or until they had cheated enough to have it proved against them).

My response

The final is entirely multiple choice. I got the results files from the scantron office. I figured that I wouldn't quite know what to do with a comparison just between these two kids (unless the tests were identical), and that it would be just about as easy (and far more informative) to compare everybody to everybody else. It's still kind of hard for me to get used to the fact that we have computers now and can really do stuff like this. I calculated the number of identical right answers and the number of identical wrong answers for each pair of students (~18K pairs), and plotted it out.
(cplot.Rout-0.png)
The line corresponds to forty total shared answers (two students having identical test papers). This did not happen. But there were four points near the line that looked like clear outliers to me:
(cplot.Rout-1.png)

The follow up

I wasn't sure what to do next, but the registrar's office knew. They make seating maps during exams. They didn't offer to help out, but I was allowed to go and examine the maps.
The results were amazing.
  • All four of the identified pairs were seated adjacent (three pairs were side by side, and the fourth pair had one student behind the other). The probability that this might have happened by chance is beyond ridiculous.
  • None of the four identified pairs were seated in the room where the alert invigilators hassled the pair of cheaters. This might have been by chance, but I doubt it. Likely the invigilators in the other room were visibly less alert.
I talked to the academic integrity office, and various experts, and figured out that it really was impossible to be sure who had cheated in the side-by-side pairs. I did put all 6 of them through a bit of an ordeal, though, and at least half of them deserved it. I was also unable to convict the person in front of the front-back pair (although it's hard to see how that one would have worked without collusion). The person in the back of the front-back pair denied all knowledge, but received a zero for the exam grade plus a confidential, temporary notation of my finding at the integrity office (the strongest punishment I was allowed to give). They promised to fight it, but never did.

Postscript

I now use versioning, but I'm starting to discover that this does not necessarily prevent cheating, either. I may have more adventures to report, soon.
   I definitely get the feeling that the person I caught cheated their way through Mac. The initial response to my call was pretty relaxed. They did get an F in my class (I couldn't give an automatic F for the class, but the exam zero was sufficient). They retook the class and passed, expunging the F, and graduating presumably with a clean record.
   I have heard a lot of anecdotal reports of people dealing with cheating informally (or not at all). It's kind of depressing. My impression is that Mac has a cheating problem, and we need to fight back.

Code


The code used to produce these plots in R is shown here.

Monday, December 8, 2014

Third Aunt

If I tell you that Corinthia is my second cousin, once removed, there are two major problems:
  • You probably don't know what the heck that means
  • If you do know, you still don't know whether she is one generation older, or one generation younger.
This nomenclature has been around for a long time, and it's just not working. So we need a new system. From now on, we'll all use the system outlined below.

Thursday, December 4, 2014

Factoring integers using the complex plane

Please see my wiki for a comprehensible version of this post.

I was pretty pleased that I factored a recent Composite of the day entirely in the complex plane.
The number was 9509, which I noticed immediately is 97²+10². Since I know Fermat's little theorem, and I know that the Composite of the day is composite, I knew there should be another way to write it as the sum of two squares. A little bit of counting (100+193+191) showed that it is also equal to 95²+22².

For some reason, I know that if I use those two summations to write complex integers with modulus 9509, their greatest common factor will also divide 9509.
So I said, (97+10i) - (95+22i) = 2-12i. The modulus of that is 2²+12²=148. The factors of 2 must be irrelevant (since the Cotd is odd), so 37 should be the number we're looking for.

Similarly, (97-10i) - (95+22i) = 2-32i. The modulus of that is 1048. Again, discarding factors of 2, we're left with the prime 257.

And the number is now factored, by finding complex integers with the right modulus and manuipulating them in the complex plane.


Pretty wild, huh?

Tuesday, July 22, 2014

Geostationary orbits

I was thinking about gravity and space elevators today. They still confuse me. So I decided to see if I could work out the height of a geostationary orbit in my head, while walking to school. This despite the fact that I don't remember any of the classical mechanics stuff I learned in college, but with the powerful ally of ... dimensional analysis, which is approximately the coolest thing ever. I got it badly wrong, and later figured out why.

What we know

I chose to start with 10 and 6.4 (instead of, say, 9.8 and 6) because I thought I would want to take the square root of their product.


C is the characteristic time of the Earth's rotation (how long it takes a point on the surface to travel the length of the radius). In my experience, the characteristic time (not the period) tends to be the quantity that gives you the right answer in dimensional analysis.

The simple answer

We have too much information for a good dimensional analysis (too many ways of combining our quantities to get the right units). But there does seem to be a natural, straightforward way to do it.
gr=8km/s is a speed. This should have something to do with something. Multiply by 14ks to get 112Mm. This seems way too high, though, so I should think this through more carefully.

I wonder if 8km/s is orbital velocity or escape velocity and the dimensional analysis discovered that by accident.

The right answer

If we're moving away from the surface of the Earth, we have to respect our knowledge that gravity goes as r2 to make use of g, so we need to construct K=gr2. Translating the prefixes back to km gives us one extra 1000, so we have 400,000km3/s2.
The radius of the earth doesn't really directly affect the orbit. In fact, I used it only because I know it, and I don't know the mass of the Earth or the gravitational constant G. This means that the right answer must be made from K and C, which means in turn that there's only one way to do it: (KC2)1/3.

This is a pain to calculate mentally.

The computer claims it's 43Mm, which still seems way too high, so what's up?

Checking wikipedia, it turns out that's what's up is the geostationary orbit, which really is 42Mm above the center of the Earth (or 36Mm above your head). Which is wild, but score one at least for dimensional analysis.




This post was developed on WorkingWiki at Geostationary orbits. The version there may be newer (or have better links).

Monday, July 14, 2014

Soccer mania!

I obviously need to work on my blogging skills.

It looks like nobody figured out my Sucker Bet question, but I think a lot of people glanced at it (and the early comments), and thought that they had.  Of course, it may be that somebody figured it out after I posted the answer, because after all what would you say if that happened? Nonetheless, it's worth another glance, IMHO.

In honor of the World Cup, I'm posting this "self-generating puzzle". How many Group B results can be worked out from the information on this page alone:? It's a lot of fun to read the sports pages and find phrases or tables that work as puzzles by themselves, although I rarely have time anymore.

Here's another self-generating sports puzzle. Making reasonable assumptions about how sportswriters write, how many results can be inferred from the following sentence?

After last night's win, the Broncos have won 2 of their last 3 games, and 5 of their last 7.

I don't think I am the one who discovered this puzzle, but I was unable to figure out who did by searching usenet archives.

Tuesday, July 8, 2014

Sucker bet

Walt sent me this puzzle (reworded from this set of great puzzles from Communications of the ACM; you can get it by accessing from a University or library).

Alice and Bob roll 2 standard 6-sided dice, note their sum, and repeat. Alice wins if a 7 is rolled, and then followed immediately by another 7. Bob wins if an 8 is followed immediately by a 7. They continue rolling until somebody wins. Who has the better odds of winning?

Of course the answer is the non-intuitive one. Can you figure out why? As a person with a long-time fondness for craps (I know lots of people who are no good at probability, except when it involves two dice), this seems to me like the ultimate sucker bet to offer someone.

More discussion (and a link to three ways to think about the answer) on JD's notebook.