Mathematics, biology and recreation

Friday, March 20, 2020

Sero-surveys

You can comment on the tweeted version of this article

It’s time for serious consideration of sero-surveys.

Lockdown cannot continue forever. Governments and societies will be facing tough choices about COVID soon. These choices should be made with as much information as possible.

A critical place where there are big gaps in the information puzzle is the infection-fatality rate (IFR). Population-level surveys, done with stratified sampling, have a huge upside potential in gaging the severity of COVID infection.

It’s important to distinguish between case-fatality and infection-fatality. Case-fatality is a proportion of individuals with identified COVID disease. Infection-fatality is a proportion of all individuals with infection.

Both disease and infection can be hard to define, but disease is typically much harder to pin down, with definitions and their applications changing with place and time. Both ratios are important, but it is IFR which is critical for planning.

Unless an effective vaccine is rapidly deployed, the majority of people around the globe will fight off coronavirus infection and develop antibodies within the next few years. We want to know the proportion of everyone that is likely to be suffer serious consequences, or die.

Now is a critical time. It is only now becoming possible to contemplate large-scale sero-surveys, and there is not much time left before we have to make hard choices about balancing between disease control and vital economic and social concerns.

Doing these surveys will not be easy. Effort spent gathering information about people infected in the past will inevitably take some attention away from those infected in the present or near future. Even were the long-term benefit more clear, this is not an easy choice. International cooperation will also be important. Places that already have large numbers of cases have more information to provide, but less resources to spare on gathering information for the future.

Nonetheless, gathering this information could be immeasureably valuable in planning. We need to weigh this value now.

Wednesday, September 27, 2017

1/19

I was so excited the other night when I realized I had figured out something new and interesting about the decimal expansion of 1/19. Something I hadn’t noticed, despite the fact that 1/19 has been one of my favorite decimal expansions for decades.

Powers of 5

I’ll start at the beginning. Repeating decimals tend to have all sorts of cool patterns and properties. A nice simple one about 1/19 is that you can generate it from the powers of 5. More or less.

1/19 is 5/95 = 5/(100-5) ≡ 5/(H-5), where H=100. If we define h=1/H and multiply, then:

1/19 = 5h/(1-5h) = 5 + 5²h + 5³ h² + …

We can try to write that out as:

. 05 25 125 625 3125 … .

Each block here represents a pair of decimal digits, so it’s clear that something is wrong: 125 and later entries don’t fit. We could arrange them to overlap properly and add:

. 05 26 31 56 … .

But that’s pretty unwieldy, and just gets worse and worse.

The better way is “pre-emptive carrying”. Instead of saying “5×5 = 25”, we say “5×5 + the 1 that we know we’re going to carry in a moment = 26” – and then just carry on as if we always had a 26. Then we multiply 26×5=130, give back the 1 at the beginning, and borrow the 1 that we’re about to generate (when we multiply by 5 the next time) to get 31. Pretty wild, but it always works:

1/19 = 
.
05        (×5 + 1 = 26) (the one is pre-emptively carried from 131)
26        (×5 + 1 = 131) (the one is pre-emptively carried from 157)
31        (×5 + 2 = 157) (the two is pre-emptively carried from 289)
57        (×5 + 4 = 289) …
89        (×5 + 2 = 447)
47        (×5 + 1 = 236)
36        (×5 + 4 = 184)
84        (×5 + 1 = 421)
21        (×5 + 0 = 105)
05
…

Checking

We can tell it’s right because it repeats with the right period (it has to be 18 or a factor of 18, more on that later (probably not, unless someone asks me)), and because it produces the same answer as the other crazy methods we’re going to try later.

We can also tell that it’s right using bc

> echo "scale=24; 1/19" | bc -l | perl -pe "s/[0-9]{2}/ $&/g"
. 05 26 31 57 89 47 36 84 21 05 26 31

Bizarrely, we can’t tell it’s right using R:

> print(1/19, digits=20)
[1] 0.052631578947368418131

R is perfectly willing to give you up to 22 digits, but only seems to get the first 16 right, ever.

Another power example

Another cool example of pre-emptive carrying, and this sort of series, is 1/49. Try it yourself. This code gives a nice-looking version of the answer, for comparison (or for the lazy)

echo "scale=44; 1/49" | bc -l | perl -pe "s/[0-9]{2}/ $&/g"

Dividing by 4

Anyway, that’s not the cool part. I’ve known that forever, thanks largely to Tim Koch.

Another nice way to derive the expansion for 1/19 is by expanding 1/19=21/399.

1/19 = 21/(4H-1) = 21h/(4-h) = 21/4 h + 21/4² h² + …

It would be a huge mess to do this just by blocks, something like:

. 0525 013125 00328125 …

But we can do it neatly with another trick: putting off the remainder. 21/4 = 5, with remainder 1, so we write 05 and then save the 1. For the next step, we put the 1 in front of 05 and 105/4 = 26r1. And so on.

1/19 = 
.         21/4 = 5r1
05        105/4 = 26r1
26        126/4 = 31r2 (the 1 is the remainder from the line above)
31        231/4 = 57r3 (the 2 is the remainder from the line above)
57        357/4 = 89r1 …
89        189/4 = 47r1
47        147/4 = 36r3
36        336/4 = 84r0
84        084/4 = 21r0
21        021/4 = 05r1
05
…

Fibonacci style

Same answer. What are the odds‽

But still not the part that got me excited recently. I figured that out decades ago as well.

The reason I’m excited, is that I was recently thinking about this conversation from 1999 (of course). It talks about how:

1/89 = 1/(T² - T - 1) (where T=ten), has an expansion that follows the rules of the Fibonacci numbers(each number is based on the sum of the two before it). So that’s cool, but 1/89 is kind of a long expansion, and expanding things in groups of 1 (instead of 2) decimals feels a bit ticky. By the way, there’s also a one-digit way to do 1/19: since 19=2T-1, you can expand it by dividing each digit by two (exactly parallel to dividing pairs of digits by 4).

So the other day, in the shower, I was wondering about 9899 = H² - H -1, and whether it has any nice factors. Imagine my surprise when I found out that 1/19 = 521/9899! That means that we can write the expansion 1/19 using Fibonacci rules.

1/19 = 
.         start with the 5 from 521
05        (+ 21 = 26) (the 21 is from 521)
26        (+ 5 = 31)
31        (+ 26 = 57)
57        (+ 31 + 1 = 89) (the one is pre-emptively carried from the next sum)
89        (+ 57 + 1 = 147) …
47        (+ 89 = 136)
36        (+ 47 + 1 = 84)
84        (+ 36 + 1 = 121)
21        (+ 84 = 105)
05
…

Yet another fun way to generate the same decimal! I can die happy now.

Monday, January 30, 2017

Pythagorean triples plot

Talking with David Earn and Ben Bolker about simple, mathy programs that make nice pictures (for pedagogical reasons).

David suggested plotting Pythagorean triples. He said to reduce them to rationals, but I couldn’t figure out a good way to do that (the natural way just produces points on a circle), so I stuck with the integers.

This code plots (a, b) from Pythagorean triples

a^{2} + b^{2} = c^{2}

, for

c \leq 50

. The circles get bigger at rate

\sqrt{c}

(chosen arbitrarily and kind of looks nice). The box goes from (-50, 50) on each axis. It has some nice patterns.
Plot of points from Pythagorean triples

pdf version. (Looks nice, embiggens nicely.)

There are two natural ways to interpret these points: as pairs of integers, or (cooler) as complex integers (see also my post about factoring in the complex plane). There are three natural ways to think of the points that are not on the axes, and three corresponding ways to decide which points to plot on the axes:

If we think of the points as representing Pythagorean triples, in the classic sense, the axes should have no points. This is the first way I made the plot, because that was what was suggested.
If we think of the points as representing points on the integer lattice that are an integral distance from the origin, the points off the axes remain unchanged, but we should put a point at each lattice point on the axes.
Finally, if we think of the points as representing square numbers in the complex plane, we should put points at square locations on one of the axes (conventionally, the horizontal axis), but not the other.

Now that I write it down, I realize I didn’t actually do any of those things! Sorry! Feel free to tweak the code, but this way looks nice. I simply extended a standard Pythagorean triple definition to not exclude 0 (I kind of thought that matched way 3, but now it doesn’t.)

Based partly on suggestions from Ethan Bolker, I colored the points:

black for “imprimitive” triples (ones which can be generated as integer multiples of other triples)
blue for primitive triples which can be factored into other triples in the complex plane
red for triples which are prime even in the complex plane (ie., they represent complex integers which are squares of primes)

Now arise more questions about how to color points on the axes. But I have to get this post off my desk before I go crazy!

Wednesday, March 4, 2015

Posted this on reddit, but I don't think it got "red"

Man. My blog is only ever read by people I know, and I certainly would have been more careful if I'd had any idea this was going to blow up. Friends have suggested I should clarify some points:

* Nobody suffered permanent marks on their transcript or lost marks from this

* One person lost a bunch of time retaking the course, the others had to go through an ordeal being grilled by me and were then told that I had decided not to follow up.

* I did not enjoy the grilling. Didn't really enjoy any of it after the amazing seating discovery (which may be one of the reasons it took me many years to post).

* It is not at all clear that three or four of these people were innocent: on the occasions where we've observed cheating in person, it has been collaborative.

* I followed up according to my understanding of the McMaster rules. If I find evidence of dishonesty, I'm expected to report it, investigate it, and then report my findings.

The reason the part about grilling them was written cavalierly is that, until now, I've mostly been criticized (though lightly) for letting 7 of the 8 off scot free. Had I expected this to reach a wide audience, I would have written more carefully.

What else? In this test, I think people were broken into big blocks by last name, but were then free to sit where they wanted. The questions were MC questions (often tricky), and I wouldn't expect a large fraction of common wrong answers from people who studied together. It's hard to be sure, though, which is one of the reasons I looked for outliers, instead of using statistics, and then verified with the seating chart. I honestly hadn't thought about the correlation between sitting together and studying together, but if any of the eight people had offered that as a defense, I would have.

Tuesday, February 17, 2015

Finding cheaters using multiple-choice comparisons

Summary

An interesting method by which I found out that people were cheating on my final exam.

Background

I use different versions of midterm examinations to discourage cheating in my population biology class (~200 students). When the course started, I used to do the same thing for the final exam, but it was a little more complicated, because the final exam is administered by the registrar's office, not by me and my teaching team.
At some point, somebody advised me not to bother with versions: the registrar's office is supposed to be professional about administration, and they usually mix people who are taking different exams in the same room, so I stopped bothering with different versions for the final exam for a year or two. I do it again now, and you'll see why.

The incident

In the year in question, my exam was given in two separate medium-sized rooms. My class was alone in these two rooms. I received a report from the invigilators in Room 1 about suspicious behaviour. They had warned a couple of students for acting strangely, and then warned them again. They weren't prepared to say that they were sure that the students were cheating, but wanted me to compare their answer slates. In retrospect, they should have left the students alone until they were ready to sign a complaint against them (or until they had cheated enough to have it proved against them).

My response

The final is entirely multiple choice. I got the results files from the scantron office. I figured that I wouldn't quite know what to do with a comparison just between these two kids (unless the tests were identical), and that it would be just about as easy (and far more informative) to compare everybody to everybody else. It's still kind of hard for me to get used to the fact that we have computers now and can really do stuff like this. I calculated the number of identical right answers and the number of identical wrong answers for each pair of students (~18K pairs), and plotted it out.
$(cplot.Rout-0.png)$
The line corresponds to forty total shared answers (two students having identical test papers). This did not happen. But there were four points near the line that looked like clear outliers to me:
$(cplot.Rout-1.png)$

The follow up

I wasn't sure what to do next, but the registrar's office knew. They make seating maps during exams. They didn't offer to help out, but I was allowed to go and examine the maps.
The results were amazing.

All four of the identified pairs were seated adjacent (three pairs were side by side, and the fourth pair had one student behind the other). The probability that this might have happened by chance is beyond ridiculous.

None of the four identified pairs were seated in the room where the alert invigilators hassled the pair of cheaters. This might have been by chance, but I doubt it. Likely the invigilators in the other room were visibly less alert.

I talked to the academic integrity office, and various experts, and figured out that it really was impossible to be sure who had cheated in the side-by-side pairs. I did put all 6 of them through a bit of an ordeal, though, and at least half of them deserved it. I was also unable to convict the person in front of the front-back pair (although it's hard to see how that one would have worked without collusion). The person in the back of the front-back pair denied all knowledge, but received a zero for the exam grade plus a confidential, temporary notation of my finding at the integrity office (the strongest punishment I was allowed to give). They promised to fight it, but never did.

Postscript

I now use versioning, but I'm starting to discover that this does not necessarily prevent cheating, either. I may have more adventures to report, soon.
I definitely get the feeling that the person I caught cheated their way through Mac. The initial response to my call was pretty relaxed. They did get an F in my class (I couldn't give an automatic F for the class, but the exam zero was sufficient). They retook the class and passed, expunging the F, and graduating presumably with a clean record.
I have heard a lot of anecdotal reports of people dealing with cheating informally (or not at all). It's kind of depressing. My impression is that Mac has a cheating problem, and we need to fight back.

Code

The code used to produce these plots in R is shown here.

Monday, December 8, 2014

Third Aunt

If I tell you that Corinthia is my second cousin, once removed, there are two major problems:

You probably don't know what the heck that means

If you do know, you still don't know whether she is one generation older, or one generation younger.

This nomenclature has been around for a long time, and it's just not working. So we need a new system. From now on, we'll all use the system outlined below.

Thursday, December 4, 2014

Factoring integers using the complex plane

Please see my wiki for a comprehensible version of this post.

I was pretty pleased that I factored a recent Composite of the day entirely in the complex plane.
The number was 9509, which I noticed immediately is 97²+10². Since I know Fermat's little theorem, and I know that the Composite of the day is composite, I knew there should be another way to write it as the sum of two squares. A little bit of counting (100+193+191) showed that it is also equal to 95²+22².

For some reason, I know that if I use those two summations to write complex integers with modulus 9509, their greatest common factor will also divide 9509.
So I said, (97+10i) - (95+22i) = 2-12i. The modulus of that is 2²+12²=148. The factors of 2 must be irrelevant (since the Cotd is odd), so 37 should be the number we're looking for.

Similarly, (97-10i) - (95+22i) = 2-32i. The modulus of that is 1048. Again, discarding factors of 2, we're left with the prime 257.

And the number is now factored, by finding complex integers with the right modulus and manuipulating them in the complex plane.

Pretty wild, huh?