huh... A quick calculation of sqrt(67 * 67 - 65 * 65) ≈ 12mm means that if I assume
(a) the true PD was 65mm
(b) the erroneous 67mm was the diagonal resulting from making a purely vertical error
(c) I'm just as likely to make vertical errors as horizontal
then maybe I should have split the difference and said 66mm. Because it takes a lot of vertical error to change the outcome compared to the essentially direct correspondence between horizontal error in measuring the position of the pupil itself and error in my resulting measurement of PD.
In fact, let me do some elementary probability reasoning.
Suppose the measured pupil positions are independent gaussians L and R where L has mean (-θ/2, 0) and variance σ^2 and R has mean (θ/2,0) and variance also σ^2. I'm interested in the likelihood that |(L - R)|^2 = k^2. Notice L = X1 - E/2 and R = E/2 + X2 for E = (θ, 0) and two IID Gaussians X1 and X2, and we're interested in the likelihood that
|X1 - X2|^2 + E^2 = k^2 in other words that |X1 - X2|^2 = sqrt(k^2 - θ^2)
Alright, I know how to think about the difference of two Gaussian distributed variables in R^2 with mean zero, because it's the same thing as the their sum. So I'm just talking about the likelihood that
X = sqrt(k^2 - θ^2)
where X is Gaussian with variance 2σ^2.
So the probability of sampling a PD of k when the true PD is θ and the pupil-measurement variance is σ^2 is
exp(-(1/2)(x/(sqrt(2)σ))^2) / (2σ sqrt(pi))
= exp((θ^2 - k^2)/4σ^2) / (2σ sqrt(pi))
and for two independent samples k1 and k2 I'd get
= exp((2θ^2 - k1^2 - k2^2)/4σ^2) / (2σ sqrt(pi))^2
and I want to maximize this --- or, equivalently,
(2θ^2 - k1^2 - k2^2)/4σ^2 - 2 ln σ
with respect to θ and σ.
When I differentiate by σ and set to zero I get
(k1^2 + k2^2) = 4σ^2 + 2θ^2
and when I differentiate by θ I get...
θ/σ^2 = 0
which appears absurd.
Waitaminnit. Back in that likelihood expression exp((θ^2 - k^2)/4σ^2) / (2σ sqrt(pi)) I can make the likelihood as big as I want by making θ arbitrarily big.
Oh! Because I forgot to include the constraint that I can't possibly noisily observe (as k) a PD smaller than the true PD (θ). So my likelihood function as a function of θ is exp((θ^2 - k^2)/4σ^2) / (2σ sqrt(pi)) as long as θ < k, and 0 otherwise. So it is maximized when θ is as big as possible, so indeed
The max likelihood hypothesis is the minimum of the observed samples
Hah! I expected the math to give me some weird answer in the neighborhood of 66mm, but it seems to be telling me (in the absence of any Bayesian priors, anyhow) that I did the "right" thing by going with 65mm.