*come up*with an efficient M-H-like sampling algorithm that actually

*is*differentially private.

The counterexample is interesting, though. In plain english, the take-home moral is that:

A long long road that is just aThis is basically because you're slowed down by that tiny bit the entire time, and so your chance of arriving on exactly the right day (if the trip is a year long) might be dramatically lowered if your speed is cut down by, say, a factor of 364/365. If you thought you had a good chance of getting there on time before, you still have a good chance of getting there no more than a day late... but maybe a really bad chance of getting there on time. And if getting there at just the right hour, minute, and second really matters, then you're totally screwed.tinybit bumpier than you expected can make you arbitrarily unlikely to get where you're going on time.

Here's the real deal:

Say you're you're trying to do sampling from some large finite subset [0,2N] of the integers. You assume, as usual for M-H, that the probability of picking integer x is supposed to be proportional to e^-V(x) for some "potential function" V, so that things with more potential energy are less likely to be picked. Take V(x) to 0 on all the even integers x=2n, and say it's some constant k on all the odd integers x=2n+1. This is our "bumpy road". The steady-state distribution will have just as much probability mass at 2N as it does at 0, but it will have a hard time getting there quickly --- more to the point, the difficulty is very, very sensitive to the bumpiness k of the road.

For suppose our proposal function in M-H just flips a coin between "try to go left (if you can)" and "try to go right (if you can)", and we initialize with all the probability mass at point 0. Running M-H will cause a wave of probability mass to creep to the right. Let's look at its leading edge. After one time step, half of the mass will propose to try to climb up the k-potential hill to get to point 1. Only exp(-k) of them will make it. So a total of (1/2)exp(-k). On time step 2, half of

*these*will propose to go right again, and all of them will succeed, falling easily down the potential slope. So a total of (1/4)exp(-k) are at point 2 after two steps of the algorithm. After the next two steps, the mass will have to climb up (and fall down) another potential hill, leaving us with (1/16)exp(-2k) at point 4 after 4 steps. Inductively, after 2n steps, we'll see (1/2^{2n})exp(-nk) mass at the 2n^th point.

But this is terrible! (for my conjecture about the sensitivity of M-H, at least) If we increase k by just a little bit Δk, then the probability of travelling 2n points to the right after 2n steps becomes exp(-nΔk) less likely. Sure,

*eventually*the algorithm will converge on a distribution that is only 2-sensitive in the potential function, but in the meantime it is n-sensitive, and in principle allows violating the privacy of anyone whose data went into constructing that potential.