March 25th, 2010

beartato phd

(no subject)

Had a good meeting, discussed various issues about the paper. This Metropolis-Hastings puzzle is still bugging me, but I really don't know how to make any progress on it for the time being, so I guess I'll let it go.

Wrote some code to actually implement the private k-means algorithm I found in Blum-Dwork-McSherry-Nissim's SuLQ paper. Turns out for clusters that have on the order of 10,000 people, ε=0.05 seems to leave you with enough signal-to-noise that you can find centers pretty well --- at least if you only need a few rounds of k-means. I only did 2 for my little pile of decently-well-separated synthetic Gaussian-mixture data.

Come to think of it, k-means is another iterative-approximation sort of algorithm where unlimited repetition seems like it ought to be completely acceptable from a privacy point of view, but it isn't obvious how to prove it, much less make that proof manifest in a typing derivation.