? ?
Notes from a Medium-Sized Island [entries|archive|friends|userinfo]

[ website | My Website ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Nov. 14th, 2004|11:17 am]
Some thoughts that came out of considering the problem we're probably doing for the Machine Learning project:

Consider metropolis-hastings estimation of the expectation of f(x) over a probability density function p(x). I'll write this as M(f(x),p(x)). This is, modulo some concerns about ergodicity and the proposal function, equal to the true expectation E(f). If p is very "flat" and f is very "peaked", (i.e. f is very close to zero over most of the support of p, and is very big in magnitude in a small area) markovianly bouncing around p may infrequently find the (very high, narrow) peak of f. I'd like to shift the probability function to home in on where f is big, to be sure to sample it. So I reason (letting g(x) = sqrt(f(x))

∫ f(x)p(x) dx = ∫ g(x) g(x)p(x) dx
= (∫ g(y)p(y) dy) ∫ g(x) (g(x)p(x) / ∫ g(y)p(y) dy) dx
= M(g(y),p(y)) M(g(x), g(x)p(x))

Since I can throw away constant factors in the probability density function when doing metropolis-hastings. Now I'm left with M(g(y),p(y)), which is just as vulnerable to g's peakedness (since g is almost as peaked as f, being the square root of it) and M(g(x), g(x)p(x)) which is this magically smarter M-H process that knows to do a sort of probabilistic gradient ascent on g. If I make the approximation that E(g)^2 = E(f), which I think is reasonable as long as f's effective support is pretty small relative to how fast-changing p is (and I assumed p was pretty flat, so I should be okay) I get

E(f) = M(g(y),p(y)) M(g(x), g(x)p(x))
E(f) = E(g)M(g(x), g(x)p(x))
E(g)^2 ~= E(g)M(g(x),g(x)p(x))
E(g) ~= M(g(x),g(x)p(x))
E(g)^2 ~= M(g(x),g(x)p(x))^2
E(f) ~= M(g(x),g(x)p(x))^2

So the answer seems to be: run metropolis-hastings on a new (function, pdf) pair, where a square root chunk has been ripped out of the function and shoved into the pdf, and when you're done, square the result. I'm eager to try coding this up this afternoon and seeing if it actually gives reasonable results.


Wait, no, that quick estimation of E(g) as sqrt(E(f)) is totally wrong. Need to rethink the final few steps.

[User Picture]From: bhudson
2004-11-14 11:38 am (UTC)
when you see this this much math in a post it's time to wrap the author in a blanket, gently take him away and murmur 'someone's been at the Machine Learning for too long'
(Reply) (Thread)
[User Picture]From: krasnoludek
2004-11-14 10:05 pm (UTC)
No, wait! I can fit another formula in this post!
(Reply) (Parent) (Thread)
[User Picture]From: jcreed
2004-11-16 03:00 pm (UTC)
See? See? I told you all! Paul and I are totally sane! The rest of you are nuts!
(Reply) (Parent) (Thread)