Consider metropolis-hastings estimation of the expectation of f(x) over a probability density function p(x). I'll write this as M(f(x),p(x)). This is, modulo some concerns about ergodicity and the proposal function, equal to the true expectation E(f). If p is very "flat" and f is very "peaked", (i.e. f is very close to zero over most of the support of p, and is very big in magnitude in a small area) markovianly bouncing around p may infrequently find the (very high, narrow) peak of f. I'd like to shift the probability function to home in on where f is big, to be sure to sample it. So I reason (letting g(x) = sqrt(f(x))

∫ f(x)p(x) dx = ∫ g(x) g(x)p(x) dx

= (∫ g(y)p(y) dy) ∫ g(x) (g(x)p(x) / ∫ g(y)p(y) dy) dx

= M(g(y),p(y)) M(g(x), g(x)p(x))

Since I can throw away constant factors in the probability density function when doing metropolis-hastings. Now I'm left with M(g(y),p(y)), which is just as vulnerable to g's peakedness (since g is almost as peaked as f, being the square root of it) and M(g(x), g(x)p(x)) which is this magically smarter M-H process that knows to do a sort of probabilistic gradient ascent on g. If I make the approximation that E(g)^2 = E(f), which I think is reasonable as long as f's effective support is pretty small relative to how fast-changing p is (and I assumed p was pretty flat, so I should be okay) I get

E(f) = M(g(y),p(y)) M(g(x), g(x)p(x))

E(f) = E(g)M(g(x), g(x)p(x))

E(g)^2 ~= E(g)M(g(x),g(x)p(x))

E(g) ~= M(g(x),g(x)p(x))

E(g)^2 ~= M(g(x),g(x)p(x))^2

E(f) ~= M(g(x),g(x)p(x))^2

So the answer seems to be: run metropolis-hastings on a new (function, pdf) pair, where a square root chunk has been ripped out of the function and shoved into the pdf, and when you're done, square the result. I'm eager to try coding this up this afternoon and seeing if it actually gives reasonable results.

---

Wait, no, that quick estimation of E(g) as sqrt(E(f)) is totally wrong. Need to rethink the final few steps.