Here's a very simple version of the problem to be solved:

I have fuzzy knowledge about where something (a robot, maybe) is, on the real line. This fuzzy knowledge takes the form of a Gaussian distribution. The probability density that my robot is at point x is proportional to exp(-A(x - a)^2) for some constants A and a. I said "proportional to", because I'm going to be ignoring normalizing constants throughout. Now the robot reads from a noisy sensor, which says that its position is b. I know the sensor has Gaussian noise, so the probability it says b if the position really were x is proportional to exp(-B(x - b)^2).

What's my posterior belief about where the robot is? Bayes says Pr[position = x | sensor says b] = Pr[position = x and sensor says b] / Pr[sensor says b]

I note that Pr[sensor says b] is some big ugly integral, but it's a constant not depending on x, so I'm going to ignore it and compute Pr[position = x and sensor says b].

That's just (marking with ~ every time that I'm multiplying or dividing by a constant independent of x)

exp(-A(x - a)^2) exp(-B(x - b)^2) = exp(-(A(x - a)^2 + B(x - b)^2)) = exp(-(Ax^2 - 2xaA + Aa^2 + Bx^2 - 2xbB + Bb^2)) ~ exp(-(Ax^2 - 2xaA + Bx^2 - 2xbB)) = exp(-((A+B)x^2 - 2x(Aa + Bb))) = exp(-(A+B)(x^2 - 2x(Aa + Bb)/(A+B))) ~ exp(-(A+B)(x - (Aa + Bb)/(A+B)))

which is another gaussian, whose mean is (Aa + Bb)/(A+B), which looks very nicely like a weighted sum of the means of the two inputs, weighted by constants that are (we secretly know even though we promised not to say "variance") proportional to the inverses of the variances. (off by a factor of 2)

And this generalizes fine in the n-dimensional case to carrying around the inverse of the covariance matrix as standing for an n x n chart of the "number of votes".

I wish someone had taught this whole business to me with inverse-variances as first-class concepts.