So a lot of questions you might like to ask out in the world are like "Is group A more Xish than group B", or, for a concrete made-up example:

(1) Are basketweavers more well-paid than harp players?

And sometimes you get the opportunity to refine this to "How

*much*more (or less) Xish is group A than B?"

(2) What's the difference between the mean salary of basketweavers and the mean salary of harp players?

That is possible so long as your trait X can be easily measured as a number. If it's not easily quantified --- or perhaps is quantifiable in many different ways, and choosing the "best" is controversial --- then you've got a problem.

And even when you

*can*answer (2), it seems common that its answer can be a limited picture of the situation, even to the point of being deceptive. If I find out that basketweavers make $50k more per year than harp players, that sounds like a good reason to make sure my kids go to a good basketweaving school, unless I find out that the typical

*variance*of salaries in both categories dwarfs that difference of mean. So maybe I should be asking something like:

(3) What's the ratio between the answer to (2) and the standard deviation of the salaries of all basketweavers and harp players taken together?

The thing that jumps out at me as a types nerd is that the answer to (3) is unitless, while the answer to (2) has an answer denominated in dollars. In some sense I feel this is a sign that (3) is a better, more complete, more finished, more "modular" question. I certainly don't mean to say that scalars are somehow morally better than non-scalar-unit-ed values, but there is some value in knowing you've already "factored out" the role of dollars in this situation. Indeed, the consumer of the answer to question (3) doesn't need to know whether salaries were measured in dollars or euros --- the answer would be the same in either case, assuming a consistent exchange rate.

Another way of saying it is that what's going on, when I move from question (2) to question (3), is that

*I'm weakening how much I need to know about the trait X*. Question (2) requires that it's something quantitative and measurable, but question (3) only requires that it's known

*up to some constant factor*.

Going even further, I can ask questions of the schematic form "Given a random A, and a random B, what are the odds the A is more Xish than the B?"

(4) If I uniformly choose a basketweaver at random, and uniformly choose a harp player at random, what are the odds that the basketweaver has a higher salary?

This requires even less of X than question (3) does: I only need to be able to compare values of X for inequality. That is, X need only be a total order, not a vector space over the reals. This is relevant even for things like salaries which

*are*numeric and easily subject to numeric operations like averaging, because we might argue about how valid it is to average quantities of money in the face of the nonlinearity of utility of money --- or we might dispute what the right utility discounting function is. But even when we disagree about those things, we agree on the answer to 4, assuming we both believe at least in the

*monotonicity*of the utility of money.

Anyhow, for my money, what I'd like to see for all those evopsych "oh I read a study somewhere" studies that conclude that people from group A (men, women, people of some race, old people, young people, fat people, thin people, etc., etc.) are more X (polite, friendly, aggressive, happy, interested in sex, honest, etc. etc.) than from group B, is estimates of the numbers that come out of questions like (4). I have a gut feeling that even for many cases of legit outcomes where the null hypothesis is rejected, the (4)-number is actually quite close to 50%. And knowing which cases

*are*like that and which aren't would be very desirable to know.