Why Inferential Statistics?

Imagine we
had a question: “Do men and women differ on X?”

No matter what “X” is—height, empathy, knowledge of 13

^{th}century Spanish history, or anything else—we know that any given man will be different than any given woman, but what we don’t know is how men “on average” differ from women “on average.” That is, when we asked our initial question, we probably wanted to know how the mean for men compared with the mean for women. But we will never know the*actual*mean for men or the*actual*mean for women, because that would involve measuring more than 7 billion people! So, we need to, somehow, get a sample of men and a sample of women, compare them, and draw a conclusion from that.
Let’s say we get a sample of 100 men and 100 women, and we ask them about
Spanish history. In our sample, women average 68% and men average 63%. That

*is*the result for our sample and it*is*a rock-solid result. But, remember, we aren’t particularly interested in our sample – we are interested in “men vs. women,” not “men we happened to look at” vs. “women we happened to look at.” We want to use our sample to infer something about the larger population (and that is what puts the infer in*infer*ential statistics).
Making these inferences has a serious challenge: Any difference we see in
our samples could be due to chance! Sure, our group of men differs from our
group of women, but that doesn’t tell us much in itself, because if we picked two
groups of men at random, they would also differ. This is a serious problem: Given
that

*any*two samples will obviously differ from each other on virtually*everything*we try to measure (if we can measur in enough detail), how can we use samples to draw conclusions?
All is not lost, though, as a little intuition will tell us. Differences
found due to random chance are likely to be small, and are likely to be of a
very different size if we do the same test again. If we could repeat our test
again and again (with new samples), it would help us make better inferences: If
we got samples of 100 men and 100 women 20 times, and every time we found women
scoring 5 points higher than men, we would be much more confident in our
finding. While replication isn’t usually practical, we can use one sample to
guess what would happen if we replicated. And, our intuition can help us here
as well: If we find a small difference between groups after measuring only a
small number of people, that is more likely to be due to random chance than if
we find a big difference between groups after measuring a lot of people. Breaking
that down: 1) Large differences are less likely to be due to chance than small
differences, and 2) the bigger the size of the sample, the more point 1 is
true.

If we could get a good mathematical handle on the “less likely” vs. “more
likely” part of those claims, we could start to use our samples to make really
good guesses about how replicable our results are. That is, we could use our
single sample to reliably predict what would happen if we replicated our study a
bunch of times. We already agreed above that

*if*the result replicated over and over again,*then*we would be confident in drawing conclusions about the larger population. And now we know that we can use a single sample to draw conclusions about what would happen if we had many samples. Putting the last two sentences together: If we could get some math behind us, we can use our single sample to make reliable inferences about the larger population.
Thus, no matter what inferential statistic we use, the question is always
something like: “This difference we found in our sample, what is the probability
we found a difference

*that*large, just by chance?” When it is unlikely that our observed difference is due to chance, we feel confident it is real.
This comment has been removed by the author.

ReplyDeleteExcellent Blog! I have been impressed by your thoughts and the way you.Dr Robi Ludwig

ReplyDelete