Why Inferential Statistics?
Imagine we
had a question: “Do men and women differ on X?”
No matter what “X” is—height, empathy, knowledge of 13th
century Spanish history, or anything else—we know that any given man will be
different than any given woman, but what we don’t know is how men “on average”
differ from women “on average.” That is, when we asked our initial question, we
probably wanted to know how the mean for men compared with the mean for women. But
we will never know the actual mean
for men or the actual mean for women,
because that would involve measuring more than 7 billion people! So, we need
to, somehow, get a sample of men and a sample of women, compare them, and draw
a conclusion from that.
Let’s say we get a sample of 100 men and 100 women, and we ask them about
Spanish history. In our sample, women average 68% and men average 63%. That is the result for our sample and it is a rock-solid result. But, remember, we
aren’t particularly interested in our sample – we are interested in “men vs. women,”
not “men we happened to look at” vs. “women we happened to look at.” We want to
use our sample to infer something about the larger population (and that is what
puts the infer in inferential
statistics).
Making these inferences has a serious challenge: Any difference we see in
our samples could be due to chance! Sure, our group of men differs from our
group of women, but that doesn’t tell us much in itself, because if we picked two
groups of men at random, they would also differ. This is a serious problem: Given
that any two samples will obviously differ
from each other on virtually everything
we try to measure (if we can measur in enough detail), how can we use samples
to draw conclusions?
All is not lost, though, as a little intuition will tell us. Differences
found due to random chance are likely to be small, and are likely to be of a
very different size if we do the same test again. If we could repeat our test
again and again (with new samples), it would help us make better inferences: If
we got samples of 100 men and 100 women 20 times, and every time we found women
scoring 5 points higher than men, we would be much more confident in our
finding. While replication isn’t usually practical, we can use one sample to
guess what would happen if we replicated. And, our intuition can help us here
as well: If we find a small difference between groups after measuring only a
small number of people, that is more likely to be due to random chance than if
we find a big difference between groups after measuring a lot of people. Breaking
that down: 1) Large differences are less likely to be due to chance than small
differences, and 2) the bigger the size of the sample, the more point 1 is
true.
If we could get a good mathematical handle on the “less likely” vs. “more
likely” part of those claims, we could start to use our samples to make really
good guesses about how replicable our results are. That is, we could use our
single sample to reliably predict what would happen if we replicated our study a
bunch of times. We already agreed above that if the result replicated over and over again, then we would be confident in drawing conclusions about the larger
population. And now we know that we can use a single sample to draw conclusions
about what would happen if we had many samples. Putting the last two sentences
together: If we could get some math behind us, we can use our single sample to
make reliable inferences about the larger population.
Thus, no matter what inferential statistic we use, the question is always
something like: “This difference we found in our sample, what is the probability
we found a difference that large,
just by chance?” When it is unlikely that our observed difference is due to
chance, we feel confident it is real.
This comment has been removed by the author.
ReplyDeleteExcellent Blog! I have been impressed by your thoughts and the way you.Dr Robi Ludwig
ReplyDelete