A blog about problems in the field of psychology and attempts to fix them.

Sunday, May 31, 2015

Stats Help Part 2: Null and Alternative Hypotheses

Many struggle with discussions of null and alternative hypotheses. The logic behind phrasing research questions in that way can be a bit unintuitive. The logical involves what you can or cannot prove given an if-then statement, and I'll put a paragraph about that down at the bottom. In the meantime, here is a much easier way to understand what is going on:

(If you missed it, here is Stats Help Part 1: Independent and Dependent Variable)

What Type of World do we Live in?
For the purpose of hypothesis testing, there are two types of worlds we could live in: A boring world, or an interesting world. In the boring world one thing doesn't affect another thing, and in the interesting world things affect each other. Alas, we can never be absolutely certain which world we are in. For example, you might think that a person's grades in college English classes affect the number of mistakes they catch when editing a memo your company is going to put out. Nothing you ever do will make you absolutely certain that there is such a relationship across all grades gotten by all students in all English classes, and across all possible documents with all possible types of mistakes. That is a crucial point, so I will rephrase it in a more general way: When you have a hypothesis that involves very large and ambiguous populations, you will never be certain about what is true for the entire population.

What can you do then? You can gather a sample, and try to make an inference about the larger population. (That is why it is called "inferential statistics.") For example, you can gather 100 students who did poorly in English classes, and 100 students who did excellently in English classes, and you can have both group correct several documents, quantifying their success as editors. The groups will have different success rates, to be sure, because any time you gather data in enough detail you will find a difference between your groups. But remember, you don't care about your samples for their own sake, you care about using the results to guess what type of world you live in. What are the options:

1) You could live in a boring world, a world in which English grades do not relate to editing skills. In such a world your samples would still have different scores, because of "sampling error", but you would expect those differences to be very small.

2) You could live in an interesting world, a world in which English grades do relate to editing skills. In such a world your samples should have larger differences.

Alas, we have no idea how big a difference we would expect if we lived in an interesting world, so it is really hard to say what would expect to find if we were in that world. We can, however, predict how our samples should look if we were in the boring world. With the help of a few assumptions, we can say how likely we are to find group differences of different sizes given the size of our sample and the assumption that the world is boring. We call the probability of a given difference a "p-value", and small p-values indicate that it is very unlikely that we are living in a boring world.

Interpreting Results of a Study
If it is unlikely we live in a boring world, then will reject that possibility. We usually do a study because we hope two variables relate, which means we are hoping for results unlikely to have come from a boring world. Thus we are usually hoping to find a low p-value. As a general rule, we hope for a probability less than 5% that our data came from a boring world: p < .05. You could set other values, but by convention .05 is used by most disciplines under most circumstances.

If we see a large p-value, we say the result is not significant. The safe bet under such circumstances is to say that we "failed to reject the null hypothesis." We didn't prove there was no difference, but we really have no way of proving that there is absolutely no difference for certain. Rather, a high p-value means that even though we did find a difference between the groups (because you always do), the result is so small it could easily have been created by chance sampling error in a boring world. For example, if the p-value is .33, that means we would find a difference at least that big by random chance 1 out of 3 times when there was no relationship between the variables in the wider world. A better study in the future might reveal an affect, but our study did not find evidence for such.

If we see p < .05, we reject the null hypothesis and conclude that we live in an interesting world, where the variables we are interested in actually do affect each other. We call that a statistically significant result, which does not mean that our result is "important", it only means that it is unlikely to be due to random variation in a boring world.

We can never be completely certain of our conclusions, of course. No matter what our results are, we will still be making probabilistic claims, based on sample data. Also, having a significant result doesn't mean that the populations are different in exactly the same way our samples are. For example, if our good-grade students caught 10 more mistakes on average than our poor-grade students, we couldn't conclude that we would find an exactly 10 point difference if we measured hundreds of thousands of students, or millions of students. That said, if the result is statistically significant, we would conclude that good-grade students were some amount better, in the much larger world, which we could never completely measure.

The Logic Behind Focusing on the Null Hypothesis
Ok, so if you followed all that, and you want to understand the logic behind this way of setting up problems, we need to go back to the basic rules of logic that you probably haven't seen since high school geometry class. When you have an "if-then" statement, there are only two things you can do with it that count as good logic. One of those was called "modus tollens." The way that works is that I can say "if A, then B" and I tell you "not B," then you can conclude "not A." For example, if I tell you that "if it is raining I carry an umbrella" and I also tell you I am not carrying an umbrella, you can conclude that it is not raining.

The logic is not as air tight when we are making statistical claims, but its form is the same: If there was no effect, we will find a small difference between groups. We did not find a small difference between groups, and based on that we conclude that there is an effect.

No comments:

Post a Comment