A blog about problems in the field of psychology and attempts to fix them.

Friday, May 22, 2015

Stats Help: Dependent vs. Independent Variables - Understanding the Difference

The terms Dependent and Independent can be a bit unintuitive, and many stats students struggle with them. The easier term should be "dependent", and that term is easiest to understand in an experimental context. The dependent variable is what you measure at the end of an experiment.

In the Context of an Experiment

In an experiment you manipulate at least one variable and measure at least one variable. 

Let's say I brought people into the lab, and had them sit in rooms of three different colors (red, blue, green). After 20 minutes, I measured how angry they were. That experiment is testing whether anger level is dependent upon the color of the room. Thus "Anger" is the Dependent Variable, because it is the one that depends upon something else.

Note the role of experimental context: I could have done a different experiment. I could have brought people into a white room and tried to make them angry for 20 minutes, then given them a choice of which room to go into. That experiment would only make sense if I thought maybe the color of the selected room might be dependent on how angry the participant was. Thus, in this experiment, "Color of Room" is the dependent variable.

Once you know which variable is "dependent", the other is "independent." Easy as that!

Why that label? The values of the independent variables do not depend on the values of the other variables in your study.

Outside an Experimental Context

What happens when it is not an experiment? Sometime the answer is obvious. For example:

Why might you studying the hair length of men and women? Would it be because you think a person's dangley bits depend upon on the length of their hair? Or would it because you think that the length of a person's hair (on average) depends on the type of dangley bits they have? Note that, while it is technically possible to experimentally change people's dangley bits, we are not doing that in our study, nor are we changing their hair length. That said, I think we can all be pretty certain that cutting someone' hair doesn't change what dangly bits they have. Thus, the only thing we could reasonable suspect was that hair length will be dependent upon gender, and so we have to label "Hair Legnth" as the dependent variable.

Sometimes, however, it is much less obvious which variable is which. For example:

If you had national level data on GDP per capita (the average amount of value a worker in a given country produces) and on the proportion of doctor in a country. With that data set, you could make an argument either way. Maybe you think there are more doctors when a country generates more money per citizen. Maybe you think having more doctors allows a workforce to be healthier and more productive. In this context, the the correct way to use the labels will depend on what how you think the causality works. Whichever one you think depends on the other, you call that one you dependent variable.

Thus, when you read a non-experimental study, and someone labels their dependent and independent variables,you should always pause for a second. In that pause, ask yourself, "Does that causal direction make sense? Do I believe that their Dependent Variable might depend on their Independent Variable?" If the story doesn't sound likely, continue reading with caution

Making and Reading Graphs

As a general rule, if you are making a graph, the independent variable is on the horizontal "X-Axis", and the dependent variable is on the vertical "Y-Axis". Thus, if you were reading a paper studying the relationship between "GDP per Capita" and "Doctors per 1,000 workers", and you saw a graph with GDPPC on the Y axis, that suggests that the authors thing GDPPC depends on the proportion of doctors in a given workforce.

(Garph linked from the Narraganstt School Website)

Alternative Labels

Because these terms can be confusing, some authors use alternative terms. Wikipedia lists many options:

Alternatives to "Independent Variable" - predictor variable, regressor, control variable, manipulated variable, explanatory variable, exposure variable, risk factor, feature, input variable.

Alternatives to "Dependent Variable" -response variable, regressand, measured variable, responding variable, explained variable, outcome variable, experimental variable, output variable.

Part 2: Null Hypothesis Testing


  1. Let´s imagine for the first example that the room color doesn´t produce any anger change in the people. Is the anger still a dependent variable?

    1. Jorge, usually the terms label the intended function of the variables in the study. So, you would say something like: "I did a study with room color as the independent variable, and anger as the dependent variable, but the results were not statistically significant." My guess is that this convention arose because the terms are commonly used when proposing a study, and the final report is - in theory - tied back to what you proposed.

  2. I was pinning away for such type of blogs, thanks for posting this for us.Robi Ludwig