Spelke and Susan Carey at Harvard, Renée Baillargeon at Illinois, and Karen Wynn at Yale are the matriarchs of the large literature using looking time to study cognition in infancy. Including their students, and others, many researchers are now active in this field, and my dissertation used looking time as its dependent variable. Other common labels in the literature include gaze duration, preferential looking time, orientation, ocular fixation, visual fixation, and attention. The history of this literature is fascinating, and the flaws in the current methods are deep. Looking time measures have a long history, but have only recently come to be used to assess infant’s insights into events. Thus, in an unusual twist, most of the criticisms of this literature are based on long traditions of empirical work that existed before the criticized work started, and more recent research supports the criticisms.
A Brief History of Looking Time as a Dependent Measure
Transitioning from Perceptual to Conceptual Investigation
Originally, LT was used purely to determine whether infants differentiated between simultaneously viewed objects. This method was first used to establishing that infants had color vision, i.e., by demonstrating that infants looked for different lengths of time at objects which differed only in color (Staples, 1932; Valentine, 1914). Experimenters latter developed methods for measuring LT during the sequential display of single items (e.g., Caron & Caron, 1968). The latter experiments confirmed that infants habituate to individual stimuli, and also demonstrated that LT is regulated by visual complexity (e.g. the density of a checker board pattern). Because infants looked longer at objects they had not previously been exposed to, they were said to prefer novelty. “Novelty”, in this context, referred quite explicitly to perceptual novelty. However, latter research showed that adults and older children looked longer at objects they found to be conceptually novel, such as the front half of an airplane attached to the back half of an elephant (Berlyne, 1958; Faw, 1970; Faw & Nunnally, 1967). This research was considered key in supporting the “information-conflict resolution model of visual selection”, which claimed that longer looking indicated cognitive recognition of a mismatch.
Later, research began studying infants’ tracking behavior (following a moving object with their eyes) and found that infants anticipated the reemergence of a moving object that was witnessed to pass behind an occluder (Gardner, 1969, as cited in Bower, Broughton, & Moore, 1971). Researchers exploited this behavioral tendency by making the movement disjointed (i.e. the object did not leave the occluder on the same trajectory, nor in line with the same trajectory, at which it was traveling when it became occluded). This produced a condition that an adult would claim “is ‘impossible’ in the sense that no single object could make such a movement” (Bower et al., 1971, p. 186). Infants showed less anticipation in the “impossible” conditions and typically showed a “rapid checking back and forth between moving object and screen edge.” This work strengthened researchers’ belief that they could tap into infants’ conceptual knowledge by examining looking behavior.
It was not long before researchers found even better methods for examining infant knowledge using pure LT measures. The first of such tests examined whether infants used Gestalt grouping principles. Newly hatched chicks had been shown to use Gestalt principles in operant paradigms, and these experiments had been replicated (using operant conditioning) in infants (Bower, 1966). To test whether LT experiments could find the same results, researchers took advantage of infants’ tendency to habituate to objects (i.e. prefer novelty). In a habituation paradigm infants are repeatedly pre-exposed to objects or events until looking time to those stimuli or events goes down by some criterion amount (usually 50% of the first few trials). Researchers then show infants two scenes, both of which conform to the perceptual properties of the original display, but which differ conceptually.
The original experiment to use this method (Kellman & Spelke, 1983) habituated infants to a moving partially-occluded rod, and then showed infants either a solid moving rod or two moving rods which lined up with each other but had a gap where the occluder had been. Infants dishabituated to the scene with the broken rod, but did not dishabituate to the scene with the solid rod. This result means that the scene with the solid rod was experienced as more similar to the original scene than the scene with the broken rod. While this experiment was initially seen as simply testing whether infants’ perceptions obey Gestalt principles, it was later taken as initial validation for using LT to measure complex cognition.
The Violation-of-Expectation Paradigm
It is most common to date the start of the violation-of-expectation paradigms to Baillargeon, Spelke, and Wasserman’s experiment published in 1985. These researchers used a habituation procedure to demonstrate that 5-month-old infants had object permanence. They first habituated infants to a rotating screen. The screen started out flat on the table “pointing” towards the infant; then it rotated, keeping its initially back edge on the table, until it was lying flat while pointing in the opposite direction. Infants were then shown two types of scenarios, both of which began with the screen in the same starting position, but with a box placed behind it. In the first scenario, the screen started rotating and stopped rotating at the place where it would have hit the box. In the second, the screen continued rotating until lying flat on the table, as it had during habituation trials. The back-and-forth rotation was repeated for the duration of the trial, i.e., as long as infants continued looking. In the first scenario, the screen exhibited perceptually distinct movement, while in the second scenario, the screen’s movement was perceptually similar to the habituation trials. Yet, infants dishabituated more to the second scenario. This was taken to mean that infants found the second scene conceptually different from the original, which would require them to remember the box while it was out of sight. That is, the authors argued, infants need to mentally represent the box as occupying the space behind the screen, despite a lack of sensation emanating from the box, and recognize that the screen was passing through where the box should have been - they needed object permanence and a sense of how solid objects interact.
Hundreds of later experiments elaborated the violation-of-expectation paradigm both in terms of procedures and in terms of the cognitive abilities being investigated. Habituation is not used much, because it is cumbersome. Now you just show infants two things, and see which they look at longer. There has been no critical evaluation of the different methods, but the same types of results are seen in the published literatures, so who cares? The commonality of all methodological variants is a three-stage combination of planning, implementation, and data interpretation. First, experimenters arrange series of events such that one outcome is expected by adults and another is not. Next, infants are shown the scenes, and the duration of their looking is compared to control conditions. Finally, conclusions are made by comparing the results with those predicted based on adults’ expectations. That is, if infants look longer than expected at the events adults found unexpected, then it is concluded that infants’ conceptual understanding of the events is the same as adults. (Except in studies that involve listening, in which case they look less at unexpected things. Don't ask why.) This has resulted in the conclusion that infants understand not only object permanence, but also gravity, linguistic patterns, inertia, rules of social interaction, basic math, and many other things that high school students have considerable difficulty with. The main players claim that these abilities are 'innate', present at birth, with no need to develop. Real, present day nativism, at the extreme.
Why be Suspicious of Looking Time Measures?
Well, first one might be suspicious of what exactly an experimenter means when they conclude that in infant understands inertia. One might also point out that all abilities develop, and that nativist-empricist debates have been passe for at least half a century in psychology (and over a century in some fields). Putting that aside, however, there are several empirical reasons to be suspicious of these measures.
Reason to be Suspicious #1 – Looking measures and reaching measures disagree.
The first looking-time studies of object permanence in occlusion conditions led to different conclusions from search experiments using occlusion, but it is only recently that the robustness and pervasiveness of differences between these measures have been shown. There are now an array of tasks in which infants’ looking and reaching behavior evidence different amounts of knowledge, including studies of familiarity/novelty preferences (Shinskey & Munakata, 2005), the A-not-B error (Ahmed & Ruffman, 1997; Diamond, 1985; Hofstadter & Reznick, 1996), and knowledge of solidity (Keen, 2003). The difference in performance between search and looking-time tasks has also been found in early and mid- childhood (B. M. Hood, Cole-Davies, & Dias, 2003; Langer, Gillette, & Arriaga, 2003), where looking dissociates from other measures of knowledge (i.e., verbal response, Garnham & Ruffman, 2001). Stranger still, studies of older children often show them failing search tasks that looking-time studies suggest infants have the knowledge to pass (B. M. Hood, Santos, & Fieselman, 2000; Kim & Spelke, 1999; Vilette, 2002; Wynn, 1992, 2000). A few experiments (in Hauser’s lab) found similar dissociations between looking and reaching measures in macaques and tamarins. Returning focus to studies of object permanence, several studies have now shown that behavioral limitations cannot explain infants' failure to retrieve hidden objects, undermining the primary rational for using looking time in these situations (Shinskey, Bogartz, & Poirier, 2000, Shinskey, 2002; Shinskey & Munakata, 2001, Munakata, McClelland, Johnson, & Siegler, 1997).
Reason to be Suspicious #2 – Better controlled experiments don’t work.
Interpretations of LT research that attribute conceptual insights to infants have been broadly criticized for not paying enough attention to the physical properties of displays (e.g., Haith, 1998), for not accounting for familiarity and novelty effects (e.g., Rivera et al., 1999), and for not controlling adequately for the sequences of events infants see (e.g., Bogartz, Shinskey, & Speaker, 1997). Basically, if you control for the perceptual factors known to drive looking time (from all the research done in the 1960’s and 1970’s), or if you just put in better control conditions, then the ‘infants are brilliant’ story starts to fall apart. Further, dynamic systems models can capture critical aspects of infants’ behavior in LT experiments simply by assign weighted values to perceptual display factors based on infants experiences during the course of an experiment (e.g., Thelen, Schoner, Scheier, & Smith, 2001). (If you are dynamic systems fan, Andrew has a good discussion of a relevant 2006 study here.) Some of these models have led to exciting new predictions, that were confirmed. Together there is much evidence to suggest that infants’ looking-times are an additive function of the perceivable properties of the display and the amount of time infants have seen each of those properties, with no need to implicate conceptual distinctions.
Reason to be Suspicious #3 – If infants are so brilliant, why not kids?
It has long been noted that older children seem to lack knowledge that LT researchers claim to find in infants (e.g. addition skills, understanding of momentum, etc.). Recently, the looking times of older children have been examined in conjunction with other methods of knowledge testing. This research uncovered that older children (2.5 years) show similar looking patterns to infants when presented with possible and impossible events, but they do not perform any better than chance on search tasks to find occluded objects (B. Hood, Victoria, & Dias, 2003; Keen, 2003; Langer et al., 2003). If the nativist stance is clung to, this leads to the seemingly contradictory conclusion that “core principles demonstrated in infancy do not constrain the search behavior of older children.” (B. Hood et al., 2003, p. 62) Yes, that is a quote. Rather than reject their hypothesis, and conclude that infants are probably not as smart as they thought, the authors would rather conclude that there is some mysterious process blocking infant knowledge from affecting childhood behavior.
Reason to be Suspicious #4 – Unmentioned methods.
I have not witnessed anything egregious myself, but some of the stories I have heard from people who visit the major infant-testing labs are scary. For example, in studies designed to measure infant attention, researchers will bang on the displays and say the infants name to get them to pay attention before trials start. Once you start allowing things like that, it is impossible to control for experimenter bias, no matter how subtle or unintentional. Because I haven’t seen it myself, I won’t say much more (confessions or witness testimony welcome below).
Reason to be Suspicious #5 – The experiments are sensitive to wacky things, and often don’t replicate.
Before I got involved in this literature, I never really appreciated this whole “file drawer” problem that everyone kept talking about in hushed voices. I knew lots of people with experiments stuck in file drawers, but they were mostly inconclusive studies, or studies for personal interest, or just things they never got around to writing up. That is not the story here, and I fully understand the file drawer problem now. There is an amazing array of failures to replicate results in the infant looking time literature. Sometimes this is due to the “honest” process of “pilot testing.” Now, admitted, it is hard to get an experiment so that infants will cooperate, so pilot testing is legitimately needed… sometimes a lot. You never know what aspects of a display will distract infants, how long they will stay focused, etc., until you try it out. However, you can catch glimpses of how odd this can get even in the published literature. For example, in one published study there is a footnote stating that the object used in that experiment (a toy carrot) had a face, because the authors tried to do the experiment with a faceless carrot, and it didn’t work. Basically, the experimenters want you to believe they have discovered an innate, core ability that all infants possess… but in the same breath they must admit that it doesn’t work if the carrot lacks a face! Presumably, much of this "pilot work", stuck in file drawers, reveals similar issues. I won't belabor the problems with ecological validity.
In addition to all this, there are troves of theses, dissertations, and unpublished manuscripts, wherein infants looking times did not evidence amazing knowledge, and so journals were not interested. (Adele Diamond is still my hero for allowing my work through, against reviewer's initial recommendations.) I can only speculate about intentionally buried studies; surely some exist, but who knows how many. On top of this, there are even a few published failures to replicate. In addition to showing that the original violation-of-expectation study fell apart with better controls, my dissertation adviser (Susan Rivera) also published a failure to replicate Wynn’s famous demonstration that young infants could add and subtract. Rivera had intended to replicate Wynn’s finding, so she could add better control conditions; as with her past work, Rivera was attempting to attack the study with the extension, not with the replication. However, after months of continuous communication with Wynn’s lab, she was unable to replicate the finding.
One might ask why Rivera agreed to have me do a dissertation in her lab using looking time measures after those experiences. I can’t say. It has worked out fairly well though overall, and I would otherwise be unable to share this information with you.
Reason to be Suspicious #6 – Looking is a behavior.
There is much to be said here, an entire additional post. I will only mention that there are things worth studying besides how long an infant stares. Much of the time in these experiments, when infants look away from the displays, it is not a mere drifting of the eyes out of boredom (as one might expect given the "habituation" story). Instead infants look from one thing to another; they social reference; they lean, to aid vision; etc. All this is very worthy of study, and the criticisms above are not meant to apply to studies of infant anticipatory looking behavior nor other studies in which researchers measure infants' active visual tracking of moving objects.
Researchers necessarily use arbitrary criterion to start and stop trials in infant looking studies. For example, they might stop a trial if the infant looks away for 2 seconds. This all seems reasonable, but when you watch the videos it becomes weird. An infant stares intently at the display for 5 seconds, then turns sharply to look at its mother, then turns back sharply to fixate on the display. If that look-away is for 2.1 seconds, the trial is over. Meanwhile another infant is lounged back in the seat, vaguely looking ahead at the display for 10 seconds, then its attention drifts off to the side for a bit, then the infant shifts body position and its head is pointed back at the display. If the drifting occurred for only 1.9 seconds, the trial continues. Weird stuff.
In the face of my desire to see the LT literature go away, I would argue that there is the potential for a very exciting field of research waiting to emerge - the study of looking as a behavior.
P.S. I am headed out to central California (Fairfield / Davis) for a bit starting next week. I have noticed that a few readers are from the bay area. If anyone wants to try to get together, shoot me an email.
P.P.S. I apologize that the quality of citations drops towards the end. Most of this if from my dissertation, and much appears in the resulting article.