Fixing Psychology: Holiday Special - A Year of Scandals in Psychology

Feeling a bit of the holiday spirit, I wanted to reflect a bit about the three public scandals in scientific psychology this year, and on some of the responses to the scandals. The first was a carry over from last year, accusations of fraud that culminated in Marc Hauser's resignation from Harvard. The second was the high profile publication of Daryl Bem's article arguing for "Psi" (i.e. psychic) Phenomenon. The third was the exposure of Diederik Stapel's serial data forgery. Each of these cases has its nuances, and individual morals, but I think there is an overarching moral we might all meditate on as the new year begins.

Marc Hauser

Marc Hauser's fall from grace was fast and furious. Hauser came out of Bucknell's Animal Behavior program about 20 years before I did, and had made his way onto the Harvard faculty about a decade later. At the time, it was still extraordinarily rare for Harvard to tenure it's assistant professors; back in the day, or so I am told, they would actually advertise that a tenured position was open, and force their assistant professor to compete with all comers. Can you imagine the pressure? Hauser was one of the first in many years to make it through from a new hire to professor. He had a stellar record of achievement, with publications mostly falling into the realm of comparative psychology or primatology, but also a healthy dose of developmental work. Though I have only met Hauser briefly, and I thought he seemed like a nice guy I was never a big fan of this work. His experiments were perfectly good, but they mostly consisted of taking the worst of cognitive psychology's theoretical problems and gifting them to comparative psychology (especially when he came under the influence of Liz Spelke and Susan Carey, and started replicating their infant experiments with monkeys). His non-experimental work was not very original, but it was well presented (his Wild Mind's book typifies this). In full disclosure, it is likely that my mixed feelings about him also involved a jealousy factor - that one of the most prominent comparative psychologists in the world, and certainly the one with the most prestigious academic position, did work I didn't much like. At any rate...

Last year it was announced that an internal investigation had found him guilty of several charges of misconduct regarding published and submitted papers. Suspicions had been raised by his graduate students, as well as a few colleagues, and Harvard had taken the accusations seriously. Hauser took a year's absence, and resigned at the end of it. Alas, the information available to the scientific public is horribly inadequate, as Neuroskeptic points out. Based on the public information, Hauser's infractions could range anywhere from shameless misrepresentation of his experimental results (a major sin), to ill-conceived and biased coding practices (a mistake that would only be unforgivable due to his academic rank), to simple mismanagement of data (a completely forgivable sin). A report in the Chronicle suggests a combination of those activities. And we are all still waiting to hear if there will be federal charges, as much of the research under suspicion was federally funded.

My understanding is that Hauser has left academia to work with troubled youths, and I will bet he is an asset to whichever youth group he ends up working with. He is a smart guy, who ran a large lab well for many years. I could imagine him returning to psychology some day, though he will certainly never enjoy the prominence he once had. It is worth noting that in no case has Hauser been accused of whole-heartedly making up experiments. (In one much discussed instance, there is a question as to whether he ran a specific control condition, but there is no doubt that he ran the experimental condition.) Also, only a small number of Hauser's several hundred publications are in question, and after the smoke clears people will realize that he made contributions, even if he also made some mistakes.

Now, the part that no one is really talking about, is how many of Hauser's publications had other Harvard faculty on them. The initial accusations in this case arose when students went to look at the raw data, the video cassettes, the recordings of the experiments actually being done, and became suspicious. I'm not sure why no one is talking about this point more: For this type of fraud to occur (if it occured), it must be the case that Hauser's co-authors, colleagues, and students never sat down and looked at the data. Or if they did, they kept their mouths shut. This is a cultural problem, and it is a major one. I'm not saying that we should all be suspicious that our co-authors are fabricating data, but that we should all be interested in seeing the actual effects of experiments - not seeing if there is a star next to a graph in SPSS, but seeing the phenomenon of interest occur. Even a small amount of casual wandering around a lab should reveal this type of problem: "Oh hey, can I watch while you run a few subjects to get a feel for the procedure?" "How does the coding work, can you show me?" "Can I see the videos for a few monkeys in each condition, just to see what the effect looks like?" If Hauser is guilty of something really scandalous, how many of Hauser's co-authors failed to do even the slightest bit of this causal diligence? More on this below.

Daryl Bem

There is a soft spot in my heart for Daryl Bem. He is emeritus professor at Cornell University, who wrote an article about how to get published in Psychological Bulletin that has some tremendous advice in it, and that I have passed along to many others. (This one, here.) Bem has also done a wide variety of well respected work in social psychology, and has also done some less respected work about the existence of psychic phenomenon. I should note from the start that no one doubts that Bem ran the studies he says he ran and found the findings he says he found. That said, he says he found evidence for precognition, i.e. cognitive awareness of events before they occur. The study was published in the top social psychology journal, which led to a major outcry. (It was published online-first late last year.) In defense of their decision to publish, the editors claimed that if the methods were solid, and the topic of interest, the reviews good, etc., then it met their editorial criterion and they felt obligated to accept. They also allowed a critical comment to be published immediately following Bem's article, along with a rejoinder. Of course, many disagree about whether the editors really did their job. Andrew Wilson, for example, takes them head on here and here.

This month's issue of Review of General Psychology had an article by LeBel and Peters titled "Fear for the Future of Empirical Psychology: Bem's Evidence of Psi as a Case Study of Deficiencies in Modal Research Practice." It is highly recommended. They show how standard practice in a psychology lab can easily lead to interpretation biases in favor of a chosen theory. As they put it:

The present commentary adopts Bem’s article as a case study for this discussion simply because it makes the tension between confidence in methods versus results unusually obvious and difficult to ignore. (p. 357)

They argue that standard practice in psychology over-emphasizes conceptual replications, rather than rigorous replications; under-emphasizes verification of experimental equipment and methods; and relies too strongly on the crudest forms of null-hypothesis testing. These problems conspire to favor a proliferation of weak theory, rather than the hard work of building and testing strong theory (see Andrew again). This all leads to our hypotheses seeming, a priori, as if they are logically necessary (to explain the data), rather than tentatively useful (as one of a myriad of possible explanations that might lead to further work).

This aura of necessity makes psychological hypotheses inherently appear central to the overall knowledege system, regardless of what they actually claim (p. 375)

When the dust settles for this one, we will realize is that there is no embarrassment here for Bem. He has earned the freedom to research what he wants, he came up with some very clever methods, and he spent eight years implementing them. The only thing Bem can really be criticized for was his obviously hand-wavy reference to quantum mechanics as a way to explain the phenomenon... but it is quite likely he put that in only after the reviewers ordered him to. The scandal, if there is one, is about our discipline, the particular journal, and its editors. Regarding these issues, Paul Meehl wrote a tremendous skewering of soft-psychology twenty years ago that every psychologists should have to read, and the current article in The Review does a darn good job too. In the end, quite aside from the issue of whether or not this study should have been published in the first place, several failures to replicate have now been made public, and, assuming failures to replicate are given the same opportunities as confirmations, that is how science naturally corrects itself. The fact that the editors who published the original paper refused to even consider publishing a failure to replicate is certainly the most embarrassing part of the whole thing. The moral of this story is that replication is a crucial part of science, that it doesn't occur nearly often enough, and the "file drawer" problem is a very real problem. More below.

Diederik Stapel

This is a more recently breaking story, and many details remain unknown. But even with only what we know now, this is certainly the worst of the scandals. Stapel has apparently been completely fabricating studies and publishing them in at least 30 papers over the last fifteen years, while rocketing up the academic ladder from newly minted Ph.D. to Dean, and garnering a "Career Trajectory Award" in the process! Possibly several scores of papers will be retracted before all is done. He even supplied students with fabricated data which likely ended up in their dissertations. It is so bad that he returned his own Ph.D. Because they take these things seriously in the Netherlands, Stapel might even be in for some jail time.

Earlier this month, the Chronicle had a back page which focused on the Stapel case; written by Alan Kraut, executive director of the Association for Psychological Science (APS). Kraut tries to throw a positive light on this case by noting that Stapel was undone by suspicions raised by students and colleagues, and arguing that this is a positive example of scientists policing each other. Sigh... If the best thing you can say about the arrest of a criminal is that some of his friends noticed he was up to suspicious things after knowing him for only 13 years...that doesn't seem too positive to me. Kraut further tries to support his position by promoting two recent publications evidencing that there are a range of ways in which psychologists can subtly (and often unintentionally) manipulate data that "allows presenting anything as significant", and that such minor forms of misconduct are rampant in the field. Oh good, that really helps the case.

He then makes some suggestions, such as "requiring" scientists to stick with their "original plans" for data collection and report all collected data, and requiring complete data sets to be placed in online repositories. The first set of suggestions is silly, because unlike late stage medical trials, experiments in psychology often adjust on the fly for completely legitimate reasons, especially during early phases. I'm not sure I have ever had an experiment that ended in exactly the form I had originally proposed. Further, most psychologists would happily give honest narratives of the processes that lead to their final experiments, if only journals would publish honest reports. Journals want stories cleaned up, they want articles that claim that they found results which were predicted from the start, they don't want to hear about the messy details of how the science of psychology really works. I am not ashamed to admit that I have rewritten papers after rejection for these reasons, and had the sanitized versions accepted. It is the journals that want lying, not the authors. The latter suggestion, data archiving, is just absurd. First off, most raw data would be utterly useless if made public. More importantly, however, if I were willing to fudge data, and I was any good at it, I would be quite willing to post it online. Remember, that the exact case in question involves a guy who fabricated data (not results, but the data files themselves) and gave them out to colleagues and graduate students to analyze. Not that an online repository is a bad idea (especially in the case of large, publicly funded studies that have the potential for far more insights than any one lab can discover), but the idea that they will stop forgery is absurd.

Before I start going into the larger moral of all these stories, meditate for a second on this: Do any of you believe that a chemist could have run a moth-balled research lab for 13 years, doing no experiments at all, garnering awards and promotions, supervising graduate students, publishing over 130 professional publications, and none of his colleagues would have noticed? Do you really believe that not a single senior faculty member would have wandered into his lab and noticed that the experiments were not being done? Wouldn't they have asked to see a new technique in action (or at least sent an grad student over to learn it)? Or, for another line of questioning: How important could an empirical finding in chemistry be, if no other chemists want to replicate it? How useful could a new synthesis or technique be if no one else is going to try to use it more or less exactly as the discoverer did?

The Solution

I know I'm a social scientist, but this seems like a simple social problem with a simple set of social solutions. The problem is in what we reward, and the solutions are all straightforwad pleads for sanity. We reward absurd levels of academic productivity (e.g. papers written, grants garnered, Ph.D.s supervised), and the venues that act as gate keepers for these prizes desire unrealistically clean research stories that result in press-worthy super-results. Frankly, there is NO WAY to advance within this system without lying, in press. Yes, that's correct, I am accusing EVERY prominent research psychologist of dishonesty. Most of them in minor, professionally accepted ways; but pervasively, and that is part of the problem. They write introductions to tell sanitized rather than accurate stories; they apply for research grants to do work that will be mostly finished by the time the funding arrives, then use the money to fund the next project, which the funding agencies and their reviewers know nothing about; they put their names on papers when they cannot vouch for the accuracy of the reported results; they promote and support colleagues they know little about; etc. The unrealistic expectations of the field demand these lies, and there is no other way to advance to top levels.

How do we solve these problems:

1) We refuse to reward people for unrealistic productivity. We fail to believe that anyone can really author 30 experimental papers, twelve book chapters, and five funded grant applications a year: Putting your name on an experimental paper should indicate that you are in a position to vouch for the claims made. But what if your only role on the experiment is as a statistical consultant, or you are an author because the studies happened to have happened in a lab that has your name on the door? Even then, by putting your name on the paper you are vouching that the experiments were done, that the data was coded in a proper way, that it was accurately entered, that it was analyzed correctly, and that the results are interpreted in a reasonable way. That is not impossible or unrealistic, it just takes time. Authorship should indicate more than a vaugely supervisory role, and the maximum believable level of professional productivity should reflect that. (I am also a fan of the less extreme position, common in some other fields, where the acknowledgements of a paper explicitly state the role of each author, making it clear what they are vouching for.)

2) Except in extreme circumstances, we do not give top-rewards to experiments that are unlikely to be replicated - not conceptually, but as done by the researcher. We then need to prominently report when highly publicized results fail to replicate, and have adequate mid-level venues to report simple replications or replications with minor extensions. Both types of reports need to be viewed as important enough as services to the field, necessary enough for the accumulation of cumulative knowledge, that they are not beneath the role of good journals. By the way, the extreme circumstances that would justify an exception are when researchers have done something that is logistically extremely difficult to replicate. Those circumstances usually involve a very large team of well coordinated people (e.g. a team of people studying the same New Zealanders for 40 years), so forgery is unlikely. Even in those circumstances, it is reasonable to check with those who would be in a position to verify that the reported work was actually done.

3) Gatekeepers at funding agencies and journals need to seriously readjust their attitudes. Grant reviewers need to stop demanding so much pilot work that studies are likely to be finished before the funding for them arrives. More journal reviewers need to value honest reports of research practices. More journal editors need to value solid incrimental science. I have nothing against the pressure to "publish or perish", but when that is combined with the other pressures mounted by these gatekeepers, it becomes pressure to "lie or perish." If that culture is not changed, psychology will only find itself mired in more scandals, and scientific advancement will stall out completely.

4) Researchers should be expected to have time built into their schedule to notice when things around them are awry. Supervisors should check what their students are doing, and everyone should check what their colleagues are doing. Everyone should have the time to replicate important results. This is an essential part of research in every other science that I know of. People want to see techniques work, see data being collected, see it being analyzed, and often they want to try it themselves. They do not want to do this constantly, because much of the work is boring, but they are naturally curious about what their colleagues are doing. It used to be that "facts" in chemistry were established by brining together a group of scientists to witness a repeat showing of the claimed chemical reactions. We don't need to go back that far in psychology, but we need to move in that direction. The amount of production we demand of our top researchers excludes the opportunity for these casual, but essential checks.

Conclusion

All these suggestions would reduce the perceived "productivity" level of top researchers, but only because they would limit illusory productivity by forcing due diligence. The reward structure in psychology would have to be recalibrated so that we were incredibly suspicious of people who appeared to be impossibly productive. Certainly some people will be able to do more than others, and top producers should be rewarded, but illusory productivity should not.

In Hauser's case, responsible people threw up the red flags that eventually lead to his resignation. In Bem's case, the unusually controversial nature of his publication lead to one of the painfully rare instancnes of psychology, as a field, actually going through the process of responsible scientific result-checking. In Stapel's case, being surrounded by colleagues who had more free time, and different professional expectations, would have made the entire scandal impossible. We need to change buisness as usual. The moral of all three scandals that hit psychology this year is that people need to slow down enough to do deliberate, thorough, responsible science.

----

Don't miss the Part Two of 2012's year of scandals!

2 comments:

AseDecember 26, 2011 at 7:55 AM
I really enjoyed this post (as I have been really affected by these stories too). A couple of things: Stapel is in Netherlans. Not too far from Sweden, of course (where I am), but distinctly different. For one, we have much more living space here....

And, two, similar things has happened in much harder sciences, over long time.

This book:
Plastic Fantastic: How the Biggest Fraud in Physics Shook the Scientific World [Hardcover]
Eugenie Samuel Reich

Chronices how Jan Hendrick Schön manufactured data at Bell labs. Data that had him hailed as a super-star, before the fraud was discovered. Sure, shorter time, no manufactured data to doctoral students (but plenty of wasted time for other post-docs attempting to replicate). But, Bell labs. Physics. It may be the incentives in science. Perhaps. Or perhaps we just don't know.
Eric CharlesDecember 26, 2011 at 11:50 AM
Ase,
Thanks for the comment! I went back and fixed the country, sorry about that.

I know these things happen in the "harder" sciences too. Though I suspect they are rarer, I don't have any proof. There is even suspicion that Mendel faked his heredity data!

In any case, I didn't know about this particular instance. Pretty impressive. A brief skimming of the web shows that debate about the obligations of co-authors became a central theme in the discussion - I still don't know why it isn't playing a bigger role in psychology's discussion.

Also, I think you are correct that the incentive structure that rewards these practices is common to most all of modern science. Psychology, however, seems to have fewer checks against it.

Fixing Psychology

Saturday, December 24, 2011

Holiday Special - A Year of Scandals in Psychology

2 comments: