Fixing Psychology: Holiday Special - A Year of Scandals 2012

A bit delayed but... its that time of year again. Time to look back on the Year of Scandals in psychology. (Last year's edition of holiday joy can be found here, and here.) This past year has seen a number of problems with our field come to light. It has also seen a rise of public consciousness regarding these problems, and a host of suggested solutions. Much is still up in the air, but it does feel like we are moving in the right direction. In addition, there is growing consciousness of some wider problems in science and in academia.

What salacious stuff happened this year?

Dirk Smeesters and Lawrence Sanna

Scandals 1a and 1b: What do a marketing researcher at Erasmus University Rotterdam and a social psychologist at the University of Michigan in Ann Arbor have in common? Both resigned rather abruptly after investigations initiated by a mysterious "data detective" later identified as Uri Simonsohn, from the University of Pennsylvania. Even after his identity was revealed, Uri refused to say why he had suspected fraud, or what procedures he used to confirm and report his suspicious, at least until his methods were published. It turns out that Uri was pretty nice about everything. He looked through the liturature until something popped out to him as potentially suspicious (i.e., the report of super-significant, really strong effects in a literature filled with mostly small or moderate effects). Then he looked closely at the reported results. It is was just the type of things a responsible member of the field might do. He just dug one little step deeper than most people would have. For example, it might seem obvious (if one suspected fraud) to check if the reported standard deviations look realistic, given the other reports in the literature. Uri did that, but he also looked at the variation in reported standard deviations. He found, for example, that across several of Sanna's papers the standard deviations were neigh identical. He still responded nicely and responsibly; he sent a letter to the authors stating his concerns, and asked if he could see their raw data. After he got the data, he became more concerned, finding, for example, neigh identical ranges in the raw scores for different sub-groups... as if someone had artificially generated the data from a program that let you specify mean, range, and standard deviation. Still he was nice! He pointed this all out to the authors to see if they had a way to explain the similarities. In one case he contacted the home university when an author became unresponsive, in another case the university contacted him with an already-in-progress investigation team.

Not only did this expose two fraudsters, but when everything had come to light it revealed that they were quite easy to catch, and that reasonable vigilance on the part of their colleagues (in this case, the people presumably reading and evaluating their papers) would have never let them get away with it... at least not so easily.

Diederik Stapel - Redux

Scandal two: Speaking of pervasive problems that could have been stopped by routine curiosity... recently the final report was released from the investigation into Diederik Stapel, the social psychologist who resigned last year after it was revealed that he forged his way to massive success. The final report didn't just expose the depth of Stapel's fraud, but it went to pretty hefty lengths to criticize the "sloppy" research culture and cult-of-personality thinking that made his fraud possible:

It is almost inconceivable that co-authors who analysed the data intensively, or reviewers of the international ‘leading journals’, who are deemed to be experts in their field, could have failed to see that a reported experiment would have been almost infeasible in practice, did not notice the reporting of impossible statistical results, such as a series of t-values linked with clearly impossible p-values, and did not spot values identical to many decimal places in entire series of means in the published tables. Virtually nothing of all the impossibilities, peculiarities and sloppiness mentioned in this report was observed by all these local, national and international members of the field, and no suspicion of fraud whatsoever arose.

.... The Committees can reach no conclusion other than that from the bottom to the top there was a general neglect of fundamental scientific standards and methodological requirements.

...Not infrequently reviews were strongly in favour of telling an interesting, elegant, concise and compelling story, possibly at the expense of the necessary scientific diligence. (p. 53)

That is page 53, out of 104. Uhg! Does it get better on the next page? Nope:

...This raises the question as to whether leading journals are sufficiently critical regarding publications that make no essential contribution to theory building in the field.

Time and again journals and experienced researchers in the domain of social psychology accepted that Mr Stapel’s hypotheses had been confirmed in a single experiment, with extremely large effect sizes. Evidently he alone was in a position to achieve the precise manipulations needed to make the subtle effects visible. People accepted, if they even attempted to replicate the results for themselves, that they had failed because they lacked Mr Stapel’s skill. However, there was usually no attempt to replicate, and certainly not independently. The few occasions when this did happen systematically, and failed, were never revealed, because this outcome was not publishable. (p. 54)

The effects of this report will be felt for sometime. Hopefully.

Milena Penkowa

Scandal three: Alright... I don't like it when people conflait neurscientists with psychologists, but because they do it looks bad for us when scandals hit neurosci. Penkowa, a Danish neuroscientist, has a story that seems like a hodgepodge of last year's scandals. Like Hauser, she was turned in by graduate students who could not replicate her findings. Like Stapel, she had a stellar career trajectory with high-profile cases and big prizes, and now there is suspicion she never even did many of the experiments in question. Like both, there was no initial blame, not even a touch, being pointed at the long list of co-authors. Also similar to Stapel, these charges came to light a few years ago, but the report just came out this year. Milena resigned her position in 2010; the recent report concludes that at least 15 of her articles contained 'deliberate scientific malpractice'.

You can prove anything! (p < .05)

Sort-of Scandal four: After decades of people complaining about the way we publish in psychology, including the types of things we let researchers get away with continuously, there is finally some traction. After 2011 saw Simmons, Nelson, and Simonsohn, show the horrific extent to which commonly accepted practices could be pushed, many wondered what would come next. Their article was blunt enough that the mainstream has finally taken notice, making this very old set of complaints into a modern scandal. It has created enough attention this year alone, in fact, to justify subheadings...

Perspectives on Psychological Science

November saw an issue of Perspectives on Psychological Science dedicated to the growing crisis, with articles from old and new players who have been mostly shut out of mainstream venues in the past. It features such great titles as "A Vast Graveyard of Undead Theories" and "Scientific Misconduct and the Myth of Self-Correction in Science", along the understated titles, for deeply deeply insightful articles like "Why Science Is Not Necessarily Self-Correcting". I should note again that a blog was set up by the editor of POPS to encourage discussion of this issue (and future issues). If you are interested, you should go use it.

Simonsohn strikes again
If taking down two fraudulent scientists doesn't seem like enough good deeds for a year, it turns out that Uri had something even a bit more powerful up his sleeve. As I previewed following the APA convention, Uri, Leslie John, and Joseph Simmons have developed a statistical tool to detect "p-value fishing" in meta-analyses. This occurs when people have a bunch of data, then keep messing around with it until it reaches past the .05 threshold most journals and reviewers want before they will even consider publication. This new meta-analytic tool doesn't uncover data forgery, as Uri's more on-the-ground sleuthing did, but rather detects pervasive dishonesty in the data analysis phase. We have yet to see the full effect of this method, but my suspicion is that several individual researchers, and a few whole literatures will be brutally destroyed by this technique over the next few years. If nothing else, it will make people much more sensitive to what data should look like if a hypothesis is correct. (It has made me more sensitive at least.)

That happens all the time!
If you don't believe me that Uri's new methods will have major effects, then you missed Masicampo and Lalande's article in The Quarterly Journal of Experimental Psychology. Examining articles published in Psych Science, JPSP, and JEP: General, they identified over 3,000 p-values between .01 and .10. A surprisingly high number of those p-values were between .045 and .05. Of those, a surprisingly high number were between .04875 and .05. We all know why this happens, because we have all been trained to know how to nudge a p-value below .05. We also know that, because there wasn't really any way to prove this type of behavior, people who engaged in it got rewarded, while the rest of us had no choice but to stew. Times, they are a changing.

The IRB couldn't pass the IRB

Sort-of Scandal five: Moving away from fraud, and towards other broad disciplinary issues: In a very nice paper that has mostly flown under the radar, Dyck and Allen from Griffith University in Australia asked the question "Is mandatory research ethics reviewing ethical?" To answer the question, they quite brilliantly just applied the same criterion the IRB uses. As they put it:

"When the standards that review boards use to evaluate research proposals are applied to review board practices, it is clear that review boards do not respect researchers or each other, lack merit and integrity, are not just and are not beneficent."

They further point out the lack of reliability and the immense social cost. Most insightfully, to my mind, they point out that the entire basis for the IRB review (that the researcher cannot be trusted) is inherently undermines the methods the IRB uses to assess studies (asking the researcher to per-report what they will do). I hope this paper will get a lot more attention, as a deeper discussion of research review boards has been needed for some time.

The Solution

Last year, after reporting on the major scandals, I set about to suggest a solution. Luckily, this year, I feel no obligation to to come up with a new set of recommendations! For one thing, my old recommendations still hold (and a revised version of that solution is currently under review as a letter to the American Psychologist). For another, one of the things worth reporting was a massive influx of concern by the field as a whole, and greater awareness of the potential solutions those "in the know" have been advocating for some time.

So... I guess we will just have to wait and find out if anything has changed by next year.

P.S. Let me know if I missed anything!

Fixing Psychology

Tuesday, January 15, 2013

Holiday Special - A Year of Scandals 2012

1 comment: