A group of psychologists recently attempted to replicate 100 studies published in psychology journals in 2008. The results are in, and they aren't great. Roughly a third of results were replicated at a statistically significant level. Before I unpack some of the implications, I should put in an important qualifier, namely that this does not mean that those studies should not have been published or that the investigators did something wrong! A study isn't conducted and published in order to provide the final word, but rather to invite the rest of the community to scrutinize the work and attempt to replicate it. If a conscientiously-conducted and well-designed experiment gives a statistically significant result (positive or negative), that result is fair game for dissemination, discussion, and further scrutiny. This replication effort is not evidence that science is not being done, but rather evidence that science is being done right. We need more replication efforts, not fewer studies.
That said, it should give pause to people who point to a study and make prescriptions. I've written before about the cult of This One Study, and the challenges of actually reproducing so many psychology studies point up the follies of that cult. Moreover, as challenging as it is to get reproducible results in psychology, many educational research studies that I read do an even worse job on blinding participants and maintaining proper controls, Some of that is inevitable, given the nature of classroom work, but some of it is a result of people uncaringly playing with too many variables at once. If the person teaching the class is enthused about trying a new educational method and transmits that enthusiasm, and also varies the topics and assignments, while the person in the "control" group is not doing anything to get fired up and refreshed, you can't treat this as a single-variable experiment. Moreover, often educational experiments are tapping into deep wells of cultural baggage; will the experiment get the same effect if it is conducted with people from the opposite side of that cultural divide? Finally, it has been pointed out to me by at least one person trained in statistics and educational psychology that when you have one class section taught by New Method and another class section taught by Old Method, even if you have hundreds of students in each section you still have n=1 in the experimental and control groups, not n=hundreds, if your goal is to study methods rather than students. "n=1" is a fancy statistical term for "anecdote." Despite all of this, educational research tends to lead to a lot of teaching and prescription.
Strangely enough (or maybe not strangely at all, depending on your degree of cynicism), I suspect that Right-Thinking People will delightedly cite this finding that psychology experiments are not always reproducible as a way of reminding people that they don't necessarily know what they think they know, but will also continue to cite their favorite studies as rhetorical cudgels. Consistency conshmistency!