I'm in the middle of Chapter 3 of Jussim's book. Right now he is covering the Pygmalion study by Rosenthal and Jacobson (1968), which found that if teachers were told that student 1 had strong potential to improve and student 2 had less potential to improve then there would be a difference in improvement, even if the descriptions were attached to the students at random, without any evidentiary basis. Jussim has his critiques of the study and its limitations, as well as the complexity in follow-up studies, and I am not really qualified to sort through the detailed strengths and weaknesses of a myriad of psychology experiments and meta-analyses. Instead, I want to muse on the attraction of theses studies in the wider academic community outside of the psychology department.
First, let me note that it is indeed attractive to academics. When I read his description of the study it vaguely rang a bell in my mind, but I could easily be confusing it with some other study that got a lot of attention. My wife, however, had definitely heard of it when she was studying for her elementary school teaching credential. Indeed, given the potential implications of a study like this for how people teach, one would hope that prospective teachers do learn about this work, provided that it is reliable.
However, Jussim notes that the follow-up work painted a more complicated picture than the original study. This should not be surprising to anybody who has ever engaged in any sort of scholarly activity. Academic research is complicated enough, and real-world phenomena even more so. The maze of studies that did and didn't replicate at varying levels of statistical significance and under varying conditions tells me that this is a complicated phenomenon of human behavior, a question that is worth getting to the bottom of. As a non-specialist, I want to read the final review article on efforts to sort ouf this phenomenon, not the first high-profile study. I know that in my own line of work, on optics in biology, the first impressive deployment of a new imaging technique always relies on a particular apparatus looking at a particular specimen, and achieving the promised benefits in a robust manner always takes a lot of hard work by a lot of people in a lot of different specialties, sometimes collaborating and sometimes competing to push each other harder.
Of course, nobody wants to hear "First there was a promising idea, then hundreds of people worked for years to make it work reliably." Especially not for human interactions, where we tell ourselves that since we don't need expensive equipment it really ought to be simple and reduce to some short script, so just tell us what to do and make it easy. We all know about the original stereotype threat study, so let's just move some questions to the end of the test and change the description at the beginning of the test. We know about the study of Rosenthal and Jacobson in the 1960's, so let's just get teachers psyched up to believe that all of their students will improve, and BOOM! Everybody learns more! (And the people who are most receptive to This One Study seem to be the ones who most enjoy the "Let's get the audience fired up!" part of a workshop.) We all heard about the values affirmation study (.pdf), so let's just give everybody a short essay assignment and erase the achievement gap. It's all so seductive: Simple hack, big results! Small intervention, big improvement! Eat this one food, lose weight! (My belly could serve as This One Piece Of Evidence showing that a kitchen full of fruit doesn't automatically lead to weight loss.) Indeed, Jussim cites a good follow-up article showing how the social zeitgeist was thirsty for the results of the Pygmalion study, and people were willing to overlook the flaws.
Well, first of all, follow-up efforts almost always show that the original dramatic result is not the whole story, and not everybody gets the same effect. If your original study is done in, say, the environment of a particularly progressive academic department, where everyone believes that doing particular tricks will work, even if you really do blind the participants to the significance of that essay assignment or putting that question at the end vs. the beginning of the test, a lot of students may just generally be primed to respond to those things. Put them in a warm, fuzzy environment, and maybe a warm, fuzzy writing assignment is meaningful to them. This is why replication is so important: It doesn't matter if you have hundreds of students in your study. If the control and experimental groups are both taught in an otherwise identical environment by the same person (which they should be) then you may have a large sample for the purpose of asking "Was this effect significant for this experiment?" but you have n=1 for the purpose of asking "Is this intervention robust and useful for the typical instructor?" That's not a criticism of the study, that's just a reinforcement of the basic scientific principle that replication is everything.
Also, a lot of these studies get discussed in the context of racial and gender inequities, where there are centuries or millenia of cultural baggage attached to both the students and their teachers and the entire environment surrounding them. I'm sorry, but I simply don't believe that simple little hacks can make giant dents in these problems outside of rarified environments. I don't care how big your sample of students is, until you sample a wide variety of environments, beyond just instructors who are already hyper-sensitized, I won't believe your claim that the effects of millenia of male dominance in Western culture and centuries of racial inequality in the US can be meaningfully blunted in the classroom by reordering a few questions and assigning a short essay or whatever. Call me anti-science if you will, but that's my prior.
Mind you, I don't really object to moving the demographic questions to the end of the test, and if you want to assign an essay on values affirmation then knock yourself out. These interventions are cheap and at the very least harmless, but I am unconvinced that simply knowing and following the recommendations of This One Study can make a huge dent in timeless problems. It's too good to be true. Next you'll be telling me that real estate prices never fall.
Speaking of real estate prices, self-fulfilling prophecies are definitely a real phenomenon in social science. Economists and marketers alike know that with enough hype the price of a good can indeed rise, at least for a while. (And if you're smart enough, do you care if it later drops? After all, you already cashed in! You did cash in your chips, right? Um, right?) But even hype tends to be expensive and unpredictable. Not every ad campaign succeeds, but when they do they often cost a lot of money. An expensive ad campaign is, like, work and stuff, whereas just doing This One Thing is easy.