I'm in the middle of Chapter 3 of Jussim's book. Right now he is covering the Pygmalion study by Rosenthal and Jacobson (1968), which found that if teachers were told that student 1 had strong potential to improve and student 2 had less potential to improve then there would be a difference in improvement, even if the descriptions were attached to the students at random, without any evidentiary basis. Jussim has his critiques of the study and its limitations, as well as the complexity in follow-up studies, and I am not really qualified to sort through the detailed strengths and weaknesses of a myriad of psychology experiments and meta-analyses. Instead, I want to muse on the attraction of theses studies in the wider academic community outside of the psychology department.
First, let me note that it is indeed attractive to academics. When I read his description of the study it vaguely rang a bell in my mind, but I could easily be confusing it with some other study that got a lot of attention. My wife, however, had definitely heard of it when she was studying for her elementary school teaching credential. Indeed, given the potential implications of a study like this for how people teach, one would hope that prospective teachers do learn about this work, provided that it is reliable.
However, Jussim notes that the follow-up work painted a more complicated picture than the original study. This should not be surprising to anybody who has ever engaged in any sort of scholarly activity. Academic research is complicated enough, and real-world phenomena even more so. The maze of studies that did and didn't replicate at varying levels of statistical significance and under varying conditions tells me that this is a complicated phenomenon of human behavior, a question that is worth getting to the bottom of. As a non-specialist, I want to read the final review article on efforts to sort ouf this phenomenon, not the first high-profile study. I know that in my own line of work, on optics in biology, the first impressive deployment of a new imaging technique always relies on a particular apparatus looking at a particular specimen, and achieving the promised benefits in a robust manner always takes a lot of hard work by a lot of people in a lot of different specialties, sometimes collaborating and sometimes competing to push each other harder.
Of course, nobody wants to hear "First there was a promising idea, then hundreds of people worked for years to make it work reliably." Especially not for human interactions, where we tell ourselves that since we don't need expensive equipment it really ought to be simple and reduce to some short script, so just tell us what to do and make it easy. We all know about the original stereotype threat study, so let's just move some questions to the end of the test and change the description at the beginning of the test. We know about the study of Rosenthal and Jacobson in the 1960's, so let's just get teachers psyched up to believe that all of their students will improve, and BOOM! Everybody learns more! (And the people who are most receptive to This One Study seem to be the ones who most enjoy the "Let's get the audience fired up!" part of a workshop.) We all heard about the values affirmation study (.pdf), so let's just give everybody a short essay assignment and erase the achievement gap. It's all so seductive: Simple hack, big results! Small intervention, big improvement! Eat this one food, lose weight! (My belly could serve as This One Piece Of Evidence showing that a kitchen full of fruit doesn't automatically lead to weight loss.) Indeed, Jussim cites a good follow-up article showing how the social zeitgeist was thirsty for the results of the Pygmalion study, and people were willing to overlook the flaws.
Well, first of all, follow-up efforts almost always show that the original dramatic result is not the whole story, and not everybody gets the same effect. If your original study is done in, say, the environment of a particularly progressive academic department, where everyone believes that doing particular tricks will work, even if you really do blind the participants to the significance of that essay assignment or putting that question at the end vs. the beginning of the test, a lot of students may just generally be primed to respond to those things. Put them in a warm, fuzzy environment, and maybe a warm, fuzzy writing assignment is meaningful to them. This is why replication is so important: It doesn't matter if you have hundreds of students in your study. If the control and experimental groups are both taught in an otherwise identical environment by the same person (which they should be) then you may have a large sample for the purpose of asking "Was this effect significant for this experiment?" but you have n=1 for the purpose of asking "Is this intervention robust and useful for the typical instructor?" That's not a criticism of the study, that's just a reinforcement of the basic scientific principle that replication is everything.
Also, a lot of these studies get discussed in the context of racial and gender inequities, where there are centuries or millenia of cultural baggage attached to both the students and their teachers and the entire environment surrounding them. I'm sorry, but I simply don't believe that simple little hacks can make giant dents in these problems outside of rarified environments. I don't care how big your sample of students is, until you sample a wide variety of environments, beyond just instructors who are already hyper-sensitized, I won't believe your claim that the effects of millenia of male dominance in Western culture and centuries of racial inequality in the US can be meaningfully blunted in the classroom by reordering a few questions and assigning a short essay or whatever. Call me anti-science if you will, but that's my prior.
Mind you, I don't really object to moving the demographic questions to the end of the test, and if you want to assign an essay on values affirmation then knock yourself out. These interventions are cheap and at the very least harmless, but I am unconvinced that simply knowing and following the recommendations of This One Study can make a huge dent in timeless problems. It's too good to be true. Next you'll be telling me that real estate prices never fall.
Speaking of real estate prices, self-fulfilling prophecies are definitely a real phenomenon in social science. Economists and marketers alike know that with enough hype the price of a good can indeed rise, at least for a while. (And if you're smart enough, do you care if it later drops? After all, you already cashed in! You did cash in your chips, right? Um, right?) But even hype tends to be expensive and unpredictable. Not every ad campaign succeeds, but when they do they often cost a lot of money. An expensive ad campaign is, like, work and stuff, whereas just doing This One Thing is easy.
Saturday, May 30, 2015
Wednesday, May 27, 2015
Next book: Social Perception and Social Reality by Lee Jussim
The next book that I'll be blogging is Social Perception and Social Reality by Lee Jussim. The thing that's most attractive to me about the book is that it is contrarian, and something of a corrective to the pop psychology that gets the most circulation in educated circles. Spend any significant time among academics and you'll hear about famous studies of bias (not just about socially and morally significant topics like race and gender, also about prior assumptions on smaller matters), fooling yourself, false perceptions, etc. Academics collect tidbits about bias in the same way that geeks collect Star Wars paraphernalia. All of these things are real, all of them need to be kept in mind, and having the humility to recognize that you don't always perceive accurately is an important thing for personal growth. Moreover, from a scholarly perspective, finding the mismatches between perspective and reality is surely essential to understanding the human mind (one of the fundamental goals of psychological science).
Jussim's point is that focusing solely on misperception and ignoring accurate perception would be like medical researchers who only study disease and never study how a healthy body works. It isn't good for the science, and it wouldn't be good for society if all we ever heard reported from the medical journals was "There are millions of different ways that you can get sick" and never "Exercise is healthy."
My own annoyance is that the narrative on bias, misperception, etc. is almost a bias of its own. What gets the most circulation in nerdy, educated circles is umpteen different studies on how wrong you are, followed by a suggested "life hack" or quick fix to correct it. My prior is that if we are prone to fooling ourselves, and if bias and delusion are everywhere, surely the embrace of quick fixes is itself a delusion. We probably get a lot of things right, and I suspect that fixing the things that we often get wrong is a bit harder than just doing This One Thing From An Article.
Anyway, enough of my rant. Let's dive into the book.
Jussim's point is that focusing solely on misperception and ignoring accurate perception would be like medical researchers who only study disease and never study how a healthy body works. It isn't good for the science, and it wouldn't be good for society if all we ever heard reported from the medical journals was "There are millions of different ways that you can get sick" and never "Exercise is healthy."
My own annoyance is that the narrative on bias, misperception, etc. is almost a bias of its own. What gets the most circulation in nerdy, educated circles is umpteen different studies on how wrong you are, followed by a suggested "life hack" or quick fix to correct it. My prior is that if we are prone to fooling ourselves, and if bias and delusion are everywhere, surely the embrace of quick fixes is itself a delusion. We probably get a lot of things right, and I suspect that fixing the things that we often get wrong is a bit harder than just doing This One Thing From An Article.
Anyway, enough of my rant. Let's dive into the book.
Tuesday, May 26, 2015
Science vs. Engineering, "This one study" vs. Practice
Any reader of this blog (assuming that I actually have readers...) can tell that I'm a curmudgeon who is skeptical of reports of Great New Ways Of Teaching. My default is that teaching is largely a work of timeless issues and problems, and there are no easy substitutes for lots of time and attention and study and patient attempts to solve a problem many different ways, to understand something from many different perspectives. I believe that the acquisition of skills and knowledge and insights is difficult, time-consuming, and painful, while also having sweet rewards and attractions that keep us coming back for more. The difficulty of finding genuinely new ideas, genuine innovations, is not a new problem.
This tends to get me into a lot of arguments at lunch with colleagues. One genuinely good thing about the people whom I tend to argue with is that they are optimistic and open-minded, and since I believe in balance I believe that it is important to have their perspective around. I'm right and they're wrong and we thereby get balance :) More seriously, my reason for dissenting from their optimism and open-mindedness, even while understanding its virtues, is that I think many of our disagreements ultimately come down to the difference between science and engineering. (Bear with me for a few paragraphs before I get back to the science vs. engineering point.)
A typical lunch argument will involve somebody enthusiastically noting some recent study whose results they hope to incorporate into their own practice of teaching or mentoring in the near future (though we have also argued over studies of things like diets and lifestyle, perennial topics of news reports that begin with "According to a new study..."). I am generally skeptical that going from a small study to large adoption will yield the promised improvements.Part Most of my skepticism is possibly obviously rooted in my own personality and prejudices, but there's more to it. Studies of new techniques for solving some problem in human interaction are almost always either small-scale experiments undertaken by teams putting a lot of effort attention into it (notoriously vulnerable to the Hawthorne Effect), or else studies of cohorts that are larger but composed primarily of early adopters, who will tend to adopt things differently than the wider population. When you are talking about teaching, where the human interaction is everything, the difference between an early adopter and a more cautious person is crucial to implementation.
One famous example from physics is the 1999 Hake Study (journal link) (free .pdf) which compared 62 different introductory physics courses and found that the more interactive courses tended to show better improvements on the Force Concepts Inventory (FCI). That basic finding has stood the test of time, and I will note that I incorporate many interactive elements into many of my courses. However, the study also found gains as high as 60% or better in some interactive courses (on a scale that is described in the study; read it for more details). My understanding from talking to people in the field of physics education research is that those gains are not terribly common today, and anything above 40% is really good and even above 30% is decent. My hypothesis for the difference is that if you were using interactive teaching techniques in the 1990's you were an enthusiastic early adopter, and maybe even willing to plow ahead in the face of resistance. If you are using interactive teaching techniques today you are following peer pressure, doing what is expected of you (in many but not all institutions) and not necessarily an enthusiast. If the human element matters in the classroom, then you either restrict your hiring to early adopter types (assuming you can find enough of them) or you look for principles and practices that have a more essential, timeless element to them, fundamentals that will work even if the instructor is not possessed of an early adopter mindset.
What does this have to do with science vs. engineering? The purpose of a good scientific study is to generate knowledge and understanding. The purpose of engineering is to translate ideas into practice. If I went to my lunch group and said "I just read this one study and the researchers made a photovoltaic with [insert great performance numbers here]! Why don't companies just make that and sell it?" people would look at me funny. If I went to my colleagues and said "I just read this one study and the researchers successfully treated tumors in preliminary trials, so why don't all doctors just do that?" they would wonder if I have a brain tumor. The road from a scientific study showing proof of concept to something that can be adopted and work in widespread practice is a long one. In the hypotheticals that I outlined, an engineer who said "Well, it's just a problem of consumer attitudes, we don't need to examine whether this [product, practice, drug, whatever] is robust" would quickly find himself at the unemployment office. Making something robust is the essence of engineering. However, if you look at This One Study and wonder if it will scale well outside of special environments or early adopter groups or whatever, people say that you have the wrong attitude.
I have a deep respect for social science. I respect it enough to understand that the road from genuine scholarly insight to human practice in the field can be just as long as the road from laboratory physics innovation to product in your home.
This tends to get me into a lot of arguments at lunch with colleagues. One genuinely good thing about the people whom I tend to argue with is that they are optimistic and open-minded, and since I believe in balance I believe that it is important to have their perspective around. I'm right and they're wrong and we thereby get balance :) More seriously, my reason for dissenting from their optimism and open-mindedness, even while understanding its virtues, is that I think many of our disagreements ultimately come down to the difference between science and engineering. (Bear with me for a few paragraphs before I get back to the science vs. engineering point.)
A typical lunch argument will involve somebody enthusiastically noting some recent study whose results they hope to incorporate into their own practice of teaching or mentoring in the near future (though we have also argued over studies of things like diets and lifestyle, perennial topics of news reports that begin with "According to a new study..."). I am generally skeptical that going from a small study to large adoption will yield the promised improvements.
One famous example from physics is the 1999 Hake Study (journal link) (free .pdf) which compared 62 different introductory physics courses and found that the more interactive courses tended to show better improvements on the Force Concepts Inventory (FCI). That basic finding has stood the test of time, and I will note that I incorporate many interactive elements into many of my courses. However, the study also found gains as high as 60% or better in some interactive courses (on a scale that is described in the study; read it for more details). My understanding from talking to people in the field of physics education research is that those gains are not terribly common today, and anything above 40% is really good and even above 30% is decent. My hypothesis for the difference is that if you were using interactive teaching techniques in the 1990's you were an enthusiastic early adopter, and maybe even willing to plow ahead in the face of resistance. If you are using interactive teaching techniques today you are following peer pressure, doing what is expected of you (in many but not all institutions) and not necessarily an enthusiast. If the human element matters in the classroom, then you either restrict your hiring to early adopter types (assuming you can find enough of them) or you look for principles and practices that have a more essential, timeless element to them, fundamentals that will work even if the instructor is not possessed of an early adopter mindset.
What does this have to do with science vs. engineering? The purpose of a good scientific study is to generate knowledge and understanding. The purpose of engineering is to translate ideas into practice. If I went to my lunch group and said "I just read this one study and the researchers made a photovoltaic with [insert great performance numbers here]! Why don't companies just make that and sell it?" people would look at me funny. If I went to my colleagues and said "I just read this one study and the researchers successfully treated tumors in preliminary trials, so why don't all doctors just do that?" they would wonder if I have a brain tumor. The road from a scientific study showing proof of concept to something that can be adopted and work in widespread practice is a long one. In the hypotheticals that I outlined, an engineer who said "Well, it's just a problem of consumer attitudes, we don't need to examine whether this [product, practice, drug, whatever] is robust" would quickly find himself at the unemployment office. Making something robust is the essence of engineering. However, if you look at This One Study and wonder if it will scale well outside of special environments or early adopter groups or whatever, people say that you have the wrong attitude.
I have a deep respect for social science. I respect it enough to understand that the road from genuine scholarly insight to human practice in the field can be just as long as the road from laboratory physics innovation to product in your home.
Subscribe to:
Posts (Atom)