Physicist at Large: 4/1/18

Saturday, April 7, 2018

Judas Iscariot, Administrator

Last weekend I saw Jesus Christ Superstar for the first time. I really, really liked it. But then, as I was singing Judas' big musical number "Superstar" to myself, I realized that the lyrics can be interpreted as the words of a university administrator. To wit:

Every time I look at you I don't understand,
Why you let the things you did get so out of hand,
You'd have managed better if you'd had it planned,

Translation: "You need to engage in strategic planning."

Why'd you choose such a backwards time and such a strange land?

Academics love to talk about how horrifying it would be to live in a small town or non-coastal city.

If you'd come today you could have reached a whole nation,
Israel in 4 B.C. had no mass communication.

Translation: "We could solve all of our problems if we jumped onto the latest digital fads and offered more online classes."

Don't you get me wrong! Don't you get me wrong!
Don't you get me wrong! Don't you get me wrong!
I only wanna know! I only wanna know!
I only wanna know! I only wanna know!

Translation: "Please submit reports. We need to know that you are spending your time on teaching, not wasting your time on irrelevant busywork."

Jesus Christ, Jesus Christ,
Who are you? What have you sacrificed?

Translation: "In this time of tightened budgets (for everything except ballooning administrative ranks), we need to see evidence of shared sacrifice. Are you doing more with less? Can you submit a report on that?"

Jesus Christ, Superstar,
Do you think you're what they say you are?

Translation: "We've reviewed your student evaluations. Please submit a self-reflection in response, so that we can put it in your file."

Tell me what you think about your friends at the top,
Who'd you think (besides yourself) was the pick of the crop?

Translation: "These performance reviews needed to be done yesterday. We'll need you to work on them. Please fill out the rubrics I'm attaching."

Buddha: Was he where it's at? Is he where you are?
Could Mohammed move a mountain? Or was that just PR?

Translation: "We are pleased to announce our new diversity initiative."

Did you mean to die like that? Was that a mistake?

Translation: "Could it be that your problems are a result of lack of proper strategic planning on your part?"

Or did you know your messy death would be a record breaker?

Translation: "Or could it be that your budgetary problems were a deliberate sabotage that you'd use to demonstrate need when requesting more resources?"

On the other hand, it seems that Judas' thirty pieces of silver were not worth a whole lot, whereas university administrators make far more money than the people who engage in such trivial tasks as teaching and research.

Measuring Success, Chapter 9

This chapter, by Rebecca Zwick (researcher at ETS and professor emerita in the education school at UC Santa Barbara, aka U Can Study Buzzed, aka my beloved alma mater), looks at the tangible outcomes from test-optional and top-percentile admissions. "Top percentile" admissions let in anyone who graduates in the top X% of a public high school class. This has certain obvious virtues (anyone who makes it to the top must be more driven than most around them), it has the potential to increase diversity (the top students in a poor, minority neighborhood get in on the same footing as the top students in a rich, white school), and it does so in a race-blind manner. Anyway, Zwick looks at the data, and a few take-aways:

1) These admissions policies don't seem to hurt graduation rates or college grades much, if at all. This is consistent with the finding in earlier chapters that kids with high grades but low scores (and we can reasonably assume that such kids are common among those who don't report scores or get in because they graduated at the top of a school in a disadvantaged neighborhood) do pretty well.

2) On the other hand, there is still sorting: Students who don't report test scores tend not to major in STEM. That isn't a bad thing, IMHO. If a kid has a great art portfolio then they should go to college and major in art, regardless of what their SAT math score is. OTOH, that kid probably shouldn't do physics if their SAT math score is abysmal. The previous sentence is only disparaging if you place physics on a pedestal that towers over the arts, and I don't place it on such a pedestal.

Anyway, these findings are reassuring, because there's something of a cottage industry in newspaper articles about minority kids who do well in a non-challenging high school but then flounder at a flagship. Such kids surely exist, and definitely deserve some compassionate counseling on alternatives, but they are apparently not a major factor in the big picture, which means they are not a major impediment to diversifying large cohorts.

3) The gains for diversity are nowhere near what people were hoping. When you go test-optional you have to look at resumes and essays and letters, all of which are at least as susceptible to manipulation and response to class and culture as anything on the SAT.

4) In a humorous aside, regarding the legality of affirmative action and alternatives to affirmative action, the author quotes Justice Ruth Bader Ginsburg as saying that "only an ostrich" would perceive top-percent admissions plans as race-neutral. Whether or not the effects actually match the intent, they are designed and scrutinized in a discussion about race, with everyone hoping to achieve a diverse outcome without mandating a diverse outcome. Say what you will for or against such agendas, but I admire Ginsburg's rhetorical flourish.

Thursday, April 5, 2018

Measuring Success, Chapter 8

This is an interesting chapter. Most of it is actually a republished chapter from Crossing the Finish Line, a 2011 book by Bowen, Chingos, and McPherson. They used large data sets from 21 flagships and 4 state-wide systems to look at a very large cohort of students who started college in 1999, and see what predicted success. They found that high school grades were less important than test scores, though test scores still have some predictive value. This makes sense to me: The ability to succeed at sustained tasks is distinct from (though not wholly unrelated to) the ability to do well on a test. Also, a person who isn't strong at the things measured on tests might still find areas where they can succeed, while a person who can't devote themselves to regular academic work will have trouble succeeding at anything, even if they have certain mental traits.

After that portion of the chapter, some researchers at the College Board re-run that analysis with more recent data, and find similar trends, though the predictive power of grades has gone down while the predictive power of tests has gone up. This is consistent with a hypothesis of grade inflation (per an earlier chapter).

Monday, April 2, 2018

Measuring Success, Chapters 4 and 5

Chapter 4 was a case study from one school that used merit scholarships to boost enrollment by good students, and included SAT scores in the mix. They focused more on enrollment than performance after enrollment, so I ignored it.

Chapter 5 is on "discrepant" students. We've all known This One Person who got bad test scores and good grades, and This One Other Person who got good scores and bad grades. The single most important thing any academic can consider in any conversation on this topic is whether a new rule would be fair to This One Person and This One Other Person. But for the bad people (like me) who want to look beyond This One Person, chapter 5 looks at college performance on the large scale, not just the anecdotes. And it appears that the people with better high school grades than scores have outcomes that are almost as good as those of people with similar grades.

On the surface, this would seem to undermine the case for using test scores. However, these people have scores that are SUBSTANTIALLY worse than their high school grades might suggest. If you just look at people with test scores that are within the normal range for people with those grades, differences in performance between people with similar grades but different scores (albeit not outrageously different scores) are indeed correlated with grades.

Interestingly, the people with substantially better grades than scores tend to be women and minorities. Again, at first glance this might seem to lead to a slam-dunk case against tests, but (1) we're talking about the outliers, not the people in the normal band (i.e. there are still plenty of women and minorities whose test scores are not discrepant, and the test scores continue to have predictive power for them) and (2) while women and minorities are more likely to have test scores that are substantially worse than high school grades than the other way around, the situation seems to reverse if we look at college grades for women and minorities in the "normal" (non-discrepant) band of grades and scores. So, complicated things are complicated.

On the other hand, people with poor grades but good scores do somewhat worse than people with similar scores (not surprising), because "smart but lazy" is a thing. (On the other hand, sometimes people with bad scores deliberately take easy classes, and sometimes people with good scores deliberately take hard classes. Complicated things are complicated.) Again, on the surface this would suggest that only grades matter, but we're talking about outliers. If you look at people who are closer to the normal range, differences in scores still have predictive power.

Interestingly, there's some evidence that people with better scores than grades tend to go into harder majors than people with better grades than scores, and this confounds some of the analysis. This is not surprising to me; STEM does have some epically smart but lazy people. (Yes, I'm sure that somewhere out there is a supremely lazy literature major with a perfect SAT score and horrible grades, but those people are more commonly STEM majors.)

My main take-aways are:

1) Test scores matter but they aren't the ONLY things that matter. (Duh.) This point has been agreed on by just about everyone who's ever suggested using test scores for decisions. Yes, I'm sure that somewhere out there is a literal straw man who has suggested eliminating grades from consideration and ONLY looking at scores, but that guy (and you just know it's a guy) is ignored by everyone else.

2) The studies in this chapter are mostly based on small samples and so should be interpreted with caution. We should probably err on the side of rewarding work, while not completely ignoring tests.

Measuring success, chapters 2 and 3

Chapter 2 is a very detailed breakdown of data on college grades (freshman and 4-year) and college completion rates versus SAT score and high school GPA. The message from many analyses is clear: In every band of high school GPAs the SAT has real predictive power for performance in college, by multiple measures.

Of course, one cannot discuss this without discussing equity, so they make the same point as the previous chapter: The SAT actually over-predicts college performance for minorities (because performance is also affected by disadvantages that are NOT fully captured in scores). More importantly, they cite a different study than the previous chapter cited. This gives me somewhat greater confidence about the point, when multiple investigators can cite a plethora of studies rather than everyone rallying around the same study. We should always be suspicious of narratives built around This One Study.

Chapter 3 has one big point: Grade inflation is real, and it's more prominent at schools serving affluent and white kids than schools serving poor kids from disadvantaged minority groups. Consequently, if you base admissions decisions on grades rather than test scores you won't actually accomplish your equity goals. Their main evidence for inflation is that they look at how high school grades have trended upward (in many but not all schools) for students in comparable bands of SAT scores. Since the College Board does a lot of work to try to make SAT scores comparable across years, this strongly suggests that grading is getting more lenient (in some but not all schools). And the schools at which grading is getting more lenient (based on this line of analysis) are whiter and more affluent than the schools with less grade inflation.

Not having examined the data myself, I am obviously not in a position to weigh in on the validity of this work, but if other authors have found similar things in multiple independent analyses then we should seriously consider the implications.

Sunday, April 1, 2018

Next book: Measuring Success

The next book that I'll blog about is Measuring Success, an edited volume with 11 chapters, 3 authors, and 26 contributors. The book is about the predictive validity of standardized tests in college admissions. This book poses something of a dilemma for me. On the one hand, it is rich in data and citations to the peer-reviewed literature. On the other hand, the editors include Lynn Letukas and Ben Wildavsky from the College Board (the organization that produces the SAT) and someone from a research center that includes the College Board on its client list.

I think the proper response is to take this with a big grain of salt. Nothing should be taken as a priori truth, but can be taken with a suitable dose of "Assuming that similar results are found in multiple, independent investigations..."

The first chapter is probably the least rigorous, simply because it's an introductory chapter titled "Eight Myths about Standardized Admissions Testing", hence several points are touched on briefly (though many of the same themes are visited in more depth in subsequent chapters). The putative myths are:

1) Standardized tests are very poor predictors of freshman grades: The authors concede that weak correlations are found if you look only at the set of admitted students, but if you correct for the fact that the proper comparison includes the students who were not admitted but nonetheless went to college elsewhere (or were admitted but chose other schools) the correlations improve. This is a point that I've made before and with quantitative detail. Moreover, students with weaker preparation often choose different majors than students with stronger preparation, so their grades might not be comparable. But the authors include data showing that when you look at a wider pool and control for common college curricula and also high school GPA, the correlation between SAT score and college grades improves considerably, reaching 0.8 (versus 0.35 in poorly-controlled studies).

2) Tests do not predict anything beyond grades: The authors show data indicating that students with higher test scores take more advanced college classes than those with lower scores, and enter majors that reflect their higher test scores (e.g. verbal versus math).

3) Alternatives to testing are superior: The authors reference work on various alternative measures, and show that often the sample sizes are small and correlations are weak. Moreover, even if there are superior sources of data for admissions decisions, I'm not sure why one would ONLY use one source of information. Why not build a multi-variate model? And since tests are not as vulnerable to the subjective biases of raters (e.g. interviewers, letter writers, essay readers) the claim of superiority over tests seems to be an extraordinary one, requiring extraordinary evidence. (Or, at a minimum, a very careful articulation of what could count as "superiority"--is lack of bias not one of the desiderata?)

4) Tests do not measure anything relevant to life after college: Here the authors cite correlations between test scores and quality of graduate work as evaluated by faculty (i.e. evaluations of the quality and quantity of research output) as well as work performance after graduate study. However, this is a weak point, because it is focused on the Miller Analogies Test as an admissions test for students preparing to work as counselors, rather than more widely-used tests for undergraduate admissions (e.g. SAT, ACT) or graduate admissions (the GRE is the main game in town here).

Still, the authors are psychologists, so counseling programs would seem to be near to their hearts. I'll give them one fumble here.

5) Beyond a threshold tests have little predictive power: In other words, this is an argument that above a threshold you can't use tests to distinguish decent from great performance. And, of course, it is true that tests can't do that with perfect predictive power. However, the authors cite evidence from large studies (6656 students in one study, 15,040 in another, and 150,000 in the third) that tests have non-trivial predictive power even at the upper end of the talent pool. Intuitively this makes sense: If test scores only had threshold predictive power then the correlations under point 1 would probably not be as large.

6) Tests only measure socioeconomic status: This one was an eye-opener. They show mean SAT score varying from 1300 (on a 2400 scale) for the lowest income bracket (<$20,000/year household income) to 1700 for the highest bracket (more than $200,000/year household income). That variation isn't trivial, but it is also hardly enough to generate the correlations seen earlier, especially when you take into account the wide variation within brackets. More importantly, even when controlling for family income, the predictive power of SAT scores remains quite strong.

7) Test are biased: Here the authors are careful to unpack what "bias" means. If a test is biased against group X and favorable to group Y, then if we take a bunch of students with the same test score and look at their college performance, the X students should do better than the Y students. In other words, such an outcome would tell us that X students are doing better than their score would predict while Y students are doing worse, so admitting based on the test gives X students a disadvantage (they're being treated the same as weaker Y students). However, SAT scores slightly over-predict college grades for minority students. The over-prediction makes sense to me, since disadvantage is multi-faceted, and there are aspects of it that cannot be fully captured by family income. If disadvantage matters and is related to ethnicity then I would expect minority students with a given family income and same academic preparation to fare slightly worse (on average) because they face burdens that otherwise-similar white students do not face.

8) Coaching produces large gains: The authors show data suggesting that gains from test prep and coaching are over-stated. This makes sense to me on a few levels. First, the students who avail themselves of test prep include a substantial pool of students who did poorly on their first try, and regression to the mean is surely a factor here. Second, this pool includes kids who did not make the minimal effort to familiarize themselves with the test beforehand. A bit of effort to get familiar with the task at hand is a modest task, one that does not require expensive tutors, but expensive tutors will nonetheless be happy to collect a fee for helping one with that modest task. There are no doubt gains from minimal due diligence, and gains from re-testing after some coaching may in part reflect that modest due diligence. The open question is whether those gains reflect much beyond that, i.e. reflect things that a kid couldn't do without substantial resources.

Besides, if shilling is a concern, then taking claims about the value of coaching at face value amounts to trusting marketing materials from Kaplan, Princeton Review, etc. That's a dubious thing.

Now, that said, all of the points made here are worthy of follow-up. The subsequent chapters have more in-depth analysis that we need to examine. The first chapter is suggestive motivation but hardly conclusive evidence.

Physicist at Large

Current Reading

Word cloud

Saturday, April 7, 2018

Judas Iscariot, Administrator

Measuring Success, Chapter 9

Thursday, April 5, 2018

Measuring Success, Chapter 8

Monday, April 2, 2018

Measuring Success, Chapters 4 and 5

Measuring success, chapters 2 and 3

Sunday, April 1, 2018

Next book: Measuring Success

Our central theme

About Me

Additional information

All post categories

Blog Archive