OK, kids, gather around and learn some statistics. Suppose that you're well-meaning and conscientious and want to make sure that colleges are doing the best possible job in admitting students and selecting them only on the basis of meaningful variables, not meaningless garbage input. So you recommend that every college--EVERY!!!--do a study of whether its admissions criteria actually predict success, and particularly whether standardized tests predict success on their campus.
There are two problems with this. One is actually an ethical problem that gets to the heart of equity concerns: Why should predictors of success be the only factors in determining who gets an opportunity to study at a college? You might argue that predictors of success are pretty important in justifying that investment of time, money, and other resources, or you might argue that giving a chance to people who are less likely to succeed is a justifiable endeavor. We can't decide between those propositions empirically. We can use empirical studies to determine if a test, or a GPA, or an essay, or an interview, or whatever "is" a statistically sound predictor of success, but we can't use empirical studies to decide if opportunities "ought" to be extended only on the basis of statistically likely success. That "is"/"ought" distinction is something that philosophers have long discussed. One could determine that a given test or whatever "is" a sound predictor but a student "ought" to be admitted in spite of a low score (or whatever) because of a commitment to opportunity. Whether you agree or disagree with that action probably depends on just how weak the student is, but resolving that conundrum is ultimately a value judgment.
But there's a second problem: Let's say that you've decided that you care about the statistical power of different predictors of success. Your "ought" questions are resolved, but answering "is" questions is tricky. If a student gets into your college with low grades, or low test scores, or weak extracurricular accomplishments, or weak letters of recommendation, or whatever it may be, then they're probably either strong by some other measure or else they're from a rich family and your development office has a lot of internal political clout. The second issue is beyond our scope here, but the first issue is a big one. If many of the students with low test scores are strong by some other measure, and many of the students with high test scores may not be as strong by some other measure, then it's hard to do "apples to apples" comparisons. If one group does better than another you won't know if it's because of how they differ on one measure or how they differ on another measure. If both groups perform similarly you won't know if the differences between them are actually meaningless, or if the differences simply cancel out. If the differences are meaningless then you shouldn't look at test scores (or whatever) at all. If the differences cancel out then you should absolutely look at test scores, but also look at other variables.
Now, some of you may be thinking "Wait, why is it that you're assuming people will only be strong by one measure but not another?" I'm not assuming that everyone fits that dichotomy, I'm just assuming that the people who are weakest by one measure could only get in by compensating in some way, while people who are average or stronger by that measure have more latitude on the other criteria. Of course there will be people who are strong by multiple measures, and looking at them will only tell you that being talented and accomplished and well-prepared by multiple measures is a good thing (but we already knew that). But there's a good chance that the people who are strongest by multiple measures will be poorly-represented at your school because they got into a more prestigious place. So the range of students that you can observe at most schools will be limited to a narrow band composed mostly (but not exclusively) of people who are decent by multiple measures but not amazing and people who are strong on one measure and weak on another. The result is a phenomenon called "range restriction" in the statistical literature.
There are ways to attempt to correct for range restriction, but those techniques work best if you have a decent understanding of the relationship between your sample and a wider pool. The best thing is to look at a wider pool that includes not only your students but also the people who were worse (or better) than most of your students and went elsewhere. This is why it's best if each school not attempt its own DIY social science, but instead people look at the wider literature and larger studies.