NRC: Nonsensical Ranking Clowns

As those of you in American academia have probably heard by now, this week the National Research Council finally released its 2005 rankings of American PhD programs, only five years behind schedule.  This time, the rankings have been made 80% more scientific by the addition of error bars.  Among the startling findings:

  • In electrical and computer engineering, UCLA and Purdue are ahead of Carnegie Mellon.
  • In computer science, UNC Chapel Hill is ahead of the University of Washington.
  • In statistics, Iowa State is ahead of Berkeley.

However, before you base any major decisions on these findings, you should know that a few … irregularities have emerged in the data used to generate them.

  • According to the NRC data set, 0% of graduates of the University of Washington’s Computer Science and Engineering Department had “academic plans” for 2001-2005.  (In reality, 40% of their graduates took faculty positions during that period.)  NRC also reports that UW CSE has 91 faculty (the real number is about 40).  Most of the illusory “faculty,” it turned out, were industrial colleagues who don’t supervise students, and who thereby drastically and artificially brought down the average number of students supervised.  See here and here for more from UW itself.
  • According to the NRC, 0% of MIT electrical engineering faculty engage in interdisciplinary work.  NRC also reports that 24.62% of MIT computer science PhDs found academic employment; the actual number is twice that (49%).
  • The more foreign PhD students a department had, the higher it scored.  This had the strange effect that the top departments were punished for managing to recruit more domestic students, who are the ones in much shorter supply these days.
  • The complicated regression analysis used to generate the scoring formula led to the percentage of female faculty in a given department actually counting against that department’s reputation score (!).

Ever since the NRC data were released from the parallel universe in which they were gathered, bloggers have been having a field day with them—see for example Dave Bacon and Peter Woit, and especially Sariel Har-Peled’s Computer Science Deranker (which ranks CS departments by a combined formula, consisting of 0% the NRC scores and 100% a random permutation of departments).

Yet despite the fact that many MIT departments (for some reason not CS) took a drubbing, I actually heard some of my colleagues defend the rankings, on the following grounds:

  1. A committee of good people put a lot of hard work into generating them.
  2. The NRC is a prestigious body that can’t be dismissed out of hand.
  3. Now that the rankings are out, everyone should just be quiet and deal with them.

But while the Forces of Doofosity usually win, my guess is that they’re going to lose this round.  Deans and department heads—and even the Computing Research Association—have been livid enough about the NRC rankings that they’ve denounced them with unusual candor, and the rankings have already been thoroughly eviscerated elsewhere on the web.

Look: if I really needed to know what (say) the best-regarded PhD programs in computer science were, I could post my question to a site like MathOverflow—and in the half hour before the question was closed for being off-topic, I’d get vastly more reliable answers than the ones the NRC took fifteen years and more than four million dollars to generate.

So the interesting questions here have nothing to do with the “rankings” themselves, and everything to do with the process and organization that produced them.  How does Charlotte Kuh, study director of the NRC’s Assessment of Research Doctorate Programs, defend the study against what now looks like overwhelming evidence of Three-Stooges-level incompetence?  How will the NRC recover from this massive embarrassment, and in what form should it continue to exist?

The NRC, as I had to look up, is an outfit jointly overseen by the National Academy of Sciences (NAS), the National Academy of Engineering (NAE), and the Institute of Medicine (IOM).  Which reminded me of the celebrated story about Richard Feynman resigning his membership in the NAS.  When asked why, Feynman explained that, when he was in high school, there was an “honor club” whose only significant activity was debating who was worthy of joining the honor club. After years in the NAS, he decided it was no different.

Now that I write that, though, an alternative explanation for the hilarious problems with the NRC study occurs to me.  The alternative theory was inspired by this striking sentence from an Inside Higher Ed article:

When one of the reporters on a telephone briefing about the rankings asked Ostriker [the chairman of the NRC project committee] and his fellow panelists if any of them would “defend the rankings,” none did so.

So, were these joke rankings an elaborate ruse by the NRC, meant to discredit the whole idea of a strict linear order on departments and universities?  If so, then I applaud the NRC for its deviousness and ingenuity in performing a much-needed public service.

29 Responses to “NRC: Nonsensical Ranking Clowns”

  1. Jay Gischer Says:

    So many things during the Bush administration got delayed or messed up due to some sort of political meddling, I can’t help but wonder if that’s at work here. And that administration was really unfriendly to science as well.

    Or maybe it’s a funding issue, or some bureaucratic eddy current left over.

  2. Xamuel Says:

    Very Serious People ™ publish study with no bearing on reality, news at 11 😉

  3. Scott Says:

    Xamuel: My first reaction was to ignore it too! But then everyone around me kept bringing it up and discussing it so seriously and respectfully and saying “yes, yes, we understand the issues, but you can’t just say outright that the emperor is naked” and oh my god just make it stop make it go away…

    Which, now that I think about it, seems to be a recurring theme on this blog. 🙂

  4. Mark Reitblatt Says:

    Thank you for posting this. The only appropriate action by any university, college or department (especially those highly ranked in this farce) is to publicly repudiate these rankings and refuse to use them in any capacity. I’m ashamed to say that my graduate school has already put them on its homepage. I’ve already sent an email (snap email, not as well composed as it should have been) to my dean asking the College of Engineering to consider taking a stance against the rankings. Or at least to avoid promoting them.

    The culture of trying to fit everything into a linear ranking is bad enough. We don’t need utterly flawed and stunningly poor numbers running around making everything worse.

  5. Jules Says:

    Deans and department chairs want to kill off any ranking system, because they do not want prospective graduate students to get this data. I hope that they are not successful now, because you can be sure they won’t put anything in its place.

    For example, UW does not report any of the NRC-requested data on its website. If you email them for it, they will not respond. Now they complain that their faculty count is wrong when they themselves inflated the count. Cry me a river.

    As to MIT, I don’t know what “found academic employment” means precisely. What counts is how many PhDs found tenure-track positions. I doubt this is 50%.

  6. Scott Says:

    Jules, I agree that it couldn’t hurt for departments to publish more data, but the idea that UW would purposefully inflate its faculty count from 40 to 91 (when that could only hurt them in the rankings) is silly on its face. Many departments (including mine) independently reported that the instructions from the NRC were complicated and confusing; the simplest explanation is that whoever filled out the form at UW just misunderstood what they were supposed to list.

    But all this misses the larger point: it’s NRC’s responsibility to apply basic sanity checks to the data they’re basing their rankings on—something they manifestly failed to do. Indeed, even when departments pointed out glaring errors to them, they changed the data reluctantly or not at all.

    By analogy, imagine the director of a $20 million cancer study announcing that a new drug under investigation cured cancer in (-50+2i)% of patients. When the audience bursts into laughter, the director exclaims, “but -50+2i is what the computer said! the patients must’ve filled out their forms wrong! it’s not our fault!”

  7. Jules Says:

    It was really just some administrator, and who knows what their motivation was or if it was a mistake? I don’t know how much NRC can do, when UW and most other departments are reluctant to release the data they need.

    Obama has been disappointing in many ways, but are you going to vote for Republicans this November? Right now we don’t have any alternatives to the NRC, imperfect as it is, and its opponents have no interest in constructing anything.

  8. Scott Says:

    Jules: Don’t worry, no plans to vote Republican in November! 🙂 As a registered Democrat, I’m as sensitive as anyone to the “sure it sucks, but what’s the alternative?” argument. However, unlike you, I simply don’t see the problem of creating “definitive rankings” of departments as one that needs solving, even supposing it were solvable (which it isn’t). If you want to know which departments are well-regarded in your area, just ask a decent sampling of students or anyone else who’s active in the field, and they’ll tell you. Or find some discussion boards on the web. Or even US News and World Report would be better than the NRC farce.

    To illustrate, let me answer the question of where the main centers for North American CS theory are, in a way that almost anyone active in the field would mostly agree with.

    The three biggest hubs for “general theory” in North America right now are Berkeley, MIT/Harvard/MSR, and Princeton/IAS/surroundings. Many other places are extremely strong in various parts of theory: for example, Cornell (esp. for science of networks), U. of Washington (+MSR nearby), CMU, Georgia Tech, Chicago/TTI, U. of Toronto, Stanford (esp. now that it has Luca Trevisan), Waterloo and Caltech (esp. for quantum computing), UT Austin, NYU, Columbia, Yale, and no doubt others that I’m forgetting.

    Having said that, if you’re a prospective grad student, the best advice I can give is to place as little weight on overall “reputation” as you can (for all of us, that will of course be a nonzero amount), and to look mainly for the place that has the specific people doing the things that most excite you.

  9. anon Says:

    This part made me giggle:

    “…0% of MIT electrical engineering faculty engage in interdisciplinary work.”

  10. Paul Carpenter Says:

    look mainly for the place that has the specific people doing the things that most excite you.

    Of course, a resource for working that out, would be helpful as well.

  11. Jay Says:

    It is known that the incompetent are unaware that they are incompetent. They would be the most likely to use and trust this list, relieving CMU, MIT, UCB, etc. of the burden of dealing with these wunderkinds. Yah?

    Incompetent study article: http://www.nytimes.com/library/national/science/health/011800hth-behavior-incompetents.html

  12. Anonymous Says:

    The raw data is more important than the rankings, but the rankings give impetus for collecting the data.

    The rankings also have more applications than you or I might imagine. Prospective graduate students can easily figure out the top departments. However, it is harder to get information on schools lower in the rankings, and also on schools outside your expertise and network. If I want to get a handle on the approximate quality of various physics departments that my niece is looking at, I wouldn’t know where to look.

    Despite its missteps, the NRC fills a necessary role.

  13. Michael Mitzenmacher Says:

    Hi Scott. I agree with your opinion on the rankings, and have been surprised by how much people have been talking about them in a way that suggests that, even though they know they’re bogus, they care. Since I’m now “Area Dean” I realize I’m supposed to care about these things, but it’s hard to work up the energy when everyone knows the methodology (at least in terms of the data-gathering) was fundamentally flawed. I’d rather worry about how to make CS at Harvard actually concretely better than about what this set of ratings says.

  14. Scott Says:

    Hi Michael, I accept that people (prospective grad students, for example) can have a legitimate use for reliable information about the strengths and weaknesses of various departments. However, I think far more useful than the (quasi-)linear rankings of NRC and US News would be more “textured” information, like the following:

    1. Waterloo is arguably the most exciting place on earth for quantum information science (and is also fantastic in computational geometry, applied cryptography, and several other areas), but has essentially nothing in mainstream complexity theory (extractors, derandomization, etc.).

    2. Harvard’s theory group might be small, but it has some amazing powerhouses—like Valiant, Vadhan, and especially Mitzenmacher—and also benefits from being in one of the “theory hubs” of North America.

    I’d also add that, if you do want to play the parlor game of ranking departments, then at least don’t do it the NRC way, by spending years of labor and 4 million dollars on a labyrinthine regression analysis of garbage data.

  15. Michael Mitzenmacher Says:

    Totally agree Scott. With point 1, clearly a problem with “rankings” is that they tend to obscure textured information of the type you’re talking about. I don’t blame rating agencies (of any type) for not including such detailed textured information, but would hope to make clear to any student that that’s exactly the sort of information they should be finding out (and that individual universities should be aiming to provide).

    With point 2, obviously I agree with your assessment of our theory group. 🙂

    I wonder Sariel can get a grant for his ranking system? I’d bet he keep it updated continuously for the next n years for a mere half million or so.

  16. Raoul Ohio Says:

    This is a particularly ludicrous page in the steady march of what you might call “bean counters demanding that the unmeasurable be measured”.

    It is surely futile to fight, but it sure is fun to give it a shot once in a while. I fondly recall the first time, maybe around 1980, that I had to fill out a form listing how many hours I spent on each activity each week for a university administrator. I put down one hour/week for something like “filling out dumb forms for moron administrators”. My chairman subsequently requested that I not do that again, but remarked that he wished he had also said it.

  17. anonymous Says:

    One clear advantage of going to a highly ranked department is the quality of the other graduate students. You’ll both learn a lot from them, and they’ll be a source of academic contacts down the road.

  18. John Sidles Says:

    “Rage against Doofosity” is of course a generic good idea; “Take action against Doofosity” is an even better idea … but very difficult to put into practice (obviously).

    A strikingly effective collective action against doofosity is evident in the journal Post-Autistic Economics Review — recently renamed Real-World Economics Review — which a web search will easily find.

    It’s great fun — and a well-posed intellectual challenge too — to contemplate the on-line historical essay “A Brief History of the Post-Autistic Economics Movement” (www.paecon.net) with a view toward imaging how a post-autistic TCS movement might arise, and in particular, imagining what kinds of research a post-autistic journal Real-World Computer Science Review might publish.

    Because this exercise demands considerable mental elasticity, it is helpful to warm-up by supposing that the NRC and Shtetl Optimized (henceforth SI) have jointly articulated a Great Truth in asserting “The three biggest hubs for ‘general theory’ in North America right now are Berkeley, MIT/ Harvard/ MSR, and Princeton/ IAS/ surroundings.” Here the assertion is from SI; the NRC analysis broadly supports it; and despite squabbles over minor points, it is evident that SI and the NRC broadly agree.

    Now, if the SI/NRC “three hub” assertion is a Great Truth, its opposite must also be a Great Truth, and we are therefore led to consider the merits (if any) of ranking the SI/NRC’s “three hubs” as being more harmful than any other institutions to the practice of TCS.

    One rationale for a lowest-possible “three hub” ranking is prima facie strong: the STEM enterprise in North America (including TCS) has been broadly stagnant for two generations; and the elite institutions which lead this enterprise must bear responsibility for its stagnation. And from this starting point, it is straightforward to repurpose the tropes of post-autistic economics to post-autistic TCS.

    My own view is that in these matters it is highly desirable that everyone not think alike. If we suppose that some folks shall respect the mainstream of neoclassical economics and/or the mainstream of TCS — as defined de facto by the brand of TCS practiced at the SI/NRC’s “three hubs” — and yet other folks regard this same mainstream TCS as the hidebound worship of dogmatized Shibboleths by academic Boeotians … well … as Mark Twain said … “It is difference of opinion that makes horse races.” And surely, everyone enjoys a horse race!

    So perhaps it would be a good thing if TCS/QIT had more horse races — aka, less worship of Shibboleths by Boeotians? — following the lead of the economists in this regard.

  19. anon Says:

    Some divergence from NRC ranking, take a look at what Chomsky said about Tea Party
    http://www.onpointradio.org/2010/09/noam-chomsky-america

  20. Mathias Says:

    Whether inflating the faculty count from 40 to 91 hurts a university depends on the ranking methodology. One could imagine certain factors such as the student/faculty ratio that would positively contribute to the ranking with an inflated faculty count.

    In any case, I have to agree with Jules, unless the departments start publishing their raw data there’s no alternative to rankings. If they, however, made the data publicly available, there would be amazing possibilities for integration and usage.

  21. Raoul Ohio Says:

    Mathias,

    The problem is most of the data is soft for many reasons. In the physical sciences, math, and computing, you can usually get solid data that is well defined and the uncertainty estimated. Thus you can do lots of analysis can make some reasonable conclusions.

    In the soft sciences, the data is poorly defined, nobody agrees how to measure it, everyone can bend it, etc. And, the practitioners tend to be poor at math. This is why so many bizarre conclusions are reached.

    As an example of data for a hot topic, big money issue, how would you measure teacher effectiveness in the public schools? Fair, accurate, correct? If you think you can do it, check and see if you have joined the Tea party.

  22. L Says:

    The data is soft, definitely, but it seems that it has to be. I mean, what exactly are you measuring when you measure the quality of a university? The quality of the students? The quality of the teaching? The quality of the research? Some random permutation of the above coupled with a bunch of miscellaneous factors seems to be what they’re trying to calculate. Opinions would be welcome (possibly from mathematicians, or philosophers) on how one could feasibly rank the vast variety of departments across the US,

    I say this from one of the four departments across the country that achieved a perfect 1-1 score; none of the current ranking systems is in any way authoritative, and all just provide a vague idea of which schools are ok. Presumably the data can be corrected (it will be, won’t it?), but the root of the problem lies in the methodology (and I’m not even going to ask how it took them 5 years to do what should have taken a few months).

    Scott, there has to be some way to build an online resource of the kind you mentioned; I mean, a lot of the data is online (papers published, professors employed) and presumably a look at a) the researchers going into Perimeter (and their histories and reputations), b) the work coming out of there, and some more factors could give you a fair idea of the quality of the QIS research there.

    Anyway, I agree that overall rankings are rather pointless unless you’re an undergrad just looking for a big-name school; the departmental rankings are somewhat better, but a more nuanced approach would be ideal (but still not perfect).

  23. Mathias Says:

    Let me try to suggest some data that might be useful and hard:

    – Faculty count
    – Student count
    – Fraction of students getting a job in academia (distinguish between teaching and research universities) and industry
    – Starting salary of graduates
    – External funding (distinguish between NSF, NIH, NEH, industry)
    – Number of RAs, TAs, stipend holders among the PhD students
    – Average salary for RAs, TAs
    – Number of fields medallists, Turing award recipients, etc. during the last x years
    – Number of graduate student offices in the basement
    – Number of graduate student offices with windows
    – Average number of desks per graduate office
    – Wireless coverage on campus
    – etc…

    Faculty names could be linked to publication repositories to compute the number of publications, citations, H-measures; the universities’ locations to cost of living data, and so on…

    The point here is that *people* could use the data and build their own applications on top of it. Prospective students would be able to weigh factors higher that matter most to them. Usually, that works much better than assigning these tasks to bureaucratic committees more worried about politics and status than anything.

  24. blank Says:

    http://www.phds.org offers ranking based on user specified criteria. i have no idea where they get the data.

  25. L Says:

    http://www.phds.org gets the data from the NRC rankings, so it’s exactly as flawed as they are.

    In general, I like the idea of user specified criteria, but I’d be wary of trying to sum the data in any meaningful way. In particular, a lot of what makes graduate school successful (or not) is the student’s dynamics with his advisor/group; whether he has enough freedom (or enough direction), and in general can collaborate well (I realize the specifics will depend on the subject). I don’t really see how that could be determined ahead of time.

    Mathias, that data looks good (although the stipends would have to be weighted by location; it’s a lot more expensive to live in NY than Texas), and I suppose the point is that the various values could be weighted based on the user’s own preference to a much higher degree of specialization than is currently offered.

    As long as you’re weighing factors that students value, though, you might as well add stuff like proximity to the beach, proximity to a big city, range of sports teams ….. I suppose that the point of the inclusion of a parameter like “diversity” in the NRC ranking was not that all students value living in a colour-blind campus environment, but rather that a multi-ethnic campus says something about the reputation of the university worldwide.

    Anyway, this is a great idea; build a layer of regularly updated data, and allow people to search and customize.

    Incidentally, this shouldn’t be something geared entirely towards grad students; postdocs and faculty are in need of a similar resource, although I suppose by that point you’ve probably developed a good understanding of where’s where in your field ….

  26. Raoul Ohio Says:

    Most of these criteria are soft, e.g., is “range of sports teams” a good or bad thing? How about location in a big city — half the people love it, half hate it. Even ratings based on things like publication counts can be gamed.

    Ranking universities is a lot like ranking religions.

  27. L Says:

    That’s the point of customization, I suppose; for some people, location in a big city is good, and they should be able to personalize their rankings to reflect that. Similarly with some of the “hard” rankings; for some people a high fraction of students going into industry would be ideal, and others would prefer the opposite.

    Another important criterion, especially for the hard sciences, should be related to the equipment accessible. Some research simply can’t be done without the highest-end machines .. MBE’s, SEM’s, LHC’s etc …

    Publication counts could be gamed, certainly, but the advantage of an online dataset is that everyone has to play with the same rulebook; in the NRC rankings there were big problems with the definition of “faculty”.

  28. Raoul Ohio Says:

    L,

    I agree that what you are suggesting would be a good and useful thing, although it would be really hard to develop the software and automate collection of data.

    My point is that it can never be definitive and produce a linear order of what university is best.

  29. Hopefully Anonymous Says:

    Jules, Scott (Prof. Aaronson), has Obama really been “disappointing in many ways”?

    Because he has actually surpassed my expectations. Part of the gap is that his administrative talent circle seems much smarter to me than the folks left on the outside to critique their performance.

    One can claim Obama isn’t perfect, and one can point to people who might be better administrators than Obama (Bloomberg? Energy Secretary Chu?) but I don’t think one can in good faith claim to be “disappointed” by Obama. I’m astonishingly impressed by his performance as a rational optimizer in a complexly self-destructive political geography.