In higher education discussions, everyone wants to know how a particular college or university is performing across a range of metrics. For metrics such as graduation rates and enrollment levels, this isn’t a big problem. Each freestanding college (typically meaning that they have their own accreditation and institutional governance structure) has to report this information to the U.S. Department of Education’s Integrated Postsecondary Education Data System (IPEDS) each year. But other metrics are more challenging to use and interpret because they can cover multiple campuses—something I dig into in this post.
In the 2015-16 academic year, there were 7,409 individual colleges (excluding administrative offices) in the 50 states and Washington, DC that reported data to IPEDS and were uniquely identified by a UnitID number. A common mistake that analysts make is to assume that all federal higher education (or even all IPEDS) data metrics represent just one UnitID, but that is not always the case. Enter researchers’ longtime nemesis—the OPEID.
OPEIDs are assigned by the U.S. Department of Education’s Office of Postsecondary Education (OPE) to reflect each postsecondary institution that has a program participation agreement to participate in federal student aid programs. However, some colleges within a system of higher education share a program participation agreement, in which one parent institution has a number of child institutions for financial aid purposes.
Parent/child relationships can generally be identified using OPEID codes; parent institutions typically have OPEIDs ending with “00,” while child institutions typically have OPEIDs ending in another value. These reporting relationships are fairly prevalent, with there being approximately 5,744 parent and 1,665 child institutions in IPEDS in the 2015-16 academic year based on OPEID values. For-profit college chains typically report using parent/child relationships, while a number of public college and university systems also aggregate institutional data to the OPEID level. For example, Penn State and Rutgers have parent/child relationships while the University of Missouri and the University of Wisconsin do not.
In this case of a parent/child relationship, all data that come from the Office of Federal Student Aid or from the National Student Loan Data System are aggregated up across a number of colleges. This includes all data on student loan repayment rates, earnings, and debt from the College Scorecard as well as student loan default rates that are currently used for accountability purposes. Additionally, some colleges report finance data out at the OPEID level on a seemingly chaotic basis—which can only be discovered by combing through data to see if child institutions do not have values. For example, Penn State always reports at the parent level, while Rutgers has reported at the parent level and the child level on different occasions over the last 15 years. Ozan Jaquette and Edna Parra have pointed out in some great research that failing to address parent/child issues can result in estimates from IPEDS or Delta Cost Project data being inaccurate (although trend data are generally reasonable).
If UnitIDs and OPEIDs were not enough, the Equality of Opportunity Project (EOP) dataset added a new term—super-OPEIDs—to researchers’ jargon. This innovative dataset, compiled by economists Raj Chetty, John Friedman, and Nathaniel Hendren, uses federal income tax records to construct social mobility metrics for 2,461 institutions of higher education based on pre-college family income and post-college student income. (I used this dataset last month in a blog post looking at variations in marriage rates across four-year colleges.) However, the limitation of this approach is that the researchers have to rely on the names of the institutions on tax forms, which are sometimes aggregated beyond UnitIDs or OPEIDs. Hence, the super-OPEID.
The researchers helpfully included a flag for super-OPEIDs that combined multiple OPEIDs (the variable name is “multi” in the dataset, for those playing along at home). There are 96 super-OPEIDs that have this multiple-OPEID flag, including a number of states’ public university systems. The full list can be found in this spreadsheet, but I wanted to pull out some of the most interesting pairings. Here are a few:
–Arizona State And Northern Arizona University And University Of Arizona
–University Of Maryland System (Except University College) And Baltimore City Community College
–Minnesota State University System, Century And Various Other Minnesota Community Colleges
–SUNY Upstate Medical University And SUNY College Of Environment Science And Forestry
–Certain Colorado Community Colleges
To get an idea of how many colleges (as measured by UnitIDs) have their own super-OPEID, I examined the number of colleges that did not have a multiple-OPEID flag in the EOP data and did not have any child institutions based on their OPEID. This resulted in 2,143 colleges having their own UnitID, OPEID, and super-OPEID—meaning that all of their data across these sources is not combined with different institutions. (This number would likely be higher if all colleges were in the EOP data, but some institutions were either too new or too small to be included in the dataset.)
I want to close by noting the limitations of both the EOP and Federal Student Aid/College Scorecard data for analytic purposes, as well as highlighting the importance of the wonky terms UnitID, OPEID, and super-OPEID. Analysts should carefully note when data are being aggregated across separate UnitIDs (particularly when different types of colleges are being combined) and consider omitting colleges where aggregation may be a larger concern across OPEIDs or super-OPEIDs.
For example, earnings data from the College Scorecard would be fine for the University of Maryland-College Park (as the dataset just reflects those earnings), but social mobility data would include a number of other institutions. Users of these data sources should also describe their strategies in their methods discussions to an extent that would allow users to replicate their decisions.
Thanks to Sherman Dorn at Arizona State University for inspiring this blog post via Twitter.