The U.S. Department of Education’s Integrated Postsecondary Education Data System (IPEDS) is a great resource in the field of higher education. While it is the foundation of much of my research, the data are self-reported by colleges and occasionally include errors or implausible values. A great example of some of the issues with IPEDS data is this recent Wall Street Journal analysis of the finances of flagship public universities. When their great reporting team started asking questions, colleges often said that their IPEDS submission was incorrect. That’s not good.
I received grants from Arnold Ventures over the summer to fund two new projects. One of them is examining the growth in master’s degree programs over time and the implications for students and taxpayers. (More on the other project sometime soon.) This led me to work with my sharp graduate research assistant Faith Barrett to dive into IPEDS program completions data.
As we worked to get the data ready for analysis, we noticed a surprisingly large number of master’s programs apparently being discontinued. Colleges can report zero graduates in a given year if the program still exists, so we assumed that programs with no data (instead of a reported zero) were discontinued. But we then looked at years immediately following the apparent discontinuation and there were again graduates. This suggests that programs with missing data periods between when graduates were reported are likely either a data entry error (failing to enter a positive number of graduates) or not reporting zero graduates in an active program instead of truly missing (a program discontinuation). This is not great news for IPEDS data quality.
We then took this a step further by attempting to find evidence that programs that seem to disappear and reappear actually still exist. We used the Wayback Machine (https://archive.org/web/) to look at institutional websites by year to see whether the apparently discontinued program appeared to be active in years without graduates. We found consistent evidence from websites that programs continued to exist during their hiatus in IPEDS data. To provide an example, the Mental and Social Health Services and Allied Professions master’s program at Rollins College did not report data for 2015 after reporting 25 graduates in 2013 and 24 graduates in 2014. They then reported 30 graduates in 2016, 26 graduates in 2017, 27 graduates in 2018, 26 graduates in 2019, and 22 graduates in 2020. Additionally, they had active program websites throughout the period, providing more evidence of a data error.
The table below shows the number of master’s programs (defined at the 4-digit Classification of Instructional Programs level) for each year between 2005 and 2020 after we dropped all programs that never reported any graduates during this period. The “likely true discontinuations” column consists of programs that never reported any graduates to IPEDS following a year of missing data. The “likely false discontinuations” column consists of programs that reported graduates to IPEDS in subsequent years, meaning that most of these are likely institutional reporting errors. These likely false discontinuations made up 31% of all discontinuations during the period, suggesting that data quality is not a trivial issue.
Number of active programs and discontinuations by year, 2005-2020.
Year | Number of programs | Likely true discontinuations | Likely false discontinuations |
2005 | 20,679 | 195 | 347 |
2006 | 21,167 | 213 | 568 |
2007 | 21,326 | 567 | 445 |
2008 | 21,852 | 436 | 257 |
2009 | 22,214 | 861 | 352 |
2010 | 22,449 | 716 | 357 |
2011 | 22,816 | 634 | 288 |
2012 | 23,640 | 302 | 121 |
2013 | 24,148 | 368 | 102 |
2014 | 24,766 | 311 | 89 |
2015 | 25,170 | 410 | 97 |
2016 | 25,808 | 361 | 66 |
2017 | 26,335 | 344 | 35 |
2018 | 26,804 | 384 | 41 |
2019 | 27,572 | 581 | 213 |
2020 | 27,883 | 742 | 23 |
For the purposes of our analyses, we will recode years of missing data for these likely false discontinuations to have zero graduates. This likely understates the number of graduates for some of these programs, but this conservative approach at least fixes issues with programs disappearing and reappearing when they should not be. Stay tuned for more fun findings from this project!
There are two broader takeaways from this post. First, researchers relying on program-level completions data should carefully check for likely data errors such as the ones that we found and figure out how to best address them in their own analyses. Second, this is yet another reminder that IPEDS data are not audited for quality and quite a few errors are in the data. As IPEDS data continue to be used to make decisions for practice and policy, it is essential to improve the quality of the data.