The SDR provides data on the characteristics of science, engineering, and health research doctorate degree holders from U.S. academic institutions who are under the age of 76.
The SDR provides demographic, education, and career history information from individuals with a U.S. research doctoral degree in a science, engineering, or health field. The SDR is sponsored by the National Center for Science and Engineering Statistics within the National Science Foundation and by the National Institutes of Health. Conducted since 1973, the SDR is a unique source of information about the educational and occupational achievements and career movement of U.S.-trained doctoral scientists and engineers in the United States and abroad.
Westat was the data collection contractor for the 2021 SDR.
|Reference Period||The week of 1 February 2021|
|Next Release Date||TBD|
The Survey of Doctorate Recipients (SDR), conducted by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation, provides data on the characteristics of science, engineering, and health (SEH) doctorate degree holders. It samples individuals who have earned an SEH research doctoral degree from a U.S. academic institution and are less than 76 years of age. The SDR provides data useful in assessing the supply and characteristics of U.S.-trained SEH doctorates employed in educational institutions, private industry, professional organizations, and government in the United States, as well as in other countries worldwide.
The 2021 SDR made two types of changes to the data collection instruments. First, for all modes of data collection, the survey included new questions to gauge the effects of the coronavirus pandemic on employment, specifically on labor force status, number of hours worked per week, salary, benefits, telecommuting options, and total earned income. The second change applied to the electronic instruments only. The Web and computer-assisted telephone interview (CATI) instruments included dependent interviewing (DI) methods for a targeted number of items within the employment question series to reduce respondent burden.
The week of 1 February 2021.
Individuals with an SEH research doctorate degree from a U.S. academic institution.
Approximately 1,185,700 individuals.
A total of 125,938 individuals.
The SDR target population includes individuals that meet the following criteria:
The Doctorate Records File (DRF) constructed from the annual Survey of Earned Doctorates, which is a census survey of all recipients of U.S. research doctoral degrees.
The SDR uses a fixed panel design with a sample of new doctoral graduates added to the panel in each biennial survey cycle. For the 2021 SDR, all 2019 sample members who remained age eligible were retained for the 2021 cycle. As with prior survey cycles, a sample of 10,000 new graduates who had earned their degrees since the last SDR survey cycle, from 1 July 2017 to 30 June 2019, was added. The new graduates sample design followed the same sample design and sample stratification first introduced in 2019, defined by detailed fields of study, gender, and underrepresented minority status.
The SDR uses a trimodal data collection approach: self-administered online survey, self-administered paper questionnaire (via mail), and CATI.
The data collected in the SDR are subject to both editing and imputation procedures. The SDR uses both logical imputation and statistical (hot-deck) imputation as part of the data processing effort.
Because the SDR is based on a complex sampling design and subject to nonresponse bias, sampling weights are created for each respondent to support unbiased population estimates. The final analysis weights account for the following:
Estimates of sampling errors associated with this survey were calculated using replicate weights.
Any missed doctoral graduates within the DRF derived from the SED would create undercoverage in the SDR. Reporting errors in the SED could lead to incorrect classification of doctorates as having or not having earned an SEH research doctorate, which could result in either overcoverage or undercoverage.
The weighted and unweighted response rates for the 2021 SDR were each 65%. Analyses of SDR nonresponse trends were used to develop nonresponse weighting adjustments to minimize the potential for nonresponse bias in the SDR estimates. A hot-deck imputation method was used to compensate for item nonresponse.
The SDR is subject to reporting errors from differences in interpretation of questions. Although three modes of response were offered (Web, mail, and CATI), 99% of sample members chose to respond via the Web instrument. As such, reporting error due to mode differences is significantly diminished.
Data from 1993 to present are available at the SDR website, https://www.nsf.gov/statistics/srvydoctoratework/.
Year-to-year comparisons can be made among the 1993 to 2021 survey cycles because many of the core questions remained the same. Small but notable differences exist across some survey years, such as the collection of occupation data based on more recent versions of the occupation taxonomy. Also, the SDR target population definition has changed over time as follows:
Caution is recommended when interpreting or analyzing trends that span pre- and post-1991 surveys, pre- and post-2010 surveys, and pre- and post-2015 surveys given the noted changes in the survey design and target population.
Data from the SDR are published in NCSES InfoBriefs and data tables, available at https://www.nsf.gov/statistics/srvydoctoratework/. Information from this survey is also included in Science and Engineering Indicators and Women, Minorities, and Persons with Disabilities in Science and Engineering.
The SDR public use data are available in the SESTAT data tool and in downloadable files through the NCSES data page. Access to restricted data for researchers interested in analyzing microdata can be arranged through a licensing agreement. For more information on licensing, see https://ncses.nsf.gov/about/licensing.
1The Web and CATI instruments included DI methods for a targeted number of items within the employment question series. With DI, sample member responses from 2019 were preloaded into the 2021 SDR questionnaire and displayed for the respondent. For each of the DI questions, sample members first answered “yes” or “no” to indicate if the information displayed from their 2019 response still applied to the 2021 reference period. If not, the sample member provided updated information on the subsequent screen. Only sample members who participated in 2019 and reported working in both the 2019 and 2021 cycles were eligible for DI.
Purpose. The Survey of Doctorate Recipients (SDR), conducted by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation (NSF), provides data on the characteristics of science, engineering, and health (SEH) doctorate degree holders. A research doctorate is a doctoral degree that (1) requires the completion of an original intellectual contribution in the form of a dissertation or an equivalent culminating project (e.g., a published manuscript) and (2) is not primarily intended as a degree for the practice of a profession. The most common research doctorate degree is the PhD. The SDR samples individuals who have earned an SEH research doctorate from a U.S. academic institution and are younger than 76 years. The SDR provides data useful in assessing the supply and characteristics of the U.S.-trained SEH doctorates employed in educational institutions, private industry, professional organizations, and governments in the United States, as well as in other countries worldwide.
The SDR is designed to provide demographic, education, and career history information about individuals who earned a research doctorate in an SEH field from a U.S. academic institution. The SDR is closely related to another survey of scientists and engineers conducted by NCSES: the National Survey of College Graduates (NSCG, https://www.nsf.gov/statistics/srvygrads/). These two surveys share a common reference date, and they use similar questionnaires and data processing guidelines.
Some of the education and demographic information in the SDR come from the Survey of Earned Doctorates (SED, https://www.nsf.gov/statistics/srvydoctorates/), an annual census of research doctorates earned in the United States. The SED provides the sampling frame for the SDR through its annual update of the longstanding Doctorate Records File (DRF), a cumulative listing of all U.S.-earned doctorate recipients dating back to 1920.
These technical notes provide an overview of the 2021 SDR. Complete details are provided in the 2021 SDR Methodology Report, available upon request from the SDR Survey Manager.
Data collection authority. The information collected in the SDR is solicited under the authority of the National Science Foundation Act of 1950, as amended, the America COMPETES Reauthorization Act of 2010, and the Confidential Information Protection and Statistical Efficiency Act of 2002. The Office of Management and Budget control number is 3145-0020 and expires on 31 July 2024.
Survey contractor. Westat, Rockville, MD.
Survey sponsor. The SDR is sponsored by NCSES with support from the National Institutes of Health.
Major changes to the recent cycle. In 2021, NCSES introduced two changes to the SDR survey. First, NCSES added new content to capture the effects of the coronavirus pandemic on the doctoral-trained SEH workforce. The new content was intended to measure impacts on salary, income, labor force status, and benefits. As a result of these changes, the set of questions in the historical salary and income series reflect some modifications and should be considered in trend analysis using these variables. Please see the 2021 SDR Methodology Report for more details about the coronavirus-related questionnaire modifications.
Second, for the electronic modes of response, eligible sample members could respond to a targeted set of six employment items via a dependent interview approach. With dependent interviewing, the survey instrument displayed the unedited response from the 2019 cycle for the targeted survey questions and asked the sample member if that response was still correct as of the reference date (1 February 2021). If yes, the instrument moved to the next applicable survey question. If the sample member indicated the response was no longer correct as of the reference date, the instrument presented the traditional (nondependent interviewing) version of the same question for the respondent to answer. The paper version of the survey did not reflect dependent interviewing methods.
Initial survey year. 1973.
Reference period. The week of 1 February 2021.
Response unit. Individuals with an SEH research doctorate from a U.S. academic institution.
Sample or census. Sample.
Population size. Approximately 1,185,700 individuals; 1,023,600 residing in the United States and 162,100 residing outside the United States.
Sample size. 125,938 individuals.
Target population. The SDR target population includes individuals that meet the following criteria:
Sampling frame. The SDR uses the DRF, constructed from the annual SED, as its sampling frame. Based on the information available in the DRF, individuals who did not meet the age criterion were dropped from the frame. For individuals who completed more than one SEH research doctorate, only the information on the first degree earned was used for sampling eligibility.
Sample design. The SDR uses a fixed panel design with a sample of new doctoral graduates added to the panel in each biennial survey cycle. For the 2021 SDR, all 2019 sampled members who remained age eligible were retained for the 2021 cycle.As with prior survey cycles, a sample of 10,000 new graduates who had earned their degrees from 1 July 2017 to 30 June 2019 was added. As with the 2017 and 2019 survey cycles, the stratification cells defined by detailed fields of study, gender, and underrepresented minority indicator were used to select the new graduate sample.
The resulting 2021 SDR sample of 125,938 cases consisted of 115,938 age-eligible cases from the 2019 SDR and 10,000 cases from the new cohort of graduates from academic years 2018 and 2019. The overall sampling rate was about 1 in 10 (10.6%), although sampling rates varied across strata.
Data collection. The data collection period for SDR has historically been 6 months, but NCSES decided to extend the 2021 data collection by one additional month, for a field period of 7 months in total. The SDR used a trimodal data collection approach: self-administered online survey (Web), self-administered paper questionnaire (via mail), and computer-assisted telephone interview (CATI). All individuals in the sample were started in the Web mode if a mail or e-mail address was available. After an initial survey invitation via postal mail and e-mail, the data collection protocol included sequential contacts by postal mail, telephone, and e-mail that ran throughout the data collection period. At any time during data collection, sample members could choose to complete the survey using any of the three modes. Nonrespondents to the initial survey invitation received follow-up with alternate survey modes.
Quality assurance procedures were in place at each data collection step (address updating, printing, package assembly and mailing, questionnaire receipt, data entry, coding, CATI, and post-data collection processing). Active data collection ended in February 2022. The online survey closed 28 February 2022, and receipt of hard-copy questionnaires ended on 2 March 2022.
Mode. Almost 99% of the participants completed the survey through the Web, 0.6% through mail, and 0.6% through CATI. Web participation increased from 93% in the 2019 cycle because of continued emphasis on Web-based participation in the starting phase of data collection.
Response rates. Response rates were calculated on complete responses, that is, from instruments with responses to all critical items. Critical items are those containing information needed to report labor force participation, including employment status, job title, and job description, as well as location of residency on the reference date. The overall unweighted response rate was 65%; the weighted response rate was also 65%. These response rates are about 3 percentage points lower than those achieved in the 2019 SDR.
Of the 125,938 persons in the 2021 SDR sample, 80,295 completed the survey. Among those who completed the survey, 71,213 respondents were residing in the United States on the survey reference date and contributed to the U.S. SEH doctoral population estimates. An additional 9,082 persons completed the survey, but they were residing outside of the United States on the survey reference date. This group contributed to the estimates of the internationally residing U.S.-trained SEH doctoral population.
Data editing. The Web and CATI SDR instruments were combined into a single code base for 2021, reducing mode differences and facilitating harmonization. Mail questionnaire data were scanned, and data were captured via Optical Mark Recognition (OMR) and Optical Characters Recognition (OCR). The OMR and OCR technology also applied editing controls that flagged unclear responses or responses that did not fit the expected response type (e.g., multiple responses in a select-one type question). Telephone callbacks were used to obtain additional information for incomplete mail responses. Responses from paper and electronic modes were merged into a single database and fully harmonized prior to the subsequent coding, editing, and cleaning needed to create an analytical database.
Following established NCSES guidelines for coding SDR survey data, including verbatim responses, staff were trained in conducting a standardized review and coding of occupation and education information, “other/specify” verbatim responses including verbatim items pertaining to the coronavirus modifications, state and country geographical information, and postsecondary institution information. For standardized coding of occupation, the respondent's reported job title, duties and responsibilities, and other work-related information from the questionnaire were reviewed by trained coders who corrected known respondent self-reporting errors to obtain the best occupation codes. The education code for the field of study of a newly earned degree or for the first bachelor's degree earned if not reported previously was assigned solely based on the verbatim response for that degree field.
Imputation. Item nonresponse for key employment items—such as employment status, sector of employment, and primary work activity—ranged from 0.0% to 1.8%. Nonresponse to questions about income was higher: nonresponse to salary was 10.5%, and nonresponse to earned income was 12.9%. Personal demographic data, such as sex, marital status, citizenship, ethnicity, and race, had variable item nonresponse rates, with sex at 0.0%, birth year at 0.2%, marital status at 8.0%, citizenship at 6.9%, ethnicity at 0.1%, and race at 0.5%. Item nonresponse was addressed using random imputation and hot-deck imputation methods.
Logical imputation often was accomplished as a part of editing. In the editing phase, the answer to a question with missing data was sometimes determined by the answer to another question. In some circumstances, editing procedures found inconsistent data that were blanked out and therefore subject to statistical imputation. During sample frame construction for the SDR, some missing demographic variables, such as race and ethnicity, were imputed before sample selection by using other existing information from the sampling frame. All sample members with imputed values for race or ethnicity were given the opportunity to report these data if they responded in the Web or CATI modes.
Respondents with missing race or ethnicity data who did not take the opportunity to report these data were assigned values for race or ethnicity through hot-deck procedures during post-data processing.
Most SDR variables were subjected to hot-deck imputation, with each variable having its own class and sort variables chosen by regression modeling to identify nearest neighbors for imputed information.
However, imputation was not performed on verbatim-based variables. For some variables, there was no set of class and sort variables that was reliably related to or suitable for predicting the missing value, such as day of birth. In these instances, random imputation was used, so that the distribution of imputed values was similar to the distribution of reported values without using class or sort variables.
Weighting. Because the SDR is based on a complex sampling design and subject to nonresponse bias, sampling weights were created for each respondent to support unbiased population estimates. The final analysis weights account for the following:
The final sample weights enable data users to derive survey-based estimates of the SDR target population. The variable name on the SDR public use data files for the SDR final sample weight is WTSURVY.
Detailed information on weighting is contained in the 2021 SDR Methodology Report, available upon request from the SDR Survey Manager.
Variance estimation. The successive difference replication method (SDRM) was used to develop replicate weights for variance estimation. The theoretical basis for the SDRM is described in Wolter (1984) and in Fay and Train (1995). As with any replication method, successive difference replication involves constructing a number of subsamples (replicates) from the full sample and computing the statistic of interest for each replicate. The mean square error of the replicate estimates around their corresponding full sample estimate provides an estimate of the sampling variance of the statistic of interest. The 2021 SDR produced 104 sets of replicate weights. Please contact the SDR Survey Manager to obtain the SDR replicate weights and the replicate weight user guide.
Disclosure protection. To protect against the disclosure of confidential information provided by SDR respondents, the estimates presented in SDR data tables are rounded to the nearest 50, although calculations of percentages are based on unrounded estimates.
Data table cell values based on counts of respondents that fall below a predetermined threshold are deemed to be sensitive to potential disclosure, and the letter “D” indicates this type of suppression in a table cell.
Sampling error. SDR estimates are subject to sampling errors. Estimates of sampling errors associated with this survey were calculated using replicate weights and are included in each table of estimates. Data table estimates with coefficient of variation (that is, the estimate divided by the standard error) that exceed a predetermined threshold are deemed unreliable and are suppressed. The letter “S” indicates this type of suppression in a table cell.
Coverage error. Coverage error occurs in sample estimates when the sampling frame does not accurately represent the target population and is a type of nonsampling error. The initial SDR sampling frame is the DRF which is derived from the SED, a census survey of research doctorates awarded annually in the United States. To the extent that the DRF does not include all awarded research doctorates, the SDR would suffer from undercoverage. Reporting errors in the SED could lead to incorrect classification of doctorates as having or not having earned an SEH research doctorate, which could result in either overcoverage or undercoverage.
Nonresponse error. The weighted and unweighted response rates for the 2021 SDR were each 65%. Results from the research and analysis of SDR nonresponse trends have been used in the development of the nonresponse weighting adjustments to minimize the potential for nonresponse bias in the SDR estimates. In addition, as noted above, most item nonresponse was addressed using hot-deck imputation methods and random imputation for a few items when applicable.
Measurement error. The SDR is subject to reporting errors from differences in interpretation of questions and by modality (Web, mail, and CATI).
Data comparability. Year-to-year comparisons can be made among the 1993 to 2021 survey cycles because many of the core questions remained the same. Small but notable differences exist across some survey cycles, however, such as the collection of occupation data being based on the different versions of the occupation taxonomy. Also, due to variation in the month of the reference date in some survey cycles, seasonal differences may occur when making comparisons across cycles and decades. Thus, use caution when interpreting cross-cycle and cross-decade comparisons. In addition, the definition of the SDR survey target population has experienced the following changes over time:
Caution is recommended when considering any analysis of trends that span pre- and post-1991 surveys, pre- and post-2010 surveys, and pre- and post-2015 surveys because of the changes in the survey design and target population.
Overlap in sample cases across survey cycles allows for longitudinal analysis using SDR data. To link cases on the SDR public use data files across survey cycles, use the unique identification variable REFID.
Changes in survey coverage and population.
Changes in data processing.
Changes in questionnaire.
Changes in reporting procedures or classification.
Employer location. Survey question A9 includes the location of the principal employer, and data were based primarily on responses to this question. Individuals not reporting place of employment were classified by their last mailing address.
Field of doctorate. The doctoral field is as specified by the respondent in the SED at the time of degree conferral. The more than 200 SED coded fields were subsequently recoded to the 77 field-of-study codes used in the SDR questionnaire. (See table A-1 for a list and cross-classification of the 77 SDR detailed fields of degree based on the TOD with over 200 fine fields of degree reported in the SED sampling frame.)
Full-time and part-time employment. Full-time (working 35 hours or more per week) and part-time (working less than 35 hours per week) employment status is for the principal job only and not for all jobs held in the labor force. For example, an individual could work part time in his or her principal job but full time in the labor force. Full-time and part-time employment status is not comparable to data reported before 2006, when no distinction was made between the principal job and the other jobs held by the individual.
Involuntarily out-of-field rate. Involuntarily out-of-field rate is the percentage of employed individuals who reported, for their principal job, working in an area not related to the first doctoral degree at least partially because a job in their doctoral field was not available.
Labor-force participation rate. The labor-force participation rate is the ratio (E + U) / P, where E (employed) + U (unemployed; not-employed and actively seeking work) = the total labor force, and P = population, defined as all noninstitutionalized SEH doctorate holders less than 76 years of age during the week of 1 February 2021 and who earned their doctorate from a U.S. institution.
Occupation data. The occupational classification of the respondent was based on his or her principal job (including job title) held during the reference week—or on his or her last job held, if not employed in the reference week (survey questions A5 and A6 as well as A19 and A20). Also used in the occupational classification was a respondent-selected job code (survey questions A7 and A21). (See table A-2 for a list and classification of occupations reported in the SDR.)
Race and ethnicity. Ethnicity is defined as Hispanic or Latino or not Hispanic or Latino. Values for those selecting a single race include American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White. Those persons who report more than one race and who are not of Hispanic or Latino ethnicity also have a separate value. Race and ethnicity data are from the SED and prior rounds of the SDR. The most recently reported race and ethnicity data are given precedence.
Salary. Median annual salaries are reported for the principal job, rounded to the nearest $1,000, and computed for full-time employed scientists and engineers. For individuals employed by educational institutions, no accommodation was made to convert academic year salaries to calendar year salaries. Users are advised that, due to changes in the salary question after 1993, salary data for 1995–2019 are not strictly comparable with 1993 salary data. In 2021, changes to the salary series allowed sample members to identify increases or decreases in their salary due to the coronavirus pandemic. Although the core salary question did not change, additional items were added that may influence how sample members responded to the salary item. Similar changes were implemented in the earnings series. Please see the 2021 SDR Methodology Report for more details regarding these changes.
Sector of employment. Employment sector is a derived variable based on responses to questionnaire items A13, A14, and A15. Questionnaire item A13 (type of principal employer) includes a separate response “In a non-U.S. government at any level” as of the 2015 survey. In the data tables, the category 4-year educational institutions includes 4-year colleges or universities, medical schools (including university-affiliated hospitals or medical centers), and university-affiliated research institutes. Other educational institutions includes 2-year colleges, community colleges, technical institutes, precollege institutions, and other educational institutions (which respondents reported verbatim in the survey questionnaire). Users should note that prior to 2008 these other educational institutions that were written as verbatim by respondents were grouped with 4-year educational institutions rather than with 2-year colleges. Private, for-profit includes respondents who were self-employed in an incorporated business. Self-employed includes respondents who were self-employed or were a business owner in a non-incorporated business.
Unemployment rate. The unemployment rate (RU) is the ratio U / (E + U), where U = unemployed (not employed and actively seeking work), and E (employed) + U = the total labor force.
Fay RE, Train GF. 1995. Aspects of survey and model-based postcensal estimation of income and poverty characteristics for states and counties. American Statistical Association Proceedings of the Section on Government Statistics 154–59.
Wolter K. 1984. An investigation of some estimators of variance for systematic sampling. Journal of the American Statistical Association 79(388):781–90.
1In 2019, NCSES excluded sample members who had not responded in either 2015 or 2017 from the 2019 continuing sample. A supplemental sample of 14,564 SEH doctoral degree holders eligible for the 2015 sample but not previously selected was added in 2019 to support a revised sample stratification design.
2Item nonresponse rates reflect data missing after logical imputation or editing, but before hot-deck imputation, for all variables except sex, predicted respondent location, ethnicity, race, and citizenship at birth. Demographic and location variables completed by logical imputation during frame construction were also counted as nonresponse, as well as those filled in by hot-deck imputation.
Recommended data tables
Flora Lan of the National Center for Science and Engineering Statistics (NCSES) developed and coordinated this report under the leadership of Emilda B. Rivers, NCSES Director; Vipin Arora, former NCSES Deputy Director; John Finamore, NCSES Chief Statistician; and Gary Anderson, Acting NCSES Program Director. Wan-Ying Chang (NCSES) reviewed the report.
Under contract with NCSES, the Westat statistical team led by Shelley Brock compiled the tables in this report.
NCSES thanks the doctorate recipients for their generous time and effort in contributing to the information included in this report.
National Center for Science and Engineering Statistics (NCSES). 2023. Survey of Doctorate Recipients, 2021. NSF 23-319. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf2319.
For additional information about this survey or the methodology, contact