Close Menu

Survey of Doctorate Recipients, Longitudinal Data: 2015–19

NSF 22-326   |   June 1, 2022
  |   Daniel Foley
 

General Notes

The Survey of Doctorate Recipients (SDR), conducted by the National Center for Science and Engineering Statistics within the National Science Foundation and by the National Institutes of Health, provides demographic, education, and career history information from individuals with a U.S. research doctoral degree in a science, engineering, or health (SEH) field. This report contains technical documentation for the longitudinal subsample of the 2015 SDR (the LSDR 2015–25 panel), which is designed to provide information about employment changes among the population of U.S.-trained science, engineering, and health (SEH) doctorate holders less than 65 years of age in 2015 over a 10-year period (2015–25). The first release of data from the LSDR 2015–25 panel includes survey data from three cycles of the SDR: 2015, 2017, and 2019.

 

Technical Notes

Survey Overview

Purpose. The Survey of Doctorate Recipients (SDR), conducted by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation (NSF) and by the National Institutes of Health, provides data on the characteristics of science, engineering, and health (SEH) doctorate degree holders. A research doctorate is a doctoral degree that (1) requires the completion of an original intellectual contribution in the form of a dissertation or an equivalent culminating project (e.g., a published manuscript) and (2) is not primarily intended as a degree for the practice of a profession. The most common research doctorate degree is the PhD. The SDR samples individuals who have earned an SEH research doctorate from a U.S. academic institution and are less than 76 years of age. The longitudinal subsample of the 2015 SDR (LSDR 2015–25 panel) was selected from among those who were 65 or younger as of 1 February 2015 and responded to the 2015 SDR.

Some of the education and demographic information in the SDR comes from the Survey of Earned Doctorates (SED, https://www.nsf.gov/statistics/srvydoctorates/), an annual census of research doctorates earned in the United States. The SED provides the sampling frame for the SDR through its annual update of the longstanding Doctorate Records File (DRF), a cumulative listing of all U.S.-earned doctorate recipients dating back to 1920.

The technical notes in the SDR data tables for 2015, 2017, and 2019 (https://www.nsf.gov/statistics/srvydoctoratework/#tabs-2) provide overviews of the baseline and two follow up SDR surveys, which form the basis of the longitudinal data of the LSDR panel for the period 2015–19. Information for data collection authority, survey contractor, survey authority, and major changes to each survey cycle is also provided in the technical notes of these survey cycles.

Key Survey Information

Frequency. Biennial.

Initial survey year. 2015.

Reference period. The week of 1 February 2015, 2017, and 2019.

Response unit. Individuals with an SEH research doctorate from a U.S. academic institution.

Sample or census. Sample.

Population size. Approximately 860,300 individuals; 85% resided in the United States during all three survey reference periods.

Sample size. 40,148 individuals.

Key variables.

  • Demographics (e.g., age, sex, race, ethnicity, and citizenship)
  • Educational history
  • Employment status
  • Field of degree
  • Occupation

Survey Design

Target population. The target population of the LSDR 2015–25 panel includes individuals that meet the following criteria:

  • Earned an SEH research doctorate from a U.S. academic institution prior to 1 July 2013.
  • Are not institutionalized or terminally ill on 1 February 2015.
  • Are less than 65 years of age as of 1 February 2015.

Sampling frame. The sampling frame of the LSDR 2015–25 panel consists of 66,270 eligible 2015 SDR respondents.

Sample design. A stratified sampling with proportional allocation was implemented. The sample stratification design defined strata by crossing four 2015 survey outcome variables: employment sector, age group, underrepresented minority indicator, and sex. In addition, seven 2015 survey outcome variables are used as implicit stratification variables in selecting the sample: field of degree, residential location, citizenship when awarded degree, sex, race and ethnicity, disability indicator, and years since degree. The stratification was implemented to strengthen reporting by baseline employment characteristics and minority groups.

The overall sampling rate was about 1 in 20 (4.7%), although sampling rates varied across strata.

Data Collection and Processing Methods

Technical notes in the SDR data tables for 2015, 2017, and 2019 detail methods used in each survey cycle for data collection, quality assurance procedures, data editing and coding, and item-level missing data imputation (https://www.nsf.gov/statistics/srvydoctoratework/#tabs-2).

Of the 40,148 persons in the sample for the LSDR 2015–25 panel, 6,147 did not complete the 2017 SDR, 10,164 did not complete the 2019 SDR, and 3,897 did not complete both the 2017 and 2019 SDR. Out of the 40,148 sampled cases, 36,399 individuals who responded at least once after 2015 or who became permanently ineligible in 2017 or 2019 were included in the released 2015–19 LSDR data file.

Imputation. For sample cases who did not complete the 2017 SDR form or the 2019 SDR form, hot-deck imputation was conducted to impute all variables in a selected set of 32 critical items. The donor pool for the imputation utilized all sample cases in the frame that responded to all three SDR surveys—2015, 2017, and 2019—regardless of whether they were selected into the LSDR panel. Other item-level missing data imputed in each survey cycle were retained without being re-imputed.

Weighting. Because the SDR is based on a complex sampling design and subject to nonresponse bias, sampling weights were created for each respondent to support unbiased population estimates. The final analysis weights from the SDR 2015 were used for computing the initial base weight for the LSDR 2015–25 panel. The 2015 final analysis weights account for: differential sampling rates, adjustments for unknown eligibility, adjustments for nonresponse, and adjustments to align the sample distribution with the DRF distribution with respect to gender, race and ethnicity, degree year, and degree field. The initial panel base weight is the 2015 final weight divided by the probability of selection for the longitudinal sample.

Raking adjustments were made to the panel base weights to compensate for cases that were not included in the 2015–19 LSDR final data file while aligning the weighted sample distribution with the population totals. The raking variables include field of degree, year of award, sex, race and ethnicity, citizenship status, residence as of 1 February 2015 (in or outside the U.S.), along with labor force status, employment sector, and disability status. The final longitudinal sample weights enable data users to derive survey-based estimates of the LSDR target population.

Variance estimation. The successive difference replication method (SDRM) was used to develop replicate weights for variance estimation. The theoretical basis for the SDRM is described in Wolter (1984) and in Fay and Train (1995). As with any replication method, successive difference replication involves constructing a number of subsamples (replicates) from the full sample and computing the statistic of interest for each replicate. The mean square error of the replicate estimates around their corresponding full sample estimate provides an estimate of the sampling variance of the statistic of interest. The final 2015–19 LSDR data file includes 104 sets of replicate weights.

Disclosure protection. To protect against the disclosure of confidential information provided by SDR respondents, the estimates presented in data tables are rounded to the nearest 50, although calculations of percentages are based on unrounded estimates.

Data table cell values based on counts of respondents that fall below a predetermined threshold are deemed to be sensitive to potential disclosure, and the letter “D” indicates this type of suppression in a table cell.

Survey Quality Measures

Sampling error. SDR estimates are subject to sampling errors. Estimates of sampling errors associated with this survey were calculated using replicate weights and are included in each table of estimates. Data table estimates with a coefficient of variation (that is, the estimate divided by the standard error) that exceeds a predetermined threshold are deemed unreliable and are suppressed. The letter “S” indicates this type of suppression in a table cell.

Coverage error. Coverage error occurs in sample estimates when the sampling frame does not accurately represent the target population and is a type of nonsampling error. The initial SDR sampling frame is the DRF which is derived from the SED, a census survey of research doctorates awarded annually in the United States. To the extent that the DRF does not include all awarded research doctorates, the SDR would suffer from undercoverage. Reporting errors in the SED could lead to incorrect classification of doctorates as not having earned an SEH research doctorate, which could result in further undercoverage.

Nonresponse error. The weighted response rate for the baseline survey (2015 SDR) was 66%; the unweighted response rate was 68%. Results from the research and analysis of SDR nonresponse trends have been used in the development of the nonresponse weighting adjustments to minimize the potential for nonresponse bias in the SDR estimates. The raking adjustment, as mentioned above, further align the LSDR sample representation to the target population to account for panel attrition.

Measurement error. The SDR is subject to reporting errors from differences in interpretation of questions and by modality (Web, mail, and CATI).

Data Comparability and Changes

Data comparability. The 2015–19 LSDR data file is the first release of longitudinal data from the LSDR 2015–25 panel. The survey questions remained the same across the three survey cycles. The 2015 SDR sample design improved population coverage in the 2015, 2017, and 2019 survey cycles to include all SEH doctorates awarded by U.S. institutions regardless of the academic year of award or the graduate’s post-graduation residency location.

Changes in survey coverage and population. None.

Changes in data processing. None.

Definitions

Field of doctorate. The doctoral field is as specified by the respondent in the SED at the time of degree conferral. The more than 200 SED coded fields were subsequently recoded to the 77 field-of-study codes used in the SDR questionnaire. (See SDR 2019 technical table A-1 for a list and cross-classification of the 77 SDR detailed fields of degree based on the ToD with over 200 fine fields of degree reported in the SED sampling frame.)

Full-time and part-time employment. Full-time (working 35 hours or more per week) and part-time (working less than 35 hours per week) employment status is for the principal job only and not for all jobs held in the labor force. For example, an individual could work part time in his or her principal job but full time in the labor force. Full-time and part-time employment status is not comparable to data reported before 2006, when no distinction was made between the principal job and the other jobs held by the individual.

Involuntarily out-of-field rate. Involuntarily out-of-field rate is the percentage of employed individuals who reported, for their principal job, working in an area not related to the first doctoral degree at least partially because a job in their doctoral field was not available.

Labor-force participation rate. The labor-force participation rate is the ratio (E + U) / P, where E (employed) + U (unemployed; not-employed and actively seeking work) = the total labor force, and P = population, defined as all noninstitutionalized SEH doctorate holders less than 76 years of age on the survey reference date and who earned their doctorate from a U.S. institution.

Occupation data. The occupational classification of the respondent was based on his or her principal job (including job title) held during the reference week—or on his or her last job held, if not employed in the reference week (survey questions A5 and A6 as well as A19 and A20). Also used in the occupational classification was a respondent-selected job code (survey questions A7 and A21). (See SDR 2019 technical table A-2 for a list and classification of occupations reported in the SDR.)

Race and ethnicity. Ethnicity is defined as Hispanic or Latino or not Hispanic or Latino. Values for those selecting a single race include American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White. Persons who report more than one race and who are not of Hispanic or Latino ethnicity also have a separate value. Race and ethnicity data are from the SED and prior rounds of the SDR. The most recently reported race and ethnicity data are given precedence.

Salary. Median annual salaries are reported for the principal job, rounded to the nearest $1,000, and computed for full-time employed SEH doctorate holders only. For individuals employed by educational institutions, no accommodation was made to convert academic year salaries to calendar year salaries. Users are advised that, due to changes in the salary question after 1993, salary data for 1995–2019 are not strictly comparable with 1993 salary data.

Sector of employment. Employment sector is a derived variable based on responses to questionnaire items A13, A14, and A15. Questionnaire item A13 (type of principal employer) includes a separate response “In a non-U.S. government at any level” as of the 2015 survey. In the data tables, the category 4-year educational institutions include 4-year colleges or universities, medical schools (including university-affiliated hospitals or medical centers), and university-affiliated research institutes. Other educational institutions include 2-year colleges, community colleges, technical institutes, precollege institutions, and other educational institutions (which respondents reported verbatim in the survey questionnaire). Users should note that prior to 2008 these other educational institutions that were written verbatim by respondents were grouped with 4-year educational institutions rather than with 2-year colleges. Private, for-profit includes respondents who were self-employed in an incorporated business. Self-employed includes respondents who were self-employed or were a business owner in a non-incorporated business.

Unemployment rate. The unemployment rate (RU) is the ratio U / (E + U), where U = unemployed (not-employed and actively seeking work), and E (employed) + U = the total labor force.

References

Fay RE, Train GF. 1995. Aspects of survey and model-based postcensal estimation of income and poverty characteristics for states and counties. American Statistical Association Proceedings of the Section on Government Statistics, 154–59.

Wolter K. 1984. An investigation of some estimators of variance for systematic sampling. Journal of the American Statistical Association 79(388):781–90.

 

Acknowledgments and Suggested Citation

Acknowledgments

Daniel Foley, Wan-Ying Chang, and Lynn Milan of the National Center for Science and Engineering Statistics (NCSES) developed and coordinated this report under the leadership of Emilda B. Rivers, NCSES Director; Vipin Arora, NCSES Deputy Director; John Finamore, NCSES Chief Statistician; and Rebecca L. Morrison, (acting) NCSES Program Director. Jock Black (NCSES) reviewed the report.

Under NCSES contract with Westat, the Westat statistical team led by Shelley Brock compiled the data file for this report. Publication processing support was provided by Christine Hamel, Tanya Gore, and Joe Newman (NCSES).

NCSES thanks the doctorate recipients for their generous time and effort in contributing to the information included in this report.

Suggested Citation

National Center for Science and Engineering Statistics (NCSES). 2022. Survey of Doctorate Recipients, Longitudinal Data: 2015–19. NSF 22-326. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/pubs/nsf22326/.

 

Contact Us

Report Author

Lynn Milan
Survey Manager
Human Resources Statistics Program, NCSES
Tel: (703) 292-2275
E-mail: lmilan@nsf.gov

NCSES

National Center for Science and Engineering Statistics
Directorate for Social, Behavioral and Economic Sciences
National Science Foundation
2415 Eisenhower Avenue, Suite W14200
Alexandria, VA 22314
Tel: (703) 292-8780
FIRS: (800) 877-8339
TDD: (800) 281-8749
E-mail ncsesweb@nsf.gov