The NPRA Survey collects information on research and experimental development performed by tax-exempt nonprofit organizations in the United States.
The Nonprofit Research Activities (NPRA) module of the Annual Business Survey measures research and experimental development (R&D) performance and funding at U.S. tax-exempt nonprofit organizations.
This survey was conducted by the Census Bureau in partnership with the National Center for Science and Engineering Statistics within the National Science Foundation.
Status | Active |
---|---|
Frequency | Annual |
Reference Period | FY 2021 |
Next Release Date | October 2024 |
The Nonprofit Research Activities (NPRA) module of the Annual Business Survey collects information on research and experimental development (R&D) performed or funded by nonprofit organizations in the United States.
Not applicable.
Annual.
FY 2020. (The questions in the FY 2020 survey were first developed and used on the FY 2016 NPRA survey, which was the first national survey of R&D activities in the U.S. nonprofit population since 1997. Due to the differences between the previous surveys and the current module, FY 2020 is considered the initial survey year for the annual series.)
FY 2021.
Organizations.
Sample.
40,650 nonfarm businesses filing Internal Revenue Service (IRS) tax form 990 as tax-exempt organizations and with payroll of $500,000 or more were in scope for the nonprofit R&D module.
8,050 organizations.
Key variables of interest are listed below.
Included are all nonfarm businesses filing Internal Revenue Service (IRS) tax forms as nonprofit organizations with an annual payroll of $500,000 or more.
The sampling frame was constructed from the final 2020 Business Register and the Exempt Organizations Business Master File Extract (EO BMF). The Business Register is the Census Bureau’s comprehensive database of U.S. businesses. Organizations were excluded from the frame if they were outside the scope of the survey (e.g., churches, government organizations, educational institutions, or organizations located outside the United States). A financial threshold was also imposed to increase the efficiency of reaching organizations that perform research.
The nonprofit R&D frame is stratified by state and primary nonprofit activity (hospitals, other health care, science and technology, and all others), and is systematically sampled within each stratum. A standard type of estimation for stratified systematic sampling is used. Certainty cases have a selection probability of one and a sampling weight of one and represent only themselves. Specifically, firms were selected with certainty based on the following criteria:
The nonprofit R&D sample consisted of 8,050 organizations; 2,150 were selected with certainty.
The remaining 5,900 noncertainty cases were selected using the stratified systematic random sample selection described above. The maximum sample weight was 6.9.
The survey was mailed to 8,050 nonprofit organizations in July 2022. Organizations were sent a letter informing them of their requirement to report. The letter also provided instructions on how to access the survey and submit online. There were three mail follow-ups and four separate e-mail follow-ups conducted to increase response. The collection period closed 30 December 2022.
Prior to tabulating the data, response data were reviewed and edited to correct reporting errors. R&D data were tabulated for records reporting $50,000 or more in R&D expenditures. Survey analysts reviewed the R&D reported by the survey respondents. Research was done by evaluating the reported R&D to expenses ratio, and organization website information.
Additional data errors were detected and corrected using an automated data edit system designed to review the data for reasonableness and consistency. The editing process interactively performed corrections by using standard procedures to fix detectable errors. Quality control techniques were used to verify that operating procedures were carried out as specified.
Where possible, missing data were imputed using previous survey data or other publicly available documents, such as annual reports and financial statements. Weights were used to compensate for unequal probabilities of selection; unit nonresponse; and to calibrate sample estimates of expenses to match total expenses on the frame. Measures of sampling variability were estimated using the delete-a-group jackknife variance estimator.
Sampling error is the difference between estimates obtained from the sample and results theoretically obtainable from a comparable complete enumeration of the sampling frame. This error results because only a subset of the sampling frame is measured in a sample survey. For published estimates from NPRA, standard errors are produced for estimated percentages, while relative standard errors (RSEs) are produced for all other estimates. Tables of the estimated measures of sampling variability corresponding to each data table are available upon request.
Coverage error occurs when the frame fails to completely enumerate the population of interest. There can be both undercoverage error, where units are not included in the frame, and overcoverage error, where units included in the frame are out of scope for the population of interest. The NPRA module uses the prior year Business Register to construct the frame so any changes in businesses that would change the inclusion or exclusion of the business to the survey scope could be sources of coverage error. Prior to tabulation, survey unit information is updated with the most recent available Business Register data to mitigate this source of error.
Unit nonresponse is treated by adjusting weighted reported and imputed data by multiplying each organization's sampling weight by a nonresponse adjustment factor. Detailed descriptions of the adjustments for nonresponse are available in the Technical Notes.
The most common source of measurement error was reporting in different monetary units (for example, reporting whole dollars rather than thousands of dollars). This was corrected during data processing. Another source of error involved incorrect inclusion of organizations already represented in other R&D data collections. The R&D of these respondents was set to 0 where it was determined their R&D was already represented in other survey responses. These cases included nonprofit organizations managing federal laboratories and some university-affiliated hospitals.
Data are available at https://ncses.nsf.gov/surveys/nonprofit-research-activities/.
The questions in the FY 2020 module were first developed and used on the FY 2016 Nonprofit Research Activities Survey—which was the first national survey of R&D activities in the U.S. nonprofit population since 1997. Due to the differences between the previous surveys and the current module, data are not comparable for trend analysis.
NPRA data will be published in NCSES InfoBriefs and data tables available at https://ncses.nsf.gov/surveys/nonprofit-research-activities/.
The NPRA module contains confidential data that are protected under Title 13 and Title 26 of the United States Code. Two types of data are currently available: public-use tabular statistics and restricted microdata. Public-use tabular statistics can be obtained on the NCSES website (https://ncses.nsf.gov/) and by contacting NCSES. Restricted microdata will be available at any of the 15 secure Research Data Centers administered by the Center for Economic Studies (CES) at the Census Bureau. Researchers interested in accessing microdata can apply for a restricted-use license by submitting a proposal to the CES, which evaluates proposals based on their benefit to the Census Bureau, scientific merit, feasibility, and risk of disclosure. To learn more about the Research Data Centers and how to apply, please visit the CES page on research with restricted-use data. For additional information about the application process, including how to initiate a project, please contact the administrator at the primary site where the research will be conducted. Per the Federal Cybersecurity Enhancement Act of 2015, the data are protected from cybersecurity risks through screening of the systems that transmit the data.
Purpose. The NPRA module of the ABS collects information on R&D performed or funded by tax-exempt nonprofit organizations in the United States. The nonprofit sector is one of four sectors (business, government, higher education, and other private nonprofit) that fund or perform R&D.
NCSES combines nonprofit sector data with data from the other sectors to estimate total national R&D expenditures. Results of the research activities data collected from nonprofit organizations will be used to report updated, valid, and reliable estimates of U.S. nonprofit R&D in National Patterns of R&D Resources and the Bureau of Economic Analysis system of national accounts.
The data collected will also be incorporated into the National Science Board’s biennial report, Science and Engineering Indicators. The R&D data from the nonprofit module will be reported in the Organization for Economic Cooperation and Development (OECD) periodic publications and used for international comparisons of R&D efforts. NCSES also anticipates professional associations will use data from the nonprofit R&D module. Likely users in this category include, but are not limited to, the Science Philanthropy Alliance, the Association of Independent Research Institutes, and the Health Research Alliance.
Data collection authority. Title 13, United States Code, Sections 8(b), 131, and 182; Title 42, United States Code, Section 1861-76 (NSF Act of 1950, as amended); and Section 505 within the America COMPETES Reauthorization Act of 2010, authorize this collection. Sections 224 and 225 of Title 13 require mandatory response. Office of Management and Budget No. 0607-1004.
Survey sponsor. NCSES within NSF.
Survey collection and tabulation agent. The survey is conducted annually by the Census Bureau in accordance with an interagency agreement with NCSES.
Frequency. Annual.
Initial survey year. FY 2020. (The questions in the FY 2020 module were first developed and used on the FY 2016 Nonprofit Research Activities Survey, which was the first national survey of R&D activities in the U.S. nonprofit population since 1997. Due to the differences between the previous surveys and the current module, FY 2020 is considered the initial survey year for the annual series.)
Reference period. FY 2021. The fiscal year referred to throughout this report was the nonprofit organization’s fiscal year, which varied across the nonprofit population; for the majority of organizations reporting R&D performance, this fiscal year ended in either September or December 2021.
Response unit. Organizations.
Sample or census. Sample.
Population size. 40,650 organizations.
Sample size. 8,050 organizations.
Target population. Included are all nonfarm businesses with at least one in-scope location filing Internal Revenue Service (IRS) Form 990 as a tax-exempt organization and with annual payroll of $500,000 or more.
Sampling frame. The sampling frame was constructed from the final 2020 Business Register and the Exempt Organizations Business Master File Extract (EO BMF). The Business Register is the Census Bureau’s comprehensive database of U.S. businesses. Business Register data are compiled from a combination of business tax returns, data collected from the economic census, and data from other Census Bureau surveys. The Business Register includes sole proprietorships, partnerships, and corporations reporting business activity to the IRS. The EO BMF is a publicly available list from the IRS of all organizations that are exempt from filing federal income taxes.
The Business Register contains establishments that are out of scope for the nonprofit R&D module. These establishments are removed from the sampling universe. They include:
Information on industry classification, receipts, payroll, and employment was extracted from the Business Register during the frame construction. Nonprofit status for each establishment was determined by matching the Business Register to the IRS nonprofit list.
The sampling frame is stratified by primary type of organization, primary nonprofit activity, and state. Details on how firm-level sampling units are assigned to these strata follow.
The sample is selected at the firm level, so firms with multiple establishments were aggregated into one firm-level unit for sampling. Firm payroll, receipts, and employment are set to the respective sums across all establishments within the firm. Firm industry is set to the 6-digit NAICS code with the highest aggregate payroll within the 4-digit NAICS code with the highest aggregate payroll within the 3-digit NAICS code with the highest aggregate payroll within the NAICS sector with the highest aggregate payroll. Firm industry is set to sector 55 (management of companies and enterprises) only if there are no establishments within the firm that belong to other sectors. Firm National Taxonomy of Exempt Entities (NTEE) code is set to the NTEE code with the highest aggregate payroll within the NTEE group (first digit of NTEE code) with the highest aggregate payroll.
After firm-level units are created and firm-level codes are assigned, firms are removed from the sampling frame if any of the following are true:
Education services firms are out of scope for NPRA because the R&D for those organizations is measured by the Higher Education Research and Development (HERD) Survey. Firms also in the BERD sampling frame are removed because their R&D is measured by the BERD Survey. The remaining criteria identify firms that are unlikely to perform R&D activities.
All records in the nonprofit sample universe are assigned a primary nonprofit activity stratum as follows:
Firms are assigned to state strata based on the organization’s physical location. When physical location is unavailable, state is assigned based on mailing address. Organizations that operate in more than one state are assigned to a multistate category.
Sample design.
The nonprofit R&D frame is stratified by state and primary type of organization (hospitals, other health care, science and technology, and all others). Within these strata some nonprofits were selected with certainty based on the following criteria:
The nonprofit R&D sample consisted of 8,050 organizations; 2,150 were selected with certainty.
The remaining 5,900 noncertainty cases were selected using the stratified systematic random sample selection. The maximum sample weight was 6.9.
Data collection. The survey was mailed to 8,050 nonprofit organizations in July 2022. Organizations were sent a letter informing them of their requirement to report under Title 13, United States Code, Sections 224 and 225. The letter also provided instructions on how to access the survey and submit online. There were three mail follow-ups and four separate e-mail follow-ups conducted to increase response. The collection period closed 30 December 2022.
Mode. The data were collected using an electronic instrument.
Response rates.
Check-in rate. The check-in rate is defined as the unweighted number of surveys that were submitted online by in-scope organizations, divided by the unweighted total number of all in-scope organizations in the sample. Response to individual questions did not factor into this metric. At the close of the collection period, the check-in rate was 84%.
Unit response rate (URR). Unit response is defined as an organization providing total expenses or employment and answering the R&D performed and R&D funded questions. URR is the ratio between the number of unit respondents in a sample (numerator) and total sample size (denominator), expressed as a percentage.
For the nonprofit R&D module, the URR was 81%.
Item response rates. The distribution of values reported by sample organizations in the nonprofit module is highly skewed. Thus, rather than report unweighted item response rates, total quantity response rates are calculated, which are based on weighted data.
Total quantity response rate (TQRR). For a given published estimate other than count or ratio estimates, TQRR is the percentage of the weighted estimate based on data that were reported by units in the sample or on data that were obtained from other sources and were determined to be equivalent in quality to reported data and weighted only by sampling but not nonresponse weights. The TQRR for total expenditures for R&D performed by nonprofit organizations in the United States in 2021 was 80%.
Total quantity nonresponse rate (TQNR). For a given published estimate, TQNR, defined as 100% minus TQRR, is calculated for each tabulation cell from the nonprofits, except for cells that contain count or ratio estimates. TQNR measures the combined effect of the procedures used to handle unit and item nonresponse on the weighted nonprofits estimates. Detailed imputation rates are available upon request.
Data editing. Prior to tabulating the data, response data were reviewed and edited with both automated and manual procedures to correct reporting errors. R&D data were tabulated for records reporting $50,000 or more in R&D expenditures.
Survey analysts reviewed the R&D reported by the respondents. Research was done by evaluating the reported R&D to expenses ratio and the organization’s website information. Respondents were asked for clarifications or corrections on any reporting issues found.
Additional data errors were detected and corrected using an automated data edit system designed to review the data for reasonableness and consistency. The editing process interactively performed corrections by using standard procedures to fix detectable errors. Quality control techniques were used to verify that operating procedures were carried out as specified.
During the editing for FY 2021, errors in reporting for FY 2020 were discovered and revisions were made to correct these. For accurate historical data, use only the most recently released data tables.
Imputation.
Item nonresponse. If detailed R&D data were not reported by a nonprofit and could not be inferred by survey analysts, it was imputed in the same ratio as reported by other nonprofits in the same sample stratum. These imputations are reflected in the reported imputation rates.
Unit nonresponse. Estimates produced from the ABS include adjustments to account for organizations that did not respond to the survey (unit nonresponse). If available, data from public tax filings, annual reports, or audits were used to impute expenses and R&D for select nonprofits known to have performed large amounts of R&D based on public information or prior surveys. Otherwise, unit nonresponse is handled by adjusting weighted reported data as follows. Each organization’s sampling weight is multiplied by a nonresponse adjustment factor. To calculate the adjustment factors, each organization in the sample that is eligible for tabulation is assigned to the adjustment cells. The adjustment cells for nonprofits are based on certainty or noncertainty sampling strata and NAICS sector. For NAICS sector, there are three categories: healthcare, science and technology, and all others. For a given adjustment cell, the nonresponse adjustment factor is the ratio of the sum of the sampling weights for all organizations in the cell to the sum of the sampling weights for all organizations in the cell with reported data. For the nonresponse adjustment, an organization is considered a respondent if it satisfies the definition of response as available in the URR section above.
Weighting. The survey data are weighted for sampling and unit nonresponse.
Industry classification. Nonprofits are classified into one of four types of organization at the time of sampling: hospitals, other healthcare, science and technology, and all other organizations. Organizations tabulated based on the classification at the time of sampling with hospitals and other healthcare were collapsed into one tabulation group called healthcare. Classification is based on both the 2017 NAICS (https://www.census.gov/naics/) and the NTEE. NTEE code is not available for all organizations.
Organizations with more than one domestic establishment are assigned a single industry classification using a hierarchal system based on the largest payroll. For NAICS, the hierarchy is largest payroll sector, largest payroll 3-digit NAICS (within the largest sector), largest payroll 4-digit NAICS (within the largest 3-digit), and largest payroll 6-digit NAICS (within the largest 4-digit). For NTEE the hierarchy is first letter of the largest NTEE code then full 3 characters of the largest NTEE code.
Organizations are first classified as hospitals if the first two characters of their NTEE code are “E2” or the first three characters of their NAICS code are “622.” Remaining organizations are classified as science and technology if the first four characters of their NAICS code are “5417.” Remaining organizations are classified as other healthcare if the first character of their NTEE code is “E” or the first 2 digits of their NAICS code are “62.” All remaining organizations are classified as other.
Variance estimation. This survey uses the delete-a-group jackknife variance estimator. The delete-a-group jackknife variance estimator requires that every sampling stratum contains at least two sampled firms. Sampling strata that do not meet this requirement are collapsed as needed to create a new set of variance estimation strata that satisfies this requirement.
The estimates produced from the NPRA module are subject to both sampling and nonsampling errors.
Sampling error. Sampling error is the difference between estimates obtained from the sample and results theoretically obtainable from a comparable complete enumeration of the sampling frame. This error results because only a subset of the sampling frame is measured in a sample survey. For published estimates from NPRA, standard errors are produced for estimated percentages, while relative standard errors (RSEs) are produced for all other estimates. Tables of the estimated measures of sampling variability corresponding to each data table are available upon request.
Coverage error. Coverage error occurs when the frame fails to completely enumerate the population of interest. There can be both undercoverage error, where units are not included in the frame, and overcoverage error, where units included in the frame are out of scope for the population of interest. The NPRA module uses the prior year Business Register to construct the frame so any changes in businesses that would change the inclusion or exclusion of the business to the survey scope could be sources of coverage error. Prior to tabulation, the survey unit information is updated with the most recent available Business Register data to mitigate this source of error.
Nonresponse error. Nonresponse error refers to the differences in key estimates between units (i.e., organizations) in the sampling frame that were sampled for data collection and those that responded. For unit nonresponse, multiple follow-ups were conducted with nonresponding organizations to mitigate nonresponse error. The final survey weights incorporated nonresponse adjustments to reduce nonresponse bias in the final estimates.
Nonresponse bias for survey estimates cannot be directly measured using only survey data. However, the impact of nonresponse is incorporated into the variability of survey estimates under the assumption the data is missing at random. Missing at random for this survey means that, conditional on the nonresponse adjustments, the propensity for an organization to respond is not related to R&D performance.
To minimize item nonresponse, organizations were encouraged to report estimates of expenditures when actual dollar amounts could not be provided. This approach reduces item nonresponse error risk but may introduce measurement error. Imputation was conducted to help mitigate item nonresponse error.
Measurement error. The most common source of measurement error was reporting in different units (e.g., reporting in whole dollars rather than in thousands of dollars). This was corrected during data processing. Another source of error involved incorrect inclusion of organizations already represented in different R&D data collections. The R&D of these respondents was set to 0 if it was determined their R&D was already represented in other R&D survey responses. These cases included nonprofit organizations managing federal laboratories and some university-affiliated hospitals.
The questions in the FY 2020 module were first developed and used on the FY 2016 Nonprofit Research Activities Survey—which was the first national survey of R&D activities in the U.S. nonprofit population since 1997. Due to the differences between the previous surveys and the current module, data are not comparable for trend analysis.
The Nonprofit Research Activities (NPRA) FY 2021 module of the 2022 Annual Business Survey (ABS) is conducted by the Census Bureau in partnership with the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation. The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. P-7504866, Disclosure Review Board [DRB] approval number: CBDRB-FY23-0334).
Ronda Britt of the National Center for Science and Engineering Statistics (NCSES) developed and coordinated this report under the guidance of Amber Levanon Seligson, NCSES Program Director, and under the leadership of Emilda B. Rivers, NCSES Director; Christina Freyman, NCSES Deputy Director; and John Finamore, NCSES Chief Statistician. Jock Black reviewed the report. In partnership with NCSES, the Census Bureau conducted the survey and prepared the tables.
National Center for Science and Engineering Statistics (NCSES). 2023. Nonprofit Research Activities: FY 2021. NSF 24-304. Alexandria, VA: National Science Foundation. Available at https://ncses.nsf.gov/surveys/nonprofit-research-activities/2021.
For additional information about this survey or the methodology, contact