Data collection. The data collection period for the 2022 NTEWS Pilot was 26 weeks (21 April 2022 to 24 October 2022), or approximately six months. The NTEWS Pilot used a trimodal data collection approach: a self-administered online survey (Web), a self-administered paper questionnaire (via mail), and a computer-assisted telephone interview (CATI). The survey was offered in English and Spanish. Several methods were used to contact sample persons to request they complete the survey. Sample persons were contacted through the following methods:
- By mail, with multiple invites and reminders sent to complete the survey online and some mailings including the paper questionnaire
- Through the CATI operation, which was also used for the nonresponse follow up (NRFU) stage
- By calling twice using an automatic dialer phone tree, with a reminder message to complete the survey
Throughout the collection, Census staff searched for new addresses and phone numbers for sample persons who received multiple undeliverable mailings or had unproductive phone numbers.
Quality assurance procedures were in place to monitor key data collection activities to ensure operations progressed in a timely manner and were performed according to plan. These activities included printing, mail package assembly, mailout, questionnaire check-in, data keying, coding, telephone questionnaire assistance (TQA), scanning questionnaires into the document database, coding, and post-data collection processing.
Mode. About 57% of respondents completed the survey by Web, 33% by mail, 8% by CATI, and 2% by TQA. Of those who completed by Web, 65% were on a computer, 32% were on a smartphone, and 3% were on a tablet.
Each of the three modes of data collection (Web, mail, or CATI) was also translated into Spanish for sample persons to use while responding. About 15% of all CATI responses were conducted in Spanish, 7% of all mail responses used the Spanish questionnaire, and 3% of all web responses used at least one Spanish question screen.
Response rates. Response rates were calculated on complete responses from instruments. To be considered a complete response, the following critical items must have been answered.
- Working for pay or profit
- Looking for work
- The name of the main job or the description of the main job
- Educational attainment
- Current high school enrollment
- Birthdate (to determine age)
- Living in the U.S.
The unweighted and weighted response rates adjusted for estimated ineligible individuals for 2022 NTEWS Pilot were 39% and 44%, respectively. Of the roughly 43,000 sample persons, approximately 15,500 completed the survey.
Data editing. Response data had initial editing rules applied relative to the specific mode of capture to check internal consistency and valid range of response. The Web survey captured most survey responses and had internal editing controls where appropriate. The Integrated Computer Assisted Data Entry (iCADE) system processed the mailed paper surveys. Responses from the three modes were merged for subsequent coding, editing, and cleaning necessary to create an analytical database.
Coding for the NTEWS Pilot survey data took open-text, verbatim answers and converted them to a standardized code. Separate coding operations corresponding to survey questions were used: field of study (i.e., NCES Classification of Instructional Programs), credential (certifications and licenses), other-specify (e.g., other reason why a respondent earned a degree or certificate, how often a respondent with a credential needs to renew it, and name of the organization or agency that issued the credential), work experience program, industry, and occupation. Each operation focused on a specific set of variables to be coded from the survey questions.
Imputation. Missing items except for critical items were imputed. Imputation is performed for several reasons. Many data users prefer data files without missing data, especially those unfamiliar with techniques used to analyze missing data. In addition, some statistical packages cannot process an observation with missing variable responses. Eliminating missing data can also reduce nonresponse bias. However, imputation can affect the variance estimates and possibly introduce other bias. For this reason, users have the option not to use imputed data because imputed values are flagged in the data file. Users can request the imputation-flag file from the Survey Manager.
The NTEWS Pilot used a combination of logical and statistical imputation. During the editing process, logical imputation was used to determine the answer to one question with missing data based on the answer to another related question when feasible. In some circumstances, edit checks found inconsistent data, which were removed and then subjected to statistical imputation through the hot deck imputation procedure, which involves splitting individuals into similar cells using class variables. The class variables may include employment status or education level, etc. Within a particular class, all observations are sorted using a list of sort variables, such as number of hours worked per week or age group. Then, an observation with a missing value is given the same response as the nearby observation not missing a value, called the donor, in the order of sort. The purpose is to find a donor who is most like the respondent and most likely to respond the same. The donor is sometimes referred to as the nearest neighbor.
Class and sort variables were selected so that donors and recipients would be as alike as possible in relation to the variable being imputed. Class variables were chosen primarily from filter variables for the questionnaire skip patterns. Sort variables were selected using stepwise-variable regression models to determine significant predictors for the item to be imputed. Potential class and sort variables came from responses to the 2018 ACS or from the NTEWS Pilot items that were already imputed or never missing, along with paradata and recodes of ACS and NTEWS Pilot variables. Sort variables were primarily listed in the order of their significance in the regression model. However, some variables were demoted down the sort order, which prevents these variables from dominating the sort in the list and helps to maintain consistency.
The item nonresponse rates reflect data missing after logical imputation or editing but before statistical imputation. These rates provide information on the percentage of data that was statistically imputed for each questionnaire item. For key variables—such as employment status, current employment educational background, licenses and certifications, and certificates—the weighted item nonresponse rates ranged from 2.5% to 10.2%. Nonresponse to questions deemed sensitive was higher: nonresponse for the type of visa of non-U.S. citizens was 20.6% and for earnings was 10.9%. Nonresponse was also higher for information that may be difficult to recall, such as the last month and year worked for those who were not employed, which was 17.3 % and 12.1%, respectively. Imputation rates frequently varied by educational attainment—generally, variables demonstrated higher imputation rates for lower education levels (i.e., individuals with a high school diploma or less).
Imputation was not performed on critical items or verbatim-based variables. For some missing demographic information, the NTEWS Pilot imported the corresponding data from the ACS, which had performed its imputation.
Weighting. Sampling weights were created for each respondent to support population estimates because the NTEWS Pilot is based on a complex sampling design and is subject to nonresponse. The final analysis weights account for several factors, including the following:
- Adjustments to account for undercoverage for recent immigrants
- Adjustment for incorrect names or incomplete address information on the sampling frame
- Unequal sample selection probabilities to produce base weights
- Post-stratification adjustment to control back to the frame totals within cells defined by cross of sampling cell (Hispanic origin, non-Hispanic White/Other race flag, disability status, age group, and occupation group) with small cells collapsed
- Adjustment to account for the removal of duplicate cases
- Adjustment to account for non-locatability and unit nonresponse during data collection
- Trimming for extreme weights
- Raking adjustments to reallocate the trimmed weights back to the pre-trim totals
- Post-stratification adjustments that ratio adjusts the weights to ensure consistency with the population control totals for key demographic characteristics
The final analysis weights enable data users to derive survey-based estimates of the NTEWS Pilot target population. The variable name on the NTEWS Pilot public use data file for the NTEWS Pilot final analysis weight is FSNTW2201.
Variance estimation. The successive difference replication method (SDRM) was used for variance estimation. The theoretical basis for the SDRM is described in Wolter (1984) and in Fay and Train (1995). As with any replication method, successive difference replication involves constructing numerous subsamples (replicates) from the full sample and computing the statistic of interest for each replicate. The NTEWS Pilot used 80 replicates; each replicate underwent the same weighting adjustment as the full sample.
Disclosure protection. The estimates presented in the NTEWS Pilot data tables are rounded to the nearest 1,000 to protect against disclosure of confidential information provided by NTEWS Pilot respondents.
Data table cell values based on counts of respondents that fall below a predetermined threshold are deemed sensitive to potential disclosure, and the letter “D” indicates this type of suppression in a table cell.