A list of demonstration projects is below. These projects were supported through two funding mechanisms by way of Congress and the Office of Management and Budget. This list will be updated as projects are initiated or completed.
A Framework to Establish a Data Linkage Program
Objective: This project explores linking federal survey data with alternative data sources, such as administrative or third-party data. This project seeks to create a standardized framework that agencies can use to develop linkage programs. It will result in a final report summarizing steps and best practices to create a linkage program. The project ends in July 2026.
Artificial Intelligence–Ready Data Products to Facilitate Discovery and Use
Objective: This project explores how to make agencies' statistical data products more readily ingestible by artificial intelligence (AI) technologies. It will produce an AI readiness assessment as a shared resource for any agency looking to test the machine understandability of its public data products and an AI readiness prototype tool to transform public data products into machine-understandable, AI-ready data. The project ends in April 2026.
Approaches for Assessing and Communicating the Quality of National Statistical Data
Objective: This project will hold a workshop with experts from multiple countries who will discuss data quality frameworks for national statistics and statistical data, how national statistics are designated or certified and communicated to policymakers and the public, and requirements for regular or periodic evaluations. The workshop will inform the Office of Management and Budget and the broader statistical community how different countries have addressed the quality issues that arise when using a greater diversity of data sources and the methods they have used to generate national statistics that are more timely, granular, and policy relevant. The project ends in June 2025.
Artificial Intelligence for Enhancing Data Quality, Standardization, and Integration
Objective: This project aims to develop a set of data processing tools using artificial intelligence to enhance data standardization and integration activities. The project will begin with interviews of key stakeholders in the federal statistical system to identify current best practices, data processing gaps, and confidentiality concerns. It will then prototype a user-friendly toolkit and user interface for a future National Secure Data Service, providing an accessible and unified system for agencies addressing data quality. The project ends in April 2026.
Assessing the Challenges for Federal Surveys through Alternative Data Sources: A Case Study
Objective: This project explores two modernization efforts for federal survey programs: (1) integrating multiple surveys into one survey and (2) using alternative data sources. Many federal surveys are experiencing ongoing challenges, including declining response rates, increased costs, and decreasing resources to support survey operations. Given these challenges, agencies are examining their survey portfolios to identify possible areas of integration or contraction. This work will use the National Center for Science and Engineering Statistics as a case study to explore these areas. It will result in a series of final reports outlining best practices and frameworks developed for these two survey modernization efforts. The project ends in July 2027.
Building Capacity for State, Local, and Territorial Governments to Use Administrative Data for Evidence-Building
Objective: This project explores how nonfederal administrative databases could be used to produce new data products. It will prototype a tool to help jurisdictional governments ingest, visualize, and explore their own administrative data, and it will provide a report that can be used as a roadmap for other state, local, and territorial governments. This project focuses on workforce inequalities as a use case, although the tool and roadmap could be used to answer other kinds of jurisdictional government questions. The project ends in February 2026.
Creating and Validating Synthetic Data
Objective: This project explores two methods of producing synthetic versions of a large-scale restricted use microdata file (the National Center for Science and Engineering Statistics' Annual Business Survey). The two synthetic files will be compared for accuracy and quality, with one selected to undergo disclosure review for public release. This dataset will then be used in an evidence-building project, and its accuracy will be tested using verification metrics. Lessons learned will inform future possibilities for creating synthetic data files to support a tiered access model for the National Secure Data Service. The project ends in March 2026.
Creation of Synthetic Data, and Development and Use of Verification Metrics
Objective: This project explores the creation of a synthetic data file, demonstrates examples of uses of synthetic data for evidence-building, and tests the use of verification metrics in validating estimates produced from synthetic data. The National Center for Science and Engineering Statistics' Survey of Earned Doctorates, an annual census conducted since 1957 of all individuals receiving a research doctorate from an accredited U.S. institution in a given academic year, serves as the case study for this work. Lessons learned will inform future possibilities for creating synthetic data to support a tiered access model for the National Secure Data Service. The project ends in October 2025.
Data Access Alternatives: Artificial Intelligence–Supported Interfaces
Objective: This project seeks to develop and pilot an artificial intelligence (AI) “chatbot” that answers natural language user questions based on public data products from federal statistical agencies. In the first part of the pilot, the team is building a Retrieval Augmented Generation (RAG)–based system that is compatible with and builds on the open-source framework behind Google’s Data Commons. The chatbot will focus on types of data products that represent how statistical agencies publish public data: (1) public use files, (2) data tables, and (3) analytical reports. These features are designed to make public data more accessible, useful, and relevant for a broad range of users, including those in science, policy, and journalism. In addition to a pilot tool, this project will record lessons learned about the size of input data tables, about making statistical data “AI ready,” and about engineering issues encountered while building the pilot tool. The project ends in August 2025.
Data Integration to Estimate STEM Attrition and Workforce Supply: A Pilot Approach (Two Projects)
Objective: These projects seek to develop an analytic approach that researchers, policymakers, and other interested parties can replicate when analyzing data from different sources (e.g., survey and administrative data and state and local data). These projects use an evidence-building question as a use case, seeking to understand the impact of science, technology, engineering, and mathematics (STEM) attrition on future STEM workforce supply. These projects will result in a framework for replicating the study's approaches to using disparate data sources to answer a question. One project ends in March 2026, and the second ends in June 2026.
Data Protection Toolkit Use Case Analysis
Objective: This project conducted a use case analysis on the Federal Committee on Statistical Methodology's (FCSM’s) Data Protection Toolkit, holding interviews with 15 individuals working for federal agencies, state governments, and other institutions. The project resulted in feedback on the Data Protection Toolkit and recommendations for improvement. The project ended in January 2024.
Engaging Policy Stakeholders to Inform a Future National Secure Data Service
Objective: This project seeks to identify the data needs of federal policy stakeholders as future users of a National Secure Data Service using a human-centered design approach. It will conduct a landscape analysis of the data needs within the federal policy ecosystem and conduct a detailed case study with the National Science Board. This project will result in recommendations for the navigation and data concierge services needed by policy stakeholders and a prototype service framework or policy toolkit. The project ends in November 2025.
Establish a National Secure Data Service Website
Objective: This project will design, build, and maintain a public website, a core component of the National Secure Data Service (NSDS). It will provide updates on research, development, and testing activities and serve as a prototype for the website that will become the "front door" for NSDS 1.0 and beyond. The project ends in May 2027.
Establish Secure Compute Environment
Objective: This project builds a secure compute environment, a core component of the National Secure Data Service. The secure compute environment allows approved researchers to access, link, and analyze data for approved projects and enables testing and use of state-of-the-art privacy-enhancing technologies. The secure compute environment will undergo operational testing in early 2025 with an operational test bed available in the summer of 2025. The project ends in August 2026.
Evaluation of Noise Infusion for Large-Scale Demographic Sample Survey
Objective: This project seeks to evaluate noise infusion for a sample survey. It will investigate different methods for noise infusion to evaluate data quality with each method and explore public messaging surrounding noise infusion. The project will result in a noise-infused sample survey with documentation of methodology and data quality assessment. The project ends in August 2025.
Expanding Equitable Access to Restricted-Use Data through Federal Statistical Research Data Centers
A project to design and build a secure compute environment that will be leveraged as part of an overall effort to build a linkage and access infrastructure to support the National Secure Data Service Demonstration project. This compute environment will increase abilities to process and analyze data, maintain data security, and expand research access while also allowing for the implementation of testing privacy-preserving technologies as required under Section 10375 of the CHIPS and Science Act.
Federated Data Usage Platform (Two Projects)
Objective: These projects seek to prototype a data usage platform to illuminate instances of how federal data are being used across a wide variety of audiences and use cases. These prototypes will inform the development of a data usage platform dashboard that federal agencies can use as a shared service within the National Secure Data Service. Both projects end in September 2025.
Informing Evidence-Building Capacity among State, Local, Territorial, and Tribal Governments within a National Secure Data Service
Objective: This project explores how a National Secure Data Service (NSDS) could support capacity for evidence-building among state, local, territorial, and tribal governments. "Capacity building" here refers to skill building, continuous learning opportunities, and access to infrastructure and tools. This project will conduct a needs analysis with all 50 states as well as local, tribal, and territorial governments. The project will produce three reports: (1) needs analysis by group, (2) gap analysis by group, (3) recommendations for a future NSDS. The project ends in August 2026.
Models for a Data Concierge Service for a National Secure Data Service
Objective: This project explores models for a data concierge service, conducting an environmental scan of service request types that federal agencies receive and interviews of federal data providers and data users to inform a data concierge service. It will result in two or more models for a data concierge service as well as resource needs for each and potential staffing requirements. The project ends in March 2025.
National Vital Statistics System: New Opportunities for Interoperable Data
Objective: This project explored the National Vital Statistics System ecosystem as a way to inform shared services in a future National Secure Data Service (NSDS) because of the system’s experience with data interoperability, implementation of governance considerations and authorized roles and responsibilities, and tiered data access structure. The project ended in September 2024 and resulted in a final report outlining considerations for a future NSDS.
Policy-Relevant Classification Techniques for Federal Data Standardization within a Data Concierge Model
Objective: This project is developing and testing a machine-based approach to develop policy-relevant classification systems for use by stakeholders. This will be done through (1) a literature review on current methods agencies and other stakeholders use to classify patent data into technology classifications and (2) convening stakeholders and experts to gather information on the CHIPS and Science Act, patent classifications, and relevant techniques in machine learning and artificial intelligence. The project ends in July 2025.
Privacy Preserving Technologies Phase 1: Environmental Scan
Objective: This project conducted an environmental scan to understand the current landscape of privacy-enhancing technologies, resulting in a report documenting the analysis. The results of this project have informed project testing and piloting using privacy-enhancing technologies (such as privacy-preserving record linkage and synthetic data generation), which inform the National Secure Data Service secure compute environment and Capacity Building Center. The project ended in January 2024.
Secure Compute Environment Environmental Scan
Objective: This project conducted an environmental scan of secure compute environments. Over 20 federal stakeholders were interviewed to share perspectives on benefits, challenges, and requirements for successful utilization of a secure compute environment within the federal space. It produced a final report detailing findings to inform the requirements needed for the National Secure Data Service secure compute environment build. The project ended in July 2024.
Synthetic Data Generation with Large Real-World Data
Objective: This project explores how synthetic data generation, a type of privacy-enhancing technology, works with large real-world data (that is, data sets with over 30 billion rows of data) in a secure super compute environment. It will produce a framework to inform a synthetic data toolkit that will include but not be limited to methods to assess privacy risk, data utility and open-source artificial intelligence (AI) methods for generating synthetic data. This is a joint project between the National AI Research Resource (NAIRR) pilot and the National Secure Data Service (NSDS) Demonstration project. These are independent initiatives with expected synergies as reflected in the CHIPS and Science Act requirement that the NSDS Demonstration project consult with the NAIRR Task Force in NSDS development. The project ends in August 2026.
Utilizing Privacy Preserving Record Linkage to Link Data from Two Federal Statistical Agencies
Objective: This project explores the development of a data sharing agreement between two federal statistical agencies that have not previously developed data sharing relationships, deploys a commercial privacy preserving record linkage (PPRL) tool to link data from these two agencies, and uses a secure environment to analyze the resulting linked data file. It will inform linkages across the federal government by developing agreements and deploying PPRL as a model to improve the availability, quality, accessibility, and interoperability of data sharing. The project ends in September 2025.
Utilizing Privacy Preserving Record Linkage with Parent Agency Data and Statistical Agency Data to Inform Programs and Policies
Objective: This project explores the development of a data sharing agreement between a federal statistical agency and its parent agency, deploys an open-source privacy-preserving record linkage (PPRL) tool to perform the linkage, and uses a secure environment to analyze the resulting linked data file. This project will inform linkages across the federal government, including within-agency collaborations, by developing agreements and deploying PPRL as a model to improve the availability, quality, accessibility, and interoperability of data sharing. The project ends in September 2025.