Technical Appendix

Industry Data, Methodology, and Terminology

This thematic report uses a variety of data sources, including U.S. data on industry value added from the U.S. Bureau of Economic Analysis (BEA), U.S. employment and occupation data from the U.S. Census Bureau’s American Community Survey (ACS), international industry production data from IHS Markit, trade data from the Organisation for Economic Co-operation and Development (OECD), and various data sources on artificial intelligence (AI) and biotechnology. This appendix provides information on the classification of industries and the main data sources utilized in this report. Internationally comparable data are generally compiled from multiple national sources and are prone to varying issues of quality and reliability that need caution when making international comparisons. In addition, most data used in this report are periodically revised to reflect new information and methodology improvements.

Classification of Industries Based on Research and Development Intensity

This report defines KTI industries using an OECD taxonomy of economic activities based on research and development (R&D) intensity developed by Galindo-Rueda and Verger (2016). The OECD taxonomy clusters industries into five R&D intensity groups—high, medium-high, medium, medium-low, and low—based on a measure of R&D performance intensity computed as the ratio of each industry’s business R&D expenditures to the industry’s value-added output. KTI industries include industries in the high and medium-high R&D intensity groups. The full classification of industries is presented in Table SAKTI-1.

OECD classification of industries, by R&D intensity

(List of industries and percent)

ISIC, Rev.4 = International Standard Industrial Classification, Revision 4; IT = information technology; nec = not elsewhere classified; OECD = Organisation for Economic Co-operation and Development.

Note(s):

R&D intensity is measured as the ratio of global R&D expenditures to global value added output of industry. The global R&D and value added excludes several economies, including China and India, due to incomplete or missing industry value added and R&D data. Industries are classified according to ISIC, Rev.4.

Source(s):

Galindo-Rueda F, Verger F, OECD Taxonomy of Economic Activities Based on R&D Intensity, OECD Science, Technology and Industry Working Papers, 2016/04, OECD Publishing, Paris (2016).

Science and Engineering Indicators

Industries are classified according to the United Nations’ International Standard Industrial Classification of All Economic Activities, Revision 4 (ISIC, Rev.4). This classification delineates the economic activities of industries based on similarities in inputs and factors of production, the process and technology of production, and characteristics and use of outputs. For economic units that engage in several types of independent activities, considerable proportions of the activities of the unit may be included in more than one class of ISIC. If the unit cannot be split into separate statistical units based on these activities, it is classified based on the principal activity, the activity that contributes most to the value added of the unit.

The R&D-intensity measure captures the R&D directly performed by industries but does not capture any R&D embedded in an industry’s purchases of intermediate inputs and capital goods. For any given industry, the R&D intensity is computed as a weighted average of corresponding industry R&D intensities for a core sample of countries using value added in purchasing power parities as weights. Because this taxonomy is based on average R&D intensities across countries, the R&D intensities for industries in individual countries are likely to vary from the average.

The OECD taxonomy is sensitive to the choice of the group of countries. The OECD analysis underlying this classification does not fully capture global production and R&D because of the exclusion of several economies that have incomplete data, including Brazil, India, and China. Because these economies may have large global production shares and R&D intensities that may be substantially different from the average of the core sample of economies included in the OECD analysis, their exclusion may result in differences in classification of R&D intensity.

The R&D intensities are also sensitive to the national accounting conventions used to compile and report the output data. The output data are compiled and reported under the 1993 System of National Accounts (SNA) prior to the capitalization of R&D expenditures with the 2008 SNA. The treatment of R&D expenditures as an investment directly increases the measure of value added and decreases the R&D intensities; the more intensive the industries, the higher the downward impact in the R&D intensities. In a series of robustness checks, the authors estimate that with the 2008 SNA standard, the R&D intensities drop by 20% in the highest group, 6% in the medium-high group, 3% in the medium and medium-low groups, and 1% in the low group. The classification of industries in these groups, however, remains stable.

U.S. Value Added, by Industry and State

The U.S. industry production data come from the Industry and Regional Economic Accounts of BEA, which publishes annual value added by industry statistics for over 130 industries and annual gross output by industry statistics for over 400 industries. These industries are based on the North American Industry Classification System (NAICS)—specifically, the 2012 revision of NAICS. The value-added data discussed in this report are presented in nominal values; the value of production is measured at current market prices, and it is not adjusted for inflation.

For KTI industries, the U.S. data need to be converted from the 2012 NAICS basis to the ISIC Rev.4 basis. The concordance used is presented in Table SAKTI-2. For most industries, this adjustment of value added is straightforward. In some cases, however, the value added of a detailed industry, not available in the published data, needs to be estimated. In this case, value added for detailed industries was estimated from corresponding gross output ratios as follows:

Image SAKTI-1
Value added for the detailed industry equals gross output for the detailed industry over gross output for the aggregate industry times value added for the aggregate industry.

This approach assumes that the ratio of intermediate consumption relative to industry output for the detailed industry is the same as the ratio of the intermediate consumption to the industry output for the aggregate industry.

2012 NAICS to ISIC, Rev.4, concordance for U.S. value added, by industry data

(List of industries)

ISIC, Rev.4 = International Standard Industrial Classification, Revision 4; IT = information technology; NAICS = North American Industry Classification System; nec = not elsewhere classified.

a The value added for these detailed industries has been estimated using gross output ratios. See Technical Appendix for more details.

b Other machinery includes Optical instrument and lens manufacturing and Industrial mold manufacturing industries. In the ISIC, Rev.4, classification, Optical lens manufacturing is under the Computer, electronic, and optical products industry, whereas the Industrial mold manufacturing is under Fabricated metal products, except machinery and equipment. Value added for these detailed industries has been estimated using gross output ratios. See Technical Appendix for more details.

c Motor vehicle body, trailer, and parts manufacturing incudes Motor vehicle metal stamping. In the ISIC, Rev.4, classification, Motor vehicle metal stamping is under Fabricated metal products, except machinery and equipment. Value added for Motor vehicle metal stamping has been estimated using gross output ratios. See Technical Appendix for more details.

d All other transportation equipment manufacturing includes Ship building and repairing and Boat building. These industries have been excluded from the Organisation for Economic Co-operation and Development classification of the medium-high R&D intensive industry Manufacture of railroad, military vehicles, and transport nec (ISIC 302, 304, and 309). Value added for these detailed industries has been estimated using gross output ratios. See Technical Appendix for more details.

Source(s):

This concordance was developed using the Census Bureau 2012 NAICS to ISIC, Rev.4, concordance available at: https://www.census.gov/naics/. More information on 2012 NAICS classification is available at: https://www.census.gov/naics/. More information on ISIC, Rev.4, is available at: https://unstats.un.org/unsd/classifications/Econ/Structure.

Science and Engineering Indicators

Employment in U.S. KTI Industries

This report uses the 2019 ACS 1-year Public Use Microdata Sample (PUMS) to estimate employment in U.S. KTI industries. The PUMS is a subsample of respondents in the ACS and contains modifications to protect respondent confidentiality in the microdata. These data were pulled from the U.S. Census Bureau’s File Transfer Protocol site on 20 December 2020. This report uses information on the respondent’s occupation, industry of employment, sex, race or ethnicity, nativity, and citizenship status.

Occupations are grouped into science, technology, engineering, and mathematics (STEM) and non-STEM based on Census occupational codes. (See Indicators 2022 report “The STEM Labor Force of Today: Scientists, Engineers, and Skilled Technical Workers” for details on how specific Census occupational codes feed into STEM and non-STEM.) STEM workers are further divided into two groups based on educational attainment: STEM workers with a bachelor’s degree or above, and STEM workers without a bachelor’s degree. This latter group is also known as the skilled technical workforce. The current analysis includes workers in all occupations aged 16–75 years old, except for those in military occupations (Census occupational codes 9800–9830) or currently attending grade school.

Industry employment in the PUMS is reported by a modified version of the NAICS. To identify workers employed in KTI industries in the PUMS, the modified NAICS codes are assigned to ISIC industries. Excluding computer, electronic, and optical products and machinery and equipment, all KTI-related ISIC industries directly align with the modified NAICS in the PUMS. For example, the ISIC industry aircraft aligns with the modified NAICS industries 33641M1 and 33641M2 (Table SAKTI-3).

Concordance for KTI employment data

(List of industries and rate of partial employment)

na = not applicable.

ISIC, Rev.4 = International Standard Industrial Classification, Revision 4; IT = information technology; KTI = knowledge and technology intensive; NAICS = North American Industry Classification System; nec = not elsewhere classified; pt = partial assignment.

a The description of NAICS codes in the modified NAICS codes in the Public Use Microdata Sample (PUMS) can be found here: https://www.census.gov/programs-surveys/acs/microdata/documentation.html.    

b Modified NAICS codes are used in the PUMS. Theses codes are generally the 4-digit NAICS code, but some industries combine multiple 3- or 4-digit NAICS codes, and these modified codes are both alpha and numeric.

c The rate of partial employment is calculated from detailed 6-digit NAICS employment estimates from the 2017 Economic Census.            

Source(s):

This concordance was developed using the Census Bureau 2012 NAICS to ISIC, Rev.4, concordance available at: https://www.census.gov/naics/. More information on 2012 NAICS classification is available at: https://www.census.gov/naics/. More information on ISIC, Rev.4, is available at: https://unstats.un.org/unsd/classifications/Econ/Structure.

Science and Engineering Indicators

Two ISIC industries align to multiple modified NAICS industries because some of the modified NAICS industries are too coarse to allow for direct matches.

  • Computer, electronic, and optical products, ISIC industry (C26). This ISIC industry concords with the modified NAICS industries 3341, 3345, 334M1, and 334M2 and with part of 3333. Part of the modified NAICS industry 3333, commercial and service industry machinery manufacturing, is assigned to ISIC industry C26, and another part of it is assigned to C28. (More details about this ISIC industry appear below.) The NAICS 4-digit industry 3333 includes NAICS 6-digit industries 333314 (optical instrument and lens manufacturing), 333316 (photographic and photocopying equipment manufacturing) and 333318 (other commercial and service industry machinery manufacturing). The NAICS industry 333314 is part of ISIC C26, and the other NAICS industries are part of C28. This partial assignment is denoted with a “pt” designation in the modified NAICS code column and the NAICS column in Table SAKTI-3.
  • Machinery and equipment. This ISIC industry concords with modified NAICS industries 331, 3332, 333MS, and 3336 and with parts of NAICS 3333, 3335, and 332MZ. Table SAKTI-3 shows the partial assignments of modified NAICS industries 3333, 3335, and 332MZ.

Employment information from the 2017 Economic Census was used to prorate estimated employment based on the PUMS for modified NAICS industries partially assigned to multiple ISIC industries (i.e., those denoted with “pt” in Table SAKTI-3). For example, the 2017 Economic Census reported that about 20% of employment in NAICS industry 3333 (commercial and service industry machinery manufacturing) was 6-digit NAICS industry 333314 (optical instrument and lens manufacturing). About 80% of NAICS industry 3333 was the other 6-digit NAICS industries (333316 and 333318). Hence, 20% of estimated employment for modified NAICS industry 3333 based on the PUMS is assigned to ISIC industry C26, and 80% to ISIC industry C28. Similar calculations were applied to estimated employment based on PUMS for modified NAICS industries 3335 and 332MZ.

IHS Markit Comparative Industry Service

The international industry production data are drawn from a proprietary database compiled by IHS Markit. The IHS Comparative Industry Service (CIS) Forecast Database provides consistent coverage for over 70 countries and over 100 industrial sectors for several macroeconomic indicators, including the value-added data presented in this report. The primary data sources on industry output are the National Income Accounts from the countries’ national statistical agencies and also the cross-national organizations, including the Industrial Structure Statistics from the OECD Structural Analysis (STAN) database, the International Yearbook of Industrial Statistics from the United Nations Industrial Development Organization (UNIDO), and the National Accounts Statistics from the United Nations System of National Accounts (UNSNA).

IHS Markit compiles the data in the CIS database using a tiered approach, where data from OECD, UNSNA, and UNIDO form the foundation for all sectors and most countries. These are harmonized data that provide consistency and comparability across countries and across time. The OECD STAN database provides data for the OECD member countries. For countries not included in the OECD database, and for industries whose coverage in STAN is not sufficiently detailed, IHS Markit combines the OECD data with the UNSNA and UNIDO databases. The UNSNA contains national accounts data for most countries in the world and facilitates macroeconomic comparisons among national economies at the broad industry level. Data availability, however, varies across countries and fiscal years because not all UN member countries are able to provide a complete set of data. The UNIDO data set provides highly disaggregated data for the manufacturing sector. Both the UN and OECD data are further supplemented by other international sources such as the International Labor Organization and Eurostat and by individual country sources.

For some countries or economies, IHS Markit collects more granular and timely data directly from their national statistical agencies. These countries or economies include the United States (BEA), Brazil (Statistics Brazil), China (China National Bureau of Statistics), and Taiwan (Taiwan Directorate-General of Budget, Accounting and Statistics; National Statistics). Finally, IHS Markit brings the collected data forward in time as needed using data from individual country sources, global trade associations, and other sources to provide timely measures of industry-level business activity.

The value-added data discussed in this report are presented in current dollars. For countries outside the United States, value added is recorded in the local currency and converted at the prevailing nominal market exchange rate. This choice comes with some limitations, particularly when an economy’s currency exchange rates are not market determined.

Trade in Value-Added Database

The trade in value-added indicators come from the 2021 OECD Trade in Value Added (TiVA) database. The OECD TiVA statistics are derived from OECD’s Inter-Country Input-Output (ICIO) tables, which provide a globally balanced view of inter-country, inter-industry flows of intermediate and final goods and services. The ICIO tables are constructed using national annual Supply and Use Tables (SUTs) or, if necessary, benchmark Input-Output Tables (IOTs), harmonized to a common format and common industry list, linked with balanced bilateral trade in goods and services, and constrained under countries’ latest SNA main aggregates and industry output and value-added time series. The SUTs and IOTs are sourced from national statistical agencies and harmonized by OECD. They show production of output by all sectors and the allocation of domestic output among intermediate and final uses, including exports. The linkage of input-output data with trade data captures the bilateral exchanges of intermediate goods and services.

For countries in the TiVA database, the overall trade balance is consistent with official national accounts figures. However, the bilateral trade estimates may differ from those reported by national statistical agencies. By necessity, asymmetries in reported bilateral trade (exports of a product reported by country A to country B may differ from country B’s reported imports from country A) need to be balanced. There are many reasons for asymmetries, including differences across countries in treatment of various aspects of trade such as re-exports and transit trade. The international statistics community continues its work to improve consistency in measuring international trade flows, particularly in services, where there are substantial differences across national statistics.

The most recent update (2021) of the TiVA database covers 66 economies, including the OECD countries, the European Union (EU) and the G20 countries, and several East and Southeast Asian economies and South American countries for the years 1995–2018. The input-output tables from which the TiVA indicators are derived are based on the 2008 SNA. Indicators are available for 45 industries within a hierarchy based on ISIC, Rev.4.

Along with its advantages, trade in value added presents some measurement challenges. The value added of companies with diversified businesses may be assigned to the single industry that accounts for the largest share of the company’s business. A company classified as manufacturing may include services (and vice versa). Disentangling the domestic and foreign content in global value chains is further complicated by fragmentation of production within multinational enterprises and trade in inputs that are further traded as more processed inputs. Quality and availability of data (e.g., SUTs, IOTs, and SNA) in many countries, and inconsistencies in national statistics within countries, are major challenges when constructing global IOTs.

PitchBook Venture Capital Data

The venture capital data shown in the report are from PitchBook Data, Inc., a private-sector financial services company that collects financial and business data on the Web and provides subscription-based data (https://pitchbook.com/). PitchBook classifies companies by industry and industry vertical: an industry is a “broad group of companies that operate in the same general space,” whereas “an industry vertical is more specific and describes a group of companies that focus on a shared niche or specialized market spanning multiple industries” (PitchBook 2021). AI is defined as an industry vertical, and biotechnology is defined as an industry. Companies are classified in one primary industry but can also be classified in secondary industries. Industry verticals are listed in Table SINV-100 in the Indicators 2022 report “Invention, Knowledge Transfer, and Innovation”—where the data are used more extensively than in this report.

Within PitchBook’s proprietary industry classification system, biotechnology is an industry within the health care sector and the pharmaceuticals and biotechnology industry group. It is the broad area of biology, involving living systems and organisms (e.g., bacteria, mammalian cells, T cells) to develop or make products. Companies in this category are researching and using biological systems to develop new drugs and therapies for medical patients. Pharmaceutical companies, which are within the same industry group, differ from biotechnology companies in that these companies are primarily involved in manufacturing and distributing drugs, generally from chemical and synthetic processes. Drug discovery is another industry within this industry group, and these companies research and develop new drugs. Many of the companies classified in the pharmaceutical and drug discovery primary industries are also classified in the biotechnology secondary industry. Therefore, this analysis includes companies that are listed in the biotechnology primary industry and secondary industry.

AI is defined for PitchBook investment reporting as companies developing technologies that enable computers to autonomously learn, deduce, and act through utilization of large data sets. The technology enables development of systems that collect and store massive amounts of data and analyze that content to make decisions based on probability and statistical analysis. Applications for AI and machine learning include speech recognition, computer vision, robotic control, and accelerating processes in the empirical sciences where large data sets are essential, such as gene sequencing in life sciences (PitchBook/NVCA 2019).

For the biotechnology industry and AI industry vertical, the analysis includes all completed venture capital deals at all stages (i.e., pre/accelerator/incubator, angel, seed, early stage venture capital, later-stage venture capital, other stages), all rounds, and all series (i.e., seed, series A–series D, and later series). All investor types are included—except government and sovereign wealth fund because these are not considered private investors.

AI and Biotechnology Patents

AI and biotechnology patent data are from OECD Science, Technology and Patents (2022), United States Patent and Trademark Office (USPTO) (2022), and 2022 Indicators report “Invention, Knowledge Transfer, and Innovation.”

OECD collects data on patent families for the five largest patent offices in the world (IP5 patent families)—European Patent Office, Japan Patent Office, Korean Intellectual Property Office, State Intellectual Property Office of the People’s Republic of China, and USPTO. IP5 patent families refer to patents that have been filed in at least two IP offices worldwide, one of which is among the IP5. This avoids double counting of patents filed in multiple jurisdictions. (For more details, see Indicators 2020 report “Invention, Knowledge Transfer, and Innovation.”) This report uses OECD’s definitions of AI-related patents (OECD 2017) and biotechnology patents (OECD 2021).

This report also presents data on patents granted by USPTO. Data on AI patents granted by USPTO are reported in Toole et al. (2020). USPTO patents on biotechnology are reported in Table SINV-59 in the Indicators 2022 report “Invention, Knowledge Transfer, and Innovation.” These data use the World Intellectual Property Organization definition of biotechnology (Schmoch 2008), which differs from the OECD definition.

Emsi Burning Glass Job Postings Data

The proprietary Emsi Burning Glass (Emsi) data set includes over 30,000 skills from millions of online sources, including job posting websites and resumes. Emsi retrieves information from over 100,000 websites, including company career sites, national and local job boards, and job posting aggregators. More information on Emsi’s methodology for job postings can be found on Emsi’s website (https://kb.emsidata.com/methodology/job-posting-analytics-documentation/).

For this analysis, skills related to biotechnology and AI were entered into the Emsi searchable database to extract job postings that featured one of the highlighted skills. The specific skills used for both biotechnology and AI will be highlighted in the methodology below.

It is important to note that the number of job postings do not reflect the actual number of hires. Job postings, regardless of industry, might not yield an actual person being hired. It is also possible that one job posting may be used to fill several vacancies at a company. Further, some websites may post duplicate postings of a job. To reduce duplicate postings for this analysis, Emsi data were filtered to remove postings that originated from staffing companies. This thematic report uses analysis data from October 2016 to September 2021.

Using the Emsi Application Programming Interface, the top 25 skills were extracted for both AI and biotechnology. These skills groupings are based on a review of job postings collected by Emsi, and they only include skills categorized by Emsi as a hard skill. More information on Emsi’s skill clustering methodology can be found on Emsi’s website (https://skills.emsidata.com/faqs#how-are-skills-selected).

The top 25 AI-related skills identified by Emsi are algorithms, Apache Spark, artificial neural networks, big data, blockchain, computer vision, data science, deep learning, distribute computing, Internet of Things, machine learning, machine learning algorithms, mathematical modeling, natural language processing, Pandas (Python package), predictive analytics, predictive modeling, R (programming language), robotic process automation, Scala (programming language), speech recognition, statistical modeling, TensorFlow, time series, and unstructured data.

The top 25 biotechnology-related skills identified by Emsi are biopharmaceuticals, biostatistics, case report forms, clinical pharmacy, clinical research, clinical study design, clinical trial management systems, clinical trials, drug development, drug discovery, electronic data capture, good clinical practices, International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use Guidelines, key opinion leader development, life sciences, medical affairs, medical devices, medical guidelines, nondisclosure agreements (intellectual property law), pharmaceuticals, pharmacovigilance, pre-clinical development, regulatory filings, scientific literature, and Title 21 of the Code of Federal Regulations.

References

Galindo-Rueda F, Verger F. 2016. OECD Taxonomy of Economic Activities Based on R&D Intensity. OECD Science, Technology and Industry Working Papers No. 2016/04. Paris: OECD Publishing. Available at https://doi.org/10.1787/5jlv73sqqp8r-en. Accessed 18 April 2019.

Organisation for Economic Co-operation and Development (OECD). 2017. OECD Science, Technology and Industry Scoreboard 2017: The Digital Transformation. Paris: OECD Publishing. Available at https://www.oecd.org/sti/oecd-science-technology-and-industry-scoreboard-20725345.htm. Accessed 11 November 2022.

Organisation for Economic Co-operation and Development (OECD). 2021. Methodological Work and Publications—Patents in Selected Fields: Biotechnology. Available at https://www.oecd.org/sti/inno/intellectual-property-statistics-and-analysis.htm#method. Accessed 18 November 2021.

Organisation for Economic Co-operation and Development, World Trade Organization (OECD/WTO). N.d. Trade in Value-Added: Concepts, Methodologies, and Challenges. Joint OECD-WTO Note. Available at https://www.oecd.org/sti/ind/49894138.pdf. Accessed 27 June 2019.

PitchBook, National Venture Capital Association (PitchBook/NVCA). 2019. Venture Monitor 4Q 2018. Seattle, WA: PitchBook Data, Inc. Available at https://pitchbook.com/news/reports/4q-2018-pitchbook-nvca-venture-monitor. Accessed 14 March 2019.

PitchBook. 2021. “What Are Industry Verticals?” Available at https://pitchbook.com/what-are-industry-verticals. Accessed 29 September 2021.

Schmoch U. 2008. Concept of a Technology Classification for Country Comparisons: Final Report to the World Intellectual Property Organisation (WIPO). Available at https://www.wipo.int/export/sites/www/ipstats/en/statistics/patents/pdf/wipo_ipc_technology.pdf. Accessed 18 November 2021.

Toole A, Pairolero N, Giczy A, Forman J, Pulliam C, Such M, Chaki K, Orange D, Homescu A, Frumkin J, Chen Y, Gonzales V, Hannon, C, Melnick S, Nilsson E, Rifkin B. 2020. Inventing AI: Tracing the Diffusion of Artificial Intelligence with U.S. Patents. Alexandria, VA: U.S. Patent and Trademark Office. Available at https://www.uspto.gov/about-us/news-updates/new-benchmark-uspto-study-finds-artificial-intelligence-us-patents-rose-more. Accessed 15 October 2021.

Notes