Warning: file_put_contents(/opt/frankenphp/design.onmedianet.com/storage/proxy/cache/c26589e68a0dded217eafb5bb443f59e.html): Failed to open stream: No space left on device in /opt/frankenphp/design.onmedianet.com/app/src/Arsae/CacheManager.php on line 36

Warning: http_response_code(): Cannot set response code - headers already sent (output started at /opt/frankenphp/design.onmedianet.com/app/src/Arsae/CacheManager.php:36) in /opt/frankenphp/design.onmedianet.com/app/src/Models/Response.php on line 17

Warning: Cannot modify header information - headers already sent by (output started at /opt/frankenphp/design.onmedianet.com/app/src/Arsae/CacheManager.php:36) in /opt/frankenphp/design.onmedianet.com/app/src/Models/Response.php on line 20
Profiling the ToxCast Library With a Pluripotent Human (H9) Stem Cell Line-Based Biomarker Assay for Developmental Toxicity - PMC Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Oct 20.
Published in final edited form as: Toxicol Sci. 2020 Apr 1;174(2):189–209. doi: 10.1093/toxsci/kfaa014

Profiling the ToxCast Library With a Pluripotent Human (H9) Stem Cell Line-Based Biomarker Assay for Developmental Toxicity

Todd J Zurlinden *, Katerine S Saili *, Nathaniel Rush *, Parth Kothiya *, Richard S Judson *, Keith A Houck *, E Sidney Hunter , Nancy C Baker , Jessica A Palmer §, Russell S Thomas *, Thomas B Knudsen *,1
PMCID: PMC8527599  NIHMSID: NIHMS1648031  PMID: 32073639

Abstract

The Stemina devTOX quickPredict platform is a human pluripotent stem cell-based assay that predicts the developmental toxicity potential based on changes in cellular metabolism following chemical exposure [Palmer, J. A., Smith, A. M., Egnash, L. A., Conard, K. R., West, P. R., Burrier, R. E., Donley, E. L. R., and Kirchner, F. R. (2013). Establishment and assessment of a new human embryonic stem cell-based biomarker assay for developmental toxicity screening. Birth Defects Res. B Dev. Reprod. Toxicol. 98, 343–363]. Using this assay, we screened 1065 ToxCast phase I and II chemicals in single-concentration or concentration-response for the targeted biomarker (ratio of ornithine to cystine secreted or consumed from the media). The dataset from the Stemina (STM) assay is annotated in the ToxCast portfolio as STM. Major findings from the analysis of ToxCast_STM dataset include (1) 19% of 1065 chemicals yielded a prediction of developmental toxicity, (2) assay performance reached 79%–82% accuracy with high specificity (> 84%) but modest sensitivity (< 67%) when compared with in vivo animal models of human prenatal developmental toxicity, (3) sensitivity improved as more stringent weights of evidence requirements were applied to the animal studies, and (4) statistical analysis of the most potent chemical hits on specific biochemical targets in ToxCast revealed positive and negative associations with the STM response, providing insights into the mechanistic underpinnings of the targeted endpoint and its biological domain. The results of this study will be useful to improving our ability to predict in vivo developmental toxicants based on in vitro data and in silico models.

Keywords: predictive toxicology, developmental toxicity, embryonic stem cells


In 2007, the National Research Council published Toxicity Testing in the 21st Century: A Vision and a Strategy (National Research Council, 2007). This report addressed the potential for automated high-throughput screening (HTS) and high-content screening (HCS) assays and technologies to identify chemically induced biological activity in human cells and to develop predictive models of in vivo biological response that would ignite a shift from traditional animal endpoint-based testing to human pathway-based risk assessment (Collins et al., 2008). Concurrent with the NRC 2007 report, the U.S. Environmental Protection Agency (USEPA) launched the ToxCast research program that utilized statistical methods and machine learning algorithms in combination with HTS/HCS data for profiling biological pathways and building bioactivity signatures predictive of toxicity (Judson et al., 2010, 2016; Kavlock et al., 2012; Richard et al., 2016). An abundance of HTS/HCS data has since fueled the building and testing of integrative models for “encoding the toxicological blueprint of active substances that interact with living systems” (Juberg et al., 2017; Sturla et al., 2014).

Impetus for the research and application of HTS/HCS assays is bolstered by the regulatory need to fill information gaps on potential hazards that chemicals might pose to human health and the environment and to identify and implement appropriate health-protective risk management measures under the Registration, Evaluation, and Authorization of Chemicals (REACh) (European Parliament, Council of the European Union, 2006) and The Frank R. Lautenberg Chemical Safety for the 21st Century Act (amended Toxic Substances Control Act) in the United States (US Public Law 114–182, 2016). Under amended Toxic Substances Control Act, for example, the USEPA must encourage and facilitate “… the use of scientifically valid test methods and strategies that reduce or replace the use of vertebrate animals while providing information of equivalent or better scientific quality and relevance that will support regulatory decisions …” and consider the impacts of chemicals and chemical mixtures to “… potentially exposed or susceptible subpopulation … who, due to either greater susceptibility or greater exposure, may be at greater risk than the general population of adverse health effects from exposure to a chemical substance or mixture, such as infants, children, pregnant women, workers, or the elderly.” (US Public Law 114–182, 2016). REACh regulation cites identification of derived no effect levels “for each relevant human population (eg, workers, consumers and humans liable to exposure indirectly via the environment) and possibly for certain vulnerable sub-populations (eg, children, pregnant women) …” and the need to “… to replace, reduce or refine testing on vertebrate animals” (European Parliament, Council of the European Union, 2006). These regulations highlight the need for in vitro assays and in silico models that can be used to evaluate the developmental toxicity potential of chemicals in screening and prioritization contexts, with less reliance on animal testing.

The in vivo protocol commonly used to test for prenatal developmental toxicity (ie, OECD TG 414) is designed for a health-protective effects assessment based on observation of fetal malformations and variations in a study designed to produce a dose-response. The in vivo developmental studies are costly, animal resource intensive, and potentially different in cross-species responses (Knudsen and Daston, 2018; Leist et al., 2014). As such, HTS/HCS-based methodologies should consider novel in vitro data and in silico models that can effectively and efficaciously profile chemicals for critical effects on human development and as well point to mechanistic pathways. Some of the most promising nonanimal alternatives exploit the self-organizing potential of embryonic stem cells (ESCs) to recapitulate developmental processes that may be sensitive to chemical exposure (Bremer and Hartung, 2004; Luz and Tokar, 2018). Endpoints that provide mechanistic support for tissue-specific developmental processes include cardiomyocyte differentiation (Chandler et al., 2011; Genschow et al., 2002; Seiler and Spielmann, 2011), gene expression (Panzica-Kelly et al., 2013; Pennings et al., 2011), metabolic profiling (Kleinstreuer et al., 2011; Palmer et al., 2013; West et al., 2010), regulatory, gene-specific biomarkers (Kameoka et al., 2014; Le Coz et al., 2015), stem cell migration (Xing et al., 2015), axial patterning (Warkus and Marikawa, 2017), and histodifferentiation in 3D organoids (Huch and Koo, 2015). For example, the validated mEST (Genschow et al., 2002) monitors emergence of beating cardiomyocytes from pluripotent murine ESCs as the targeted read-out (in parallel with cytotoxicity) to discriminate nonteratogens from weak teratogens and strong teratogens. Because the cardiopoietic lineage is ultimately dependent on heterogeneous interactions with other cell lineages in the culture, either via “embryoid bodies” (Seiler and Spielmann, 2011) or dense monolayers (Chandler et al., 2011), the cardiogenic read-out is an effective surrogate for complex pathways in teratogenicity. These examples show the diversity of alternative test modalities amenable to ESC-based methodologies for developmental hazard prediction in embryogeny.

Assays currently represented in the ToxCast portfolio evaluate hundreds of biochemical targets, dozens of signaling pathways, and a broad range of cellular effects. To increase the diversity of HTS assays used to predict developmental toxicants, we describe the addition of a human stem cell-based platform to the ToxCast portfolio based on the devTOX quickPredict (devTOXqP) platform (Palmer et al., 2013). This assay, contracted from Stemina Biomarker Discovery, utilizes undifferentiated H9 human embryonic stem cells (hESCs) and measures relative changes in 2 metabolites, ornithine (ORN) and cystine (CYSS), targeting the ORN/CYSS ratio as a biomarker for developmental toxicity (Palmer et al., 2013, 2017). Ornithine is a nonproteogenic amino acid that functions in several biochemical pathways including ammonia detoxification in the urea cycle, pyrimidine synthesis via ornithine transcarbamy-lase, and polyamine synthesis via ornithine decarboxylase. Ornithine is initially absent from the medium but released from viable cells; as such, decreased cellular release reflects general metabolic states for these pathways. Cystine is initially present in the medium and used by cells in glutathione production; as such, the change connected to decreased CYSS uptake likely reflects a change in cellular glutathione synthesis and redox balance. Ultra-performance liquid chromatography–high-resolution mass spectrometry (UPLC-HRMS) measures these metabolites in the conditioned medium of H9 hESCs maintained in a pluripotent state during a 3-day chemical exposure.

Although it is not known if bidirectional changes in ORN and CYSS are coupled directly to the same pathways or indirectly to cellular metabolic state, an imbalance dropping the ORN/CYSS ratio below a critical level (eg, < 0.88) has positive predictive value (PPV) for a chemical’s potential to invoke teratogenicity (Palmer et al., 2013). The teratogenicity potential, as defined by Palmer et al. (2013) for pluripotent H9 hESCs and more recently the potential for developmental toxicity in induced human pluripotent stem cells (iPSCs) (Palmer et al., 2017), is a concentration-based comparison between the ORN/CYSS ratio relative to cell viability established using a 23-pharmaceutical compound training set (Palmer et al., 2013). Predictive performance has been shown by the assay provider for 80 diverse chemical compounds at 85% accuracy (0.89 specificity and 0.82 sensitivity) based on observed developmental toxicity in rodents or humans (Zhu et al., 2016).

Here, we provide a comprehensive description and analysis of the ToxCast_STM platform. Results are shown for 1065 chemicals from the phase I and II ToxCast library (1065 unique structures) (Richard et al., 2016). We describe the ToxCast assay annotation, hereafter referred simply as “STM,” with regards to (1) data processing through ToxCast pipeline (tcpl) R package version 2.0.1(Filer et al., 2017), (2) quality control metrics and performance ratings in predictivity across published benchmark compounds of developmental toxicity (Augustine-Rauch et al., 2016; Daston et al., 2014; Genschow et al., 2002; West et al., 2010; Wise, 2016), (3) a broader evaluation of STM assay performance when anchored to prenatal developmental toxicity endpoints collated in the ToxRefDB database for pregnant rat and rabbit studies (Knudsen et al., 2009; Martin et al., 2009; Watford et al., 2019), and (4) an initial analysis of sensitive and insensitive pathways in the assay relative to 440 biochemical features in the ToxCast NovaScreen (NVS) dataset (Knudsen et al., 2011; Sipes et al., 2013).

MATERIALS AND METHODS

ToxCast chemical library.

EPA’s ToxCast chemical library has been constructed iteratively using criteria including chemical nomination and procurement, dimethyl sulfoxide (DMSO) solubility, and suitability for testing in automated or semiautomated systems. Drivers for procurement included availability of animal toxicity data, mechanistic knowledge to support model development predicting toxicity, and chemicals of heightened regulatory concern for which data are lacking. The ToxCast chemical inventory file is available at the following link: (ftp://newftp.epa.gov/comptox/Sustainable_Chemistry_Data/Chemistry_Dashboard/2018/September/,last accessed February 6, 2020). For a detailed description of the library, see Richard et al. (2016).

The present evaluation addresses the phase I and II ToxCast library that include pesticides accompanied by guideline animal studies (OCSPP 870 series, and some NTP studies that are guide-linelike), data-poor industrial chemicals, and over a hundred pharmaceutical compounds. This list has 1065 unique structures and 13 duplicates for a total of 1078 samples tested here. Chemical compounds were commercially procured, diluted in DMSO to a stock concentration of up to 100mM (approximately 30% of the chemicals were provided at concentrations lower than 100mM), and plated by Evotec (US), Inc (Watertown, Massachusetts). Aliquots from the stock plates were first diluted with 100% DMSO (Sigma-Aldrich, St Louis, Missouri) to a concentration 1000 times the highest test concentration (HTC) (if necessary) and then diluted 1:1000 in the cell culture media for testing. The final concentration of 0.1% DMSO was a major determinant of the HTC because DMSO itself decreases the ORN:CYSS ratio at concentrations in hESCs above 0.2% (Palmer, unpublished data) and adversely impacts mESCs at concentrations > 0.25% (Adler et al., 2006). We coded stock plates to blind the assay provider to chemical identity. For 7 chemical samples, neat compound was procured from Evotec to enable retesting at concentrations above the range achievable by stock dilutions.

Pluripotent H9 hESC culture.

H9 cells (NIH code WA09, WiCell Research Institute, Inc, Madison, Wisconsin) were used as approved for federally funded research and selected because of their commercial availability, genetic stability (normal female karyotype), and scientific legacy (hundreds of publications). Derivation and characterization of the H9 cell line was originally reported by Thomson et al. (1998). Cells were handled as described (Palmer et al., 2013). Briefly, cells were maintained under feeder-free conditions with mTeSR1 media (StemCell Technologies, Vancouver, Canada) on Matrigel hESC-Qualified Matrix (Corning, Bedford, Massachusetts) coated 6-well plates. Cultures were incubated at 37°C in a humidified atmosphere of <5% CO2. Differentiated colonies were removed daily through aspiration to maintain the undifferentiated stem cell population. Differentiation was based on visual inspection; there is typically < 5% differentiation in a culture, and only highly pure undifferentiated H9 cell populations were used for these experiments. Cultures were passaged using Versene (Life Technologies, Grand Island, New York) or ReLeSR (StemCell Technologies) at 85%–90% confluency, karyotyped approximately every 10 passages, and the absence of mycoplasma was routinely confirmed with the MycoAlert Mycoplasma Detection Kit (Lonza, Rockland, Maine).

All treatments were carried out in Matrigel-coated 96-well plates. H9 cells were plated with a seeding density of 100000 cells per well in mTeSR1 medium containing 10μM Y27632 Rho-associated kinase inhibitor (ATCC, Manassas, Virginia) to increase plating efficiency. Y27632 was removed prior to compound addition at 24h after plating. The passage number of H9 cells used over the course of this study ranged from 31 to 48; anything above passage 40 was karyotyped within 10 passages prior to use in the assay.

Chemical exposure.

H9 hESCs maintained in a pluripotent state were exposed to test compound for 72h with chemical replenishment with media replacement every 24h. Cell-conditioned media from the final 24-h treatment period was collected for analysis of the targeted biomarker, and cell viability of the corresponding cell layer was assessed as described below. The chemical library was tested in blinded fashion. Plate design and sample workflow is summarized in Figure 1 for the single-concentration screen (no cell viability measures, n=3) and/or 8-point concentration-response series (with cell viability measure, n=3). Each test plate included 1μM Methotrexate (MTX; Selleck Chemicals, Houston, Texas) as a positive reference, 5 nM MTX as a negative reference, 0.1% DMSO as the neutral (vehicle) control, and sample-level media blanks.

Figure 1.

Figure 1.

Workflow for the ToxCast STM dataset. “Samples” indicates chemicals tested in triplicate from stock plates; “Compounds” indicates chemicals entered into the dataset. The total number of samples (1373) reflects all duplicate measures and single-concentration screens that collapse to 1065 chemical records. Plate design is mapped for single-concentration (upper) and concentration-response (lower) series; individual plate-level controls (included negative control [5nM Methotrexate, green wells], positive control [1μM Methotrexate, red wells], and neutral control [dimethyl sulfoxide; DMSO, gray wells]). “Records” refers to individual chemical entries into the ToxCast data pipeline (tcpl) at level 0 from which the virtual plate diagram is reconstructed for QA purposes. Records from chemicals tested in the concentration-response series were processed in tcpl to level 6 and entered into invitrodb (Filer et al., 2017); records from chemicals tested at a single-concentration were entered directly to invitrodb from tcpl level 0 and pipelined to level 2. All subsequent data analysis was performed from the processed data at level 6 for concentration-response and level 2 for single-concentration-response.

Dosing strategy for the protocol initial screen was guided by the “cytotoxicity point” previously reported across 38 different ToxCast assays (Judson et al., 2016). We initially selected 141 chemicals for concentration-response testing; the remaining 924 chemical samples were tested in one concentration. Of those, 252 were retested in concentration-response to confirm a response or adjust the concentration range. The HTC for the initial screen was, for most samples, set to 1, 10, or 100 μM so as not to exceed the chemical-specific median AC50 cytotoxicity point based on Z-score as defined by Judson et al. (2016) unless otherwise limited by compound availability. Other considerations for setting the HTC in concentration-response evaluation included outcome of the single-concentration screen, relevant data from ToxRefDB prenatal developmental toxicity animal studies, and reference compounds used by other studies on alternative test platforms for developmental toxicity (AugustineRauch et al., 2016; Daston et al., 2014; Genschow et al., 2002; West et al., 2010). In all, 379 chemicals were tested in concentration-response and the remaining 686 were negatives at the initial screen. The latter has 14 chemicals remaining where the HTC was an order of magnitude below the ToxCast lower bounds of the median cytotoxicity burst (LBC) (see Supplementary Table S1 for details).

Biosample processing.

H9 cell-conditioned media from the final 24-h treatment period was collected for analysis of the targeted biomarker and cell viability was measured from the corresponding cell layer. Cell viability was measured using the CellTiter-Fluor assay (Promega, Madison, Wisconsin) based on proteolytic cleavage of a substrate to fluorescent signal proportional to the number of living cells (Niles et al., 2007). The cell viability Relative Fluorescence Unit (RFU) was background corrected and normalized to mean RFU of the neutral control (0.1% DMSO). The collected H9-cell-conditioned media samples were processed for targeted biomarker analysis as described (Palmer et al., 2013). Briefly, spent media samples were deproteinized (40% acetonitrile) and processed for UPLC-HRMS. Data acquisition was performed using 4 separate UPLC-HRMS systems, consisting of an Agilent 1290 Infinity LC system (Agilent Technologies) interfaced with an Agilent high-resolution mass spectrometer (models G6520A, G6520B, G6530A, and G6224A). A Waters Acquity UPLC BEH Amide column (2.1mm × 50mm, 1.7lm particle size; Waters, Milford, Massachusetts) maintained at 40C was applied for separation of metabolites using a 6.5min solvent gradient with 0.1% formic acid in water and 0.1% formic acid in acetonitrile (1.0ml/min flow rate). Data were acquired using MassHunter Acquisition software (version B 04.00, Agilent Technologies). The extracted ion chromatogram (EIC) areas for ORN, CYSS, and their respective spike-in C13-standards were determined using the Agilent MassHunter Quantitative Analysis software, version B.05.00 or newer (Agilent Technologies), and data were normalized as described in Palmer et al. (2017).

ToxCast annotations.

All raw data and metadata were loaded into a central database for ToxCast using standard nomenclature to pipeline the data as outlined by Filer et al. (2017) into invitrodb (v3 pending release, March 2020) under for the Stemina DevToxqP assay source identifier (asid) for the ToxCast platform, designated STM_H9 (asid 14); specifically, the 2 assay identifiers (aid) STM_H9_secretome (aid 428) and STM_H9_viability (aid 437)representing data measures for the conditioned media and cell monolayer, respectively. This included identifiers for tracking each chemical sample (spid), plate identifier (apid), well position (row, column), micromolar concentration tested (dose), well quality (0=fail, 1=pass), and well type (wllt). The well type identifiers included media blank “b” lacking H9 cells, neutral control “n” 0.1% DMSO, test compound “t,” negative control “m” 0.005μM MTX, and positive control “p” 1.0μM MTX. Invitrodb_v3 stores raw data values (rval) for the following assay components (acid): peak area for C13-cystine (acid 1023) and C13-ornithine (acid 1024) tracers, peak area for measured cystine (acid 1025) and cystine standardized to the C13 tracer (acid 1026), cystine normalized to the plate median value of the neutral controls (acid 1027), peak area for measured ornithine (acid 1028) and ornithine standardized to the C13 tracer (acid 1029), ornithine normalized to the plate median value of the neutral controls (acid 1030), the ORN:CYSS ratio calculated from DMSO-normalized values (acid 1031), the targeted biomarker prediction (acid 1032, which is an empty placeholder), background-corrected RFU from the CellTiter-Fluor assay (acid 1113), and cell viability normalized to mean RFU of the DMSO control (acid 1114). Virtual plate maps reconstructed from the ORN/CYSS ratio (Figure 1) were visually inspected to confirm consistency in each well before entering the data into invitrodb_v3.

The corresponding assay endpoint identifiers (aeids) analyzed in the Results presented here address the 4 main assay component features for the predictive model: the decrease in media ornithine reflecting reduced cellular release (STM_H9_ornithineISnorm_perc_dn, aeid 1689), the increase in media cystine reflecting less utilization (STM_H9_cystineISnorm_perc_up, aeid 1682), the ORN/CYSS ratio reflecting a decrease in the ORN:CYSS ratio as the primary biomarker (STM_H9_OrnCyssISnorm_ratio_dn, aeid 1693), and normalized cell viability (STM_H9_Viability_norm, aeid 1858). The processed STM dataset described and analyzed here comprised 79398 individual data points across 1065 unique chemical structures.

ToxCast Data Pipeline (tcpl).

Individual data points from the 379-chemical concentration-response series were processed through the ToxCast Data Pipeline (tcpl) (Filer et al., 2017). Level 0 of tcpl is the entry point for rval from the 4 key assay component features. All individual data points for the same chemical were concatenated at this level and processed through 6 levels of data processing. Level 1 flagged the DMSO-normalized samples for plate position effects, any missing replicate(s), or samples with poor well quality. The latter was found for individual wells across 30 chemicals where the highest concentration caused significant loss of cell viability and no measurable ORN. Generally, the few invalid samples reflected extreme cytotoxicity at higher concentrations. All are included in the concentration-response assessment, and a teratogenic index could still be computed if lower concentrations remained in the noise belt. If not, then we retested the samples at a lower concentration range and all of the data would be included in the tcpl profile. For plotting purposes, we assigned a minimal recorded ORN value (0.001) that was well below the limits of detection of the metabolite (0.01). This was applied only to the 30 data points having poor well quality. Level 2 transformed the individual replicate data points into an appropriate “response unit” computed on a log2 scale. Level 3, typically a normalization step in tcpl, applied no additional normalization but instead inverted the log2 data to graph response profiles in a manner consistent with other ToxCast assay platforms, ie, responses increase from a baseline of zero. At this level, tcpl calculated a baseline median absolute deviation (bmad) of the response variable using only the lowest 2 test compound concentrations from all test wells, concentrations where we expect no activity for the vast majority of chemicals. Note that the 731 chemicals with STM data calls based solely on single-concentration screen data (eg, that had not been tested to date in a definitive concentration series) are pipelined to level 2, with single-concentration hit calls summarized based on a global threshold of 3*bmad across all tested compounds, but not plate-level controls. For the multiconcentration series, the same 3*bmad was implemented as a noise belt distribution in level 4 that calculated the parameters for automated curve-fitting models. Three curve-fitting models were applied (Constant, Hill, and Gain-Loss) and the winning model, selected by Akaike Information Criterion (AIC), progressed to level 5 for graphing. Level 6 then applied warning flags for curve-fitting issues or data quality concerns such as local plate effects, single point hits, and noisy data. Chemicals with a measured response on the ORN/CYSS ratio greater than 3*bmad were classified as “active” at the concentrations tested here. Because “inactive” results (hitc = 0) are not stored in the database, these were replaced by an arbitrary value of 106 μM post-tcpl for computing purposes, consistent with other assays in the ToxCast database.

Developmental toxicity (in vivo) correlation analysis.

To assess STM model performance against in vivo developmental toxicity, we identified an anchoring set of 42 benchmark compounds from the ToxCast phase I/II inventory. An initial list was compiled having overlap with previous literature studies that aimed to evaluate alternative methods for developmental toxicity (Augustine-Rauch et al., 2016; Daston et al., 2014; Genschow et al., 2002; West et al., 2010; Wise, 2016). Note that 18 of the 42 compounds were in the training set and 2 of the compounds were in the test set from the original devTOX publication (Palmer et al., 2013). From this list, we selected those chemicals having traditional (pre-2015) US FDA (Food and Drug Administration) categories for potential risk to the developing fetus if pregnant women are exposed (or in a few cases, well-defined developmental toxicants from the National Toxicology Program). A final set of positives (n=26) and negatives (n=16) was developed using evidence from teratology cohorts tested under EPA Health Effects Test Guidelines OPPTS 870.3700 (https://ntp.niehs.nih.gov/testing/types/devrepro/index.html, last accessed February 6, 2020) (now OCSPP 870.3700 even though it is has not been corrected on regulations.gov). STM model performance was computed from 2 × 2 binary classification tables of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) conditions (Powers, 2011). Overall (regular) accuracy (Rand ACC) was computed from the TP rates: sensitivity = TP/(TP + FN), specificity = TN/(TN + FP), and Rand ACC = (TP + TN)/(TP + FP + FN + TN). Balanced accuracy (BAC) was computed as the average of PPV = TP/(TP + FP) and negative predictive value (NPV) = TN/(TN + FN); BAC = (PPV + NPV)/ 2. Model performance was assessed with Matthews correlation coefficient (Matthews, 1975). Because the testing protocol was independent of the original training by Palmer et al. (2013), the subset of 20 compounds overlapping in the 42 benchmark set that were used to develop the model (Palmer et al., 2013) are not used here for training chemicals; rather, they confer confidence in the testing strategy because the contractor was blinded to their identity.

To evaluate a broader correlation of STM models to in vivo animal studies, we used prenatal developmental toxicity study type (870.3700) downloaded from the ToxRefDB v1.0 data release (https://www.epa.gov/chemical-research/toxcast-toxrefdb-datarelease, October 16, 2015, accessed July 25, 2017). This included chemical-endpoint-specific dosage for the Lowest Effect Level (LEL) in mg/kg/day if the endpoint was tested (or assumed to have been tested) and a treatment-related effect was observed (Knudsen et al., 2009). Lowest Effect Level values were culled from the “Endpoint_summary” download file. The “study-endpoint-category” calls reflected developmental (dLEL) and maternal (mLEL) endpoints for each available study, else “NULL” meaning no LEL was observed at the highest dosage tested (assigned an LEL of 106 mg/kg/day for computing purposes). For the 1049 rat and rabbit studies, 924 ToxRefDB study records (approximately 400 chemicals) were oral route, 60 records (16 chemicals) inhalation, 48 records (16 chemicals) direct, 42 records (16 chemicals) dermal, and 4 records (2 chemicals) “other.” “Direct administration” in ToxRefDB refers to routes of exposure other than oral, dermal, or inhalation (eg, intraperitoneal, intramuscular, and intravenous). Bioavailability might be questionable for the dermal or inhalation study records with a negative outcome for developmental toxicity in ToxRefDB. This situation applied to 35 of 1049 records, of which 30 records correlated with a negative response (TN) in the STM response and only 5 records correlated with a positive response (FP). There may be overlap wherein some chemicals may have records on alternate routes of exposure. The list of chemicals is provided in Supplementary Table S2. For the few inhalation studies having ppm exposure unit, we used a 1:1 mass conversion rather than assuming a particular breathing rate to get at this volume. In all, 3496 records were obtained for dLEL, mLEL. or NULL entries. These records collapsed into 791 outcomes (424 rat, 331 rabbit, 33 mouse, 2 hamster, and 1 dog). Given the preponderance of rat and rabbit studies, we focused only outcomes from those 2 species to build the endpoint classifier model (400 rat OR rabbit, 389 rat, 323 rabbit, and 203 rat AND rabbit). This removed outcomes reported in mouse/hamster/dog from consideration, even if they may have been reported in rat/rabbit as well.

We first assigned levels of evidence for developmental toxicity for each individual ToxRefDB record as follows: “clear evidence” (dLEL ≤ 200mg/kg/day; dLEL < mLEL), “some evidence” (dLEL ≤ 200mg/kg/day; dLEL ≤ mLEL), “equivocal evidence” (dLEL < 103 mg/kg/day; dLEL > mLEL), and “no evidence” demonstrated by data from a study with appropriate experimental design but no developmental effects observed (eg, no dLEL or dLEL ≥ 1000mg/kg/day) (see Supplementary Table S2). This cutoff was arbitrary, selected from convenience to match the arbitrary in vitro cutoff of 200μM for the Teratogenicity Index (TI) reflecting the critical change in the ORN/CYSS biomarker. Replicate studies conducted with the same chemical compound and species were collapsed into a single dLEL or mLEL value with the evidence hierarchy: clear > no > some > equivocal. Essentially, “equivocal evidence” indicates that a dLEL was determined for the compound, in a rat or rabbit study, but there is not enough evidence to attribute the observed fetal endpoint to a specific developmental effect as opposed to a maternal effect. It is important that “evidence” not be equated to “importance.” A maternally mediated adverse developmental outcome is important, but perhaps falls outside the biological domain of the assay. The final number of compounds for in vivo anchoring was 401 from ToxRefDB and completed to 432 by adding information from the 11-overlapping compounds in the 42 benchmark compounds identified in the literature (described above) but not compiled in ToxRefDB.

To produce a call TP, FP, FN, and TN for each chemical tested, we looked for rat-rabbit concordance where available. This was done by discordance (OR = hit in either species constituted a positive) and concordance (AND = same calls from both species). Stringency models built onto the BM-42 calls as follows. Base model: defines any chemical with a dLEL as positive in either species, else negative, and the STM data accepts as positive any dTP < 1000; n=432. Low stringency model: defines a positive as CLEAR or SOME evidence, else negative (EQUIVOCAL, NO); the STM-positive data are filtered at an arbitrary cutoff of TI <= 200 μM; n=432. Medium stringency: defines a positive when either species (rat OR rabbit) shows CLEAR evidence, else negative; n=285. High stringency: defines positive as a concordant response (rat AND rabbit) showing CLEAR or NO evidence in both cases; n=127. Note that the BM-42 set was defined as CLEAR and NO; there are 11 ToxRefDB among the 42 tested in rat of which 4 were also tested in rabbit.

Biochemical (in vitro) correlation analysis.

To evaluate the biological perturbations underlying the targeted biomarker in the STM assay, we utilized the HTS results from the ToxCast NVS cell-free biochemical-assay platform (Knudsen et al., 2011; Sipes et al., 2013). The information for this analysis is provided in Supplementary Table S3. AC50 (half-maximal activity concentration) values for NVS aeids were selected from invitrodb_v3 hit-call matrix (August 10, 2018 release, accessed September 18, 2018) (https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data). Note that the public download has AC50s for chemical-assay relationships that are flagged as inactive because they do not meet the assay-specific efficacy threshold for bioactivity; the hit-call file converts them to inactive and inserts log10=3 for all inactive AC50s. The NVS dataset had AC50 values for activation or inhibition across 337 biochemical assays for substrate-product enzymatic activity or ligand-receptor binding, corresponding to 420 total features, due to the possibility of enzymatic activation/inhibition or nuclear receptor (NR) agonist/antagonist binding activity. This produced 2918 hit calls in the 420-feature × 1065 chemical matrix expressed as log10 micromolar AC50.

The NVS AC50 matrix was further consolidated to a “gene-score” feature matrix using chemical-specific Z-scores from the global ToxCast cytotoxicity burst (Judson et al., 2016) and the official gene symbol for the human protein function that is annotated in each particular ToxCast NVS assay. This filtered-out any chemical-assay pair having an AC50 value within 3 standard deviations of the cytotoxicity burst (Leung et al., 2016). The resulting gene-score feature matrix, G(chemical, gene_dir), comprised the mean log10(AC50) for each homologous gene symbol and was assigned an “up” or “down” extension to account for directionality (dir) in terms of enzymatic activator/inhibitor or receptor ligand binding activities. Finally, this matrix was transformed to micromolar concentration potency terms using the following equation:

GPS=10log10G,

where G is the logAC50 gene-score matrix and GPS is the transformed gene-potency score matrix (1062 chemicals × 267 NVS gene features) in the analysis. Binary calls from the ToxCast STM dataset (1=active STM response, 0=inactive STM response) were then used as the categorical endpoint to fit a multiple logistic regression model (Scikit-learn, v. 0.18.1; Python, v.2.7.13):

logit(p)=w0+i=1Ngenes wiGPSi

where wi represents the log-odds coefficient term for each gene. These weights can be positive or negative to represent how strongly the gene association correlated to an active or inactive STM response, respectively.

To characterize the phenotypic importance for each gene in the dataset, we used the Human-Mouse: Disease Connection (HMDC) database (http://www.informatics.jax.org/humanDiseases.shtml, accessed July 31, 2018). Although rat is the preferred species in tiered approach for developmental toxicity testing, performing evaluations in the rat and mouse or mouse and rabbit detected 80% of the 105 teratogens examined in the veterinary pharmaceutical products study by Hurtt et al. (2003). As such, the HMDC database provides a reasonable surrogate for modeling rat-human disease connections. The phenotypic weighting was conducted independently of the logistic regression described above and serves as a way to determine the biological relevance of each gene present in the dataset. HMDC integrates data on phenotype and disease model data from the MGI Mouse Phenotype Ontology (MP) with human gene-to-disease relationships from the National Center for Biotechnology Information, Online Mendelian Inheritance in Man (OMIM), and the Human Phenotype Ontology. We mapped all NVS gene symbols to the 28 available HDMC phenotype systems (extracted from the HDMC database) and assigned a bin score (b) based on evidence weighting levels from the number of HMDC entries (n) for any (n=1, b=1), some (n=2–6, b=2), or many (n>6, b=3) annotations, or no HMDC data (n=0, b=0). We then calculated a gene-specific MP weight by summing the bin score across all 28 MP categories for the 233 HMDC gene symbols that could be linked to a NVS feature. This list was standardized from 0 to the maximum value and weighted each NVS target gene by relevance to a human-mouse disease system. It should be noted that each phenotype system has subordinates and not all of which may directly apply to developmental toxicity (the deeper level of ontology was not considered here). Also note that a few NVS gene entries lacked an HMDC record: the normalized MP weighting schema does not filter out those genes but scales them at the low end of the evidence bar for a phenotypic system. The top and bottom 40 from the NVS gene targets list linked to the STM-positive and STM-negative responses, respectively, were annotated separately using the Functional Annotation Tool from the DAVID Bioinformatics Resources 6.8, NIAID/NIH (https://david.ncifcrf.gov/). Categories were selected for a minimum of 4 genes and maximum 30 genes from GO Direct (molecular function, cellular compartment, and biological process), OMIM, KEGG, BioCarta, Reactome, and INTERPRO at a Bonferroni-adjusted p-value ≤ .05 per category. Nothing from OMIM or BioCarta passed. All other redundancies were resolved manually by the lowest false discovery rate value (as these have lesser uncertainties across the specific annotation) or in some cases the most informative annotation record. The resulting annotation records were visualized in a Spearman correlation matrix based on summed weights of represented gene targets from the logistic regression model, colored by STM on one axis and developmental toxicity on the other.

RESULTS

The STM dataset currently holds concentration-response profiles for 379 chemicals processed through the tcpl pipeline, and single-concentration screens for another 686 chemicals that were inactive in the initial screen (Figure 1). All data files are available forftp:// download at: https://doi.org/10.23645/epacomptox.11819265 and the processed data files from tcpl level 2 and level 6 have been submitted to Dryad (http://datadryad.org/,doi.org/10.5061/dryad.gqnk98shm). Results described here will focus on the 2 key output parameters, namely the targeted biomarker (ORN/CYSS ratio) and cell viability. Beyond the present study, some of the single-concentration samples will be retested in concentration-response and the information added to the ToxCast database for public access. Less than 10% of the chemicals were tested < 10μM, and we might anticipate some activity in those tested higher.

Determination of STM Positivity

The bmad determined in tcpl is calculated as the median absolute deviation of the response variable from the lowest 2 test compound concentrations in the assay between tcpl levels 3 and 4 (Filer et al., 2017). Because of the way tcpl computes the noise belt, a point of departure represents the state of the platform in ToxCast. This threshold is dynamic and could change as more and more compounds are tested in the future. As such, the ORN/CYSS ratio = 0.76 represents the biomarker’s point of departure threshold (eg, teratogenicity Index, TI) in the current state of 1065 cases and is not a recommendation for general applicability. Figure 2 plots the distribution of tcpl data at level 3 for plate-level neutral controls (n=1176) and Methotrexate (MTX) reference for positive (1μM, n=589) and negative (5nM, n=590) response; and at level 4 for the lowest 2 test concentrations of ToxCast samples (n=2169). A threshold drawn at 3*bmad for each feature distinguished all MTX-positive and MTX-negative plate references. CYSS values showed wider variability than ORN values due to a minor change in the culture medium formulation that increased CYSS utilization over the course of the contracting period. Aside from a few outliers that were not removed when calculating bmad, the lowest 2 concentrations from all ToxCast test wells mimicked the neutral control and MTX-negative reference groups.

Figure 2.

Figure 2.

Boxplots of log2-fold induction for STM response data (75% box/95% whiskers/outlier points). Tcpl response data and Mann-Whitney rank sum test comparison of these distributions versus plate-level neutral (DMSO) controls (n=1158), MTX-negative (n=581), and MTX-positive (n=580, p<.05) references; and tcpl baseline median absolute deviation (bmad) (n=2069). 3*bmad for each feature (dashed line) correctly classified all MTX-positive reference values (a few outliers elicited a response outside 3*bmad at 1 or both of the 2 lowest concentrations tested).

Feature plots for each chemical tested in concentration-response (n=379) are given in Supplementary Figures S1aS1f. As an example, Figure 3 displays results for Methotrexate as part of the ToxCast library. The gray hatched zone indicates the global noise belt computed as a 3*bmad for each main feature. Due to the way tcpl calculates bmad, the threshold ORN/CYSS ratio (0.88) originally described (Palmer et al., 2013) fell within the noise belt. Consistent with other ToxCast assays, we used a statistical threshold for positivity that reflects the test concentration at which the activity departs from the noise belt (acb, activity concentration at baseline). This is indicated by the vertical yellow line in Figure 3 and for the ORN/CYSS ratio gives a TI that reflects the concentration threshold predicted for human developmental toxicity. Because several outliers widened the global noise belt this, in turn, resulted in a right-shift in the TI relative to the default ORN/CYSS model from the original assay description (Palmer et al., 2013). For example, 3*bmad (log2) was 0.403, rendering the acb (log2−0.403) = 0.756 versus 0.88 in the default model. For normalized cell viability (percent control), the 3*bmad = 0.161 rendering the acb (log2−0.161) = 0.894. This threshold corresponds to a global decrease of 11% cell viability, which is taken as a positive effect for this feature but may or may not reflect overt cytotoxicity. Although minor effects on cell viability could account for changes in cellular ORN release and/or CYSS uptake measured in the ORN/CYSS ratio, altered cell growth and/or survival are potential modes of action in developmental toxicity. As such, compounds that impact the ORN/CYSS biomarker due to minimal effects on cell viability should not be discounted because of it. The blinded ToxCast Methotrexate sample registered a TI = 0.059μM and 11% loss of cell viability at 0.062μM, consistent with the MTX plate references.

Figure 3.

Figure 3.

Sample level 6 tcpl outputs on each feature in the ToxCast_STM dataset, exemplified by methotrexate. Automated curve-fits from DMSO-normalized triplicate measures modeled by 3 objective functions: constant (CNST), Hill (HILL), and gain-loss (GNLS). HILL parameters are value and standard deviation for the top asymptote (tp), log10(μM AC50) in the gain direction (ga) and Hill coefficient (gw) in the gain direction. GNLS is the product of 2 Hill models with a shared tp, log10(μM) AC50s in gain (ga) and loss (la) directions, and Hill coefficients in the gain (gw) and loss (la) directions. The gray striped box denotes the feature noise belt computed from 0±3 bmad for the lowest 2 concentrations across all samples in the feature set. Model summary values include AIC, probability (P), and root mean square error (RMSE). The lowest AIC is selected as the winning model (red font), and the vertical yellow line denotes point of departure of activity (activity concentration at baseline, acb) from the noise belt (horizontal gray line); vertical blue and red lines depict the AC50’s for the specific model plots. See Filer et al. (2017) for details on HIT-CALL (0=inactive, 1=active), fit-category (FITC, 01 to 68+), activity probability (ACTP), and efficacy cutoff (COFF). The y-axis indicates the log2-fold change in (A) ORN levels, (B) CYSS levels, (C) ORN/CYSS ratio, and (D) cell viability versus neutral (DMSO) control value.

To examine the impact of the right-shift in the tcpl-derived critical concentration for positivity, we next plotted the difference in hit calls between the original value (0.88) (Palmer et al., 2013). We binned each chemical by measured maximum response into a distance from bmad. For example, chemicals in the first bin had a maximal response within 1 bmad, chemicals in the second bin had a maximal response within 2 bmads, and so forth. Quantitatively, the tcpl model shifted active calls to a slightly higher concentration versus the published default that fell within the noise belt. This shift, however, had little impact resulting in the loss of only 2 chemicals from the active list (Maneb and 2-methoxy-5-nitroaniline) that displayed marginal efficacy.

Profiling the ToxCast Phase I/II Inventory

Level 2 and level 6 ToxCast STM data are contained in Supplementary Table S1. Because of the way ToxCast assays were contracted, multiwell stock plates (100μM where possible) permitted testing most chemicals at HTC = 100μM (due to potential interference with DMSO). Variations in the HTC was in some cases limited to < 100μM or < 10μM based on a low “median cytotoxicity burst” across dozen cytotoxicity and cell stress assays in the ToxCast portfolio (Judson et al., 2016). On the other hand, tcpl’s automated curve-fitting extrapolated TI above the HTC of 100μM in several cases. Across the 1065 chemical samples tested, we observed a critical decrease in the ORN/CYSS ratio for 208; however, if we consider the response positive only if TI < 200μM as reasonable cutoff, then this condition was met by 205 (19%) of the ToxCast phase I/II library. Concentration-response profiles are shown for ORN, CYSS, the ORN/CYSS ratio, and cell viability plots in Supplementary Figures S1aS1f. We retested all compounds with activity in the single-concentration screen in concentration-response aside from tetracycline and 3,3’-dimethylbenzidine, which produced a weak effect on the ORN/CYSS ratio at 100μM but were included among the positives.

Concentration-dependent drops in the ORN/CYSS ratio were driven by ORN alone for 15 chemicals (7.3% of the positives, including thalidomide), by CYSS alone for 36 chemicals (17.6% of the positives), and both metabolites in 147 cases (71.7% of the positives). A few chemicals (n=7) produced an overall effect on the ORN/CYSS ratio without scoring a hit on either metabolite alone due to noisy data flagged for individual metabolites. The reduction in CYSS utilization seemed to drive the ORN/CYSS response for most of the positive chemicals, minor variations were observed for a few compounds (see Supplementary Figures S1a and S1d). All-trans-Retinoic acid, for example, increased ORN production (an effect shown for retinoids as a class through inhibition of ornithine decarboxylase [Palmer et al., 2017]), but the active response was driven by a stronger reduction in CYSS utilization, indicating that different mechanisms may lead to changes in the ORN/CYSS ratio. For more complex concentration-response behavior, octyl gallate produced a transient spikein CYSS utilization (acb = 0.39μM) prior to concentrations that decreased CYSS utilization (acb = 1.83μM) (Supplementary Figs. S1af). The nonmonotonic changes in CYSS may foreshadow cytotoxicity but perhaps not as a general phenomenon.

The general relationship between TI (Supplementary Figure S1e) and H9 cell viability (Supplementary Figure S1f) is shown graphically for the 205 positives confirmed in the concentration-response series (Figure 4). Although cell viability is measured, it is not included in the prediction by the assay, which is based solely on the ORN/CYSS ratio response. Cell viability is provided as an additional endpoint to aid in the interpretation of the data. The 11% decline in viable H9 cell number (point of departure from the noise belt) could be due to impaired growth, reduced viability, or both. For example, 2 well-recognized human teratogens, Thalidomide (Figure 4, No. 8) and Methotrexate (Figure 4, No. 154), invoke a biomarker response at concentrations differently accompanied by changes in viable cell number. It should be noted that the highest concentration tested here approached within an order of magnitude or exceeded the lower bounds of the ToxCast cytotoxicity point (LBC) for 204 of 205 STM-positive compounds (99.5%) and 843 of 883 STM-negative compounds (95.3%). Approximately one-third of the STM-positive compounds triggered a critical drop in the ORN/CYSS ratio without altering cell viability at the concentrations tested here (sector A in Figure 4 and Table 1). This is exemplified by all-trans-retinoic acid and thalidomide (TI = 0.003 and 1.27μM, respectively). In the remaining two-thirds of the STM-positive compounds, an effect on the ORN/CYSS ratio cannot be generally dissociated from 11% drop in cell viability. Methotrexate and cytarabine, for example, dropped the ORN/ CYSS ratio (TI = 0.059 and 0.054μM, respectively) as well as cell viability (acb = 0.062 and 0.082μM, respectively).

Figure 4.

Figure 4.

Stratification of 205 STM-positive chemicals (TI < 200μM) by teratogen index (TI), cell viability (CV), and lower bounds of the ToxCast cytotoxicity burst (LBC). Plots are −log10(acb) micromolar concentration for each chemical indicated by TI (blue line), 11% reduction in H9 cell viability (red line), and LBC (gray stippled line). (NOTE: color image available in the online version). Samples are ranked low (No. 1) to high (No. 205) (see Table 1 for chemical key), first by effect on the biomarker relative to no effect on H9 cell viability (sector A), then by biomarker potency (low to high) relative to an effect on H9 cell viability (sector B), and finally where the effect on H9 cell viability was more potent than the biomarker (sector C). The radial scale is −log10 μM from no effect (−log10 = −6, center) to potent effect (−log10 = +6 at the periphery). The stippled gray circle marks the 200μM cutoff for activity (inward designated inactive) on a −log10 scale.

Table 1.

Key for Positive Chemicals Plotted in Figure 4

Sector A Sector B Sector C

1* all-trans-Retinoic acid 73* Valproic acid 160 Phenanthrene
2 PharmaGSID_47333 74* Hydroxyurea 161 Endosulfan
3 Mirex 75 Fenarimol 162 Clodinafop-propargyl
4 Spiroxamine 76 Tetracycline 163* Indomethacin
5 SAR150640 77 3,3’-Dimethylbenzidine 164 Corticosterone
6 Aplaviroc hydrochloride 78 Propetamphos 165 Fipronil
7 3’-Azido-3’-deoxythymidine 79 Linuron 166 Hexaconazole
8* Thalidomide 80 2,4-Dinitrotoluene 167 Myclobutanil
9 7,12-Dimethylbenz(a)anthracene 81 Fluthiacet-methyl 168 Flumioxazin
10* Carbamazepine 82 Tebuconazole 169 Malathion
11 Tridemorph 83 4-Nonylphenol, branched 170 Ethofumesate
12* Rifampicin 84 Diisobutyl phthalate 171 Norethindrone
13 Darbufelone mesylate 85 Cyhalofop-butyl 172 Diniconazole
14 Chlorpropham 86 Endrin 173 2,3-Diaminotoluene
15 Nitrofurazone 87 Thidiazuron 174 Pioglitazone hydrochloride
16 Carbaryl 88 Diazinon 175* Dexamethasone sodium phosphate
17 AVE8488 89 4-(2-Methylbutan-2-yl)phenol 176 Oxytetracycline hydrochloride
18 GW473178E methyl benzene sulfonic acid 90 Nitrofen 177 Fenamidone
19 Elzasonan 91 Flumetralin 178 Zoxamide
20 PharmaGSID_47259 92 Fluridone 179 Testosterone propionate
21* Amiodarone hydrochloride 93 o,p’-DDT 180 SSR240612
22 Volinanserin 94 N,N’-
Methylenebisacrylamide
181 Bifenazate
23 Besonprodil 95 Methyl methanesulfonate 182 3,4-Diaminotoluene
24 Dihexyl phthalate 96 4-Chlorobenzotrichloride 183 Fluazinam
25 Carbendazim 97 Fluazifop-P-butyl 184 1,2-Phenylenediamine
26 Tri-allate 98 4,4’-Oxydianiline 185 PD 0343701
27* Lovastatin 99 Triticonazole 186 Thiram
28 SAR102608 100 Propiconazole 187 2,4-Diaminotoluene
29 Prometon 101 Disulfoton 188 Sodium dimethyldithiocarbamate
30 Cycloate 102 Triclosan 189 Azathioprine
31 Dipentyl phthalate 103 Flutamide 190 SR125047
32 Ametryn 104 Genistein 191 Nitrofurantoin
33 N,N-Dimethyldecylamine oxide 105 Flusilazole 192 Milbemectin (mixture of 70% Milbemcin A4, 30% Milbemycin A3)
34 Pirinixic acid 106 Benzyl butyl phthalate 193* 5-Fluorouracil
35 PharmaGSID_48507 107 4-Vinyl-1-cyclohexene dioxide 194 Octyl gallate
36 Isazofos 108 Dibutyl phthalate 195 PharmaGSID_48505
37 Atrazine 109 Fludioxonil 196 Propargite
38 Tricresyl phosphate 110 1,3-Dinitrobenzene 197 N-Phenyl-1,4-benzenediamine
39 Diallyl phthalate 111 Pyrimethamine 198 Pyraclostrobin
40 Diethanolamine 112 5HPP-33 199 PharmaGSID_48519
41 Di(2-ethylhexyl) phthalate 113 Ketoconazole 200 Mercuric chloride
42 Triadimenol 114 4-Methylaniline 201 Gentian Violet
43 Nitrilotriacetic acid 115 Propazine 202 Disulfiram
44 2,4,7,9-Tetramethyl-5-decyne-4,7-diol 116 Fenaminosulf 203 Tebufenpyrad
45* Stavudine 117 N,N,N-Trimethyl(oxiran-2-yl)methanaminium chloride 204 PharmaGSID_48511
46 Isopropyl triethanolamine titanate 118 2-(Thiocyanomethylthio) benzothiazole 205 PharmaGSID_48166
47 Triadimefon 119 Methylene bis(thiocyanate)
48 Clomazone 120 Diquat dibromide monohydrate
49 Cyclanilide 121 Napropamide
50 Boscalid 122 SSR 241586 HCl
51 17α-Hydroxyprogesterone 123 Difenzoquat metilsulfate
52 Esfenvalerate 124 Octhilinone
53 Cymoxanil 125 CP-728663
54 Fluometuron 126* Busulfan
55 Flumiclorac-pentyl 127 UK-337312
56 2-tert-Butyl-5-methylphenol 128 SB236057A
57 Procymidone 129* Diphenhydramine hydrochloride
58 Coumaphos 130 Benomyl
59 Tributyl phosphate 131 Fluoxastrobin
60 2,4-Dinitrophenol 132 Clomiphene citrate (1:1)
61 Etridiazole 133 Sodium (2-pyridylthio)-N-oxide
62 Norflurazon 134 MK-274
63 Tralkoxydim 135 SR271425
64 Acibenzolar-S-methyl 136 Dodecyltrimethylammonium chloride
65 Diuron 137 SAR115740
66 Cyproconazole 138 Famoxadone
67 Dinoseb 139 Chlorpromazine hydrochloride
68 N-Nitrosodiphenylamine 140 Difenoconazole
69 Paclobutrazol 141 Captafol
70 1,3-Propane sultone 142 Didecyldimethylammonium chloride
71 Carminic acid 143 CI-959
72* MEHP 144 Cycloheximide
145 Cladribine
146 6-Thioguanine
147 Prometryn
148 Tributyltin chloride
149 Tributyltin methacrylate
150 Phenylmercuric acetate
151 Triphenyltin hydroxide
152 Triglycidyl isocyanurate
153* Cytarabine hydrochloride
154* Methotrexate
155 Rotenone
156 Pyridaben
157 TNP-470
158 Colchicine
159 Fenpyroximate (Z, E)
*

Asterisk indicates the chemical is a positive in the BM-42 set.

Benchmarking STM Assay Performance for Teratogenicity

To evaluate ToxCast STM assay performance, we compared the in vitro classification with a set of 42 benchmark compounds that have been used by others to evaluate developmental toxicity alternatives (Augustine-Rauch et al., 2016; Daston et al., 2014; Genschow et al., 2002; West et al., 2010; Wise, 2016) (Table 2). Most of these compounds have information from FDA labels on potential risk to the developing fetus if pregnant women are exposed, ranging from safe for use during pregnancy (category A) to contraindicated during pregnancy (category X). Although the FDA labels are no longer used (since 2015), they have been used in evaluating alternatives to animal testing for developmental toxicity and, as such the category descriptors are given for convenience in parentheses as follows. (FDA category A where adequate and well-controlled studies have failed to demonstrate a risk to the fetus in the first trimester of pregnancy [and there is no evidence of risk in later trimesters]; category B, either animal-reproduction studies have not demonstrated a fetal risk but there are no controlled studies in pregnant women, or animal-reproduction studies have shown an adverse effect [other than a decrease in fertility] that was not confirmed in controlled studies in women in the first trimester [and there is no evidence of a risk in later trimesters]; category C where animal-reproduction studies have shown an adverse effect on the fetus and there are no adequate and well-controlled studies in humans, but potential benefits may warrant use of the drug in pregnant women despite potential risks; category D where there is positive evidence of human fetal risk based on adverse reaction data from investigational or marketing experience or studies in humans, but potential benefits may warrant use of the drug in pregnant women despite potential risks; and category X where studies in animals or human beings have demonstrated fetal abnormalities, or there is evidence of fetal risk based on human experience, or both, and the risk of the use of the drug in pregnant women clearly outweighs any possible benefit. The drug is contraindicated in women who are or may become pregnant.) The ToxCast STM assay based on defining a positive using the tcpl threshold 0.76 performed with an accuracy of Rand ACC = 78.6% (sensitivity 0.65, specificity 1.00, n=42) and BAC = 82.0% (PPV 1.00, NPV 0.64, n=42). This was consistent with the pharmaceutical-trained model from the assay provider (77% accuracy, 0.57 sensitivity, 1.00 specificity, n=23) (Palmer et al., 2013). The targeted biomarker outperformed 11% reduction in H9 cell viability as a predictor of teratogenicity (54% accuracy, 0.28 sensitivity, 0.94 specificity, n=42).

Table 2.

ToxCast STM Performance Anchored to 42 Benchmark Compounds

CASRN Chemical HTC (μM) CVa (μM) TIb (μM) Preg. Classc STM Classd

302-79-4 all-trans-Retinoic acid 10 NA 0.003 X TP
69-74-9 Cytarabine hydrochloride 1 0.083 0.054 D TP
59-05-2 Methotrexate 1 0.062 0.059 X TP
147-24-0 Diphenhydramine hydrochloride 100 3.76 0.588 B TP
50-35-1 Thalidomide 100 NA 1.27 X TP
51-21-8 5-Fluorouracil 100 1.45 2.02 D TP
298-46-4 Carbamazepine 100 NA 2.29 C TP
55-98-1 Busulfan 100 4.91 2.31 D TP
13292-46-1 Rifampicin 10 NA 2.46 C TP
19774-82-4 Amiodarone hydrochloride 10 NA 5.10 D TP
75330-75-5 Lovastatin 20 NA 6.67 X TP
3056-17-5 Stavudine 100 NA 32.5 C TP
2392-39-4 Dexamethasone sodium phosphate 100 21.8 37.7 C TP
53-86-1 Indomethacin 100 44.1 72.7 D TP
127-07-1 Hydroxyurea 1000 237 74.9 D TP
99-66-1 Valproic acid 1000 271 155 D TP
4376-20-9 MEHP 500 NA 167 D TP
57-41-0 5,5-Diphenylhydantoin 100 NA NA D FN
51-52-5 6-Propyl-2-thiouracil 100 NA NA D FN
10043-35-3 Boric acid 40.7 NA NA NTP FN
4449-51-8 Cyclopamine 10 NA NA D FN
6055-19-2 Cyclophosphamide monohydrate 20 NA* NA D FN
56-53-1 Diethylstilbestrol 10 NA NA X FN
107-21-1 Ethylene glycol 100000 NA NA NTP FN
57-30-7 Phenobarbital sodium 100 NA* NA D FN
81-81-2 Warfarin 100 NA NA X FN
69-72-7 Salicylic acid 10000 1795 513 C TN
103-90-2 Acetaminophen 100 NA* NA B TN
79-06-1 Acrylamide 36 NA NA NTP TN
50-78-2 Aspirin 100 NA* NA C TN
80-05-7 Bisphenol A 100 39.4 NA NTP TN
94-26-8 Butylparaben 100 NA NA GRAS TN
58-08-2 Caffeine 500 NA NA B TN
464-49-3 D-Camphor 20 NA NA C TN
131-11-3 Dimethyl phthalate 100 NA NA NTP TN
59-30-3 Folic acid 100 NA NA A TN
54-85-3 Isoniazid 8.8 NA* NA C TN
57-55-6 1,2-Propylene glycol 1 000 000 327,552 246,664 NTP TN
68-26-8 Retinol 10 NA NA A TN
81-07-2 Saccharin 100 NA NA A TN
134-03-2 Sodium L-ascorbate 20 NA* NA A TN
599-79-1 Sulfasalazine 100 NA* NA B TN
TP rate (sensitivity) 0.28 0.65
TN rate (specificity) 0.94 1.00
Balanced Accuracy e 53.7% 82.0%

Abbreviations: CV, cell viability; FN, false negative; FP, false positive; HTC, highest tested concentration; TI, teratogenicty index; TN, true negative; TP, true positive.

a

Point of departure (acb) at 11% reduced cell number; asterisk (*) inferred inactivity from single-concentration screen.

b

TI positivity set for a STM response; NA indicates no activity detected at the highest concentration tested.

c

Anchors labeled by FDA pregnancy risk (categories A, B, C, D, and X); generally regarded as safe (GRAS); NTP class based on evidence from teratology cohort study in a rat, rabbit, or mouse.

d

Consensus across published studies (Augustine-Rauch et al., 2016; Daston et al., 2014; Genschow et al., 2002; West et al., 2010; Wise, 2016) for teratogens and nonteratogens.

e

Contingency analysis accepts TI ≤ 20μM as a STM-positive (predicted), else STM-negative.

Ten compounds in ToxCast phase I/II are found among 28 compounds on the Daston List (DL) for exposure-based comparisons of developmental toxicity with regards to Cmax for maternal plasma toxicokinetic studies at a dosage producing, or not producing, teratogenicity/embryolethality (Daston et al., 2014). We evaluated concordance of the STM model for these 10 compounds across 7 negative and 7 positive exposure-based dosimetry calls from the Daston et al. (2014) study. In 7 cases, the highest concentration exceeded what could be achieved by diluting from 100mM stock plates at the final DMSO concentration. Those were tested from neat compound in solid or solution form. Results are shown in Table 3. Teratogenicity index correctly called 6 of 7 DL-negatives and 5 of 7 DL-positives, again yielding 78.6% concordance. The 3 misses were 1,2-propylene glycol (FP), caffeine (FN), and ethylene glycol (FN). Evaluation of 1,2-propylene glycol yielded a TI of 246664μM although the result is not considered reliable given the exceedingly high (1M) test concentration required to achieve 850000μM for assessing DL-negativity (Daston et al., 2014). Daston List concordance improves to 84.6% if this FP call is discarded. Caffeine is negative in animal studies when consideration is given to appropriate control data (Wise, 2016). Model performance further improves to 91.7% if this FN call is disregarded. No explanation can be offered for missing ethylene glycol aside from the lack of in vitro bioactivation; however, the ToxRefDB result for this chemical is equivocal with regards to prenatal developmental toxicity in rats and rabbits. As such, the exposure-based DL model performed the same (78.6%) or better (> 84.6%) than the 42-benchmark compound model.

Table 3.

ToxCast STM Performance Anchored to 10 DL Chemicals

CASRN Chemical HTC (μM) CV (μM) TI (μM) DL(−) (μM) DL(+) (μM) STM Responsea

302-79-4 all-trans-Retinoic acid 10 NA 0.0030 0.0017 0.03 DL(−), DL(+)
57-55-6 1,2-Propylene glycol 1000 000 327 552 246 664 850 000 DL(−)*
94-26-8 Butylparaben 100 NA NA 110 DL(−)
58-08-2 Caffeine 500 NA NA 7.7 325 DL(−)*
107-21-1 Ethylene glycol 100000 NA NA 1400 57 000 DL(−)
127-07-1 Hydroxyurea 1000 237.4 74.9 350 DL(+)
4376-20-9 MEHP 500 NA 166.6 1 146 DL(−), DL(+)
81-07-2 Saccharin 100 NA NA 24 DL(−)
69-72-7 Salicylic acid 10000 1795 513.4 3000 DL(+)
99-66-1 Valproic acid 1000 271.4 155.0 800 DL(+)

Abbreviations: CV, cell viability; DL(−) exposure-based negative from the Daston List (Daston et al., 2014); DL(+) exposure-based positive from the Daston List (Daston et al., 2014); HTC, highest tested concentration; TI, teratogen index.

a

Concordance: TI correctly called 6 of 7 DL-negatives and 5 of 7 DL-positives yielding 78.6% accuracy; accuracy improves to 84.6% if the hit* is discarded due to the exceptionally high concentration required for 1,2-propylene glycol, and to 91.7% if the miss* is discarded due to ambiguities associated with Caffeine in animal studies.

Evaluating STM Assay Performance Against ToxRefDB Prenatal Studies

To evaluate a broader concordance of the ToxCast STM assay to in vivo animal studies, we culled the data from endpoint summary files in ToxRefDB prenatal studies that provide no effect level and LEL (Knudsen et al., 2009) and manually collapsed these data into evidence-based calls for developmental toxicity in rat and/or rabbit studies (see Materials and Methods for details). Each study attempts to achieve a maternal (mLEL) and fetal (dLEL) dosage for developmental parameters observed at term. Note here that some endpoint categories (eg, placenta, resorptions, and postimplantation loss) do not strictly map to dLEL outcomes and are thus not reflected in this analysis. Note that information from Table 2 was carried across the performance models, although only 11 of 42 benchmark compounds had corresponding ToxRefDB data. Table 4 summarizes the results from 2 × 2 contingency models, and Supplementary Table S2 provides the data and models used for this analysis. In all, we identified 432 chemicals with STM data and endpoint effects data.

Table 4.

ToxCast STM Performance Anchored to ToxRefDB Prenatal Developmental Toxicity Studiesa

Conditionb Stringency Filter Applied to DevTox Anchor
Basec,d Lowc,e Mediumc,f Highc,g BM-42c

TP 98 68 44 20 17
FP 24 52 38 16 0
FN 204 119 42 10 9
TN 106 193 161 81 16
n 432 432 285 127 42
Sensitivity 0.325 0.364 0.512 0.667 0.654
Specificity 0.815 0.788 0.809 0.835 1.000
PPV 0.803 0.567 0.537 0.556 1.000
NPV 0.342 0.619 0.793 0.890 0.640
Rand ACC (%) 47.2 60.4 71.9 79.5 78.6
BAC (%) 57.3 60.4 66.5 72.3 82.0
MCC 0.143 0.173 0.325 0.473 0.647

Abbreviations: BAC, balanced accuracy; MCC, Matthews correlation coefficient; NPV, negative predictive value; PPV, positive predictive value; Rand ACC, regular accuracy.

a

Data from ToxRefDB (v1) endpoint summary file (Supplementary Table S2).

b

Predicted condition (in vitro) against true condition (in vivo).

c

Benchmark set of 42 reference compounds from Table 2, carried across the subsequent models (11 occur in ToxRefDB).

d

Base model anchored STM positivity (TI < 1000μM) to any dLEL whether recorded in a rat or rabbit study.

e

Low stringency model anchored STM positivity (≤ 200μM) to dLEL ≤ mLEL; dLEL ≤ 200mg/kg/day) in either species (rat or rabbit).

f

Medium stringency anchored STM positivity to clear evidence (dLEL < mLEL, dLEL ≤ 200mg/kg/day) in one species (rat or rabbit).

g

High stringency model anchored STM positivity to a concordant response (dLEL < mLEL, dLEL ≤ 200mg/kg/day) in both species (rat AND rabbit) on top of the 42 compound benchmark set.

Fundamentally, ToxCast STM performed weakly against an unfiltered compilation of developmental toxicity studies that included any dLEL call in a rat or rabbit (Rand ACC = 47.2%, 0.33 sensitivity, 0.82 specificity, n=432). Concordance to the in vivo classification increased when the criteria for evidence of developmental toxicity became more stringent. Rand ACC increased to 60.4% (n=432), 71.9% (n=285), and 79.5% (n=127) for low, medium, and high stringency anchors, respectively (Table 4). The latter sets a true condition as dLEL < mLEL and dLEL ≤ 200mg/kg/day, and no developmental toxicity as ≥ 1000mg/kg/ day concordantly in rat and rabbit studies. Therefore, ToxCast STM accuracy reaches 78.6% (Rand ACC) to 82.0% (BAC) when there is high confidence in the call for developmental toxicity.

Biochemical Stratification of the STM Response

We next sought to correlate the STM-positive and STM-negative responses to biochemical profiles from the ToxCast NVS dataset. This dataset was selected because it reflects potential molecular initiating events (MIEs) in chemical-target interaction, and within the different STM-positive and STM-negative compounds there are compounds with known MIEs that could serve as indicators for how well the correlation is picking up MIEs. A binary classification outcome model of the ORN/CYSS ratio was built with a logistic regression strategy using assay-specific AC50s from 420 NVS features. Figure 5 shows the workflow for this operation and Supplementary Table S3 provides the corresponding data.

Figure 5.

Figure 5.

Workflow used to identify biological pathways and processes demarcating sensitive and insensitive domains of the STM response (see Materials and Methods for details, Supplementary Figure S2 for the python code, and Supplementary Table S3 for corresponding data). Vertical arrows contain the number of chemical-NovaScreen (NVS) assay targets filtered through the workflow, leaving 82 potential molecular initiating events (MIEs) for functional annotation (see Supplementary Table S3 for details) and 68 MIEs that finally map to 60 statistically significant functional annotations (see Figure 6 for details). (NOTE: color image available in the on-line version).

The NVS chemical-assay matrix started with 2918 chemical-biochemical features in the NVS dataset connected to a ToxCast gene score, whereby assays were aggregated for the gene symbol most closely linked to a biochemical target where the cell-free AC50s fell 3*mad below the ToxCast cytotoxicity burst (Judson et al., 2016; Leung et al., 2016). For assay selection, logistic regression identified 267 molecular targets having more statistical concordance over the different STM-positive or STM-negative compounds. The algorithm tries to maximize the distance between weak and strong potency, resulting in gene-potency scores (GPS) typically in the lower nanomolar range (eg, < 75nM). Supplementary Table S3 lists them in rank order. Perturbed ligand binding to NR3C1 (glucocorticoid receptor) and ESR1 (estrogen receptor) represented the top GPS values for the STM-positive and STM-negative responses, respectively.

For further gene reduction, we next weighted the 267-gene list by evidence for gene-to-disease connections across 28 phenotypic systems in the HMDC database. This rearranged the gene list by evidence for gene-to-disease associations from curated literature. Supplementary Table S3 gives the adjusted rank order, and Figure 5 shows the top 10 genes surfacing to the STM-positive and STM-negative responses, respectively, mapped to canonical pathways and biological processes. For example, receptor tyrosine kinases (RTKs) and nonreceptor tyrosine phosphatases joined NR3C1 glucocorticoid receptor as top STM-positive contenders, whereas several NRs and G-protein coupled linked receptors joined ESR1 as top STM-negative contenders. Further demarcation of potential pathways and processes selected the top 40 phenotype-weighted NVS targets in each domain for functional annotation via NIAID/NIH DAVID Bioinformatics Resources 6.8 (see Materials and Methods). Note that 2 gene products (HDAC6, PTPN9) were present in both up/down sides resulting in 82 annotation records. Best results were obtained when STM-positive and STM-negative lists were annotated separately, utilizing a minimum of 4 genes and maximum 30 genes from GO Direct (molecular function, cellular compartment, and biological process), OMIM, KEGG, BioCarta, Reactome, and INTERPRO annotation systems. Annotation records were selected for Bonferroni-adjusted p-value ≤ .05 per category; redundancies were resolved manually by the lowest false discovery rate value (as these have lesser uncertainties across the specific annotation) or in some cases the most informative annotation record. This resulted in 60 overlapping annotation records (Supplementary Table S3) visualized by Spearman correlation (Figure 6).

Figure 6.

Figure 6.

Spearman correlation matrix for 60 potential biological pathways and processes translated from biochemical targets in the ToxCast STM domain and annotated with the NIH DAVID bioinformatics resources. NovaScreen assay targets demarcated for the top 40 STM-positive and STM-negative response domains produced by the workflow operation shown in Figure 5. Each record had 4–30 targets represented and passed at a Bonferroni-adjusted p-value ≤ .05. Vertical axis: annotation records colored by the NVS biochemical domain (STM-positive in red and STM-negative in blue). Horizontal axis: annotation records colored by the developmental toxicity domain from ToxRefDB medium stringency model (positive in red and negative in blue).

The 60 annotation records summed for 68 potential MIEs showed an inconsistent relationship to the medium stringency anchor when colored by STM response on one axis and developmental toxicity on the other (eg, rat or rabbit outcomes, n=285) (Figure 6). We further collapsed the 60 records into several “keystone” pathways that accounted for 75% of the MIEs (51 of 68, Supplementary Table S3). Figure 7 maps 34 MIEs (NVS) that correlated with developmental toxicity and detected (TP) or not (FN) with the STM response. The flow of regulatory pathway information points to AKT/FoxO signaling and focal adhesion as determinants in the applicability domain (RTK signaling), and Ca++ second messenger generation (G-protein coupled receptor [GPCR] signaling) via G(q) pathways and as well NR-mediated gene expression as determinants of developmental toxicity outside the applicability domain. Aside from the glucocorticoid receptor (NR3C1), the pathway model provides the basis for a mechanistic speculation of where the STM response is lacking in sensitivity.

Figure 7.

Figure 7.

Map of potential MIEs (NVS) to the applicability domain (STM) anchored to the medium stringency model of DevTox. Connections were largely made from information in GeneCards (https://www.genecards.org/). Molecular initiating events in dotted outline are implicit by pathway/group annotation from explicit MIEs (solid outline). Chemicals that hit MIEs in the red zone trip the STM biomarker and had some/clear evidence of DevTox in at least one species (rat or rabbit); those that hit MIEs in the green zone were positive for DevTox but missed by the STM assay under the conditions employed here (eg, false negatives [FNs]). The flow of regulatory information points to AKT/FoxO signaling in the TP domain, and G(q) signaling and nuclear receptor-mediated gene expression in the FN domain. (NOTE: color image available in the online version).

DISCUSSION

Profiling hESCs for their secreted metabolites has been proposed as an alternative testing platform for identifying compounds with potential developmental toxicity (Kleinstreuer et al., 2011; Palmer et al., 2013, 2017; West et al., 2010). Dynamic variations in metabolite abundance with functional changes in biochemical pathways and cellular metabolic response may be direct or secondary consequences of chemical exposure (Allison et al., 2015). Taking this into consideration, the profile of intermediary metabolites and small molecules released by hESCs to their environment (secretome) could lead to identification of the extent of adverse outcome pathways in the developing embryo. The ToxCast STM platform described here provides a potency read-out of a chemical compound’s exposure-based potential for developmental toxicity based on a critical imbalance in the targeted biomarker (decreased ORN/CYSS ratio detected in the H9 hESC conditioned medium) (Palmer et al., 2013). Testing the ToxCast phase I and II library of 1065 chemicals revealed a teratogenic index for 205 compounds consisting of mostly environmental and commercial chemicals and some pharmaceutical compounds. Despite the wide diversity of chemical structures (Richard et al., 2016), the resulting performance metrics in predicting in vivo developmental toxicity were consistent with those reported by the assay provider using pharmaceutical compounds (Palmer et al., 2013).

Major findings from an initial analysis of the ToxCast STM dataset may be summarized as follows: (1) 19% of 1065 chemicals tested here showed a positive biomarker response, yielding a prediction of developmental toxicity, (2) biomarker performance in general reached accuracies of 79% (Rand ACC) to 82% (BAC) with excellent to outstanding specificity (> 84%) but modest sensitivity (< 67%) when compared with in vivo animal models of human prenatal developmental toxicity, (3) sensitivity improved as more stringent criteria were applied to the animal studies, and (4) statistical analysis of the most potent chemical hits on specific biochemical targets in ToxCast revealed positive and negative associations with the STM response, providing insights into the mechanistic underpinnings of the targeted endpoint and its biological domain. The results must be interpreted with caution, insofar as the in vitro response is not a direct test of in vivo toxicity in absence of kinetics, metabolism, genetic diversity, and biological coverage.

  • 1

    Targeted biomarker. Recognizing the potential for FP and FN calls, the STM dataset has been analyzed and interpreted under the assumption of direct chemical-biological interaction of developmental toxicants with target proteins in the H9 cell culminating in a toxicodynamic response altering the ORN/CYSS ratio. Ornithine is an amino acid produced in the urea cycle by splitting urea from L-arginine. When transported by SLC25A15 into the mitochondrial matrix, ORN can be carbamylated to L-citrulline by ornithine transcarbamylase. Alternatively, ORN is metabolized by ornithine decarboxylase in the cytosol to putrescine, which is rate limiting in polyamine biosynthesis and thus important for the stabilization of newly synthesized DNA. Cystine taken by cells from the medium is used in glutathione production and so decreased CYSS uptake likely reflects a change in cellular glutathione synthesis and redox balance. Decreasing the ORN/CYSS ratio reflects, therefore, an imbalance in H9 cells that may predict a chemicals’ human teratogenic potential (Palmer et al., 2013, 2017). Using a benchmark set of 42 compounds compiled from literature on nonanimal alternatives, ToxCast STM perfectly classified STM-positives (1.00 PPV) but misclassified about a third as negatives (0.64 NPV). Absence of FPs bolsters confidence in the assay’s qualitative predictivity and adds quantitative value with an exposure-based prediction of teratogenicity; however, the relative insensitivity toward some chemicals showing evident developmental toxicity in guideline animal studies is an important factor in modeling assay performance that must be considered when using the data for health-protective hazard identification. For example, cyclopamine, cyclophosphamide, and diethylstilbestrol were classified as FNs. As noted in the Materials and Methods section, the HTC for each chemical was set based on information available for the cytotoxicity burst in ToxCast (Judson et al., 2016). The LBC (lower bounds of the median cytotoxicity burst) for cyclopamine, cyclophosphamide, and diethylstilbesterol was 15.7, > 1000, and 21.5μM, respectively. We tested cyclopamine up to 20μM and do not think its FN call was an underdosing issue. Although we only tested cyclophosphamide monohydrate to 20μM, a response would not be expected because this agent needs to be metabolically activated to its proximate teratogen (our contractor finds no response in human iPSCs testing this compound up to 300μM). For diethylstilbestrol, the HTC (10μM) was at the cytotoxicity burst; the contractor finds a response for this compound in iPSCs between 30 and 100μM, so it likely it would be TP if tested higher.

  • 2

    Assay performance. Overall accuracy of the ToxCast STM platform was assessed using in vivo outcomes in the ToxRefDB database’s prenatal developmental toxicity studies for pregnant rats and/or rabbits (Knudsen et al., 2009) and 42 benchmark compounds with confident calls from the literature, including 20 compounds from the original development of the assay (Palmer et al., 2013). Our data yielded a BAC of 82% that was consistent with the original description of the assay (Palmer et al., 2013). In characterizing the performance of the STM assay across in vitro and in vivo cutoff models, we found a convenient cutoff for positivity at ≤ 200μM (in vitro), and ≤ 200mg/kg/day for positivity and ≥ 1000mg/kg/day for negativity in vivo. The median dLEL values for rat and rabbit developmental toxicity were 100 and 80mg/kg/day, respectively. A study by van Ravenzwaay et al. (2017) to evaluate prenatal developmental toxicity on 480 chemicals from the REACh and BASF databases reported median dLEL values of 320 and 65mg/kg/day for rat and rabbit studies, respectively using ≤ 1000mg/kg/day for taking developmental toxicity into account. As such, the 200mg/kg/day cutoff for positivity is reasonable but could be more fully explored in future evaluation of compound-specific concentration (Cmax) dosimetry profiles predicted using high-throughput toxicokinetic models.

Expanding the 42-benchmark set with ToxCast chemicals having concordant ToxRefDB outcomes in rat and rabbit studies resulted in Rand ACC of 79.5% and BAC of 72.3% (n=127). The 127 chemical set is highlighted for its value in assessing performance metrics that correspond to the higher stringency chemical set, where we have more confidence about whether a chemical produces developmental toxicity or not based on consistent results across 2 species of animal prenatal developmental toxicity studies. These metrics are short of the assay’s performance of 85% BAC for 80 compounds cited in a recent review (Zhu et al., 2016). An improvement was not evident when the analysis was expanded with calls from one species only, rat or rabbit (Rand ACC = 71.9%, BAC = 66.5%, n=285). Relaxing the stringency of a positive call in that case would lessen the weight of evidence from ToxRefDB calls to tests in one species or species discordance based on the cutoffs used in our analysis for positivity (≤ 200mg/kg/day) and negativity (≥ 1000mg/kg/day). Given the caveat of species discordance in developmental toxicity findings for some chemicals (Carney et al., 2011; Hurtt et al., 2003; Janer et al., 2008; Knudsen et al., 2009; Rorije et al., 2012; Teixidó et al., 2018; Theunissen et al., 2016), and even more when using the base model, it may be that 82% BAC versus animal studies is quite good as a prediction of human developmental toxicity. Regulatory nonclinical safety testing of human pharmaceuticals typically requires embryo-fetal developmental toxicity testing in 2 species (1 rodent and 1 nonrodent). Discordance across species may be attributed to factors such as maternal toxicity, study design differences, pharmacokinetic differences, and pharmacologic relevance of species. If rat and rabbit studies combined are more predictive of human developmental toxicity, then an important question is whether using rat or rabbit as the benchmark for assessing model performance with hESC lines is truly health protective. An analysis of 379 pharmaceutical compounds having prenatal studies in both rat and rabbit animal models found both species to be equally sensitive by overall dLEL comparison, but selective developmental toxicity in one species was not uncommon, suggesting that the use of both species has a higher probability of detecting developmental toxicants than either one alone (Teixidó et al., 2018; Theunissen et al., 2016). Relaxing the stringency criteria for developmental toxicity would likely be used in a more conservative health-protective developmental hazard assessment for chemicals management programs that do not require detailed animal protocols. Given the tradeoff between assay sensitivity-specificity, further investigation is needed to determine how information from ToxCast or read across methods might guide concentration selection to optimize the assay for sensitivity (eg, low FNs) or specificity (eg, low FPs).

  • 3

    Assay sensitivity. Potential reasons for FNs in STM predictivity include limited in vitro solubility of chemicals, chemical degradation, lack of xenobiotic metabolism, incomplete biological coverage, or simply not testing high enough concentrations. Solubility and stability properties of ToxCast chemicals have been reported (Richard et al., 2016). These, as well as the caveats associated with biotransformation, may be considered with regards to negative predictivity on a case-by-case basis and will not be discussed here. As part of the ToxCast portfolio, we designed the highest tested concentration for each chemical using its median cytotoxicity burst across 37 cytotoxicity and cell stress assays as an initial guide for setting the HTC (Judson et al., 2016). Based on distance between the highest concentration tested here (ToxCast STM) and the lower bounds of the ToxCast cytotoxicity burst (Judson et al., 2016), we estimate 40–60 chemicals inactive may have been tested at too low of a concentration in this study. Applying the overall 19% hit rate from the assay profile obtained here suggests dose insufficiency might account for fewer than a dozen FNs in the tests conducted to date. Incomplete biological coverage will be discussed below as part of the qualification of this assay for the ToxCast dataset with regards to applicability for developmental hazard characterization.

  • 4

    Potential MIEs. Although performance indicators can be successfully anchored to in vivo developmental effects data, statistical analysis of the most potent chemical hits on specific biochemical targets in ToxCast that demarcated potential positivity or negativity of the STM response can provide insights into potential MIEs and mechanistic support for the assay biomarker (ie, the ORN/CYSS ratio). Focusing on the ToxCast NVS dataset as an initial proof of concept, functional annotation identified 60 “keystone” pathways and processes from the top correlations in the STM-positive and STM-negative space, respectively. Annotated functions most strongly correlated with the sensitive domain of the assay were RTKs and their associated downstream kinases and phosphatases that regulate cell growth, differentiation, and survival. At the top of the list were several “class-III RTKs” characterized by conserved Ig-like repeats (FLT1, FLT4, KDR, and CSF1R) activated by ligand-induced dimerization leading to autophosphorylation and downstream signaling through downstream phosphorylation cascades (SRC, RAS, RAP1, MAPK, PI3-AKT, and FOXO). Positivity is generally consistent with findings from a study in H9 cells using nuclear immunoreactivity for SOX17, a transcription factor of endoderm and hematopoietic differentiation, as a biomarker of positivity for teratogenicity (Kameoka et al., 2014). Those authors screened 302 kinase inhibitors that cover a majority of the human kinome. Their positives enriched for some of the top kinase targets identified in our ToxCast STM model (PI3K, AURKA, CSNK1D, KDR, FLT3, and FLT1) to further qualify relevance of the sensitive domain of the assay. Among the NR superfamily, only the glucocorticoid receptor (NR3C1) had a strong correlation with a STM-positive response.

Annotated functions most strongly correlated with the insensitive domain of the assay included NRs and GPCRs. Estrogen receptors (ESR1) stood out among the NRs having a very strong correlation to STM-negative response, wherein a positive might have been expected. For example, 17-ethinylestradiol, 17-α-estradiol, 17-β-estradiol, and diethylstilbestrol all tested negative in the ToxCast STM assay despite showing sub-nanomolar AC50s in the ToxCast NVS human ESR1 ligand binding assay. An obvious question is why the ORN/CYSS ratio was unaffected following exposure to test concentrations (1–10μM) that were several orders of magnitude greater than biochemical AC50s. One obvious question is whether estrogen receptors are even expressed or functional in undifferentiated hESCs. In their quantitative polymerase chain reaction (qPCR)-based characterization of NR gene expression across 3 mouse and human ESC lines (including H9), Xie et al. (2009) did not detect ERa or ERb expression in the undifferentiated cells. This absence is consistent with query of the NIH Stem Cell Data Management System Database (StemCellDB) (https://stemcelldb.nih.gov/public.do). On the other hand, both ERα and ERβ transcripts as well as nuclear immunoreactivity are highly expressed in undifferentiated hESC from the Miz-hES1 cell line as well as embryoid bodies grown on murine feeder cells (Hong et al., 2004). For this reason, we cannot rule out the important concern that negativity may reflect an absence of ESR1 signaling dynamics.

G-protein coupled receptors at the top of the STM-negative list included muscarinic receptors (CHRM 1/3/5) and endothelin receptors (EDNRA and EDNRB) acting through G alpha (q) proteins to stimulate calcium signaling through phosphatidylinositol generation and phospholipase C activity. Negativity in response to chemicals with strong antagonist effects on the CHRM system is consistent with negligible expression of these receptors in the NIH StemCellDB, as well as studies showing undifferentiated H9 cells to be unresponsive to various neurotransmitters, including acetylcholine and substance P, that robustly increase intracellular calcium in differentiated hESCs (Carpenter et al., 2004). Notably, antagonists of the neurokinin receptors TAC1 (substance P) and TACR2 (neurokinin A) correlated with a STM-negative response versus a STM-positive response for TACR3 (neurokinin B). Because all 3 neurokinin receptors couple to G(q) signaling events via phosphatidylinositol-calcium second messenger (Regoli et al., 1994), we may speculate that differential GPCR expression explains at least some of the relative insensitivity of the STM biomarker to some chemicals in the ToxCast library.

Understanding the biochemical space not covered by ToxCast STM may lead to improved models for predictive developmental toxicity. Consider, for example, NVS assays for endothelin receptors EDNRA and EDNRB. Endothelin signaling through these GPCR systems is essential for normal neural crest development and disruption is teratogenic (Clouthier et al., 1998; Spence et al., 1999; Treinen et al., 1999). Two compounds with potent inhibitory effects on these NVS receptor assays (SB-217242 and SB-209670) were STM-negative. As such, statistical analysis of the STM response with regards to the NVS biochemical domain shows at least some teratogenic pathways will be missed by the ToxCast STM profile. A machine learning approach using the entire ToxCast assay portfolio will be needed as a follow-up study to more completely define the STM applicability domain and expand the predictive models with other in vitro data and in silico models relevant to prenatal developmental toxicity but otherwise missed by the ToxCast STM assay.

Deconvoluting the flow of regulatory information in the STM-positive keystone annotations speculates a convergence of TP signals on FOXO signaling. The mammalian “Forkhead box O” family of transcription factors (FOXO1, FOXO3, and FOXO4) are well established downstream targets of AKT-phosphorylation that, in turn, lead to nuclear export via 14–3-3 binding (Ro et al., 2013). FOXO is part of an energy-sensing system that controls stem cell self-renewal and differentiation relative to the metabolic state of the cell (Ochocki and Simon, 2013; Rafalski et al., 2012). FOXO1 is essential for the maintenance of hESC pluripotency (Zhang et al., 2011) and its inactivation is linked with AKT, serum- and glucocorticoid kinase, IKB kinase, and ERK pathways (Hagenbuchner and Ausserlechner, 2013). These pathways are all represented in the STM-positive domain. FOXO transcription factors act as metabolic sensors by virtue of redox modifications of their cysteine residues (Wang et al., 2013). ROS modulate FOXO activity at multiple levels, including posttranslational modifications (phosphorylation and acetylation), interaction with coregulators, alterations in subcellular localization, protein synthesis, and stability (Klotz et al., 2015). Important roles are proposed for FOXO signaling in several embryonic processes (Hosaka et al., 2004; Yeo et al., 2013), leading to the hypothesis that the ORN/CYSS ratio reflects the H9 metabolic cell state in a manner linked to hypophosphorylation and nuclear retention (FOXO1) or mitochondrial ROS homeostasis (FOXO3) (Ochocki and Simon, 2013; Yeo et al., 2013) in redox balance. Further investigation is required to determine if the ORN/CYSS ratio somehow signifies the potential for a developmentally relevant alteration in FOXO switching.

In conclusion, the data presented here support the application of the Stemina devTOXqP platform for predictive toxicology (Palmer et al., 2013, 2017) and further demonstrate its value in ToxCast as a novel resource that can generate testable hypotheses aimed at characterizing potential pathways for teratogenicity and HTS prioritization of environmental chemicals for an exposure-based assessment of developmental hazard. These “hierarchical performance-based models” bring new and potentially valuable information on the ORN/CYSS biomarker for systematic and unbiased prediction of adverse fetal endpoints across a relatively large number of animal studies. This analysis, we believe, points out strengths and weaknesses in the translatability of the performance-based models for both scientific and regulatory purposes. The present analysis demonstrated the positive predictive capability of this assay and that balanced accuracies increase in model performance increase as the stringency criteria tighten for in vivo developmental toxicity (57%–82%). For an untested chemical, a positive test in this assay would indicate its likelihood of being a developmental hazard at a particular internal concentration, with the understanding that a negative response does not imply a nonhazard. Further analysis, both experimental and computational, will be necessary to determine whether the limitations in sensitivity reflect the robustness of in vivo endpoints used in the models or the biological domain of the hESC biomarker response, in generating alerts for further testing.

Supplementary Material

Supplement1

ACKNOWLEDGMENTS

We gratefully acknowledge technical support from Michael Colwell and Laura Egnash (Stemina) as well as Ann Richard and David Murphy (NCCT). We thank our technical reviewers, Dr. Nicole Kleinstreuer (NTP/NIEHS/NIH) and Dr. Katie Paul-Friedman (NCCT/ORD/EPA), for their critical comments during the Agency’s clearance review of this manuscript. We also thank Madison Feshuk for compiling the assay annotations for release of this assay to EPA’s CompTox Chemicals Dashboard (anticipated March 2020).

FUNDING

US EPA under NCCT contract EP-D-13-055 with Stemina Biomarker Discovery (Madison, Wisconsin).

Footnotes

DATA AVAILABILITY

Supplementary data are available at https://doi.org/10.5061/dryad.gqnk98shm. The ToxCast _STM dataset is available for ftp://download at: https://doi.org/10.23645/epacomptox.11819265, last accessed February 7, 2020.

DECLARATION OF CONFLICTING INTERESTS

The authors declare that they have no conflicts of interest to disclose in connection with this study. J.A.P. is an employee of Stemina Biomarker Discovery Inc.

Disclaimer: The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

Published by Oxford University Press on behalf of the Society of Toxicology 2020. This work is written by US Government employees and is in the public domain in the US.

REFERENCES

  1. Adler S, Pellizzer C, Paparella M, Hartung T, and Bremer S. (2006). The effects of solvents on embryonic stem cell differentiation. Toxicol. In vitro 20, 265–271. [DOI] [PubMed] [Google Scholar]
  2. Allison TF, Powles-Glover NS, Biga V, Andrews PW, and Barbaric I. (2015). Human pluripotent stem cells as tools for high-throughput and high-content screening in drug discovery. Int. J. High Throughput Screen 5, 1–13. [Google Scholar]
  3. Augustine-Rauch K, Zhang CX, and Panzica-Kelly JM (2016). A developmental toxicity assay platform for screening teratogenic liability of pharmaceutical compounds. Birth Defects Res. B Dev. Reprod. Toxicol 107, 4–20. [DOI] [PubMed] [Google Scholar]
  4. Bremer S, and Hartung T. (2004). The use of embryonic stem cells for regulatory developmental toxicity testing in vitro—The current status of test development. Curr. Pharm. Des 10, 2733–2747. [DOI] [PubMed] [Google Scholar]
  5. Carney EW, Ellis AL, Tyl RW, Foster PMD, Scialli AR, Thompson K, and Kim J. (2011). Critical evaluation of current developmental toxicity testing strategies: A case of babies and their bathwater. Birth Defects Res. B 92, 395–403. [DOI] [PubMed] [Google Scholar]
  6. Carpenter MK, Rosler ES, Fisk GJ, Brandenberger R, Ares X, Miura T, Lucero M, and Rao MS (2004). Properties of four human embryonic stem cell lines maintained in a feeder-free culture system. Dev. Dynam 229, 243–258. [DOI] [PubMed] [Google Scholar]
  7. Chandler KJ, Barrier M, Jeffay S, Nichols HP, Kleinstreuer NC, Singh AV, Reif DM, Sipes NS, Judson RS, Dix DJ, et al. (2011). Evaluation of 309 environmental chemicals using a mouse embryonic stem cell adherent cell differentiation and cytotoxicity assay. PLoS One 6, e18540. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Clouthier DS, Hosoda K, Richardson JA, Williams CA, Yanagisawa H, Kuwaki T, Kumada M, Hammer RE, and Yanagisawa M. (1998). Cranial and cardiac neural crest defects in endothelin-A receptor-deficient mice. Development 125, 813–824. [DOI] [PubMed] [Google Scholar]
  9. Collins FS, Gray GM, and Bucher JR (2008). Transforming environmental health protection. Science 319, 906–907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Daston GP, Beyer BK, Carney EW, Chapin RE, Friedman JM, Piersma AH, Rogers JM, and Scialli AR (2014). Exposure-based validation list for developmental toxicity screening assays. Birth Defects Res. B Dev. Reprod. Toxicol 101, 423–428. [DOI] [PubMed] [Google Scholar]
  11. European Parliament, Council of the European Union (2006) Regulation (EC) No. 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No. 793/93 and Commission Regulation (EC) No. 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri¼CELEX:32006R1907. Last accessed July 1, 2019.
  12. Filer DL, Kothiya P, Setzer RW, Judson RS, and Martin MT (2017). tcpl: the ToxCast pipeline for high-throughput screening data. Bioinformatics (Oxford, England) 33, 618–620. 10.1093/bioinformatics/btw68027797781 [DOI] [PubMed] [Google Scholar]
  13. Genschow E, Spielmann H, Scholz G, Seiler A, Brown N, Piersma A, Brady M, Clemann N, Huuskonen H, Paillard F, et al. (2002). The ECVAM international validation study on in vitro embryotoxicity tests: Results of the definitive phase and evaluation of prediction models. Altern. Lab. Anim 30, 151–176. [DOI] [PubMed] [Google Scholar]
  14. Hagenbuchner J, and Ausserlechner MJ (2013). Mitochondria and FOXO3: Breath or die. Front. Physiol 4, 147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hong SH, Nah HY, Lee YJ, Lee JW, Park JH, Kim SJ, Lee JB, Yoon HS, and Kim CH (2004). Expression of estrogen receptor-alpha and -beta, glucocorticoid receptor, and progesterone receptor genes in human embryonic stem cells and embryoid bodies. Mol. Cells 31, 320–325. [PubMed] [Google Scholar]
  16. Hosaka T, Biggs WH 3rd, Tieu D, Boyer AD, Varki NM, Cavenee WK, and Arden KC (2004). Disruption of fork-head transcription factor (FOXO) family members in mice reveals their functional diversification. Proc. Natl. Acad. Sci. U. S.A 101, 2975–2980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Huch M, and Koo B-K (2015). Modeling mouse and human development using organoid cultures. Development 142, 3113–3125. [DOI] [PubMed] [Google Scholar]
  18. Hurtt ME, Cappon GD, and Browning A. (2003). Proposal for a tiered approach to developmental toxicity testing for veterinary pharmaceutical products for food-producing animals. Food Chem. Toxicol 41, 611–619. [DOI] [PubMed] [Google Scholar]
  19. Janer G, Slob W, Hakkert BC, Vermeire T, and Piersma AH (2008). A retrospective analysis of developmental toxicity studies in rat and rabbit: What is the added value of the rabbit as an additional test species? Regul. Toxicol. Pharmacol 50, 206–217. [DOI] [PubMed] [Google Scholar]
  20. Juberg DR, Knudsen TB, Sander M, Beck NB, Faustman EM, Mendrick DL, Fowle JR, Hartung T, Tice RR, Lemazurier E, et al. (2017). FutureTox III: Bridges for Translation. Toxicol. Sci 155, 22–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Judson R, Houck K, Martin M, Richard AM, Knudsen TB, Shah I, Little S, Wambaugh J, Woodrow Setzer R, Kothya P, et al. (2016). Analysis of the effects of cell stress and cytotoxicity on in vitro assay activity in the ToxCast dataset. Toxicol. Sci 152, 323–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Judson RS., Houck KA., Kavlock RJ., Knudsen TB., Martin MT., Mortensen HM., Reif DM., Richard AM., Rotroff DM., Shah I., et al. (2010). In vitro screening of environmental chemicals for targeted testing prioritization: The ToxCast project. Environ. Health Perspect 118, 485–492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kameoka S, Babiarz J, Kolaja K, and Chiao E. (2014). A high-throughput screen for teratogens using human pluripotent stem cells. Toxicol. Sci 137, 76–90. [DOI] [PubMed] [Google Scholar]
  24. Kavlock R, Chandler K, Houck K, Hunter S, Judson R, Kleinstreuer N, Knudsen T, Martin M, Padilla S, Reif D, et al. (2012). Update on EPA’s ToxCast Program: Providing High Throughput Decision Support Tools for Chemical Risk Management. Chem. Res. Toxicol 25, 1287–1302. [DOI] [PubMed] [Google Scholar]
  25. Kleinstreuer NC, Smith AM, West PR, Conard KR, Fontaine BR, Weir-Hauptman AM, Palmer JA, Knudsen TB, Dix DJ, Donley E, et al. (2011). Identifying developmental toxicity pathways for a subset of ToxCast chemicals using human embryonic stem cells and metabolomics. Toxicol. Appl. Pharmacol 257, 111–121. [DOI] [PubMed] [Google Scholar]
  26. Klotz L-O, Sanchez-Ramos C, Prieto-Arroyo I, Urbanek P, Steinbrenner H, and Monsalve M. (2015). Redox regulation of FoxO transcription factors. Redox Biol 6, 51–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Knudsen TB, and Daston GP (2018) Systems toxicology and virtual tissue models. In Comprehensive Toxicology, Third Edition (McQueen CA, Ed.), Vol. 5, pp. 351–362. Elsevier Ltd, Oxford. [Google Scholar]
  28. Knudsen TB, Houck K, Sipes NS, Judson RS, Singh AV, Weissman A, Kleinstreuer NC, Mortensen H, Reif D, Setzer RW, et al. (2011). Activity profiles of 320 ToxCast™ chemicals evaluated Across 292 biochemical targets. Toxicology 282, 1–15. [DOI] [PubMed] [Google Scholar]
  29. Knudsen TB, Martin NT, Kavlock RJ, Judson RS, Dix DJ, and Singh AV (2009). Profiling the activity of environmental chemicals in prenatal developmental toxicity studies using the U.S. EPA’s ToxRefDB. Reprod. Toxicol 28, 209–219. [DOI] [PubMed] [Google Scholar]
  30. Le Coz F, Suzuki N, Nagahori H, Omori T, and Saito K. (2015). Hand1-Luc embryonic stem cell test (Hand1-Luc EST): A novel rapid and highly reproducible in vitro test for embryotoxicity by measuring cytotoxicity and differentiation toxicity using engineered mouse ES cells. J. Toxiol. Sci 40, 251–261. [DOI] [PubMed] [Google Scholar]
  31. Leist M, Hasiwa N, Rovida C, Daneshian M, Basketter D, Kimber I, Clewell H, Gocht T, Goldberg A, Busquet F, et al. (2014). Consensus report on the future of animal-free systemic toxicity testing. Altex 31, 341–356. [DOI] [PubMed] [Google Scholar]
  32. Leung MCK, Phuong J, Baker NC, Sipes NS, Klinefelter GR, Martin MT, McLaurin KW, Setzer RW, Darney SP, Judson RS, et al. (2016). Systems toxicology of male reproductive development: Profiling 774 chemicals for molecular targets and adverse outcomes. Environ. Health Perspect 124, 1050–1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Luz AL, and Tokar EJ (2018). Pluripotent stem cells in developmental toxicity testing: A review of methodological advances. Toxicol. Sci 165, 31–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Martin MT, Mendez E, Corum DG, Judson RS, Kavlock RJ, Rotroff DM, and Dix DJ (2009). Profiling the reproductive toxicity of chemicals from multigenerational studies in the toxicity reference database. Toxicol. Sci 110, 181–190. [DOI] [PubMed] [Google Scholar]
  35. Matthews BW (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 405, 442–451. [DOI] [PubMed] [Google Scholar]
  36. National Research Council. (2007). Toxicity Testing in the 21st Century: A Vision and a Strategy, 196 pp The National Academies Press, Washington, DC. [Google Scholar]
  37. Niles A, Moravec RA, Eric Hesselberth P, Scurria MA, Daily WJ, and Riss TL (2007). A homogeneous assay to measure live and dead cells in the same sample by detecting different protease markers. Anal. Biochem 366, 197–206. [DOI] [PubMed] [Google Scholar]
  38. Ochocki JD, and Simon C. (2013). Nutrient-sensing pathways and metabolic regulation in stem cells. J. Cell Biol 203, 23–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Palmer JA, Smith AM, Egnash LA, Colwell MR, Donley ELR, Kirchner FR, and Burrier RE (2017). A human induced pluripotent stem cell-based in vitro assay predicts developmental toxicity through a retinoic acid receptor-mediated pathway for a series of related retinoid analogues. Reprod. Toxicol 73, 350–361. [DOI] [PubMed] [Google Scholar]
  40. Palmer JA, Smith AM, Egnash LA, Conard KR, West PR, Burrier RE, Donley ELR, and Kirchner FR (2013). Establishment and assessment of a new human embryonic stem cell-based biomarker assay for developmental toxicity screening. Birth Defects Res. B Dev. Reprod. Toxicol 98, 343–363. [DOI] [PubMed] [Google Scholar]
  41. Panzica-Kelly JM, Brannen KC, Ma Y, Zhang CX, Flint OP, Lehman-McKeeman LD, and Augustine-Rauch KA (2013). Establishment of a molecular embryonic stem cell developmental toxicity assay. Toxicol. Sci 131, 447–457. [DOI] [PubMed] [Google Scholar]
  42. Pennings JLA, van Dartel DAM, Robinson JF, Pronk TE, and Piersma AH (2011). Gene set assembly for quantitative prediction of developmental toxicity in the embryonic stem cell test. Toxicology 284, 63–71. [DOI] [PubMed] [Google Scholar]
  43. Powers DMW (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol 2, 37–63. [Google Scholar]
  44. Rafalski VA, Mancini E, and Brunet A. (2012). Energy metabolism and energy-sensing pathways in mammalian embryonic and adult stem cell fate. J. Cell Sci 125, 5597–5608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Regoli D, Boudon A, and Fauchere JL (1994). Receptors and antagonists for substance P and related peptides. Pharmacol. Rev 46, 551–599. [PubMed] [Google Scholar]
  46. Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, et al. (2016). ToxCast chemical landscape: Paving the road to 21st century toxicology. Chem. Res. Toxicol 29, 1225–1251. [DOI] [PubMed] [Google Scholar]
  47. Ro S-H, Liu D, Yeo H, and Paik J-h. (2013). FoxOs in neural stem cell fate decision. Arch. Biochem. Biophys 534, 55–63. [DOI] [PubMed] [Google Scholar]
  48. Rorije E, van Hienen FJ, Dang ZC, Hakkert BH, Vermeire T, and Piersma AH (2012). Relative parameter sensitivity in prenatal toxicity studies with substances classified as developmental toxicants. Reprod. Toxicol 34, 284–290. [DOI] [PubMed] [Google Scholar]
  49. Seiler AEM, and Spielmann H. (2011). The validated embryonic stem cell test to predict embryotoxicity in vitro. Nat. Protoc 6, 961–978. [DOI] [PubMed] [Google Scholar]
  50. Sipes NS, Martin MT, Kothiya P, Reif DM, Judson R, Richard A, Houck KA, Dix DJ, Kavlock RJ, and Knudsen TB (2013). Profiling 976 ToxCast chemicals across 331 enzymatic and receptor signaling assays. Chem. Res. Toxicol 26, 878–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Spence S, Anderson C, Cukierski M, and Patrick D. (1999). Teratogenic effects of the endothelin receptor antagonist L-753,037 in the rat. Reprod. Toxicol 13, 15–29. [DOI] [PubMed] [Google Scholar]
  52. Sturla SJ, Boobis AR, Fitzgerald RE, Hoeng J, Kavlock RJ, Schirmer K, Whelan M, Wilks MF, and Peitsch MC (2014). Systems toxicology: From basic research to risk assessment. Chem. Res. Toxicol 27, 314–329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Teixidó E., Krupp E., Amberg A., Czich A., and Scholz S. (2018). Species-specific developmental toxicity in rats and rabbits: Generation of a reference compound list for development of alternative testing approaches. Reprod. Toxicol 76, 93–102. [DOI] [PubMed] [Google Scholar]
  54. Theunissen PT, Beken S, Beyer B, Breslin WJ, Cappon GD, Chen CL, Chmielewski G, de Schaepdrijver L, Enright B, Foreman JE, et al. (2016). Comparing rat and rabbit embryo-fetal developmental toxicity data for 379 pharmaceuticals: On the nature and severity of developmental effects. Crit. Rev. Toxicol 46, 900–910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Thomson JA, Itskovits-Eldor J, Shapiro SS, Waknitz MA, Swiergiel JJ, Marshall VS, and Jones JM (1998). Embryonic stem cell lines derived from human blastocytes. Science 282, 1145–1147. [DOI] [PubMed] [Google Scholar]
  56. Treinen KA, Louden C, Dennis MJ, and Wier PJ (1999). Developmental toxicity and toxicokinetics of two endothelin receptor antagonists in rats and rabbits. Teratology 59, 51–59. [DOI] [PubMed] [Google Scholar]
  57. US Public Law 114–182 (2016). Available at: https://www.congress.gov/114/plaws/publ182/PLAW-114publ182.pdf and https://www.epa.gov/assessing-and-managing-chemicalsunder-tsca/alternative-test-methods-and-strategies-reduce.
  58. van Ravenzwaay B, Jiang X, Luechtefeld T, and Hartung T. (2017). The threshold of toxicological concern for prenatal developmental toxicity in rats and rabbits. Regul. Toxicol. Pharmacol 88, 157–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Wang K, Zhang T, Dong Q, Nice EC, Huang C, and Wei Y. (2013). Redox homeostasis: The linchpin in stem cell self-renewal and differentiation. Cell Death Dis 4, e537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Warkus ELL, and Marikawa Y. (2017). Exposure-based validation of an in vitro gastrulation model for developmental toxicity assays. Toxicol. Sci 157, 235–245. [DOI] [PubMed] [Google Scholar]
  61. Watford S, Pham LL, Wignall J, Shin R, Martin MT, and Paul Friedman K. (2019). ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses. Reprod. Toxicol 89, 145–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. West PR, Weir AM, Smith AM, Donley ELR, and Cezar GG (2010). Predicting human developmental toxicity of pharmaceuticals using human embryonic stem cells and metabolomics. Toxicol. Appl. Pharmacol 247, 18–27. [DOI] [PubMed] [Google Scholar]
  63. Wise LD (2016). Numeric estimates of teratogenic severity from embryo-fetal developmental toxicity studies. Birth Defects Res. B Dev. Reprod. Toxicol 107, 60–70. [DOI] [PubMed] [Google Scholar]
  64. Xie CQ, Jeong Y, Fu M, Bookout AL, Garcia-Barrio MT, Sun T, Kim B, Xie Y, Root S, Zhang J, et al. (2009). Expression profiling of nuclear receptors in human and mouse embryonic stem cells. Mol. Endocrinol 23, 724–733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Xing J, Toh Y-C, Xu S, and Yu H. (2015). A method for human teratogen detection by geometrically confined cell differentiation and migration. Sci. Rep 5, 10038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Yeo H, Lyssiotis CA, Zhang Y, Ying H, Asara JM, Cantley LC, and Paik JH (2013). FoxO3 coordinates metabolic pathways to maintain redox balance in neural stem cells. EMBO J 32, 2589–2602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Zhang X, Yalcin S, Lee D-F, Yeh TYJ, Lee S-M, Su J, Mungamuri SK, Rimmelé P, Kennedy M, Sellers R, et al. (2011). FOXO1 is an essential regulator of pluripotency in human embryonic stem cells. Nat. Cell Biol 13, 1092–1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Zhu H, Bouhifd M, Donley E, Egnash L, Kleinstreuer N, Kroese ED, Liu Z, Luechtefeld T, Palmer J, Pamies D, et al. (2016). Supporting read-across using biological data. Altex 33, 167–182. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

RESOURCES