Consensus score heatmaps for cell lines for k=5 to 8 in CCLE expression data (A) and patient tumor specimens (B).
Figure S2: WGCNA co-expression modules.
(A) Dendrogram generated by WGCNA unsupervised hierarchical clustering of genes from SCLC dataset. Adopting a cutoff height of 0.95 resulted in 14 gene modules. 13 modules contain co-expressed genes and are assigned colors, as is standard for the WGCNA package. The grey module contains all genes which could not be clustered using this cutoff. (B) Boxplots comparing distribution of module eigengenes for the consensus clusters assigned in Figure 1. Only the Blue and Turquoise module were statistically significantly different.
Figure S3: Blue module network topology given by WGCNA.
This figure shows nodes within the Blue module defined via WGCNA while edges denote a topological overlap measure (TOM). TOM is a metric for the degree of co-expression/correlation between a pair of genes (27). If the TOM is significant, an edge is drawn between a pair of genes. The thickness of the edges denotes the magnitude of TOM. The size of the node and its font denotes intramodular connectivity (‘hubness’) (27) within the Blue module, higher the value – larger the size of the node and its font. The nodes in red denote well-known biomarkers of neuroendocrine and epithelial differentiation.
Figure S4: Turquoise module network topology given by WGCNA.
Same as Figure S3, except that the nodes in red denote well-known biomarkers of Epithelial-mesenchymal transition (EMT) or mesenchymal differentiation.
Figure S5: Biological processes associated with Blue and Turquoise module genesets.
The Blue and Turquoise modules exhibit statistically significant differences (FDR<0.05) in differentiation (upper part) and signaling (lower part) pathways, by comparative enrichment analysis of Gene Ontology (GO) pathways using BINGO and EnrichmentMap (28) in Cytoscape® (www.cytoscape.org). These differences are presented as a network, where nodes denote the GO categories and edges denote GO connections between the pathways. Solid red dots are “umbrella” nodes that connect distinct but related biological processes (manually encircled with dotted lines). The characteristics of the other dots are indicated in the box. Blue module shows enrichment for epithelial and neuronal development and differentiation, neuronal signaling, axon guidance, neurotransmitter secretion and cell-cell signaling. Turquoise module shows enrichment for myeloid and neural crest differentiation, MAPK, JAK-STAT, NF-kB, TGFbeta, and cytokine signaling cascades (TNF, VEGF, IL-6, IL-8) that are known to be associated with a mesenchymal/EMT phenotype. Note that statistically significant differences between Blue and Turquoise modules were also found for the following pathways: metabolism, adhesion, transcription, proliferation and apoptosis (data not shown).
Figure S6: Pathway expression of the Blue and Turquoise modules given by comparative pathway enrichment analysis.
(A) In columns, expression of genes comprised in differentiation and signaling pathways that distinguish the Blue and Turquoise module (Figure S5), averaged within each of the 53 SCLC cell lines (rows, ordered by Blue eigengene as in Figure 2). Blue module shows enrichment for neuronal signaling, axon guidance, neurotransmitter secretion and cell-cell signaling. Turquoise module shows enrichment for MAPK, JAK-STAT, NFKappaB, TGFBeta, cytokine signaling cascades (TNF, VEGF, IL-6, IL-8) that are known to be associated with a mesenchymal/EMT phenotype. (B) Gene set enrichment analysis (GSEA) (50) of pro-neural and mesenchymal glioblastoma subtype signatures (30) in SCLC cell lines. In this analysis, the 15477 CCLE genes are rank-ordered (left to right) based on correlation between their expression to the top 9 (ML phenotype) and bottom 12 (NE phenotype) SCLC cell lines as ordered in (A). These rank-ordered genes are then assessed for enrichment with gene signatures of pro-neural (bar code in upper panel, as indicated) and mesenchymal (bar code in lower panel, as indicated) glioblastoma subtypes (30). Upper panel: the Enrichment Score (ES) for the mesenchymal glioblastoma signature quickly rises in the part of the gene ranking list correlated with the SCLC ML cell lines, and decreases thereafter. Lower panel: the ES for the pro-neural glioblastoma signature rises alongside genes correlated with the SCLC NE cell lines.
Figure S7: Identification of transcription factors that regulate SCLC phenotypic states.
(A) Overview of Boolean model network generation analysis. To identify a global SCLC transcriptional regulatory network, ARACNE analysis (based on mutual information between genes) was performed on 53 SCLC cell lines and 15477 genes in the CCLE dataset. The analysis yielded a network of 8706 nodes (genes) and 27224 edges (see Methods). This core SCLC network was analyzed using Fisher’s exact test to identify top transcription factors (TF) that act as master regulators of either the NE or ML networks (identified via WGCNA). These TFs were independently validated using literature and transcription factor ChIP-Seq and TF-binding site prediction databases via EnrichR (35), leading to a list of 76 TFs. Only the most variant TFs across the SCLC cell lines were selected for building the boolean model network. (B) Correlation heatmap plot of individual 73 Blue and 184 Turquoise module and 23 common TF regulators (columns) with 1179 Blue and 3471 Turquoise module genes (rows). Yellow-orange-red indicates positive correlation suggesting positive target gene regulation while green-blue indicates negative correlation suggesting negative target gene regulation. (C) Density histogram of the Blue/Turquoise TF regulators and correlation with its targets in the Blue or Turquoise modules. This suggests that a particular TF differentially regulates the 2 modules. (D) ARACNE network view of the top TFs shown in B and C (identified via master regulator analysis) that regulate the Blue, Turquoise or both modules. The node connectivity of a TF is given by its bigger size indicative of the number of targets regulated by the TF. Edges are derived from ARACNE mutual information between the nodes given its co-expression.
Figure S8: Inhibitory dominant dynamics show qualitatively similar attractors to threshold updates.
(A) Attractors identified using the inhibitory dominant update method (see Methods). We found 6 steady state attractors, as well as 5 distinct 2-state oscillating attractors, or limit cycles. (B) Correlation score of the attractors with the CCLE cell lines, as in Figure 3. The inhibitory dominant update method still identifies several attractors which are correlated to the NE and ML cell lines, however several cell lines still show no significant correlation with any attractor. The limit cycle attractors transition between similar states, suggesting that each limit cycle may still be thought of as only either NE or ML, not transitions between them.
Figure S9: Derrida analyses show that network dynamics are ordered.
Derrida plots showing the average Hamming distance between two states after a single update of the Boolean network, as a function of the Hamming distance between the two initial states for (A) threshold update, and (B) inhibitory dominant updates. The solid blue line distinguishes whether states get farther apart (above the line) or closer together (below the line) after a single update. Chaotic trajectories tend to get farther apart, and therefore lie above the line, while ordered trajectories converge, and lie below. These results suggest that the SCLC TF network dynamics are within an ordered regime.
Figure S10: Statistical significance of TF network attractors.
10,000 random attractors were generated to derive a correlation distribution for the null hypothesis that the model finds random attractors. (A and C) The distribution of attractor correlations with the CCLE dataset for threshold updates (A) and inhibitory dominant updates (C). For each attractor, only the highest correlation was considered to avoid saturating the distribution with poor correlations (i.e. if an attractor correlates with a NE cell line, it will almost certainly be anti-correlated or poorly correlated with the ML cell lines). The blue distribution shows the best correlation of random attractors, while the green distribution shows the best correlation of the model’s attractors. By the Mann-Whitney U test, the model’s correlations are significantly higher than random (threshold: p-value = 9.5e-34, inhibitory-dominant: p-value = 7.6e-9). (B and D) The distribution of attractor correlations with the CLCGP dataset for threshold updates (B) and inhibitory dominant updates (D). As in (A) only the highest correlations are considered. By the Mann-Whitney U test, the model’s correlations are significantly higher than random (threshold: p-value = 1.7e-23, inhibitory-dominant: p-value = 1.7e-6).
Figure S11: Hybrid cells are not enriched for stem cell marker.
Distribution of fluorescence intensities of CD133, a cancer stem-cell marker, for NE, ML, and hybrid cell lines. Density estimates show that there is no significant difference in the expression levels of this marker between the three subsets.
Table S1: Consensus clusters
A list of all the CCLE cell lines and CLCGP patient samples, and which cluster they were placed in by consensus clustering with k=2.
Table S2: WGCNA modules
A list of all genes used in these analyses, and which WGCNA gene co-expression module they belonged to.
Table S3: TF network
A comma-separated file of the TF network. Each line is “SOURCE,TARGET,WEIGHT”
Table S4: Structural coherence
Robustness coherence is a metric of how ordered a basin of attraction is, determined by what fraction of states in a basin lie along the boundary of a different basin of attraction. Structural coherence is a normalized metric of robustness scaled between the coherence that would be expected purely by random chance, and the maximum possible coherence that a basin could have. Here we report estimates of random coherence (the coherence expected just based on the size of the basin), maximum coherence (the best coherence possible), observed coherence (the actual calculated fraction of states along a basin boundary), and structural coherence (the observed coherence, normalized between random (0) and maximum (1)). These results show that SCLC network leads to more robustness attractors than would be expected by random chance, and suggests that the network is able to robustly direct the system based on the initial condition to the appropriate attractor.
Table S5: Statistical significance
Model predicted attractors are significantly more correlated to cell lines and patient samples than random attractors. This file reports the cell line most highly correlated to each attractor, that correlation value, and associated p-value.