Characterization of cancer-driving nucleotides (CDNs) across genes, cancer types, and patients

  1. Lingjie Zhang
  2. Tong Deng
  3. Zhongqi Liufu
  4. Xiangnyu Chen
  5. Shijie Wu
  6. Xueyu Liu
  7. Changhao Shi
  8. Bingjie Chen
  9. Zheng Hu
  10. Qichun Cai
  11. Chenli Liu
  12. Mengfeng Li
  13. Miles E Tracy
  14. Xuemei Lu
  15. Chung-I Wu  Is a corresponding author
  16. Hai-Jun Wen  Is a corresponding author
  1. State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, China
  2. Center for Excellence in Animal Evolution and Genetics, The Chinese Academy of Sciences, China
  3. GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, China
  4. CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, China
  5. Cancer Center, Clifford Hospital, Jinan University, China
  6. Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, China
  7. Department of Ecology and Evolution, University of Chicago, United States
8 figures, 5 tables and 1 additional file

Figures

Mutations in organismal evolution vs. cancer evolution.

(A, B) A hypothetical example of DNA sequence evolution in organism vs. in cancer with the same number of mutations. (C) Mutation distribution in two species in the organismal evolution of (A). (D, E) Mutation distribution in cancer evolution among 10 sequences may have D and E patterns. (F) Another pattern of mutation distribution in cancer evolution with a recurrent site but shows too few total mutations. Mutations of (F) are cancer-driving nucleotides (CDNs) missed in the conventional screens.

ΔUi analysis across six cancer types.

ΔUi, ranging between 0 and 1 (Tang et al., 2004; Chen et al., 2019a), is a measure of physico-chemical differences among the 20 amino acids (see the text). The most similar amino acids have ΔUi near 0 and the most dissimilar ones have ΔUi near 1. Each panel corresponds to one cancer type, with horizontal bar represents ΔUi distribution of each recurrence group. The numbers on the left of the panel are i values and on the right are the number of sites. Note that the proportion of dark red segments increases as i increases. This figure shows that mutations at high recurrence sites (larger is) code for amino acids that are chemically very different from the wild type.

Distribution of cancer-driving nucleotides (CDNs) among genes.

(A) Out of 119 CDN-carrying genes (red bars), 87 have only one CDN. For the rest, TP53 possesses the most CDNs with three others having more than 10 CDNs. (B) CDN number in TP53 among patients. The dark bar represents the observed patient number with corresponding CDNs of the X-axis. The gray bar shows the expected patient distribution. Clearly, TP53 only needs to contribute one CDN to drive tumorigenesis. Hence, TP53 (and other canonical driver genes; see text), while prevalent, does not contribute disproportionately to the tumorigenesis of each patient.

Sharing of cancer-driving nucleotides (CDNs) across cancer types.

The X-axis shows imax, which is the largest i a CDN reaches among the 12 cancer types. The Y-axis shows the number of cancer types where the mutation also occurs. Each dot is a CDN, and the number of dots in the cloud is given. The blue and red dots denote, respectively, mutations classified as a CDN in one or multiple cancer types. Gray dots are non-CDNs. The table in the lower panel summarizes the number of sites and the number of genes harboring these sites.

Survival analysis of non-small cell lung cancer (NSCLC) patients based on EGFR mutation status.

Patient data were retrieved from the GENIE database (https://genie-public-beta.cbioportal.org/) and stratified into three groups based on EGFR mutation profiles: Group 1 comprises patients with EGFR CDN mutations; group 2 includes patients with nonsynonymous mutations in EGFR that are not cancer-driving nucleotides (CDNs); the EGFRWT group consists of patients with no EGFR mutations (see ‘Methods’). Patients of groups 1 and 2 received EGFR-targeted therapies in accordance with the guidelines for managing EGFR mutant NSCLC (Passaro et al., 2022; Choudhury et al., 2023). Survival analysis using the Kaplan–Meier method revealed a significantly higher survival rate for group 1 patients compared to group 2 and the EGFRWT group (p<0.001).

Appendix 1—figure 1
The overlap of cancer drivers from IntOGen, Bailey et al. and CGC Tier 1 (Bailey et al., 2018; Sondka et al., 2018; Martínez-Jiménez et al., 2020).

Driver genes (dots) for 12 cancer types were extracted from each driver list, indicated by three different region colors. The area size of each region is proportional to the gene number, with 384 genes for IntOGen, 168 for Bailey et al. and 137 for CGC Tier 1. Genes with a significant positive selection signal in the merged mutation set are marked in red, while nonsignificant ones are colored in blue. Notably, genes shared across the three driver sets are largely those with a significant Ka/Ks > 1.

Appendix 2—figure 1
Noncanonical cancer driver genes (CDGs) in colon and lung cancer along with associated biological processes (Y-axis).

For each gene, we examine its annotation results from GO analysis and search for cancer-related evidence in the literature. Biological processes are summarized and curated in relation to cancer hallmarks. Each connection between gene ID and biological process is depicted by a blue block in the grid.

Appendix 2—figure 2
Top 10 noncanonical cancer driver genes (CDGs) with the highest enrichment records with IntOGen’s driver list from four enrichment analysis.

Panels (A–D) corresponds to Gene Ontology, KEGG, Disease Ontology, and Reactome analysis, respectively. The X-axis represents the number of enrichment records for each gene, while genes are listed on the Y-axis according to their enrichment record number. Genes with different occurrences across the top set of four analysis are marked with red (three hits), blue (two hits) and black (one hit).

Tables

Table 1
Mutation recurrences (Ais and Sis) in 12 cancer types.
LungBreastCentral nervous systemKidneyUpper aerodigestive tractColonEndometriumProstateStomachUrinary tractOvaryLiverAverage
Patients #1035963873711688571465465423404404367614
*A022,540,62321,683,13620,783,83522,247,65321,580,44420,601,02620,766,00121,300,81020,892,75521628918222781242261805921576782
*S078,042,819,388,41810,298,91187,814,8393,332,8310,428,91310,375,59697,543,3110,243,6349426888874600282552689403084
A/S_02.892.312.022.532.311.982.002.182.042.292.552.742.29
A119595844696251222566966924946347887095837883466153211382573161109
S16939316732101829317261513860631982361332538265467227939823474
A/S_12.822.672.472.762.562.452.472.652.422.492.922.742.60
A2294623328756489166210522911768165146737
S29696275111597363869489308912249
A/S_23.043.763.835.093.082.262.733.222.402.655.673.832.74
A399184214289152679609942.3
S321261528110149008.08
A/S_34.7197145.63.254.736:05.646.679:09:05.23
Ai ≥31785184187714814214124100262382.1
Ai ≥4793342449579084540171439.8
A4231082142321323114311.1
A516610210620299358.2
A6-927101001393227126210.8
A[10, 20)731009119165445.75
A≥206440388003003
Total20282845669265962584168387989318189897068167868297213872594463097
SiteNbr22739705217281162080932822273396216479342069747020846065213104362097288921695987222993392264385921638710
nE(u)9.07E-031.79E-031.00E-031.06E-032.83E-033.84E-033.15E-033.72E-043.27E-032.88E-038.28E-041.14E-032.6E-03
  1. *

    See ‘Methods’ for the calculations of A0 and S0.

  2. Ai and Si are as defined in the text.

  3. ‘Total’ represents the total number of missense mutations, or . ‘Site number’ refers to the count of missense sites. nE(u) is calculated based on synonymous mutations, representing the expected number of neutral mutations per site in a population of size n.

Table 2
Excess of Ais of each i class.
RecurrencesLungBreastCentral nervous systemKidneyUpper aerodigestive tractColonEndometriumProstateStomachUrinary tractOvaryLiver
*A1_o195958446962512225669669249463478870958378834661532113825731
*, A1_e198627385862053223582603167604963860788866194607511839625720
Excess–2669611045902087660818585150101695126405402274211
Ratio (%)–1.3613.6718.278.139.8719.6419.0317.6916.038.1712.970.04
A2_o294623328756489166210522911768165146
A2_e175069202516928019632101711529
Excess1195.61164.36266.7231.01320.481381.54855.7726.08966.42645.4135.8116.75
Ratio (%)40.5870.5492.9355.3765.5483.1381.3589.9382.1879.0970.2236.42
A3_o991842142891526796099
A3_e15.430.120.020.030.471.030.600.000.660.480.010.03
Excess83.5717.8841.9813.9727.5389.9751.406.0078.3459.528.998.97
Ratio (%)84.4299.3299.9599.8198.3298.8698.8499.9899.1699.2099.8699.63
A4_o2310821423213231143
A4_e0.135930.000221.98E-052.81E-050.001320.003810.001854.00E-070.002100.001351.04E-053.78E-05
Excess22.86419.999787.999981.9999713.998722.996220.9981322.997910.99873.999992.99999
Ratio (%)99.4110010010099.9999.9899.99100.0099.9999.99100100.00
  1. *

    The notation of ‘o’ and ‘e’ following Ais represents the observed Ai and expected Ai.

  2. See ‘Methods’ for the calculation of expected Ais.

  3. Ratio is the proportion of observed sites in excess, that is, the proportion of putative CDNs in the observation.

Table 3
Distribution of cancer-driving nucleotides (CDNs) among genes.
CDN calls based on i*=3LungBreastCentral nervous systemUpper aerodigestive tractColonEndometriumMeanTotalOverlap with the conventional setCriteria of classification
# of patients
(n)
1035963873688571465---
CDN count178508377148142113.3495-
# CDN-carrying genes (type I fulfills the convention of Ka/Ks > 1**; type II does not)
Type I
(Ka/Ks >1**)
1081213102112.334595.7%Conventional
Type II
(Ka/Ks ~1)
799121986354022926.1%This study only
All CDN genes89172432965652.3325847%Both types
Genes with 1–2 CDNs
(% all CDN genes)
80
(89.9 %)
14
(82.4 %)
19
(79.2 %)
27
(84.4 %)
90
(93.8 %)
45
(80.4 %)
45.8
(85 %)
250
(96.9%)
A subset of both types
Number of driver genes in three major CDG lists
*Other criteria:  Variable (see legends)
IntOGen118100100106867297321
Bailey et al.36293238205535134
CGC Tier 130323224442330.83118
  1. *

    intOGen, Bailey et al., and CGC Tier 1 are the three major CDG lists adopted here for comparison (Bailey et al., 2018; Sondka et al., 2018; Martínez-Jiménez et al., 2020).

  2. ”Total” refers to the cumulative number of unique genes identified across all six cancer types.

  3. Here, ** denotes significant Ka/Ks results with a corrected q-value < 0.1 based on dndscv analysis.

Table 4
Numbers of patients with cancer-driving nucleotides (CDNs) vs. number of patients with any non-synonymous mutations in the same genes.
LungBreastCentral nervous systemUpper aerodigestive tractColonEndometrium
CDN*
(178)
Gene
(89)
CDN
(50)
Gene
(17)
CDN
(83)
Gene
(24)
CDN
(77)
Gene
(32)
CDN
(148)
Gene
(96)
CDN
(142)
Gene
(56)
n0342
(33%) §
53
(5.3%)
492
(51.1%)
415
(43.1%)
235
(26.9%)
163
(18.7%)
268
(39%)
140
(20.3%)
102
(17.9%)
42
(7.4%)
42
(9%)
14
(3%)
n1411
(39.7%)
70
(6.8%)
379
(39.4%)
395
(41%)
359
(41.1%)
306
(35.1%)
268
(39%)
229
(33.3%)
159
(27.8%)
79
(13.8%)
108
(23.2%)
59
(12.7%)
n2192
(18.6%)
84
(8.1%)
73
(7.6%)
114
(11.8%)
225
(25.8%)
293
(33.6%)
101
(14.7%)
171
(24.9%)
140
(24.5%)
93
(16.3%)
169
(36.3%)
101
(21.7%)
n>290
(8.7%)
826
(79.8%)
18
(1.9%)
38
(3.9%)
53
(6.1%)
110
(12.6%)
50
(7.3%)
147
(21.4%)
170
(29.8%)
357
(62.5%)
146
(31.4%)
291
(62.6%)
Total n10351035963963873873688688571571465465
Mean #1.067.190.610.781.121.440.931.631.964.62.173.7
  1. *

    ni designates the number of patients with i CDN mutations.

  2. The number in the parentheses is the total number of CDNs or genes.

  3. In this column, ni designates the number of patients with any nonsynonymous mutation in the same gene as the CDN column.

  4. §

    There are 684 CDNs summed over all cancer types. The percentage is ni/Total n.

Table 5
Gene numbers for different cancer hallmarks.
Gene number
HallmarkAll recordsBreastColon
Angiogenesis7886
Cell division control1071210
Cell replicative immortality4443
Change of cellular energetics70104
Escaping immune response to cancer5111
Escaping programmed cell death2023220
Genome instability and mutations106107
Invasion and metastasis2065227
Proliferative signaling1764020
Senescence4835
Suppression of growth1301112
Tumor-promoting inflammation5423
  1. Data downloaded from COSMIC (https://cancer.sanger.ac.uk/cosmic/download), see ‘Methods’.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lingjie Zhang
  2. Tong Deng
  3. Zhongqi Liufu
  4. Xiangnyu Chen
  5. Shijie Wu
  6. Xueyu Liu
  7. Changhao Shi
  8. Bingjie Chen
  9. Zheng Hu
  10. Qichun Cai
  11. Chenli Liu
  12. Mengfeng Li
  13. Miles E Tracy
  14. Xuemei Lu
  15. Chung-I Wu
  16. Hai-Jun Wen
(2024)
Characterization of cancer-driving nucleotides (CDNs) across genes, cancer types, and patients
eLife 13:RP99341.
https://doi.org/10.7554/eLife.99341.3