Population structure analyses performed on the extended dataset including Tibetan, Sherpa, and lowland East-Asian individuals.

(A) Admixture analysis showed the best predictive accuracy when seven (K=7) population clusters were tested. Populations included in the dataset are labelled according to population names and acronyms reported in Supplement table 1. (B) Map showing geographic location and admixture proportions at K=7 of the high-altitude groups included in the extended dataset. The label Tibetans_WG indicates whole genome sequence data for individuals of Tibetan ancestry analysed in the present study. Additional information about the considered samples (e.g., number of individuals per group, reference study, and used abbreviations) are reported in Supplement table 1. (C) PCA plot considering PC1 vs PC2 and summarizing genomic divergence between high-altitude Tibetan/Sherpa people and the cline of variation observable for lowland East-Asian populations. The enlarged square displays clustering between Tibetan samples sequenced for the whole genome (i.e., blue dots) and Tibetan samples characterized by genome-wide data (i.e., light-blue squares).

Distribution of VolcanoFinder statistics suggestive of putative adaptive introgrossed loci across the EP300 and NOS2 genomic regions.

On the x-axis are reported genomic positions of each SNV, while on the y axis are displayed the related statistics obtained. Pink background indicates the chromosomal interval occupied by the considered genes, while the grey background identifies those genes/pseudogenes (i.e., EP300AS1 and L3MBTL2 in the EP300 downstream genomic region and the LGALS9DP pseudogene in the upstream NOS2 region) possibly involved in regulatory transcription mechanisms. The dashed red line identifies the threshold set to filter for significant LR values. (A) Non-normalized LR (blue diamonds) and -logα (grey diamonds) values resulted collectively elevated in the entire EP300 genomic region, which was characterized also by a consistent number of SNVs showing significant normalized LR values (red stars). (B) Both non-normalized LR and -logα scores appeared elevated in the upstream NOS2 region, which also presented a remarkable number of SNVs showing significant normalized LR values.

Significant gene networks enriched for Denisovan-like derived alleles according to the Signet analysis.

(A) Partially overlapping significant networks belonging to Cancer-related and Relaxin signalling pathways including genes that mediate key functions in the modulation of cell proliferation and differentiation, in the promotion of angiogenesis, as well as in the regulation of vascular tone thought NO induction. Among them, EP300 and genes functionally related to both EP300 and NOS2 candidate AI loci are highlighted in red. Genes supported by both Signet and VolcanoFinder analyses as potentially introgressed loci, as well as the associations that they establish in the networks are highlighted in blue. (B) Gene network built by setting co-expression as force function and by displaying the entire set of genes identified by the Signet algorithm as significantly enriched for Denisovan-like derived alleles. Genes whose variation pattern was supported by both VolcanoFinder and Signet analyses (e.g., MAPK1) as shaped by archaic introgression are displayed as dark red ellipses. EP300 and NOS2 genes, which we shortlisted as the most convincing candidate AI loci according to the Signet approach, VolcanoFinder analyses, and evidence advanced by previous studies supporting their adaptive role in high-altitude populations, are displayed as yellow diamonds. EPAS1, which was included manually in the network as a positive control locus that has been previously proved to have mediated adaptive introgression in Tibetan and Sherpa populations, was represented as pale pink rectangular. Genes included in pathways involved in angiogenesis and/or in the modulation of NO induction are reported as dark green circles, while the remining fraction of significant genes are represented as light-blue circles. The closeness or the distance between all nodes reflects the tendency to be co-expressed with each other and all the connections inferred are characterized by a confident score ≥ 0.7.

Representation of genetic distances between modern and archaic haplotypes.

(A) Heatmap displaying the divergence between Tibetan and Han Chinese NOS2 haplotypes with respect to the Denisovan sequence. Haplotypes are reported in rows, while derived (i.e., black square) and ancestral (i.e., white square) alleles are displayed in columns. Haplotypes are ranked from top to bottom according to their number of pairwise differences with respect to the Denisovan sequence. A total of 14 haplotypes belonging to individuals with Tibetan ancestry are plotted in the upper part of the heatmap (i.e., the first quartile of haplotype distribution). These haplotypes account for 33% of the haplotypes inferred for Tibetans and are characterized by 32 or less pairwise differences with respect to the Denisovan sequence. (B-D) Barplots showing the number and cumulative proportions of NOS2, EPAS1 and EGLN1 Tibetan and Han Chinese haplotypes in each haplotype class defined according to the number of pairwise differences between modern and archaic sequences. In all plots are reported on the x-axis the haplotype classes, while on the first and on the second y-axes are indicated the number of haplotypes belonging to each haplotype class (i.e., blue/red bars) and the cumulative proportion of haplotypes (i.e., blue/red lines), respectively. (B) NOS2 presents a pattern qualitatively comparable to the one displayed for the EPAS1 gene, with 46% of haplotype classes presenting a greater value for the cumulative proportion of TIB haplotypes (i.e., blue line) rather than CHB ones (i.e., red line). (C) The EPAS1 plot represents the trend expected for genes involved in AI events mediated by hard selective sweeps, in which TIB haplotypes (i.e., blue bars) are over-represented in all those haplotype classes presenting the smaller number of pairwise differences with Denisovan. In line with this, values obtained for the cumulative proportion of Tibetan haplotypes are higher with respect to those calculated for CHB haplotypes in all the haplotype classes considered. (D) The entire set of haplotype classes at EGLN1 showed higher values for the cumulative proportion of CHB haplotypes with respect to the same proportion calculated for Tibetan haplotypes, with an over-representation of Han Chinese samples in the haplotype classes presenting few numbers of differences with Denisovan, as it may be expected for genes not involved in AI events.