Dynamic molecular evolution of a supergene with suppressed recombination in white-throated sparrows

  1. Hyeonsoo Jeong
  2. Nicole M Baran
  3. Dan Sun
  4. Paramita Chatterjee
  5. Thomas S Layman
  6. Christopher N Balakrishnan
  7. Donna L Maney  Is a corresponding author
  8. Soojin V Yi  Is a corresponding author
  1. School of Biological Sciences, Georgia Institute of Technology, United States
  2. Department of Psychology, Emory University, United States
  3. Department of Ecology, Evolution, Marine Biology, University of California, Santa Barbara, United States
  4. Department of Medicine Huddinge, Karolinska Institutet, Sweden
  5. Department of Biology, East Carolina University, United States
4 figures, 2 tables and 8 additional files

Figures

Figure 1 with 2 supplements
Genomic data from newly sequenced tan- and white-striped birds.

(A) Nucleotide diversity of macro-chromosomes for tan-striped (TS) and white-striped (WS) birds. White-striped birds (ZAL2/2m) show elevated nucleotide diversity for the ZAL2/2m inverted (INV, i.e. rearranged) regions (ZAL2/2m inv), while TS birds (ZAL2/2) show overall reduced nucleotide diversity for the inverted regions compared with other chromosomes. Note that panel (A) shows the comparison across morph. The comparison across the ZAL2 and ZAL2m alleles is shown in Figure 2a. (B) Scatterplots of eigenvector 1 (PC1) and eigenvector 2 (PC2) from principal component analysis of all single-nucleotide variants (left panel). (C) Principal component analysis (PCA) excluding single nucleotide polymorphisms (SNPs) on the ZAL2 chromosomes (right panel). The sex chromosomes and the ZAL3 chromosome (which includes an additional chromosomal inversion) were excluded from both PCA analyses. Note that ‘location’ here refers to the site of collection or capture of the bird: Georgia (GA), Illinois (IL), or Maine (ME). Breeding locations for GA and IL birds are unknown.

Figure 1—source data 1

Nucleotide diversity between tan- and white-striped birds.

Figure 1B and C: Supplementary file 1 (PCAs performed using variant call format (vcf) data from whole genome sequencing).

https://cdn.elifesciences.org/articles/79387/elife-79387-fig1-data1-v1.txt
Figure 1—figure supplement 1
The number of informative sites inside the ZAL2m rearrangement differed between morphs.

The number of informative sites in tan- vs white-striped birds is shown for the four largest chromosomes (macrochromosomes), computed using the same number of samples from tan-striped (TS) and white-striped (WS) birds (=13 each). ZAL2 inv = inverted region in ZAL2 or ZAL2m; ZAL2 non-inv = non-inverted region in ZAL2 or ZAL2m.

Figure 1—figure supplement 2
Admixture tests showed no population substructure by geographic sampling location.

Inferred ancestral population fractions are shown for each bird as estimated by ADMIXTURE (K=2 to K=6 possible populations) for birds of each genotype and from different sampling locations. ADMIXTURE was run using all single nucleotide polymorphisms (SNPs) in the genome, excluding SNPs that met any of the following criteria: MAF <0.01, missing >20%, located inside the additional chromosomal polymorphism on ZAL3, or located in sex chromosomes. Note that ‘geographic location’ here refers to the site of collection or capture of the bird. Breeding locations for the GA and IL birds were unknown.

Figure 2 with 1 supplement
Genetic divergence between ZAL2 and ZAL2m chromosomes.

(A) The scaffolds for the ZAL2m chromosome in the super-white (SWS) assembly tend to be fragmented compared with those for the ZAL2 chromosome in the tan-striped (TS) assembly. ** p<0.001 (Mann-Whitney U-test); ns, not significant (B) Fraction of structural variants (SV), both insertion and deletion events, for the 4 largest chromosomes, using the tan-striped assembly as a reference. The fraction of SV is computed as a total base affected by variants divided by the length of the chromosome. (C) Number of fixed mutations derived in ZAL2 and ZAL2m in protein-coding regions (D) Sliding window (window size of 20 genes with step size of 5 genes) analysis of the ratio of nonsynonymous to synonymous nucleotide diversity (πNS) within the ZAL2 and ZAL2m chromosomes. The ZAL2m outlier region is highlighted (colored background). (E) Site frequency spectrum of polymorphic sites. (F) Decay of linkage disequilibrium. (G) Proportion of the ZAL2m alleles expressed for each tissue set. The proportion of the ZAL2m alleles expressed is less than the null hypothesis of 0.5 for all tissues except nestling AMV using false discovery rate (FDR) correction. Hyp, hypothalamus; AMV, ventromedial arcopallium.

Figure 2—figure supplement 1
Allelic bias in expression was associated with the number of non-synonymous fixed differences.

Allelic bias in expression for each gene, averaged across sequencing batch and tissue (see Table 1, Materials and methods), is plotted by (A) the rate of fixed differences per-base that are non-synonymous, (B) the rate of fixed differences per-base that are synonymous, or (C) the number of fixed differences within 1 kb upstream of the transcription start site. Only the rate of non-synonymous fixed differences was associated with allelic bias in gene expression (X2(1)=9.97, p=0.00159).

Figure 3 with 5 supplements
Genetic diversity and patterns of divergence across the rearranged region of the ZAL2m chromosome and in the ZAL2m outlier region.

(A) Tajima’s D and nucleotide diversity across the ZAL2 and ZAL2m chromosomes. The ZAL2m outlier region is highlighted (colored background). (B) Phylogenetic tree of randomly selected regions (left panel) and the ZAL2m outlier region (right panel). The ZAL2m chromosome shows multiple haplotype structures and has longer branch lengths within the population compared with ZAL2 chromosomes. (C) Single nucleotide polymorphism (SNP) genotype plot of a scaffold inside the ZAL2m outlier region (Scaffold NW_005189516.1, 1900001–1950001). The plot shows two haplogroups. Major allele SNPs (A, same genotype as the super-white ZAL2m/2m genome) are represented in purple, and minor allele SNPs (a, different from the super-white genome) in red. Tan indicates that there were no fixed SNPs to differentiate ZAL2 vs ZAL2m reads, resulting in missing data. (D) Genetic divergence (dXY) for a portion of the rearrangement. dXY between the ZAL2 chromosome and haplogroup 1 (H1) is plotted in light blue, between ZAL2 and haplogroup 2 (H2) in dark blue, and between H1 and H2 in light green.

Figure 3—source data 1

RAxML bipartitions for scaffold 5189516.

Figure 3A: Supplementary file 1 (Tajima’s D and nucleotide diversity plots created from variant call format (vcf) data from whole genome sequencing).

https://cdn.elifesciences.org/articles/79387/elife-79387-fig3-data1-v1.txt
Figure 3—source data 2

RAxML bipartitions for scaffold 5190802.

https://cdn.elifesciences.org/articles/79387/elife-79387-fig3-data2-v1.txt
Figure 3—source data 3

Genotype data for scaffold 5189516.

https://cdn.elifesciences.org/articles/79387/elife-79387-fig3-data3-v1.txt
Figure 3—source data 4

dXY between ZAL2m haplotypes and ZAL2.

https://cdn.elifesciences.org/articles/79387/elife-79387-fig3-data4-v1.txt
Figure 3—figure supplement 1
No evidence of introgression in ZAL2m outlier region.

Phylogenetic tree of ZAL2m outlier region using only the exons of single-copy orthologous genes.

Figure 3—figure supplement 2
The D-statistic did not vary by haplotype.

50 kb sliding window estimates of the D-statistic resulting from ABBA-BABA tests using as ingroup genomes the ZAL2 chromosome, the Harris' sparrow chromosome 2, and the ZAL2m chromosome (P1, P2, and P3, respectively). We used the medium ground finch as the outgroup species (P4). Data from the ZAL2m outlier region are plotted for four individual birds.

Figure 3—figure supplement 3
No difference in sequencing depth between haplotypes.

Boxplot of the average sequencing depth for individuals of haplotype 1 (H1) and haplotype 2 (H2).

Figure 3—figure supplement 4
The ZAL2m outlier region exhibited an excess of intermediate frequency minor alleles.

Site frequency spectra of polymorphic sites inside the ZAL2m outlier region are shown for both the ZAL2 and ZAL2m chromosomes. (A) shows all ZAL2/ZAL2m-linked single nucleotide polymorphisms (SNPs) and (B) excludes singleton SNPs.

Figure 3—figure supplement 5
Neither sex nor geographic location of sample collection produced distinct patterns between haplogroups.

Scatterplots of eigenvector 1 (PC1) and eigenvector 2 (PC2) from principal component analysis of all single-nucleotide variants outside the ZAL2/2m inversion are shown. Colors show the haplogroup of the sample. In the left panel, shape indicates the sex of the sample and in the right panel, shape indicates the geographic sampling location. Note that the GA and IL birds were captured during migration, so their breeding location was unknown. SNP: single nucleotide polymorphisms.

Figure 4 with 3 supplements
Evidence for antagonistic selection driving ZAL2 and ZAL2m gene expression in the brain.

(A) shows the percentage of differentially expressed genes that reside inside the rearranged region on ZAL2, vs elsewhere in the genome. The percentage of differentially expressed genes inside vs outside the rearranged region of ZAL2 is higher than expected by chance (padj <2.2 × 10–16 for all comparisons). (B) shows log2 ZAL2m expression ratios for genes that were more highly expressed in white-striped birds (W>T), genes more highly expressed in tan-striped birds (T>W) and those that that do not significantly differ between morphs (T=W). (C) Log2 ZAL2m expression ratios are plotted vs the Log2 ZAL2m H-statistic for each category of sample. Hypothalamus (Hyp), Ventromedial arcopallium (AMV). (D) Log2 ZAL2m expression ratio are plotted vs the Log2 ZAL2 H-statistic.

Figure 4—source data 1

Percent of Differentially Expressed genes on ZAL2 vs rest of genome.

https://cdn.elifesciences.org/articles/79387/elife-79387-fig4-data1-v1.txt
Figure 4—source data 2

RNAseq allele specific expression data for brain in long format merged with morph bias and H-scan values.

https://cdn.elifesciences.org/articles/79387/elife-79387-fig4-data2-v1.txt
Figure 4—figure supplement 1
Genetic differentiation between ZAL2 and ZAL2m is reduced at the ends of the chromosomal arms.

Plots show the population differentiation in allele frequency (FST) between tan and white birds, the number of nucleotide substitutions per site (dXY) between ZAL2 and ZAL2m, and density of fixed differences (df) between ZAL2 and ZAL2m inside the rearranged region.

Figure 4—figure supplement 2
Both ZAL2 and ZAL2m have experienced selective sweeps.

The imputed p-values of the H-statistic (a measure of homozygosity, computed in 20 kb windows) are plotted along the position on ZAL2 inside the inversion for (A) ZAL2 and (B) ZAL2m. Colors refer to alternating scaffolds. A candidate region showing elevated H-statistics in four 20 kB windows on both ZAL2 and ZAL2m (Scaffold NW_005081582.1, 480–520 kb and 920–960 kb) is highlighted in blue. A~6 Mbp region showing a long stretch of elevated H-statistics on ZAL2 is highlighted in red.

Figure 4—figure supplement 3
Population recombination rates in ZAL2 are likely higher in the chromosome end.

FastEprr was used to estimate the population recombination parameter, rho, for each genome assembly contig (50 kbp non-overlapping sliding window).

Tables

Table 1
List of RNA sequencing data sets.
TissueSample size (WS/total)Collection detailsSource
Adult malesBrain (Hyp, AMV)9/20Collected early in the breeding seasonZinzow-Kramer et al., 2015; 
Sun et al., 2018
Accession: GSE77186
Adult femalesBrain (Hyp, AMV)6/11Collected early in the breeding seasonAccession: PRJNA657006
Nestlings (both sexes)Brain (Hyp, AMV)16/32Collected from nests during the breeding seasonAccession: PRJNA657006
Adult males (all white-striped)Heart and Liver20/20Collected during fall migration, then housed in captivity on either long or short days to simulate breeding vs non-breeding; collected at two time points during the dayHorton et al., 2019
Accession: GSE116989
Table 2
List of protein-coding genes inside the ZAL2m outlier region.
GeneScaffoldStartEndπ ZAL2π ZAL2mTaDZAL2TaD ZAL2mDXY
KCNS3NW_005081621.1970891105122.75E-048.43E-04–1.2953–0.86580.011281
MSGN1NW_005081621.1160375160897NA6.10E-04NA–0.11380.003056
GEN1NW_005081621.11757911982453.52E-042.22E-03–1.13021.17280.011647
SMC6NW_005081621.11984522441363.94E-042.27E-03–1.00880.85480.011901
MYCNNW_005081621.1117949211847612.59E-046.18E-04–0.58980.21160.005173
DDX1NW_005081621.1143269714525353.41E-042.42E-03–1.58190.87520.014699
NBASNW_005081621.1145460116155802.93E-042.17E-03–1.62712.22910.012296
TRIB2NW_005081621.1259617826169503.45E-041.89E-04–1.0009–0.83760.011498
LPIN1NW_005081621.1301215330612032.73E-043.27E-04–1.7799–0.72520.015295
GREB1NW_005081621.1310081431651862.25E-041.72E-04–1.8299–0.73510.01458
E2F6NW_005081582.124475465771.90E-041.59E-04–1.5978–1.02191.4E-02
ROCK2NW_005081582.1509931551702.62E-046.34E-04–1.46–0.23621.4E-02
KCNF1NW_005081582.13314243339919.85E-052.41E-04–0.2519–0.86415.9E-03
PDIA6NW_005081582.14158534319193.74E-041.49E-04–1.1579–1.25211.2E-02
ATP6V1C2NW_005081582.14317664548862.56E-046.94E-04–1.7535–0.23911.4E-02

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Hyeonsoo Jeong
  2. Nicole M Baran
  3. Dan Sun
  4. Paramita Chatterjee
  5. Thomas S Layman
  6. Christopher N Balakrishnan
  7. Donna L Maney
  8. Soojin V Yi
(2022)
Dynamic molecular evolution of a supergene with suppressed recombination in white-throated sparrows
eLife 11:e79387.
https://doi.org/10.7554/eLife.79387