Copy number variation and population-specific immune genes in the model vertebrate zebrafish

  1. Yannick Schäfer
  2. Katja Palitzsch
  3. Maria Leptin
  4. Andrew R Whiteley
  5. Thomas Wiehe  Is a corresponding author
  6. Jaanus Suurväli  Is a corresponding author
  1. Institute for Genetics, University of Cologne, Germany
  2. WA Franke College of Forestry and Conservation, University of Montana, United States
  3. Department of Biological Sciences, University of Manitoba, Canada
4 figures, 6 tables and 1 additional file

Figures

Structure of zebrafish NLRs and a map showing the origin of wild zebrafish samples.

(A) Generalized, schematic representation of the domain architecture of an NLR-C protein. Each box represents a translated exon. The N-terminal repeats, the death-fold domain, as well as the B30.2 domain only occur in subsets of NLR-C genes. The number of N-terminal repeats and leucine-rich repeats can vary. Domains that can be either present or absent in different NLRs are surrounded by square brackets. (B) Sampling sites for wild zebrafish. All sites are located near the Bay of Bengal. Final sequenced sample sizes are indicated in parentheses. The map is based on geographic data collected and published by AQUASTAT from the Food and Agriculture Organization of the United Nations (FAO, 2021). The population DP is marked with an asterisk because its analysis and results are presented only in figure supplements.

Figure 2 with 3 supplements
Total counts of NLRs found per individual, shown for each population.

Black diamonds on the box plots denote means, horizontal lines denote medians. Left side: two laboratory strains; right side: three wild populations.

Figure 2—figure supplement 1
Sequencing and assembly statistics of circular consensus sequence (CCS) reads from NLR exons.

(A1) Absolute numbers of CCS reads from NLR exons per sequenced individual. (A2) Lengths of the CCS reads that map to NLR genes. (B1) Absolute numbers of assembled contigs containing an NLR exon per sequenced individual. Triangles above TU mark the numbers of NLR exons found in the reference genome. (B2) Lengths of the individual assembled contigs that contain an NLR exon. Outliers not shown in the boxplots. The black diamonds on boxplots denote means, horizontal black lines denote medians.

Figure 2—figure supplement 2
Assembled NLRs in the reference genome GRCz11.

(A) Proportions of unique FISNA-NACHT and NLR-B30.2 sequences that were successfully mapped to the reference genome GRCz11 with a mapping quality of 60, by population. (B) Distribution of mapping qualities for all unique NLR sequences that aligned to GRCz11, showing that most map either with very high (60) or very low quality.

Figure 2—figure supplement 3
Identification of B30.2 domains associated with zebrafish NLRs.

(A) Nucleotide sequence logo (on top, continued on bottom left) and amino acid logo (on bottom right) for a small, highly conserved 47 bp exon that precedes B30.2 in zebrafish NLRs and not in other genes. The first nucleotide of the exon was removed to generate the correct amino acid translation. Logos were created with Weblogo, the height of each base represents its information content in bits (Crooks et al., 2004). (B) Absolute numbers of contigs containing a B30.2 exon per sequenced individual, split by presence/absence of the NLR-specific exon. The black diamonds on boxplots mark the means. (C) Genomic distribution of FISNA-NACHT domains in the GRCz11 reference genome. (D) Genomic distribution of B30.2 domains in the GRCz11 reference genome.

Figure 3 with 2 supplements
Copy number variation of NLR genes.

(A) Sequence data from each individual zebrafish (vertical axis) was aligned to FISNA-NACHT exon sequences of the pan-NLRome (horizontal axis). Grayscale intensity shows, for each NLR, the proportion of NLR-aligning data in each given fish that matches this specific gene. Darker gray indicates a higher likelihood of this NLR being represented in multiple copies in the particular individual. Light gray indicates a single copy, white indicates absence. For clarity, only the 1235 FISNA-NACHT exons for which at least one fish had a minimum of 10 reads mapped to it are shown. (B) Numbers of pan-NLRome sequences (based on FISNACHT diagnosis) found in all three, two, or only one wild population. (C) Relative numbers of fish in which pan-NLRome sequences were found in wild populations. ’Core’ pan-NLRome: genes which are found in at least 80% of the sample (from a total of 57 wild fish); ’shell’: genes in at least 20%; ’cloud’: rare genes found in less than 20% of the sample. (D) Observed and estimated sizes of population-specific pan-NLRomes. Data points (filled circles and squares) show the average number of totally discovered NLR genes (as identified via their FISNA-NACHT domain) when investigating x fish. The dashed line is obtained by non-linear fit of the data to the function given in Equation 2. For all populations, the hypothetical pan-NLRome size – when extrapolating x – is finite (see Table 1).

Figure 3—figure supplement 1
Comparison of copy number variation in FISNA-NACHT and NLR-B30.2 exons.

(A1, A2) Numbers of private and shared NLR sequences in wild populations. (B1, B2) Numbers of unique NLR sequences (each with one or more copies per individual) found in fish of the sequenced strains. Black diamonds on the box plots denote means, horizontal lines denote medians. (C1, C2) Population-specific pan-NLRomes and sets of NLR-B30.2 domains. Data points (filled circles and squares) show the average number of totally discovered NLR genes (as identified via their FISNA-NACHT domain) in x individuals. The dotted lines represent the result of non-linear curve fitting (detailed in ‘Materials and methods’).

Figure 3—figure supplement 2
Copy number variation of NLR genes, including the DP population.

(A) Sequence data from each individual zebrafish (x-axis) was aligned to FISNA-NACHT exon sequences of the pan-NLRome (y-axis). Grayscale intensity shows, for each NLR, the proportion of NLR-aligning data in each given fish that matches this specific gene. Darker colors can be interpreted as potentially having multiple copies. Lighter colors indicate a single copy, white color means that the sequence was not present. For clarity, only the 1235 FISNA-NACHT for which at least one fish had 10 reads mapped to it are shown. (B) Relative numbers of fish in which the pan-NLRome sequences were found in wild populations. Some belong to the core pan-NLRome (in at least 80% of fish), while others are classified as shell (in at least 20% of fish) or cloud (less than 20%). (C) Numbers of unique NLR sequences (each with one or more copies per individual) found in fish of the sequenced strains. Black diamonds on the box plot denote means, horizontal lines denote medians. (D, E) Principal component analysis of scaled-per-individual NLR (FISNA-NACHT) copy numbers. The first two components appear to separate data based on differences between wild and laboratory zebrafish (PC1), and based on geographic distance (PC2).

Figure 4 with 1 supplement
Single-nucleotide variation in NLR exons.

Pairwise nucleotide diversity (θπ) and Watterson’s estimator of the scaled mutation rate (θw) for FISNA-NACHT (A) and NLR-associated B30.2 (B) exons. (C) Proportion of exons without any single nucleotide polymorphisms. (D) Ratio of θπ/θw. Only exons with at least one single-nucleotide polymorphism are shown. The dotted, horizontal line marks a ratio of 1, the expected value under neutrality and constant population size. The black diamonds on box plots denote means, horizontal lines denote medians.

Figure 4—figure supplement 1
Single-nucleotide polymorphisms of different NLR exons shown by population, including DP.

(A) Nucleotide diversity (θπ) and Watterson estimator (θw) for FISNA-NACHT exons. (B) Nucleotide diversity (θπ) and Watterson estimator (θw) for NLR-associated B30.2 exons. (C) Proportion of exons which are completely monomorphic. (D) Ratio of θπ/θw. Only exons with at least one variant are shown. The black, dotted line marks a ratio of 1. The black diamonds on box plots denote means, horizontal lines denote medians.

Tables

Table 1
Values of fitted parameters and saturation limits for FISNA-NACHT and NLR-B30.2 exons, by population.
PopulationFISNA-NACHTNLR-B30.2
-αβLimitQuantile*αβLimitQuantile*
TU178.2741.43356519.54811853.85791.40774164.73164
CGN257.2071.62786569.3672378.71561.61283177.24625
DP309.141.0123125284293069.36090.87454na
KG436.7611.21522288.412060145.7151.14181113.236.41e6
SN479.8921.260932152.123907145.5481.101831514.353.75e9
CHT416.7121.188932451.811.12e5135.6771.119111218.541.41e8
  1. *

    Sample size required to capture 90% of the population’s pan-NLRome.

  2. DP required sample size refers to only 10% (instead of 90%) of its hypothetical pan-NLRome size.

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain (Danio rerio)Cologne zebrafish; CGN; KOLNOther8 Cologne fish, AG Hammerschmidt, University of Cologne
Strain (D. rerio)Tübingen zebrafish; TUOther8 Tübingen fish, AG Hammerschmidt, University of Cologne
Biological sample (D. rerio)DPOther20 wild fish, Dandiapalli, India (22.22155, 84.79430)
Biological sample (D. rerio)CHTOther20 wild fish, Chittagong, Bangladesh (22.47400, 91.78300)
Biological sample (D. rerio)KGOther20 wild fish, Leturakhal, India (22.26189 87.27881)
Biological sample (D. rerio)SNOther20 wild fish, Santoshpur, India (22.93765 88.55311)
Sequence-based reagentBaits; RNA baits; hybridization baitsDaicel Arbor BiosciencesCat# Mybaits-1-24Sequences available in Figure 2—source data 2
Commercial assay or kitMagAttract HMW DNA KitQIAGENCat# 67563
Commercial assay or kitNucleoSpin Tissue KitMACHEREY-NAGELCat# 740952.50
Commercial assay or kitNEBNext Ultra II DNA Library Prep KitNew England BiolabsCat# E7645L
Sequence-based reagentNEBNext Multiplex Oligos for IlluminaNew England BiolabsCat# E7335LIndex Primers Set 1
Commercial assay or kitKapa HiFi Hotstart ReadymixKapa BiosystemsCat# 07958935001
Commercial assay or kitPreCR Repair MixNew England BiolabsCat# M0309L
Commercial assay or kitSMRTbell Template Prep Kit 1.0-SPv3Pacific BiosciencesCat# 100-991-900
OtherGRCz11NCBI RefSeqRefSeq:GCF_000002035.6Zebrafish reference genome
OtherM220 miniTUBE, RedCovarisCat# 4482266Used to shear DNA on Covaris ultrasonicator
OtherDB MyOne Streptavidin C1Thermo Fisher ScientificCat# 65001Used to retrieve bait-bound DNA fragments
OtherAMPure XPBeckman CoulterCat# A63881Size selection beads
OtherAmpure PBPacific BiosciencesCat# 100-265-900PacBio-compatible size selection beads
Software, algorithmlimaPacific Bioscienceslima:v1.0.0; lima:v1.8.0; lima:v1.9.0; lima:v1.11.0
Software, algorithmccsPacific Biosciencesccs:v4.2.0
Software, algorithmpbmarkdupPacific Biosciencespbmarkdup:v1.0.0
Software, algorithmpbmm2Pacific Biosciencespbmm2:v1.3.0
Software, algorithmsamtoolshttps://doi.org/10.1093/bioinformatics/btp352samtools:v1.7
Software, algorithmEMBOSShttps://doi.org/10.1016/s0168-9525(00)02024-2EMBOSS:v6.6.0.0
Software, algorithmHMMERhttps://doi.org/10.1093/bioinformatics/btt403HMMER:v3.2.1
Software, algorithmblastnhttps://doi.org/10.1186/1471-2105-10-421blastn:v2.11.0+
Software, algorithmhifiasmhttps://doi.org/10.1038/s41592-020-01056-5hifiasm:v0.15.4-r347
Software, algorithmget_homologueshttps://doi.org/10.1128/AEM.02411-13get_homologues:x86_64–20220516
Software, algorithmdeepvarianthttps://doi.org/10.1038/nbt.4235deepvariant:r1.0
Software, algorithmGLnexushttps://doi.org/10.1101/343970Glnexus:v1.2.7–0-g0e74fc4
Software, algorithmvcftoolshttps://doi.org/10.1093/bioinformatics/btr330vcftools:v0.1.16
Appendix 1—table 1
Sequencing scheme for the zebrafish samples.

Libraries sequenced after the introduction of an improved (long run) sequencing chemistry are marked with LR. Samples that yielded no data after sequencing are marked with asterisks.

IndividualsLibrarySequencer
TU01, TU02, TU03, TU06TU L1Sequel
TU08, TU10, TU12, TU14TU L2Sequel
CGN1, CGN2, CGN3, CGN4CGN L1Sequel
CGN5, CGN6, CGN7, CGN8CGN L2Sequel
DP07, DP09, DP10, DP12DP L1Sequel
DP15, DP20, DP23, DP24, DP25, DP28, DP31, DP34DP L2Sequel (LR)
DP03, DP05, DP13, DP16, DP21, DP29, DP31, DP33DP L3Sequel (LR)
KG35, KG41, KG42, KG43KG L1Sequel
KG03, KG05, KG07, KG12, KG14, KG15, KG18, KG19KG L2Sequel (LR)
KG20, KG22, KG24, KG26, KG29, KG32, KG33, KG44KG L3Sequel (LR)
SN21, SN23, (SN24*), SN26SN L1Sequel
SN03, SN04, SN08, SN09, SN10, SN11, SN12, SN24SN L2Sequel II (LR)
SN13, SN14, SN15, SN16, SN17, SN18, SN19, SN20SN L3Sequel II (LR)
CHT19, CHT23, CHT26, CHT28CHT L1Sequel
CHT01 - CHT07, (CHT13*)CHT L2Sequel II (LR)
CHT08, CHT10 - CHT12, CHT14 - CHT16, (SN25*)CHT L3Sequel II (LR)
Appendix 1—table 2
PCR program used for barcoding.

For library amplification, the same program was used with 26 or 31 cycles.

StepTemperature (°C)Duration
Initialization984 min
Denaturation9830 s{x 12}
Annealing6530 s
Elongation7212 min
Final elongation7220 min
Storage4
Appendix 1—table 3
qPCR program for the evaluation of enrichment efficiency.
StepTemperature (°C)Duration
Initialization9512 min
Denaturation9515 s{x 40}
Annealing6520 s
Elongation7220 s
Appendix 1—table 4
Sequences of qPCR primers used for evaluation of target enrichment.
GeneDirectionSequence
il1+5’-tgg-tga-acg-tca-tca-tcg-cc-3’
il1-5’-tcc-agc-acc-tct-ttt-tct-cca-a-3’
foxo6 intron+5’-agt-tct-gtg-tgg-gaa-cag-gg-3’
foxo6 intron-5’-gtg-cat-ctt-tag-cgt-tgg-ct-3’
NLR group 1+5’-cct-gac-aca-ggt-caa-caa-aac-a-3’
NLR group 1-5’-gat-tgt-ctt-ttc-ctt-cag-ccc-ag-3’
NLR group 2+5’-tgg-att-ggg-ctg-aag-gga-aa-3’
NLR group 2-5’-agg-ttc-agt-cct-tta-gtc-tct-gg-3’
NLR group 3+5’-ctg-ctg-gag-gtg-aaa-gat-cag-ac-3’
NLR group 3-5’-gat-tgt-tga-gca-gtg-agc-agg-a-3’
NLR group 4+5’-tac-ctg-gac-aag-aca-aag-cca-3’
NLR group 4-5’-ctc-ctt-ctc-ttc-agc-cca-gtc-3’

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yannick Schäfer
  2. Katja Palitzsch
  3. Maria Leptin
  4. Andrew R Whiteley
  5. Thomas Wiehe
  6. Jaanus Suurväli
(2024)
Copy number variation and population-specific immune genes in the model vertebrate zebrafish
eLife 13:e98058.
https://doi.org/10.7554/eLife.98058