Total number of considered sequences from different regions of the human genome. Sequence listings available at https://github.com/rginiunaite/CGI-NMI-sequences.

Left column: locations of the Watson strand phosphates for 100 aligned nucleosome structures, projected to a plane perpendicular to the nucleosome central axis. Top row corresponds to 100 experimental PDB nucleosome structures (not all with independent sequences). Red points are phosphates with local minima of radial distance used to identify bound indices. Bottom row analogous data over 100 predicted minimal energy nucleosomal configurations for sequences drawn from human genome CpG islands. The phosphates with bound indices that are constrained during the optimisation are coloured in red. Right panels: standard deviations over sequence of radial distance of all phosphates against index along the Watson strand. Top PDB structures, bottom model computations. Bound indices are marked with solid red vertical lines. Dashed black vertical lines mark indices of bound complementary (Crick) strand phosphates.

Spectra of nucleosome wrapping energies and logarithms of probability densities for the optimal nucleosomal configurations for 147 bp sequences (a,c) generated randomly and (b,d) drawn from the human genome, grouped by the indicated ranges of numbers of CpG dinucleotide steps: dots averages, bars standard deviation in sequence. For methylated and hydroxymethylated data all CpG steps are symmetrically modified.

Effects of sequence context and epigenetic base modifications on the ground state shape of CpG steps. Bar plots of the ground state values of (a) six inter base-pair step and (b) six Watson phosphate coordinates for CpG steps i) averaged over sequence context with standard deviations in thin lines and ii) the extreme case of poly(CpG) (in hatch). In each case three versions corresponding to unmodified, methylated and hydroxymethylated steps. The standard deviations highlight the crucial role of non-local sequence dependence in the equilibrium structure of CpG/MpN/HpK steps. Analogous plots for the remaining intra base-pair coordinates and Crick phosphate coordinates are shown in Supplementary Figure S7.

Average number of instances of the 16 different dinucleotide steps for (a) 1000 random 147 bp sequences and for (b) our 147 bp human genome sequence ensemble, with [5, 14] CpGs (different colours correspond to fragments taken from different chromosomes). Dinucleotide steps are ordered next to their complements, with self-complementary steps listed on the right.

Sequence logos for tetramer flanking context of CpG dinucleotide steps for (a) all four sequence ensembles from the human genome with varying numbers of CpG junctions, and (b) all four sequence ensembles from the human genome after dinucleotide shuffling (but respecting the numbers of dinucleotide steps). Just specifying the numbers of CpG dinucleotide steps is a strong enough constraint to leave the tetramer sequence context logos largely unchanged after shuffling. The sequence logos in panel a) for the human sequence ensemble before sequence shuffling, suggest a slightly stronger C/G flanking enrichment than after shuffling.

Spectra of (a) nucleosome wrapping energies and (b) log probability densities of the optimal nucleosomal configurations for 147 bp sequences drawn from four different regions of the human genome: (A) intersection of CGI and NMI, (B) NMI and not CGI, (C) CGI and not NMI, (D) not CGI and not NMI (Table 1). Dots averages, error bars standard deviation over sequence, solid and circles when CpG dinucleotides are not methylated, dashed and triangles when CpGs are methylated.

Spectra of nucleosome ocupancy scores for our 86,874 selected sequences, grouped by the genomic regions (NMI and not NMIs) and by indicated ranges of numbers of CpG dinucleotide steps: dots averages, error bars standard deviation in sequence. The number of sequences in each group is listed in Table 2. See also Figure 2d.

Numbers of human genome sequence fragments of length 147 bp taken from CGIs, non CGIs, NMIs and non NMIs grouped by the number of CpG dinucleotide steps in each of four intervals.

As expected CGI fragments have relatively more CpG junctions than non CGI fragments. NMIs also have more CpGs than non NMIs.

Normalised frequencies (each of the four histograms in each plot normalised independently) of experimental nucleosome occupancy scores for our 86,874 selected sequences grouped by each of the four types of regions in the genome (cf. Table 1). Average score for each region is indicated by a vertical dashed line of appropriate colour. The black and red (but not blue or green) histograms have significant spikes reflecting many instances of zero occupancy in the experimental data.

(a) Predicted log probability density for an optimal nucleosomal configuration (b) nucleosome occupancy scores from Schwartz et al. (2019) and (c) nucleosome occupancy scores from Yazdi et al. (2015) for sequence positions 850K-900K of human chromosome I. In the regions corresponding to the intersection of CGIs and NMIs, both the mean log probability density (468.61) and mean scores (2.62 and 139.53) are smaller than outside of the intersection regions (476.10, 5.89 and 212.00 respectively).