Large-scale analysis of the integration of enhancer-enhancer signals by promoters

  1. Miguel Martinez-Ara
  2. Federico Comoglio
  3. Bas van Steensel  Is a corresponding author
  1. Division of Gene Regulation, Netherlands Cancer Institute, Netherlands
  2. Oncode Institute, Netherlands
  3. Division of Molecular Genetics, Netherlands Cancer Institute, Netherlands
5 figures, 1 table and 5 additional files

Figures

Figure 1 with 1 supplement
Schematic of three-way combinatorial approach.

(A) Three-way combinatorial massively parallel reporter assay (MPRA) design to test enhancer-enhancer-promoter combinations. Eight barcoded reporter assay libraries, one per promoter, were constructed. Pairs of DNA elements (enhancers and scrambled control sequences) were inserted after barcoded reporter. The enhancers and controls can be placed in both orientations in either the enhancer position 1 (E1) or enhancer position 2 (E2). (B) The design of the library yields eight matrices that contain control-control combinations (CC), enhancer-control combinations (EC and CE), and enhancer-enhancer combinations (EE). (C and D) Two example loci, Sox2 LCR (locus control region) (C) and Otx2 (D) from where we selected enhancers to test in the reporter libraries. Enhancers (orange) were defined around DNAse I hypersensitivty sites from mouse embryonic stem cells (mESC). Promoters (blue) were chosen according to transcription start site (TSS) annotation and mESC DNAse I hypersensitivty sites. DNAse I data is from Joshi et al., 2015.

Figure 1—figure supplement 1
Schematic of cloning strategy to generate enhancer-enhancer-promoter libraries.

First the eight different promoters were cloned into a reporter vector. Each of the eight vectors was barcoded and linearised, random enhancer-enhancer, enhancer-control, and control-control combinations were cloned downstream of the barcoded reporter.

Figure 2 with 3 supplements
Effects of single enhancers across promoters.

(A) Enhancer-promoter median boost index matrix of single enhancers. For each single enhancer a median boost index (see figure and Methods, log2(Activity_Combination/Activity_control-control_baseline)) was calculated across all enhancer-control combinations for that particular enhancer in any position and orientation, the baseline is the median activity of the promoter across all control-control combinations. For each control-control combination a boost index was calculated as the activity of the control-control combination over the median activity of all controls. A median control-control boost index was calculated based on this. Colour coding of the matrix corresponds to the median boost indexes, white spaces are missing data. (B) Distribution of boost indices for all enhancer+control combinations for the Sox2 promoter. Leftmost column corresponds to the boost index distribution for all control-control combinations for the Sox2 promoter. Each dot represents one enhancer+control combination. Horizontal lines correspond to the median of each distribution (same median as represented in the matrix in A). (C) Distribution of boost indices for all enhancer+control combinations for the enhancer Nanog_E074 across all promoters. Each dot represents one enhancer+control combination. Horizontal lines correspond to the median of each distribution. (D) Distribution of median boost indices for each single enhancer across promoters. Each dot represents the median of an enhancer across all enhancer+control combinations for that enhancers (same median as represented in the matrix in A). Colouring in B and D represents the significance at a 1% false discovery rate (FDR) for a Wilcoxon test comparing boost index distributions between enhancer+control combinations and the controls for each enhancer.

Figure 2—figure supplement 1
Reproducibility of experimental data matrix of replicate correlations for all eight enhancer-enhancer-promoter (EEP) libraries.

Lower left panels represent the two-dimensional density plots of replicate-replicate activity correlations. Middle panels represent the one-dimensional density plot of each replicate. Upper right panels are the Pearson’s correlation coefficients of the mirror lower left panels.

Figure 2—figure supplement 2
Position and orientation bias.

(A) Relationship between single enhancer boost indices across all enhancer-control (EC) combinations in position 1 versus in position 2 (see Figure 1). (B) Relationship between single enhancer boost indices across all EC combinations plus versus minus orientation, in position 1 (left) and in position 2 (right). For both panels each dot is one enhancer with one promoter. R is Pearson’s correlation.

Figure 2—figure supplement 3
Selectivity of single enhancers.

(A) Relationship between average boost index of each single enhancer across all promoters and the F-statistic of a Welch F-test for that enhancer across all promoters. Colours indicate significance at a 5% false discovery rate (FDR). (B) Distributions of single enhancer boost indices for each enhancer across promoters. Each dot is one enhancer with one promoter and the boost index is the average boost index across all enhancer-control combinations. Colours of each point represents promoter identity. Vertical bars indicate the median boost index, and their grey or black shading indicates significance according to the Welch F-test at a 5% FDR.

Figure 3 with 1 supplement
Effects of enhancer-enhancer (EE) combinations.

(A) Fragment-fragment combinatorial boost index matrix for the Lefty1 promoter. Each square represents one control-control (CC), enhancer-control (EC or CE), or EE combination. Colour coding corresponds to the average boost index for each combination across all orientations measured over the median control-control baseline. (B) Boost index distributions across all eight promoters for each combination type, control-control (CC), enhancer-control (EC regardless of position) or EE combinations. p-Values correspond to the result of a Wilcoxon test. (C) Relationship between observed boost index for each EE combination and the observed boost index of the strongest single enhancer of the pair for Sox2 and Lefty1 promoters. Blue lines represent the LOESS fit of the data. (D) Observed and expected additive activities for the Sox2_E178+Sox2_E182 combination with the Sox2 promoter and the individual activities of each of the elements. Each column represents the observed activities for the control-contol combinations, the enhancer-control combinations, and the EE combinations. The horizontal bars represent the median of each distribution. The horizontal black line represents the expected additive activity of the EE combination as calculated by the formula in the panel in the linear space. The horizontal red lines represent the propagated standard deviations of the expected additive activity of the EE combinations as calculated by the formula in the panel. (E and F) Relationship between observed and expected activities (additive in E, multiplicative in F) for all EE combinations for the Sox2 promoter. The blue lines represent the linear fit of the data. Grey diagonal line is the x=y identity line. In all panels R represents Pearson’s correlation coefficient. Expected activities are calculated in the linear space and then plotted in the log2 space.

Figure 3—figure supplement 1
Additivity versus multiplicativity for all promoters.

Relationship between observed and expected activities (additive in A, multiplicative in B) for all enhancer-enhancer combinations for the all promoters. The blue lines represent the linear fit of the data. In all panels R and R2 are based on Pearson’s correlation.

Supra- and sub-additive behaviours of enhancer combinations.

(A) Distributions of log2 observed activities over expected additive activities ratios of enhancer-enhancer combinations across promoters. Coloured in turquoise are supra- and sub-additive combinations for which the observed activity is more than one standard deviation away from the expected activity. Horizontal bars represent the median of each distribution. Numbers on the top part are the percentage of supra-additive combinations for each promoter. Numbers on the lower part are the percentage of sub-additive combinations for each promoter. (B) Relationship between the percentages of supra- and sub-additive enhancer-enhancer combinations and promoter control-control baselines. Blue lines are the linear fit of the data. R is Pearson’s correlation. (C) Average supra- or sub-additive behaviour of each single enhancer across enhancer-enhancer combinations for each promoter. Each dot represents the median log2 observed over expected for all enhancer-enhancer combinations of a single enhancer and a particular promoter. Grey bars represent the median of each distribution. (D) For two example enhancers, distribution of log2 observed over expected ratios for combinations of that enhancer with any other enhancer and promoter. Horizontal black bars represent the median of the distribution. (E) Distribution of log2 observed over expected ratios for enhancer-enhancer pairs from the same enhancer cluster (within clusters) or from different enhancer clusters (between clusters).The p-value results from comparing both distributions using a Wilcoxon test. Horizontal grey bars represent the median of each distribution. In all panels Obs/exp refers to observed activity over expected additive activity.

Figure 5 with 2 supplements
Non-linear responses of promoters to enhancer-enhancer combinations.

(A) Relationship between observed boost indices and average boost index across promoters for all shared enhancer-enhancer combinations. Blue lines represent the linear fit of the data. (B) Relationship between the slopes extracted from the linear fits in A and the baseline promoter activities derived from the control-control combinations. The formulae depict the relationship between the average boost indices and the observed boost indices of each promoter through the extracted slopes. For both panels R is Pearson’s correlation coefficient.

Figure 5—figure supplement 1
Promoter-promoter boost index correlations for all shared enhancer-enhancer combinations.

Lower left panels represent the two-dimensional density plots of promoter-promoter boost index correlations. Middle panels represent the one-dimensional density plot of each promoter. Upper right panels are the Pearson’s correlation of the mirror lower left panels. Blue lines are the linear fit of the data.

Figure 5—figure supplement 2
Non-linear responses of promoters to enhancer-control (single enhancers) combinations.

(A) Relationship between observed boost indices of single enhancers and average boost index across promoters for all shared single enhancer-control combinations. Blue lines represent the linear fit of the data. (B) Relationship between the slopes extracted from the linear fits in A and the baseline promoter activities derived from the control-control combinations. For both panels R is Pearson’s correlation coefficient.

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Cell line (Mus musculus)E14tg2a mouse embryonic stem cell (mESC)ATCCCRL-1821
Gene (M. musculus)Klf2GenBank16598Promoters and enhancers; coordinates in Supplementary file 1
Gene (M. musculus)Sox2GenBank20674Promoters and enhancers; coordinates in Supplementary file 1
Gene (M. musculus)Otx2GenBank18424Promoters and enhancers; coordinates in Supplementary file 1
Gene (M. musculus)Lefty1GenBank13590Promoters and enhancers; coordinates in Supplementary file 1
Gene (M. musculus)Fgf5GenBank14176Promoters and enhancers; coordinates in Supplementary file 1
Gene (M. musculus)Tbx3GenBank21386Promoters and enhancers; coordinates in Supplementary file 1
Gene (M. musculus)NanogGenBank71950Promoters and enhancers
Gene (M. musculus)Ap1m1GenBank11767Promoters and enhancers
Recombinant DNA reagentDownstream Assay vector (JvAp102)Martinez-Ara et al., 2022JvAp102Downstream reporter assay plasmid; ‘see Methods
Sequence-based reagentEnhancer and promoter oligosThis paperPrimersSupplementary file 4
Sequence-based reagentBarcoding primers and sequencing primersMartinez-Ara et al., 2022Barcoding reagentsSupplementary file 4
Sequence-based reagentSynthetic DNA negative controlsThis paperNegative controlsSupplementary file 2
Peptide, recombinant proteinGibson Assembly Master MixNEBCat# E2611S
Peptide, recombinant proteinI-CeuINEB#R0699S
Peptide, recombinant proteinI-SceINEB#R0694S
Commercial assay or kitMouse Embryonic Stem Cell Nucleofector KitLonza#VPH-1001
Chemical compound, drugTRIsureBioline(#BIO-38032)
Commercial assay or kitGeneJET RNA extraction kitThermo Fisher#K0732
Peptide, recombinant proteinDNase IRoche#04716728001
Peptide, recombinant proteinMaxima Reverse transcriptaseThermo Fisher#EP0743
Peptide, recombinant proteinMyTaq Red mixBioline#BIO-25043
Strain, strain background (Escherichia coli)e. cloni 10G supremeLucigen#60081-1Electrocompetent cells
Peptide, recombinant proteinTakara ligation kit version 2.1Takara#6022
Peptide, recombinant proteinKlenow HC 3′ ->5′ exoNEB#M0212L
Commercial assay or kitISOLATE II PCR and Gel KitBiolineBIO-52059
Peptide, recombinant proteinFast-link ligaseLucigenLK0750H
Peptide, recombinant proteinEnd-It DNA End-Repair KitEpicentre#ER0720
Commercial assay or kitdsDNA High sensitivity Qubit kitInvitrogen#Q33231
Commercial assay or kitCleanPCR magnetic beadsCleanNA#CPCR-0050
Peptide, recombinant proteinXcmINEB#R0533S
Peptide, recombinant proteinT4 DNA ligaseRoche#10799009001
Peptide, recombinant proteinAvrIIThermo Fisher#ER1561
Peptide, recombinant proteinNheINEB#R0131S
Strain, strain background (Escherichia coli)5-alpha Competent E. coliNEB#C2987Competent cells
Peptide, recombinant proteinLIFSigma-Aldrich#ESG11072i+LIF media
Chemical compound, drugMonothioglycerolSigma-Aldrich#M6145-25ML2i+LIF media
Chemical compound, drugCHIR-99021MedChemExpress#HY-101822i+LIF media
Chemical compound, drugPD0325901MedChemExpress#HY-102542i+LIF media
OtherBSAGibco#15260-0372i+LIF media
OtherDMEM-F12mediumGibco#11320-0332i+LIF media
OtherNeurobasal mediumGibco#21103-0492i+LIF media
Chemical compound, drugN27Gibco#17504-0442i+LIF media
Chemical compound, drugB2Gibco#17502-0482i+LIF media
Commercial assay or kitMycoAlert Mycoplasma Detection KitLonza#LT07-318
Software, algorithmBatchPrimer3 version 1.0You et al., 2008version 1.0https://wheat.pw.usda.gov/demos/BatchPrimer3/
Software
, algorithm
StarcodeZorita et al., 2015version 1.1https://github.com/gui11aume/starcode
Software, algorithmPythonRossum and Drake, 2009version 3.6https://www.python.org/downloads/release/python-362/
Software, algorithmBowtie2Langmead and Salzberg, 2012version 2.3.4http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Software, algorithmRR Development Core Team, 2021version 4.0.5https://www.r-project.org/
Software, algorithmggplot2Wickham, 2016ggplot2https://ggplot2.tidyverse.org/
Software, algorithmSnakemakeKöster and Rahmann, 2012version 4.4.0https://anaconda.org/bioconda/snakemake/files?version=4.4.0

Additional files

Supplementary file 1

mm10 genomic coordinates of the regulatory elements tested in this study.

https://cdn.elifesciences.org/articles/91994/elife-91994-supp1-v1.zip
Supplementary file 2

Sequences of the synthetic controls used in this study.

https://cdn.elifesciences.org/articles/91994/elife-91994-supp2-v1.zip
Supplementary file 3

Processed and normalised data for all the enhancer-enhancer-promoter (EEP) combinations tested in this study.

https://cdn.elifesciences.org/articles/91994/elife-91994-supp3-v1.zip
Supplementary file 4

Sequences of the primers used in this study.

https://cdn.elifesciences.org/articles/91994/elife-91994-supp4-v1.zip
MDAR checklist
https://cdn.elifesciences.org/articles/91994/elife-91994-mdarchecklist1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Miguel Martinez-Ara
  2. Federico Comoglio
  3. Bas van Steensel
(2024)
Large-scale analysis of the integration of enhancer-enhancer signals by promoters
eLife 12:RP91994.
https://doi.org/10.7554/eLife.91994.3