Workflow of this study. All available coenzymes in the PDB were identified according to the CoFactor database (Fischer et al., 2010). The PDB entries of structures bound to coenzymes were downloaded programmatically through the PDBe REST API (pdbe.org/api), including the interatomic cofactor-protein interactions, calculated by Arpeggio (Jubb et al, 2017). The coenzyme binding amino acids were mapped to Uniprot databases via SIFTS (Velankar et al., 2013; Dana et al., 2019). PDB entries were grouped by UniProt code; redundancy was removed by clustering the UniProt sequences by 90% (and in parallel also 30%) sequence identity.

Classification of coenzymes and amino acids by their assumed evolutionary temporality. The “Unclassified” coenzymes Thiamine diphosphate, Coenzyme M, Factor F430 and Glutathione are not shown in the scheme.

Early versus late amino acid composition of the coenzyme binding sites, categorized according to the evolutionary temporality of coenzymes. Early amino acids are shown in color blue and late residues in red. The dashed line corresponds to the proportion of early vs. late amino acids within the UniProt composition of the sequences derived from our database (67% early and 33% late residues). The statistical significance of the early versus late amino acid composition was assessed by a Chi-squared test (P < 0.0001). Detailed statistical data are listed in Supplementary Table 8.

Binding of coenzymes with early and late amino acids by backbone and side chain atoms. "Backbone" interactions refer to residues in the coenzyme binding sites that interact purely through amino acid backbone atoms. "Side chain" interactions involve residues that interact solely via side chain atoms. "Backbone & Side chain" residues are those that interact with the coenzyme using both their backbone and side chain atoms. (A) Abundance of amino acids in individual studied coenzymes. “Backbone & Side chain” interactions are not depicted. Unclassified cofactors are in gray, Post-LUCA in yellow, LUCA in cyan and Ancient in purple. Amino acids are ranked by the order of addition of amino acids to the genetic code (Higgs and Pudritz, 2009). (B) Proportion of early versus late residues in coenzyme categories by interaction type. In each coenzyme category, the individual proportions add up to 100%. The amino acid composition was normalized by the percentage of late residues from the UniProt sequences retrieved from our database. The statistical significance of early versus late amino acid composition for each interaction type per coenzyme temporality was determined by a Chi-squared test (*, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001) . For detailed statistical analysis, refer to Supplementary Table 9.

Secondary structure content in coenzyme binding sites. Composition of secondary structural elements in amino acids interacting with coenzymes. The PDB category represents secondary structure content across the dataset for comparison with coenzyme binding sites. Additional statistical analyses are shown in Supplementary Table 10.

Fold diversity of coenzyme binding sites. (A) Folds represented by ECOD X-groups, according to numbers of coenzyme binding sites. (B) Comparison of numbers of ECOD X-groups vs. UniProt entries per cofactor class

Examples of coenzyme binding solely through early or late amino acids. (A) Coenzymes bound exclusively by early residues (AMP bound by ATP-phosphoribosyltransferase. PDB code 6czm (chain B) created by LIGPLOT (Laskowski and Swindells, 2011). (B) Coenzyme, entirely bound by late residues (Ascorbic acid bound by Hyaluronate lyase. PDB code 1f9g (chain A), created by LIGPLOT).