The geographic distribution, population structure, and gene flow of maize and teosinte populations.

(A and D) Admixture proportions among populations within subspecies. The dominant cluster in each population is colored by sampling location. (B) The unrooted tree of maize and teosinte populations. (C) Geographic sampling locations for the studied maize and teosinte populations. (E) F4 tests to quantify evidence of gene flow between the subspecies for allopatric and sympatric population pairs. Each point in (E) reports the absolute Z-Score for an F4 test, where a given focal population was partnered with another population of the same subspecies as a sister node, and two other populations from the other sub-species as a sister clade (see Methods for further details). Black points show F4 tests that included maize from Crucero Lagunitas, otherwise points are colored by focal population. The dotted line corresponds to our chosen significance threshold (p = 0.001).

Inbreeding, diversity, and demography.

The distribution of π (A) and Tajima’s D (B) calculated in 100Kb windows for maize and teosinte populations. Dashed lines show the median values for the two subspecies. Filled white points show the median values generated from coalescent simulations under the demographic history inferred for each population. Colors for each population are as in Figure 1 and are shown at the bottom of the figure. (C) The inferred demography for each population. (D) The quantile of observed Homozygosity By Descent (HBD) lengths (cM) versus those simulated under each population demography. Dashed lines shows the 1:1 correspondence between the axes. (E) The distribution of inbreeding coefficients in each population. Filled white points are the average values for each population.

The proportion of mutations fixed by natural selection.

Estimated values of the proportion of mutations fixed by natural selection (α) by population. Vertical lines show the 95% credible interval.

The distribution of shared and private selective sweeps.

(A) The total number of sweeps inferred in each population. (B) The proportion of sweeps that are unique to each population. (C) Negative log10 p values for hypergeometric tests to identify maize-teosinte population pairs that shared more sweeps than expected by chance (see Methods). P values were adjusted for multiple tests using the Benjamini and Yekutieli method. Populations along the y axis are maize (order matches the legend below, with Amatlán de Cañas at the bottom), while the point color designates the teosinte population each maize population was paired with. Points with black outline highlight the sympatric population comparisons. Point size is scaled by the number of shared sweeps identified in each pair. The dotted line indicates our chosen significance level (p = 0.05). (D) Counts of shared and unique sweeps broken down by how many maize and teosinte populations they occurred in. Grey boxes show sweeps shared across the two sub-species.

Modes of convergent adaptation and affiliated parameters for shared selective sweeps.

(A) The difference in composite likelihood scores for the best supported mode of convergent adaptation (colors in top legend) compared to next best mode (black points), and best mode compared to the neutral model (other end of each line segment above or below black point). (B) Selection coefficients colored by the most likely mode of convergent adaptation. (C) Number of shared sweeps for both subspecies that were inferred to be from each convergent adaptation mode. (D) The most likely source population for shared sweeps that converged via migration. Bars are colored by population (bottom legend) and are outlined in black for teosinte and grey for maize. (E) Observed frequency of the inferred time in generations that each selected allele persisted prior to selection for models of convergent adaptation via standing variation. (F) Observed frequency of each inferred migration rate value for models of convergent adaptation via migration. Panels C, D, E and F are partitioned by which subspecies shared the sweep.

Population sampling location information.

f4 tests including the maize Crucero Lagunitas population are significantly elevated compared to those without.

Significant F4 tests.

Each row of the table reports the number of significant f4 tests that occurred with a given focal and secondary population, where the two other tip positions were filled with each of the remaining populations for each subspecies. Rows that are left blank in the secondary column are used to report the total number of significant trees for a given focal population.

Predicted values of α across mutation types.

Grey bands for each mutation type show the 95% credible intervals averaged over each population.

Proportion and count of fixed differences at 0 fold sites across teosinte populations

Treemix phylogeny including both subsamples of Palmar Chico.

Performance to detect simulated hard, soft, and incomplete sweeps under varying strengths of selection under the maize Palmar Chico population demography.

TPR, TNR, FNR, and FPR stand for true postive, true negative, false negative, and false positive rates, respectively.

Performance to detect simulated hard, soft, and incomplete sweeps under varying strengths of selection under the maize Palmar Chico population demography.

Each panel shows a combinations of sweep type (hard, soft, or incomplete) and strength of selection (α = 4Nes = 10, 50, or 100)

Degree of overlap between simulated sweep regions takenfrom two downsampled replicates under the maize Palmar Chico population demography.

Positive values show the amount of overlap in basepairs between sweep regions, while negative values represent a the space between them. Panel structure follows that of S4.

Frequency of each population as the mutation source for sweeps shared via migration.

The order of populations along the x axis matches that of the source populations labeled for each strip along the top.

Inferred sweeps shared between subspecies via migration.

The x axis is sorted by the number of populations each sweep was found in. Populations are sorted along the y axis first by subspecies then by their number of sweeps.

Variation in composite likelihoods within sweep regions.

(A) Distribution of differences between highest and next highest composite likelihoods each sweep region. (B) Distribution of differences between highest and next highest composite likelihoods candidate beneficial mutation positions each sweep region.