Experimental protocol.

The experiments started with a Random stream (120 s) in which both syllables and voices changed randomly, followed by a long-Structured stream (120 s). Then, 10 short familiarisation streams (30 s), each followed by test blocks comprising 18 isolated duplets (SOA 2-2.3 s) were presented. Example streams are presented to illustrate the construction of the streams, with different colours representing different voices. In Experiment 1, the Structured stream had a statistical structure based on phonemes (TPs alternated between 1 and 0.5), while the voices were randomly changing (uniform TPs of 0.2). For example, the two syllables of the word “petu” were produced by different voices, which randomly changed at each presentation of the word. In Experiment 2, the statistical structure was based on voices (TPs alternated between 1 and 0.5), while the syllables changed randomly (uniform TPs of 0.2). For example, the “green” voice was always followed by the “red” voice, but they were randomly saying different syllables “boda” then “tupe” in our example. The test duplets were either Words (TP=1) or Partwords (TP=0.5). Words and Partwords were defined in terms of phonetic content for Experiment 1 and voice content for Experiment 2.

Neural entrainment during the random and structured streams.

(A) SNR for the ITC during the Random and Structured streams of Experiment 1 (structure on phonetic content). The topographies represent the entrainment in the electrode space at the syllabic (4 Hz) and duplet rates (2 Hz). Crosses indicate the electrodes showing enhanced neural entrainment (cross: p < 0.05, one-sided paired permutation test, FDR corrected by the number of electrodes; dot: p < 0.05, without FDR correction). Colour scale limits [-1.8, 1.8]. The entrainment for each electrode is shown in light grey. The thick orange line shows the mean over the electrodes with significant entrainment relative to the adjacent frequency bins at the syllabic rate (4 Hz) (p < 0.05 FDR corrected). The thick green line shows the mean over the electrodes showing significant entrainment relative to the adjacent frequency bins at the duplet rate (2 Hz) (p < 0.05 FDR corrected). The asterisks indicate frequency bins with entrainment significantly higher than on adjacent frequency bins for the average across electrodes (p < 0.05, one-sided permutation test, FDR corrected for the number of frequency bins). (B) Analog to A for Experiment 2 (structure on voice content). (C) The first two rows show the topographies for the difference in entrainment during the Structured and Random streams at 4 Hz and 2 Hz for both experiments. Crosses indicate the electrodes showing stronger entrainment during the Structured stream (cross: p < 0.05, one-sided paired permutation test, FDR corrected by the number of electrodes; dot: p < 0.05, without FDR correction). The bottom row shows the interaction effect by comparing the difference in entrainment during the Structured and Random streams between Experiments 1 and 2. Crosses indicate significant differences (cross: p < 0.05, two-sided unpaired permutation test, FDR corrected by the number of electrodes; dot: p < 0.05, without FDR correction). (D) Time course of the neural entrainment at 4 Hz for the average over electrodes showing significant entrainment during the Random stream and at 2 Hz for the average over electrodes showing significant entrainment during the Structured stream (Phoneme: green line, Voice blue line). The shaded area represents standard errors. The horizontal lines on the bottom indicate when the entrainment was larger than 0 (p < 0.05, one-sided t-test, corrected by FDR by the number of time points).

Cluster-based permutation analysis of ERPs to isolated duplets during recall

The topographies show the difference between the two conditions corresponding to each main effect. Results obtained from the cluster-based permutation analyses are shown at the bottom of each panel. Thick lines correspond to the grand averages for the two main tested conditions. Shaded areas correspond to the standard error across participants. Thin lines show the ERPs separated by duplet type and familiarisation type. The shaded areas between the thick lines show the time extension of the cluster. The topographies correspond to the difference between conditions during the time extension of the cluster. The electrodes belonging to the cluster are marked with a cross. Significant clusters are indicated with an asterisk. Color scale limits [-0.07, 0.07] a.u. (A) Main effect of Test-duplets (Words - Part-words) over a frontal-right positive cluster (p = 0.019) and a left temporal negative cluster (p = 0.0056). (B) Main effect of familiarisation (Phonemes - Voices) over a posterior negative cluster (p = 0.018). The frontal positive cluster did not reach significance (p = 0.12). Results are highly comparable to the ROIs-based analysis presented in SI (Fig S3).

Adults’ behavioural experiment.

Each subject’s average score attributed to the Words (blue) and Partwords (orange) is represented. On the right, for the group familiarised with the Phoneme structure and on the left, for the group familiarised with the Voice structure. The difference between test duplets was significant for the Phoneme group (p = 0.007) and only marginally significant for the Voice group (p = 0.050).