Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorBjörn HerrmannBaycrest Hospital, Toronto, Canada
- Senior EditorBarbara Shinn-CunninghamCarnegie Mellon University, Pittsburgh, United States of America
Reviewer #1 (Public review):
Summary:
Parsing speech into meaningful linguistic units is a fundamental yet challenging task that infants face while acquiring the native language. Computing transitional probabilities (TPs) between syllables is a segmentation cue well-attested since birth. In this research, the authors examine whether newborns compute TPs over any available speech feature (linguistic and non-linguistic), or whether by contrast newborns' favor the computation of TPs over linguistic content over non-linguistic speech features such as speaker's voice. Using EEG and the artificial language learning paradigm, they record the neural responses of two groups of newborns presented with speech streams in which either phonetic content or speaker's voice are structured to provide TPs informative of word boundaries, while the other dimension provides uninformative information. They compare newborns' neural responses to these structured streams to their processing of a stream in which both dimensions vary randomly. After the random and structured familiarization streams, the newborns are presented with (pseudo)words as defined by their informative TPs, as well as partwords (that is, sequences that straddle a word boundary), extracted from the same streams. Analysis of the neural responses shows that while newborns neural activity entrained to the syllabic rate (2 Hz) when listening to the random and structured streams, it additionally entrained at the word rate (4 Hz) only when listening to the structured streams, finding no differential response between the streams structured around voice or phonetic information. Newborns showed also different neural activity in response to the words and part words. In sum, the study reveals that newborns compute TPs over linguistic and non-linguistic features of speech, these are calculated independently, and linguistic features do not lead to a processing advantage.
Strengths:
This interesting research furthers our knowledge of the scope of the statistical learning mechanism, which is confirmed to be a general-purpose powerful tool that allows humans to extract patterns of co-occurring events while revealing no apparent preferential processing for linguistic features. To answer its question, the study combines a highly replicated and well-established paradigm, i.e. the use of an artificial language in which pseudowords are concatenated to yield informative TPs to word boundaries, with a state-of-the-art EEG analysis, i.e. neural entrainment. The sample size of the groups is sufficient to ensure power, and the design and analysis are solid and have been successfully employed before.
Weaknesses:
There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.
To sum up, the authors achieved their central aim of determining whether TPs are computed over both linguistic and non-linguistic features, and their conclusions are supported by the results. This research is important for researchers working on language and cognitive development, and language processing, as well as for those working on cross-species comparative approaches.
Reviewer #2 (Public review):
Summary:
The manuscript investigates to what degree neonates show evidence for statistical learning from regularities in streams of syllables, either with respect to phonemes or with respect to speaker identity. Using EEG, the authors found evidence for both, stronger entrainment to regularities as well as ERP differences in response to violations of previously introduced regularities. In addition, violations of phoneme regularities elicited an ERP pattern which the authors argue might index a precursor of the N400 response in older children and adults.
Strengths:
All in all, this is a very convincing paper, which uses a clever manipulation of syllable streams to target the processing of different features. The combination of neural entrainment and ERP analysis allows for the assessment of different processing stages, and implementing this paradigm in a comparably large sample of neonates is impressive. I only have some smaller comments.
Weaknesses:
I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.
Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?
Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).
Reviewer #3 (Public review):
Summary:
This study is focused on testing whether statistical learning (a mechanism for parsing the speech signal into smaller chunks) preferentially operates over certain features of the speech at birth in humans. The features under investigation are phonetic content and speaker identity. Newborns are tested in an EEG paradigm in which they are exposed to a long stream of syllables. In Experiment 1, newborns are familiarized with a sound stream that comprises regularities (transitional probabilities) over syllables (e.g., "pe" followed by "tu" in "petu" with 1.0 probability) while the voices uttering the syllables remain random. In Experiment 2, newborns are familiarized with the same sound stream but, this time, the regularities are built over voices (e.g., "green voice" followed by "red voice" with 1.0 probability) while the concatenation of syllables stays random. At the test, all newborns listened to duplets (individual chunks) that either matched or violated the structure of the familiarization. In both experiments, newborns showed neural entrainment to the regularities implemented in the stream, but only the duplets defined by transitional probabilities over syllables (aka word forms) elicited a N400 ERP component. These results suggest that statistical learning operates in parallel and independently on different dimensions of the speech already at birth and that there seems to be an advantage for processing statistics defining word forms rather than voice patterns.
Strengths:
This paper presents an original experimental design that combines two types of statistical regularities in a speech input. The design is robust and appropriate for EEG with newborns. I appreciated the clarity of the Methods section. There is also a behavioral experiment with adults that acts like a control study for newborns. The research question is interesting, and the results add new information about how statistical learning works at the beginning of postnatal life, and on which features of the speech. The figures are clear and helpful in understanding the methods, especially the stimuli and how the regularities were implemented.
Weaknesses:
(1) I'm having a hard time understanding the link between the results of the study and the universality of statistical learning. The main goal of the study was testing whether statistical learning is a general mechanism for newborns that operates on any speech dimension, or whether it operates over linguistic features only. To test that, statistical regularities (TPs) were built over syllables (e.g., pe followed by tu in petu with 1.0 probability) or voices (e.g., green voice followed by red voice with 1.0 probability). Voices were considered as the non-linguistic dimension.
While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.
Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.
(2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?
(3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.
(4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.
(5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?