Experimental design and behavioral performance

A. Skill learning task. Participants engaged in a procedural motor skill learning task, which required them to repeatedly type a keypress sequence, “4 − 1 − 3 − 2 − 4” (1 = little finger, 2 = ring finger, 3 = middle finger, and 4 = index finger) with their non-dominant, left hand. The Day 1 Training session included 36 trials, with each trial consisting of alternating 10s practice and rest intervals. After a 24-hour break, participants were retested on performance of the same sequence (4-1-3-2-4) for 9 trials (Day 2 Retest) as well as single-trial performance on 9 different sequences (Day 2 Control; 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4). MEG was recorded during both Day 1 and Day 2 sessions with a 275-channel CTF magnetoencephalography (MEG) system (CTF Systems, Inc., Canada). B. Skill Learning. As reported previously1, participants on average reached 95% of peak performance by trial 11 of the Day 1 Training session (see Figure 1 - figure Supplement 1A for results over all Day 1 Training and Day 2 Retest trials). At the group level, total early learning was exclusively accounted for by micro-offline gains during inter-practice rest intervals (Figure 1B inset). C. Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning (see Figure 1—figure Supplement 1B for results over all Day 1 Training and Day 2 Retest trials). Note the initial variability of the relative KTT composition of the sequence (i.e. – 4-1, 1-3, 3-2, 2-4, 4-4), before it stabilizes by trial 6 in the early learning period.

Spatial and oscillatory contributions to neural decoding of finger identities

A. Contribution of whole brain oscillatory frequencies to decoding. When trained on broadband activity relative to narrow frequency band features, decoding accuracy (i.e. - test sample performance) was highest for whole-brain voxel- (74.51% ± SD 7.34%, t = 8.08, p < 0.001) and parcel-space (70.11% ± SD 7.11%, t = 13.22, p < 0.001) MEG activity. Thus, decoders trained on whole-brain broadband data consistently outperformed those trained on narrowband activity. Dots depict decoding accuracy for each participant. *p < 0.05, **p< 0.01, ***p< 0.001, ns.: not significant. B. Contribution of intra-parcel brain oscillatory frequencies to decoding. Performance of the top performing decile of intra-parcel decoders (see Methods, Figure 2—figure supplement 1) over different frequency bands (top-panel; decoding accuracy is color coded and listed for each parcel and frequency band assessed). Note that broadband activity resulted in best accuracy closely followed by delta band activity. Broadband intra-parcel decoding accuracy mapped to a standard brain surface (FreeSurfer fsaverage brain) and color-coded by accuracy (bottom panel). Note that bilateral superior frontal cortex (yellow, 69%) contributed the most to decoding accuracy followed by middle frontal, pre-and post-central regions (> 60%).

Hybrid spatial approach for neural decoding during skill learning A. Pipeline

Sensor-space MEG data (N = 272 channels) were source-localized (voxel-space features; N = 15684 voxels), and then parcellated (parcel-space features; N = 148) by averaging the activity of all voxels located within an individual region defined in a standard template space (Desikan-Killiany Atlas). Individual voxel-space regional decoders (intra-parcel decoders) were then constructed and ranked. The final hybrid-space keypress state (i.e. – 4-class) decoder was constructed using all parcel-spaces and top-ranked intra-parcel voxel input features (see Methods). B. Decoding performance across parcel, voxel, and hybrid spaces. Note that decoding performance was highest for the hybrid space approach compared to performance obtained for whole-brain voxel- and parcel spaces. Addition of linear discriminant analysis (LDA)-based dimensionality reduction further improved decoding performance for both parcel- and hybrid-space approaches. Each dot represents accuracy for a single participant and method. “∗∗∗” indicates p < 0.001 and “” indicates p < 0.05. C. Confusion matrix of individual finger identity decoding for hybrid-space manifold features. True predictions are located on the main diagonal. Off-diagonal elements in each row depict false-negative predictions for each finger, while off-diagonal elements in each column indicate false-positive predictions. Please note that the index finger keypress had the highest misclassifications (n = 141 or 47.5% of all prediction errors).

Evolution of Keypress Neural Representations with Skill Learning

A. Keypress neural representations differentiate during early learning. t-SNE distribution of neural representation of each keypress (top scatter plots) is shown for trial 1 (start of training; top-left), 11 (end of early learning; top-center), and 36 (end of training; top-right) for a single representative participant. Individual keypress manifold representation clustering in trial 11 (top-center; end of early learning) depicts sub-clustering for the index finger keypress performed at the two different ordinal positions in the sequence (IndexOP1 and IndexOP5), which remains present by trial 36 (top-right). Spatial distribution of regional contributions to decoding (bottom brain surface maps). The surface color heatmap indicates feature importance scores across the brain. Note that decoding contributions shifted from right pre-central cortex at trial 1 (bottom-left) to superior and middle frontal cortex at trials 11 (bottom-center) and 36 (bottom-right). B. Confusion matrix for 5-class decoding of individual sequence items. Decoders were trained to classify contextual representations of the keypresses (i.e., 5-class classification of the sequence elements 4-1-2-3-4). Note that the decoding accuracy increased to 94.15% ± SD 4.84% and the misclassification of keypress 4 was significantly reduced (from 141 to 82). C. Trial-by-trial classification accuracy for 2-class decoder (IndexOP1 vs. IndexOP5). A decoder trained to differentiate between the two index finger keypresses embedded at different positions (IndexOP1 at ordinal position 1 vs. IndexOP5 at ordinal position 5) within the trained sequence becomes progressively more accurately over early learning, stabilizing around 96% by trial 12 (end of early learning). Taken together, these findings indicate that the neural feature space evolves over early learning to incorporate sequence location information.

Neural representation distance between index finger keypresses performed at two different ordinal positions within a sequence

A. Contextualization increases over Early Learning during Day 1 Training. Online (Practice; green line) and offline (Rest; magenta line) neural representation distances between two index finger key presses performed at ordinal positions 1 and 5 of the trained sequence (4-1-3-2-4) are shown for each trial during Day 1 Training. Both online and offline distances between the two index finger representations increase sharply over the Early Learning before stabilizing across later Day 1 Training trials. B. Contextualization primarily occurs offline during short inter-practice rest periods. The neural representation difference was significantly greater when assessed offline (right distribution; purple) versus online (left distribution; green) periods (t = 4.84, p < 0.001). C. Contextualization was retained after 24 hours and was specific to the trained sequence. The neural representation differences assessed across both rest and practice for the trained sequence (4-1-3-2-4) were retained for Day 2 Retest. Further, contextualization was significantly reduced for several untrained sequences controlling for: 1) index finger keypresses located at the same ordinal positions 1 and 5 but with a different intervening sequence pattern (Pattern Specificity Control: 4-2-3-1-4); 2) both ordinal 1 and 5 position keypresses performed with either the little or ring finger instead of the index finger (Finger Specificity Control: 2-1-3-4-2, 1-4-2-3-1 and 2-3-1-4-2); and 3) multiple index finger keypresses occurring at ordinal positions other than 1 and 5 (Position Specificity Control: 4-2-4-3-1 and 1-4-3-4-2). The mean online neural representation distance, or degree of contextualization, was substantially lower for the untrained control sequences (51.05% lower for the Pattern Specificity Control sequence, 35.80% lower for the Finger Specificity Control sequences, and 22.06% lower for the Position Specificity Control sequences) compared with the trained sequence. Note that offline contextualization cannot be measured for the Day 2 Control sequences as each sequence was only performed over a single trial.

Behavioral performance during skill learning

A. Total Skill Learning over Day 1 Training (36 trials) and Day 2 Retest (9 trials). As reported previously1, participants on average reached 95% of peak performance during Day 1 Training by trial 11. Note that after trial 11 performance stabilizes around a plateau through trial 36. Following a 24-hour break, participants displayed an upward shift in performance during the Day 2 Retest – indicative of an overnight skill consolidation effect. B. Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning. Note the initial variability of the five component transitions in the sequence (i.e. – 4-1, 1-3, 3-2, 2-4, 4-4), stabilize by trial 6 in the early learning period and remain stable throughout the rest of Day 1 Training (through trial 36) and Day 2 Retest.

Oscillatory contributions at individual brain regions

Decoding performance of each individual brain region (intra-parcel decoder performance) at each oscillatory level is shown for both left and right hemisphere as a heatmap. Optimal decoding performances were obtained from bilateral superior frontal (Left: 68.77% ± SD 7.6%; Right: 67.52% % ± SD 6.78%), middle frontal (Left: 63.41% ± SD 7.58%; Right: 62.78% % ± SD 76.94%), pre-central (Left: 62.37% % ± SD 6.32%; Right: 62.69% ± SD 5.94%), and post-central (Left: 61.71% ± SD 6.62%; Right: 61.09% ± SD 6.2%) brain regions. Superior parietal, central, paracentral, anterior-cingulate, and precuneus regions also showed greater (> 60%) decoding performance. For Delta band, only superior frontal regions showed > 60% decoding performance.

Contribution of whole-brain oscillatory frequencies to decoding

Accuracy was highest when decoders were trained on broadband activity, closely followed by delta band activity, across whole-brain -parcel, -voxel, and -hybrid space. Sensor-space decoder accuracy did not differ statistically when trained on either broadband or delta-band activity. Hybrid approach resulted in best decoding accuracy for each frequency range. Interestingly, accuracy with sensor space decoders was comparable to accuracy with parcel and voxel space decoders. Dots depict decoding accuracy for each participant. “***” indicates p < 0.001, “**” indicates p < 0.01, and “n.s.” denotes no statistical significance (i.e. - p > 0.05).

Comparison of different dimensionality reduction techniques

Dimensionality reduction was applied to the input features for each approach [parcel (N=148)/voxel(N=15684)/hybrid(N=1295)] [16]. The results with principal component analysis (PCA, in green), multi-dimensional scaling (MDS, in blue), minimum redundant maximum relevance algorithm (MRMR, in red), linear discriminant analysis (LDA, in black) are shown in comparison to performance obtained using all input features (in magenta). For parcel space, all these approaches increased the mean decoding accuracy with PCA and LDA showing statistically significant improvement [1-way ANOVA: F= 13.05, p < 0.001; post hoc Tukey tests: p =0.032 (PCA), p < 0.001 (LDA), p > 0.05 (MDS, MRMR)]. At the voxel space, there was no statistically significant improvement with either of the approaches (p > 0.05). MRMR showed the highest improvement but not statistically significant with post hoc Tukey tests (p = 0.14). With LDA the performance dropped significantly. For hybrid space, all the dimensionality reduction techniques were significant in improving decoding performance [1-way ANOVA: F= 21.32, post hoc Tukey tests: p < 0.05] and the best improvement was seen with LDA.

Confusion matrices for decoding performance on Day 2 Retest (A) and Day 2 Control (B) data

Note that, the hybrid decoding strategy generalized to Day 2 data with 87.11% keypress decoding accuracy for the trained sequence (Day 2 Retest) and 79.44% accuracy for decoding keypresses embedded within untrained control sequences (Day 2 Control).

Quantification of regional trial-by-trial feature importance score during skill learning

The quantification of regional importance in decoding for each trial is shown for the regions that showed highest decoding accuracy, i.e., superior frontal, middle frontal, pre-central, and post-central cortex. Please note that, the feature importance score was higher for pre-central cortex which shifted to middle frontal cortex during later trials, as can be seen with the divergence of line plots about trial 11.

Average decoding accuracies across participants with varying temporal scales. X-axis represents the onset of window for decoding analysis with respect to keypress onset (0). Y-axis represents the window size. The heatmap color denotes the decoding accuracy for all window size/location pairings. Note that, the best decoding accuracy across subjects is obtained by taking a window starting from 0 (i.e., onset of keypress) with a window size of 200ms.

Relationship between offline contextualization of neural representations and micro-offline learning

Cumulative micro-offline gains, i.e., the net gain in skill during inter-practice rest periods increased over time during early learning. Offline representation difference, i.e., the net change in the neural representations of keypress 4 over the inter-practice rest intervals between different ordinal positions in the sequence (context) increased over time during early learning. A linear regression analysis showed a strong temporal relationship (correlation coefficient (r) = 0.9034 and coefficient of variance explained (R2) = 0.82) between amount of contextualization during rest and cumulative micro-offline gains during rest, signifying the effect of contextualization in sequential skill learning.

Online versus offline changes in keypress transition patterns. A. Trial-by-trial Euclidian distance between keypress transition patterns (i.e. – relative share of each keypress transition across the full sequence duration) for the first and last sequence iteration within a single trial (online; green) and last sequence iteration of the current trial versus the first sequence iteration of the subsequent trial (offline; magenta). B. Cumulative online (green; left) and offline (magenta; right) pattern distances recorded over all forty-five trials covering Days 1 and 2. Note that cumulative online and offline distances are not significantly different (t=-0.03, p = 0.976).

Relationship between contextualization and absolute speed

A. Relationship between maximum speed and corresponding contextualization: The maximum typing speed and the degree of contextualization was related using a linear regression analysis that showed no significant relationship between maximum typing speed and degree of contextualization (R2 = 0.028, p = 0.41). Here, each dot represents the maximum speed attained and the corresponding degree of contextualization of each participant. Thus, people with higher typing speed did not show higher degree of contextualization. B. Relationship between typing speed and degree of contextualization at each trial. We performed a regression analysis for each trial to relate the degree of contextualization at each trial and the typing speed at that trial. The violin plot represents the distribution of R2 values obtained with regression analysis and here each dot represents a trial. Note that, there was no significant relationship at any trial (mean R2 = 0.06; p > 0.05).