CMR-replay.
a. Consider a task of encoding a sequence consisting of four items, each denoted by a shade of blue. b. We propose a model of replay that builds on the context-maintenance and retrieval model (CMR), which we refer to as CMR-replay. The model consists of four components: experience (f), context (c), item-to-context associations (M fc), and context-to-item associations (M cf). At each timestep during awake encoding, f represents the current item and c is a recency-weighted average of associated contexts of past and present items. CMR-replay associates f and c at each timestep, updating M fc and M cf according to a Hebbian learning rule. M fc and M cf respectively support the retrieval of an item’s associated context and a context’s associated items. During replay, f represents the current reactivated item and c is a drifting context representing a recency-weighted average of associated contexts of past and present reactivated items. Here, too, the model updates M fc and M cf to associate reactivated f and c. The figure illustrates the representations of f, c, M fc, and M cf as the model encodes the third item during learning. Lengths of color bars in f and c represent relative magnitudes of different features. Shades of grey illustrate the weights in M fc and M cf. Orange features represent task-irrelevant items, which do not appear as inputs during awake encoding but compete with task-relevant items for reactivation during replay. c. During both awake encoding and replay, context c drifts by incorporating the current item f’s associated context cf and downweighting previous items’ associated contexts. The figure illustrates how context drifts during the first time the model encodes the example sequence. d. The figure illustrates M fc and M cf updates as the model encodes the third item during the first presentation of the sequence. e. Consider the activation of items at the onset of sleep and awake rest across sessions of learning. At replay onset, an initial probability distribution across experiences a0 varies according to the behavioral state (i.e., awake rest or sleep). Compared to sleep, during awake rest, a0 is strongly biased toward features associated with external inputs during awake rest. For awake rest, the figure shows an example of a0 when the model receives a context cue related to the fourth item. Through repeated exposure to the same task sequence across sessions of learning, activities of the four task-related items (i.e., blue items) become suppressed in a0 relative to task-irrelevant items (i.e., orange items). f. Each replay period begins by sampling an experience ft=0 according to a0, where t denotes the current timestep. If ft=0 is a task-related item, its associated context cft=0 is reinstated as c0 to enter a recursive process. During this process, at each timestep t ≥ 1, ct−1 evokes a probability distribution at that excludes previously reactivated experiences. Given at, the model samples an experience ft and reinstates ft’s associated context cft, which is combined with ct−1 to form a new context ct to guide the ensuing reactivation. The dashed arrow indicates that ct becomes ct−1 for the next time step. At any t, the replay period ends with a probability of 0.1 or if a task-irrelevant item is reactivated.