Participants
Thirty-six participants were recruited from the New York University (NYU) Psychology Subject Pool and the local community. Sample size was estimated using a G*Power 3.1 power analysis (alpha = 0.05, power = 0.80, d = 0.80), based on the pooled temporal memory performance from a very similar event boundary experiment36. This power analysis yielded a required sample size of 29 participants. To account for attrition, we recruited 36 participants. Participants were required to have normal or corrected-to-normal vision and hearing, to have no metal in their body, and to not be taking beta-blockers or psychoactive medications. All participants provided written informed consent approved by the NYU Institutional Review Board and received monetary compensation for participation.
Four participants were fully excluded from analyses (reasons included falling asleep in the scanner and audio malfunction). Therefore, our final sample size was 32 participants (20 F; M age = 22 years, SD age = 2.7 years). Of this sample, 15 participants identified as Asian, 2 as Black/African American, 4 as more than one race, and 11 as White.
Of these 32 participants, five requested to leave the scanner early and thus did not complete all 10 blocks of the task (n = 2 completed 9 blocks, n = 1 completed 8 blocks, n = 2 completed 7 blocks). Finally, four participants’ eye tracking data was excluded due to equipment malfunction or poor quality, leaving 28 participants with usable data for all analyses that included blink data.
Sex differences were not analyzed or included as covariates in any analyses, because they were not central to our predictions and there were relatively small sample sizes of each type.
Materials
Participants viewed 512 grayscale images of everyday objects62,63, which were resized to 300 × 300 pixels, placed on a gray background, and luminance-normalized using the MATLAB SHINE toolbox to control for the effect of brightness on pupil behavior. To embed these images within distinct auditory contexts, participants were presented with 1 s pure tones (created with Audacity; https://www.audacityteam.org/) in either the left or right ear. The six tone frequencies used ranged from 500-1000 Hz in 100 Hz intervals, were perceptually discriminable, and were sufficiently arousing to facilitate engagement in the task.
Code and data accessibility
The code, experiment materials, and data for this study are publicly available on the OSF account of E.M. (osf.io/yt6hm).
Overview of experimental design
This experiment took place over two days. On the first day, participants completed several questionnaires and a practice task block before completing the full event sequence task in the MRI scanner. The MRI session took approximately 3 h, with 2.5 h of scanning. On the second day, approximately 24 h later, participants returned to perform a surprise item recognition memory test (not described in the current paper).
Event sequence encoding task
While in the MRI scanner, participants completed a modified version of an event sequence encoding task that has been shown to elicit reliable event segmentation effects in memory, as indexed by significant time dilation in retrospective distance ratings36 (Fig. 1). Our goal was to use these boundary-induced subjective temporal distortions to operationalize whether memories had become separated at context shifts. That is, if item pairs spanning a boundary were remembered as appearing farther apart than item pairs from the same context—despite being the same objective distance apart during encoding—then they were more likely to have become encoded as distinct episodic memories.
In this encoding-retrieval block design, participants were presented with sequences of 32 object images for 2.5 s each. For each image, participants made a button press to respond to a simple orienting question (Is this object larger or smaller than a standard shoebox?). Additionally, participants were asked to form their own mental narrative of the image sequence to encourage associative encoding64. A jittered ISI (3, 5, or 7 s fixation cross) separated each image presentation to optimize deconvolution of the blood-oxygen level dependent (BOLD) signal during the fMRI analyses.
Each item sequence was divided into four auditory events that contained 8 images each. Events were defined by a stable auditory context, in which 1 s pure tones played either in participant’s left or right ear only, halfway through each jittered ISI. The laterality of the tone also cued participants to use either their left or right hand to make button responses for the object size judgments (left ear = left hand). Thus, the tones not only provided sensory information to elicit perception of stable mental events, but were also task-relevant. After the final image in an 8-item event, the tone abruptly switched sides to the opposite ear and changed pitch. Participants were also instructed to switch the hand used to make button responses. This salient change constituted an event boundary in the sequence, marking the end of one event and the beginning of the next. The new context then remained the same for the rest of the 8-item event.
Each item sequence contained three tone switches total, resulting in four stable auditory events. There were 10 encoding-retrieval blocks total across the experiment. The starting ear/hand was counterbalanced across blocks. Pitch changes were pseudorandomized, such that the same frequency (e.g., 500 Hz) was not heard more than once in any given encoding block. To separate the encoding and retrieval tasks and to reduce any recency effects in memory, participants completed a brief arrow detection task (45 s) after each encoding block. Participants were presented with 0.5 s left-facing (<) or right-facing (>) arrows in the center of the screen, separated by 0.5 s ISIs of fixation. They were instructed to indicate which direction the arrow was facing via button press as quickly as possible.
Temporal distance memory test
At the end of each encoding block, participants completed two temporal memory tests: the first for temporal order and the second for temporal distance. Given that our primary focus was on subjective distortions in distance memory, the results from the temporal order memory test are reported elsewhere37.
In the temporal distance memory test, 14 item pairs from the encoding sequence were presented for a fixed duration of 5 s each. Importantly, each of these item pairs had been separated by exactly three images during encoding. Because the objective distance between to-be-tested pairs was pseudorandomized to always be equivalent (approximately 32.5 s from onset of the first image to the offset of its pairmate), we were able to measure subjective temporal distortions in memory across conditions.
Accordingly, participants were asked to judge how far apart in time two images appeared. There were four options: very close, close, far, and very far. To replicate previous effects of event boundaries on temporal distance memory, we compared ratings for two types of item pairs: same-context pairs (8 trials per block) and boundary-spanning pairs (6 trials per block). Same-context pairs contained images from the same context during encoding, while boundary-spanning pairs contained images that spanned a change in context (i.e., task-relevant tone switch) during encoding.
We excluded the first same-context item pair from all analyses, because this pair contained the first image in each block. As such, it likely constituted a task-irrelevant event boundary and would therefore produce different behavioral effects than the other same-context pairs. Due to a programming error, one of the boundary pairs from each list also contained an incorrect item. Data for these specific pairs (appearing once per block across 23 participants) were excluded from the analyses. Finally, one block was excluded for 9 participants due to a timing error.
fMRI acquisition and preprocessing
fMRI/MRI data acquisition
All neuroimaging data were acquired with a 3 T Siemens Magnetom PRISMA scanner using a 64-channel matrix head coil. First, participants underwent a high-resolution MPRAGE T1-weighted anatomical scan (slices = 240 sagittal; TR = 2300 ms; TE = 2.32 ms; TI = 900 ms; FOV = 230 mm; voxel in-plane resolution = 0.9 mm2; slice thickness = 0.9 mm; flip angle = 6°; bandwidth = 200 Hz/Px; GRAPPA with acceleration factor = 2; scan duration: 5 m 21 s).
Next, participants underwent a T2-weighted functional scan (slices = 240 sagittal; TR = 3200 ms; TE = 564 ms; FOV = 230 mm; voxel in-plane resolution = 0.9 mm2; slice thickness = 0.9 mm; flip angle = 6°; bandwidth = 200 Hz/Px; GRAPPA with acceleration factor = 2; scan duration: 3 m 7 s). The task audio was also calibrated during this scan to ensure the tones were audible above scanner noise and could be discriminated from one another. Participants could make button presses to request changes in tone volume. Additionally, we collected two fieldmap scans to assist with functional imaging unwarping (1 in anterior-posterior (AP) phase encoding direction and 1 in posterior-anterior (PA) phase encoding direction). Afterwards, we collected a short fast spin echo (FSE) sequence MRI scan that enables visualization of the locus coeruleus (LC). Details are reported elsewhere37.
After the structural scans were collected, participants underwent separate functional imaging for each of the 10 encoding blocks and the 10 retrieval blocks. Functional scans were collected using a whole-brain T2*-weighted multiband echo planar imaging (EPI) sequence (128 volumes per encoding block; TR = 2000ms; TE = 28.6 ms, voxel in-plane resolution = 1.5 × 1.5 mm2; slice thickness = 1 mm with no gap; flip angle = 75°, FOV = 204 mm × 204 mm; 136 × 136 matrix; phase encoding direction: anterior-posterior; GRAPPA factor = 2; multiband acceleration factor = 2). In each volume, 58 slices were tilted −20° of the anterior commissure-posterior commissure line and were collected in an interleaved order. A single-band reference image was collected for each run of the task, but was not included during preprocessing.
fMRI preprocessing
Image preprocessing was performed using FSL Version 6.00 (FMRIB’s Software Library, www.fmrib.ox.ac.uk/fsl). Functional images were preprocessed using the following steps: removal of non-brain tissue using BET; B0 unwarping using fieldmap images; grand-mean intensity normalization of the 4D data set by a single multiplicative factor; and application of a high-pass temporal filter of 100 s. No spatial smoothing was applied due to the small size of the VTA and to preserve spatial specificity. Motion correction was performed using the MCFLIRT tool, which produced six motion nuisance regressors. Additionally, fsl_motion_outliers was used to identify volumes with extreme head movements, or frame displacements, using the DVARS option. Both this matrix of outlier volumes and the six motion regressors were as modeled as covariates in the subsequent GLM analyses. Entire blocks with excessive head motion overall (conservatively defined as mean frame displacement > 1 mm) were excluded from analysis, resulting in the removal of one block each from three participants. Each participant’s denoised mean functional volume was co-registered to their T1-weighted high-resolution anatomical image using brain-based registration (BBR). Anatomical images were then co-registered to the 2 mm isotropic MNI-152 standard-space brain using an affine registration with 12 degrees of freedom.
Physiological denoising
Eight separate physiological nuisance signal regressors were extracted for the subsequent GLM analyses. First, FSL FAST was used to decompose each participant’s high-resolution anatomical images into probabilistic tissue masks for white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). The CSF and WM masks were thresholded at 75% tissue-type probability to increase their spatial specificity and reduce potential overlap. Following a similar approach to Bartoň et al. 65, we defined eight 4 mm spheres in representative regions of WM and CSF (four of each type; for exact coordinates, see Bartoň et al. 65). The eight spheres and WM and CSF anatomical masks were then transformed into each participant or block’s native functional space and merged to further increase their spatial specificity. Nuisance timeseries for each of the four WM and four CSF merged masks were then extracted from each block’s preprocessed functional data and modeled as nuisance regressors in the GLMs.
VTA region-of-interest (ROI) definition
To create an anatomical mask of the VTA, we used a publicly available probabilistic atlas that was originally created using individual hand-drawn ROIs66. This standard-space VTA mask was thresholded at 75% probability to increase its spatial specificity and then registered into each participant’s functional run (i.e., block) of the encoding task.
Hippocampal subfield segmentation and quality control
We used Freesurfer Version 6.0 (http://surfer.nmr.mgh.harvard.edu/) to segment bilateral hippocampal subfields CA2/3, DG, and CA1 from each participant’s high-resolution anatomical scan. To quality control this segmentation, we used guidelines designed for the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) consortium (https://enigma.ini.usc.edu/ongoing/enigma-hippocampal-subfields/). Details are reported elsewhere37.
fMRI analyses
Generalized linear modeling (GLM) analyses and acquisition of single-trial activation estimates of VTA activation
To estimate VTA activation at the trial level, we conducted Least Squares Separate (LSS) GLM analyses on the unsmoothed functional data from the sequence encoding task. This modeling approach generates unique activation estimates for each stimulus (i.e., tone or image) from the task. Importantly, the repetition or switching of tones carried the critical information about the stability or change in the surrounding context. Thus, using LSS GLM enabled us to isolate the distinct effects of context-relevant tones on brain activation.
Each LSS-GLM contained a total of 64 task-related regressors for each block of the encoding task, because there were 32 tones and 32 images in each block. Each tone was modeled as a 1 s stick function and each image was modeled as 2.5 s stick function. Both regressors were convolved with a dual-gamma hemodynamic response function (HRF). A total of 64 separate LSS-GLM analyses were conducted for each block of the ten blocks of the task, where one stimulus (image or tone) served as the regressor of interest and all other trials were modeled as a separate regressor. Each LSS-GLM thereby resulted in a unique activation estimate (i.e., beta map) across the whole brain for each stimulus67,68. To control for noise, a total of 14 nuisance regressors (4 WM, 4 CSF, and 6 motion regressors) were included in each GLM, along with individual nuisance regressors for trials with extreme head movements.
A VTA ROI analysis was performed on the resulting beta maps. The standard-space VTA anatomical mask was transformed into each participant’s native run-space for each encoding block and used to extract single-trial VTA parameter estimates (a measure of activation) for each of the 32 tone trials (3 boundary tones, or switches, and 29 same-context tones, or repeats). We excluded noisy datapoints using boxplot outlier removal by participant, resulting in the removal of approximately 3% of the tone-related VTA activation trials across the entire dataset.
Eye-tracking methods
Eye-tracking
Pupil diameter was measured continuously at 250 Hz during the event sequence task using an infrared EyeLink 1000 eye-tracker system (SR Research, Ontario, Canada). Raw pupil data, segmented by block, were preprocessed using ET-remove-artifacts, a publicly available MATLAB program (https://github.com/EmotionCognitionLab/ET-remove-artifacts). Although this software is typically used to clean pupil timeseries by interpolating over blink events and other artifacts37, our primary objective was to identify blink event timestamps and quantify their frequency within different periods of interest.
ET-remove-artifacts located blink events by identifying rapid changes in pupil size, or pupil velocity, following the approach described in Mathôt et al. 69. The velocity timeseries was computed by applying MATLAB’s finite impulse response (FIR) differentiator filter on the raw pupil size timecourse. This method provides a robust estimate of instantaneous rate of change while minimizing noise amplification. For our dataset, we selected FIR filter parameters (Filter Order = 4, Passband Frequency = 10, and Stopband Frequency = 12) to produce a smooth velocity timeseries with trough-and-peak profiles that were identifiable and specific to blink events rather than noise-related data loss.
The algorithm then used MATLAB’s “findpeaks” function to locate peaks and troughs in the pupil velocity timecourse. Blink profiles were distinctly identifiable as a contiguous trough followed by a peak in the velocity timeseries. To achieve a high degree of specificity to blink events, we set the Peak and Trough Threshold Factor and Trough Threshold Factor at 8 standard deviations of the velocity timeseries. This threshold is higher than those typically used for cleaning artifacts from pupil timeseries37, ensuring that only substantial blink-related troughs and peaks were counted as blink events rather than small and transient velocity changes attributable to other artifacts.
To generate a preprocessed pupil timecourse, the algorithm applied a linear interpolation across identified blink intervals. Artifact intervals exceeding 2 seconds were automatically imputed with NaN (missing data indicator).
Alongside the blink estimates, we also computed pupil dilation responses to both boundary tones and same-context tones. A temporal principal component analysis (PCA) was applied to these tone-evoked pupil responses to dissociate distinct autonomic and functional components of pupil-related arousal. Details about pupil analyses and methods are reported elsewhere37, because the goal of this manuscript was to focus on the specificity of the relationship between brainstem nuclei activation and blinking measures of neuromodulation. As mentioned previously, four participants were excluded from eye tracking analyses due to system malfunction or poor overall eye tracking quality, resulting in a total of 28 participants for all eye tracking-related analyses.
Computing local blink counts
Breakpoints in continuous experience have been shown to elicit a momentary increase in blinking31,34. Further, blink behavior has also been putatively linked to dopaminergic processes28, suggesting that blink behavior may offer a window into neuromodulatory processes that facilitate memory separation at boundaries. To test this idea, we used MATLAB Version R2022b to tabulate local blink count in a 1.5 s interval immediately after the onset of each tone. The size of this interval was chosen because the smallest ISI in the current paradigm was 3 s – that is, 1.5 s on either side of the tone onset. In this way, we were able to capture tone-induced blinks during a brief period that was uncontaminated by the preceding or ensuing images on any trial. To acquire highly conservative estimates of blink behavior that were not confounded by noise or data loss, blink intervals with more than 25% invalid samples were excluded from analyses. As with the VTA data, we also cleaned the trial-level blink responses using boxplot outlier detection within each participant. After these exclusions, approximately 78% of local blink intervals remained per participant, on average.
Computing sustained blink counts between to-be-tested item pairs
We also zoomed out to examine blink patterns across longer, behaviorally relevant windows of encoding to assess more sustained dopaminergic processes. We again used MATLAB Version R2022b to calculate blink count for the approximately 32.5 s window between each to-be-tested pair, from the onset of the first item to the offset of its pairmate (encountered four images later). We conducted the same noise-related and by-participant boxplot outlier removal procedure as before, which left approximately 79% of blink windows per participant, on average.
Testing linear relations between the VTA, temporal memory, and blink patterns
To assess trial-level relationships between our key variables, we used linear mixed-effects and cumulative link modeling. Linear mixed-effects modeling was used when the outcome of interest was continuous (i.e., trial-level VTA parameter estimates, post-tone blink count, and extended blink count). Fixed effect predictors in these models were categorical (i.e., tone type: boundary tone, same-context tone) and/or continuous (i.e., trial-level VTA parameter estimates).
Cumulative link models are the most common type of ordinal regression model, using a similar frequentist approach to linear mixed-effects models70,71. Thus, cumulative link modeling was used when the outcome of interest was ordinal (i.e., trial-level temporal distance memory ratings). Distance memory ratings were ordered from the smallest to largest rating: very close, close, far, very far. Fixed effect predictors in these models were categorical (i.e., pair type: boundary-spanning pair, same-context pair) and/or continuous (i.e., trial-level parameter estimates, post-tone blink count, and extended blink count). In the models with trial-level VTA parameter estimates as a fixed-effect predictor of distance memory, we included an additional categorical variable for Pair Position during encoding to account for potential list position effects. Including this predictor improved model fit (ps < 0.01). This predictor was also included in associated modeling analyses in the Supplementary Material (see Supplementary Figs. 6-7).
To meaningfully relate local measures (i.e., VTA parameter estimates and post-tone blink counts) to measures capturing a wider temporal window (i.e., extended blink count and distance memory ratings), we extracted the value associated with one tone between each to-be-tested item pair. For details on the exact positions of the selected tones and their position-matching across boundary and same-context pairs, see Supplementary Figs. 1-2.
Analyses were conducted in RStudio (Version 2024.2.29)72, with the main modeling using the lme473 and ordinal packages74. Each model included random intercepts for participant ID. Continuous predictors were mean-centered by participant to reduce the influence of individual differences75. This mean centering was performed separately for each model. Continuous predictors were also z-scored as necessary to improve model convergence. When the outcome of interest was continuous, standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% confidence intervals were computed using a Wald z-distribution approximation. When the outcome of interest was ordinal, effect sizes were computed as odds ratios (with 95% confidence intervals).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.