Effects of exploring a novel environment on memory across the lifespan

Participants

A total of 487 visitors of the NEMO Science Center in Amsterdam aged 8 years or older volunteered to participate in this study. Data was collected during a 2-week Science Live exhibition, during which we tested all visitors interested in volunteering during all opening hours of the NEMO Science Center. While this somewhat restricted our control over the age and the total number of participants, it yielded a final sample size that largely exceeded that of prior studies (e.g., between 30 and 103 participants in references25 and28). Forty-five participants were excluded: 17 participants were excluded because of administrative issues (e.g., accidental reuse of a participant number), seven due to technical issues (e.g., task crash), seven because of language issues (e.g., unable to understand the instructions), six because they worked together or received help from a parent, five participants because they did not finish the tasks in sequence (e.g., with a long break to visit an exhibition show), and two participants because they talked on the phone during the word learning task. As such, 439 participants were included in the main analyses (401 performed the task in Dutch and 38 in English). As the landmark test did not run on all laptops due to technical issues, the number of included participants that completed this task was only 331. Participants were classified as children (8–11 years; mean = 9.33; SD = 1.15), adolescents (12–17 years; mean = 13.19; SD = 1.43), younger adults (18–44 years; mean = 32.73; SD = 8.27) or older adults (> = 45 years [range 46–77]; mean = 53.30; SD = 8.23 ) based on their age (and presumed associated differences changes in dopaminergic functioning:38,45,46). Supplementary Information (SI): Appendix 1 shows demographics and the distribution of participants over age groups and conditions. Participants in the first testing week performed a word learning task with a deep encoding, and participants in the second week performed a shallow encoding task. For participants within each age-group, age distributions were similar across the different novelty and level of processing conditions (for novelty and level of processing respectively, children: p = 0.598 and p = 0.405; adolescents: p = 0.568 and p = 0.155; young adults: p = 0.077 and p = 0.815; old adults: p = 0.658 and p = 0.733). Also sex distributions were similar over conditions (Pearson Chi-Square for novelty and level of processing respectively, children: p = 0.216 and p = 0.821; adolescents: p = 1 and p = 0.128; young adults: p = 0.214 and p = 0.853; older adults: p = 0.285 and p = 0.241).

All participants or a participant’s parent in case of minors, gave written informed consent. Participants could choose to perform the tasks in Dutch or English. The study was approved by the Psychology Research Ethics committee (CEP) of Leiden University, the Netherlands. All procedures were in line with the Declaration of Helsinki (1964, and later amendments), and followed relevant COVID-19 guidelines and regulations.

General procedure

Throughout all procedures the experimenters were wearing a mask and gloves as a safety regulation regarding the COVID-19 pandemic. For data collection we used six laptops in two spacious testing rooms that allowed for social distancing (> = 1.5 m). The experimenter stayed in the testing room throughout the entire procedure to start the tasks and to answer questions. The entire experimental procedure took approximately 15–25 min.

Data was collected at the NEMO Science Center in Amsterdam. Upon arrival, participants were asked to disinfect their hands as part of the COVID-19 protocol. Before participation, participants or their parents read the information letter and were given the opportunity to ask questions. After giving written informed consent, the participants were seated before they performed a series of tasks on a laptop.

Stimuli and apparatus

The VEs were created using Unity Version 2017.2.21f1 (Unity Technologies, 2017), and were matched in size, path length, and number of intersections. Both VEs consisted of fantasy islands with unusual landmarks (such as a slot machine) at intersections or road endpoints, including land and a body of water (see Fig. 1). The VEs were presented on laptops running on Windows 10 (Microsoft, 2015). Participants could move forward using the W key on the keyboard and the mouse to determine the heading direction. During exploration the X, Y and Z coordinates of the moving agent were logged for all timepoints with a sampling rate of about 15 Hz. The VAS I, VAS 2 and word learning task were programmed and presented using Open Sesame 3.3.347, the landmark task and NS questionnaire were created using E-Prime 3.0 software (Psychology Software Tools, Pittsburgh, PA).

Figure 1 Screenshots of the two virtual environments. The environments contained landmarks at intersections and road endpoints, and were matched in size, number of intersections, number of landmarks, and path length. Full size image

For the word learning task fifteen Dutch neutral nouns were chosen from the CELEX lexical database and translated to English for non-Dutch speakers48. The same words were used in the novel and familiar, and shallow and deep encoding conditions. Four words referred to an animal (“alive”) and eleven words referred to a non-living thing (“not alive”). Similarly, four words started with a closed letter (e.g., “Boat”) and eleven with an open letter (e.g., “Wolf”).

Participants were reminded of the response keys and task during the encoding, recall and recognition phase of the word learning and landmark tasks. The response keys were shown below the word, in the location corresponding to the keyboard, and in the semantic task the response keys were further accompanied by the picture of a cow (to indicate a living thing) and a chair (to indicate a non-living thing). These reminders were included to lift the working memory load, especially because this otherwise could have made the task disproportionally difficult for the younger children.

Landmarks were objects from the Unity Asset store, and included a wide range of easily recognizable objects, such as an airplane and desk chair. Pictures of the landmarks presented on a grey background were used in the landmark memory test. During this test also lures were presented, which consisted of objects that were not part of either of the two VEs.

Exploration phases and affective ratings

Participants received scripted verbal instructions regarding how to navigate through the VE. The ‘W’ key (for ‘walk’) could be used to move forward, and the mouse could be used to look around and determine heading direction. The space bar could be used to jump, although there was no function in jumping, as one could not jump on top of things. Participants were instructed that they could navigate freely but should try to stay on the paths. During the first familiarization phase, participants explored the VE for 3 min. After exploration, they were asked to indicate their happiness (“How happy are you?”, from 1 = extremely unhappy to 9 = extremely happy) and arousal (“How aroused are you?”, from 1 = very calm to 9 = very excited) on a visual analogue scale (VAS) with Self-Assessment Manikins49. They could use the number keys to indicate their answers, and completing the ratings took less than 1 min.

During the second exploration phase participants explored either the same (i.e., familiar) or a new VE for another 3 min (i.e., novelty and VEs were counterbalanced). After this exploration, participants were asked to rate their happiness and arousal levels again on the same two VAS as before the first exploration. See Fig. 2 for the experimental task sequence.

Figure 2 Experimental task sequence. Tasks are shown in sequential order from top to bottom. During the first exploration phase participants explored one of the two virtual environments (counterbalanced between participants). Participants filled out Visual Analogue Scales to report current mood and arousal state44 In the second exploration phase participants either explored the same (familiar condition) environment again or a new one (novel condition). The depth of encoding during the word task was varied between subjects, with participants either performing a semantic (deep encoding condition) or shallow encoding task. After a short distractor task, memory ways tested with free recall and a recognition test. After a visuomotor adaptation task (not reported here) landmark memory was tested with a recognition test with confidence judgments. Finally, adults filled out the full Novelty Seeking scale of the TPQ35,36,45, while children answered NS-related questions (non-standardized). Full size image

Experimental tasks

For the word task, instructions were shown on the screen. During the encoding phase, fifteen nouns were shown in a random sequence (we believe this number of items to be sufficient to identify individual and condition differences, as the smaller 10-word learning list from the Consortium to Establish a Registry for Alzheimer’s Disease [CERAD] has been shown to be a sensitive measure for detecting mild cognitive impairment and identifying early symptoms of Alzheimer’s disease, suggesting that relatively short word lists are sufficient to robustly identify individual differences in memory performance50,51. Also other neuropsychological test batteries use relatively short word lists, such as the California Verbal Learning Test [CVLT;52] which uses 16 words, or the Rey Auditory Verbal Learning Test [R-AVLT] which uses 15 words53). In the first week of data collection, word learning involved a deep encoding task in which participants had to judge whether the shown word represented a living (e.g., a cow) or a non-living (e.g., a chair) thing. During the second test week, word learning involved a shallow encoding task in which participants had to indicate whether the first letter of the shown word had an open (such as a “W”) or closed (such as an “O”) shape. Each word was presented for a duration of 3000 ms (irrespective of whether a response was given or not). In between words a fixation cross was shown for 500 ms. After the encoding phase, participants performed a series of nine simple math problems (e.g., 4 – 3 or 7 + 1) in a distractor task. The solution to all problems varied between 1 and 9. Next, participants were prompted to enter as many words as they could remember from the encoding phase. They were instructed to press ENTER, to continue entering words or to press ESC + ENTER to continue if they could not remember any more words. In the following recognition test all 15 words from the encoding phase were randomly shown, interspersed with 10 lures (new words that were not presented during encoding). Participants had to indicate for each word whether it was old (“press X”) or new (“press N”). Each word was shown until a response was given. All phases of the word task were finished in 3–4 min. Recall was quantified by the percentage correctly remembered words, while recognition was quantified by the corrected hit rate (CHR = percentage old hits – percentage new false alarms). Next, participants performed a visuomotor adaptation task, which was completed in 2–3 min (results published in54).

The landmark test assessed memory for landmarks that participants could have encountered during the second exploration phase. In total 35 landmarks were shown, of which 20 were present in the second VE (i.e., “old) and 15 were lures (i.e., “new”). Participants had to indicate for each landmark whether they saw it before (“press X for old”) or not (“press N for new”). When participants indicated “old” they were further asked to indicate whether they thought the landmark was “sure old” (“press X”), “probably old” (“press N”) or whether they guessed (“press M “). Each landmark was shown until a response was given and the test had a duration of approximately 2–3 min. As an estimate of landmark recollection, the “sure” CHR was calculated (i.e., “sure” old hits—new false alarms).

Novelty seeking questionnaire

Finally, participants reported their sex (male; female; other), age in years, and handedness (right; left; ambidextrous). Adults (> 17) subsequently filled out the 34 items of the NS scale of the Tridimensional Personality Questionnaire39,40,55, whereas children and adolescents filled out a simplified and abbreviated (20 item) version of the questionnaire. Each question remained on the screen until a response (“X” = yes; “M” = no) was given. All questions could be answered in about 2–5 min. Afterwards feedback was shown on basis of the total NS score (i.e., with a subdivision into low, medium, and high scorers). These cut-off scores were only used to provide the participants feedback and were not used in any analyses.

Analyses

Memory performance

Recall and CHR for words were subjected to 2*2*4 ANOVAs with Novelty (novel; familiar), Encoding type (shallow; deep) and Age group (children; adolescents; younger adults; older adults) as between-subject factors. As we expected the effects of age on memory performance to be quadratic, with performance peaking in adolescents or young adults, we followed up a main effect of age group with a quadratic contrast38. In line with our hypothesis that older adults would show diminished effects of novelty, an interaction between novelty and age was followed up with three 2*2*2 ANOVAs with Novelty (novel; familiar), Encoding type (shallow; deep), and Age (either older adults vs. children, older adults vs. adolescents, or older vs. younger adults) as factors. As the groups between conditions were unequal, we also included Encoding type in this analysis, but the main effect and interactions with this factor are not interpreted. For all analyses, the α-criterion was set at 0.05, and Bonferroni-Holm correction was applied to compensate for multiple testing.

Roaming entropy, and other measures of exploration

Roaming entropy (RE) during the first and second exploration round was defined for each participant. In this analysis the Z-coordinates were omitted, as the VEs consisted of only one location for each of the XY coordinates (although it was possible that people jumped at a location, they could not climb on anything). As there was a very high number (6.31 million) of possible locations the likelihood that the same coordinates were visited during the 3-min exploration was small, therefore the individual paths were smoothed using a Gaussian filter with a width of 100. Then a likelihood matrix p j was calculated for each of the two VEs, where the likelihood that someone visited each of the XY positions j was calculated, by dividing the total number of visits to that location by the total number of visited locations for all participants. See Fig. 3A for the map of one of the VEs, and 3B for a heatmap depicting the number of visits for all XY coordinates for that VE.

Figure 3 Maps of one of the virtual environments. (A) Depicts the map of one the VEs. (B) Shows the number of visits per XY-coordinate in a heatmap for all participants that explored that island. The spawn point in the top left is visible as a highly visited region. Outlines of landmarks can be recognized at some ends of paths. Individual navigation traces show that some people left the paths and used short-cuts to other paths. This data was used to calculate a probability matrix reflecting the likelihood that each of the locations was visited (see main text). Note, the number of visits per XY-coordinate is relatively low, despite smoothing. The probability that each location was visited was used to calculate the roaming entropy (RE) of individuals. Full size image

Roaming entropy (RE i ) was calculated per participant and exploration round by summating over the product between the individual’s path (p ij ) and the log of the probability that each location was visited (p j ) divided by the log of the number of possible locations (k):

$$RE_{i} = - \sum\limits_{j = 1}^{k} {\left( {\frac{{p_{ij} \log (p_{j} )}}{\log (k)}} \right)}$$

High RE indicates that the participant explored more of the less-often-walked paths, while a lower value reflects higher concordance to the often-walked paths.

In addition to RE, we calculated the total distance travelled (in Unity meters) as the sum of Euclidian distances between successive datapoints (2D) and counted the number of landmarks that were encountered in the second exploration round for each participant by defining regions of interest (ROIs) for each of the landmarks for which memory was tested. These ROIs consisted of rectangular bounding boxes around the landmarks. ROIs could overlap in case landmarks were close to each other. For each participant it was determined which ROIs were visited. The total number of ROIs visited provides an additional measure of exploration, as it reflects how many regions were visited by the participant. A GLM including novelty (novel; familiar) and encoding type (deep; shallow) as categorical predictors, and RE for round 2 and age as continuous predictors of word recall was ran, to investigate whether exploration behavior as quantified by RE could predict later word recall above and beyond the effects of novelty and encoding type. We chose to include only RE and not distance travelled, or landmarks encountered in this model, as these measures were found to be positively correlated (see SI: Appendix 5). RE is the most commonly used measure of exploration and the only of these three measures for which we found a novelty effect. The GLM was ran on centered data to reduce multicollinearity. Multicollinearity was shown to be low, with all variance inflation factor (VIF) values < 1.15. It is of note that we ran our task on different laptops with varying specifications, which resulted in different sampling rates between participants, but as participants were randomly distributed over the laptops, and RE exhibits a similar pattern of results as the other exploration measures (e.g., distance traveled, and landmarks encountered) we believe that potential effects of these differences were minimal.

Preprint

A previous version of this manuscript was published as a preprint https://psyarxiv.com/r2tdn/