Study protocol
We investigated the multivariate associations between behavioral symptoms and task-based FC (MID task, SST and emotion reactivity task) with the widely used CPM23,51. The task-based connectome prediction analysis was conducted in the population-based IMAGEN sample of children aged 14 years. Additional analyses were then performed to discover the relationships between behavioral symptoms and crossdisorder neural circuits. Next, the predictive and crossdisorder connectome was investigated at several levels, using behavioral, longitudinal, genetic and clinical data. Notably, because psychiatric comorbidity is common in both males and females, we mainly focused on identifying the crossdisorder neural circuits across the whole population, not specifically for each sex.
IMAGEN
IMAGEN is a large-scale longitudinal neuroimaging–genetics cohort study (N = 2,000 at age 14, N = 1,300 at age 19) conducted to understand the biological basis of individual variability in psychological and behavioral traits and their relationship to common psychiatric disorders. The study involves a thorough neuropsychological, behavioral, clinical and environmental assessment of each participant. Participants also undergo biological characterization with the collection of T1-weighted structural MRI, task-based fMRI and genetic data. In this investigation, we used task and resting-state MRI, genetic and behavioral data. Notably, as a population-based approach, IMAGEN has balanced sample sizes for male and female participants (based on self-reported sex).
Development and Well-Being Assessment and Strengths and Difficulties Questionnaire
Behavioral symptoms of the IMAGEN participants were assessed using screening questions from the Development and Well-Being Assessment (DAWBA)52 and the Strengths and Difficulties Questionnaire (SDQ)53. DAWBA is a wide-ranging psychiatric screening questionnaire that was previously used to define subthreshold clinical symptoms in neuroimaging studies of subclinical psychopathology54. The SDQ was also used in this investigation because it contributes to the assignment of diagnostic status in the DAWBA52. At age 14, the parent-rated externalizing symptoms comprised ADHD (23 items), ODD (11 items), CD (10 items) and ASD (7 items). The child-rated internalizing symptoms included GAD (7 items), depression (8 items), SP (13 items) and ED (5 items). The full set of psychiatric questions asked in our investigation can be found in Supplementary Table 1. The choice of using different versions of questionnaires (that is parent-rated externalizing symptoms and child-rated internalizing symptoms) at age 14 was grounded on findings that externalizing problem scores from parents are more reliable than those from children themselves, and vice versa55. At age 19, however, because parent-rated questionnaires were unavailable, we used child-rated questionnaires for both externalizing and internalizing symptoms (Supplementary Table 1).
DAWBA also provides a diagnostic output for common psychiatric disorders, that is, the likelihood of a clinical diagnosis being made after rating. Of the 1,750 IMAGEN participants at age 14, 134 had a high risk for at least one diagnosis (that is, they scored 4 or 5, with over 50% chance of being diagnosed), and 39 participants met the criteria for two or more diagnoses. More specifically, 93 participants were likely to have one or more externalizing disorders (24 with ADHD, 45 with ODD, 59 with CD and 1 with ASD), and 46 participants were likely to have one or more internalizing disorders (16 with GAD, 21 with depression, 5 with ED and 14 with SP; see Extended Data Table 1 for more detail).
Monetary incentive delay task
Participants performed a modified version of the MID task (Supplementary Fig. 1) to examine neural responses to reward anticipation and reward outcome56. The task consisted of 66 10-second trials. In each trial, participants were presented with one of three cue shapes (cue, 250 ms) denoting whether a target (white square) would subsequently appear on the left or right side of the screen and whether zero, two or ten points could be won in that trial. After a variable delay (4,000–4,500 ms) of fixation on a white crosshair, participants were instructed to respond with a left or right button press as soon as the target appeared. Feedback on whether any, and how many, points were won during the trial was presented for 1,450 ms after the response (Supplementary Fig. 1). With a tracking algorithm, task difficulty (that is, target duration varied between 100 and 300 ms) was individually adjusted, such that each participant successfully responded on ~66% of trials. Participants had first completed a practice session outside the scanner (~5 minutes) during which they were instructed that, for every five points won, they would receive one food snack in the form of small chocolate candies. Our study used the task conditions consisting of hit anticipation, hit feedback and miss feedback.
Stop-signal task
Participants performed an event-related SST (Supplementary Fig. 2) designed to study neural responses to successful and unsuccessful inhibitory control57. The task comprised go trials and stop trials. During go trials (83%, 480 trials), participants were presented with arrows pointing either to the left or to the right. Participants were then instructed to make a button response with their left or right index finger, corresponding to the direction of the arrow. In the unpredictable stop trials (17%, 80 trials), the arrows pointing left or right were followed (on average 300 ms later) by arrows pointing upwards; participants were instructed to inhibit their motor responses during these trials. A tracking algorithm changes the time interval between the go and stop signal onsets according to each participant’s performance on previous trials (average percentage of inhibition over previous stop trials, recalculated after each stop trial), resulting in 50% successful and 50% unsuccessful inhibition trials. The intertrial interval was 1,800 ms. The tracking algorithm of the task ensured that participants were successful on 50% of stop trials and worked at the edge of their own inhibitory capacity. Our study used the SST measures consisting of stop success, stop failure and go wrong.
Emotional face task
The EFT was adapted from Grosbras et al.58. Participants watched 18-second blocks of either a face movie (depicting anger or neutrality) or a control stimulus. Each face movie showed black and white video clips (200–500 ms) of male or female faces. Five blocks each of angry and neutral expressions were interleaved with nine blocks of the control stimulus. Each block contained eight trials of six face identities (three female). The same identities were used for the angry and neutral blocks. The control stimuli were black and white concentric circles that expanded and contracted at various speeds, roughly matching the contrast and motion characteristics of the face clips. Our study used the EFT task conditions of neutral and angry faces.
Image acquisition
fMRI data were acquired at eight IMAGEN assessment sites with 3 T MRI scanners from different manufacturers (Siemens, Philips, General Electric, Bruker). The scanning variables were specifically chosen to be compatible with all scanners. The same scanning protocol was used at all sites. In brief, high-resolution T1-weighted 3D structural images were acquired for anatomical localization and coregistration with the functional time series. In addition, blood oxygen level-dependent (BOLD) functional images were acquired with gradient-echo, echo-planar imaging sequence. For all tasks, each volume consisted of 40 slices aligned to the anterior commission–posterior commission line (2.4-mm slice thickness, 1-mm gap). The echo time was optimized (30 ms, with repetition time (TR) of 2,200 ms) to provide reliable imaging of the subcortical areas.
Task-based functional image preprocessing
Task-based fMRI data were first prepreprocessed using SPM8 (Statistical Parametric Mapping, http://www.fil.ion.ucl.ac.uk/spm). Spatial preprocessing included slice time correction to adjust for time differences due to multislice imaging acquisition, realignment to the first volume in line, nonlinearly warping to the MNI space (on the basis of a custom echo-planar imaging template (53 × 63 × 46 voxels) created from an average of the mean images from 400 adolescents), resampling at a resolution of 3 × 3 × 3 mm3 and smoothing with an isotropic Gaussian kernel of 5 mm full-width at half-maximum.
Network construction
To estimate the condition-specific FC, we used the CONN toolbox (version 16.h) with the weighted generalized linear model method. Task condition regressors, 21 covariate regressors (21 covariate regressors consisting of 12 motion regressors (3 translations, 3 rotations and 3 translations shifted 1 TR before, and 3 translations shifted 1 TR later) and 9 additional columns corresponding to the long-term effects of the movement (3 nuisance variables for the white matter and 6 nuisance variables for ventricles, commonly referred to as CompCor correction59) were first regressed out from the raw BOLD signal of each region of interest (ROI). The residual signals were then further fed into weighted generalized linear models to investigate conditional time-series correlations (that is, the conditional FC) between any pairs of ROIs, where the temporal weight function for each condition was calculated as the corresponding, but now rectified, task condition regressor (that is, only time points expected with positive BOLD signals count). This approach not only amplifies the expected hemodynamic delay to each task condition but also deweights the initial and final scans when estimating functional correlation measures to avoid spurious jumps in BOLD signal and reduces the potential crosstalk between adjacent task conditions60. After this procedure, ROI:ROI FCs were calculated on the basis of the brain template from the 268-node functional brain atlas22 (Supplementary Fig. 3).
Connectome-based predictive modeling
We used CPM (Supplementary Fig. 4) to predict the participants’ behavioral symptoms from whole-brain, task-based FC. CPM is a recently developed method for identifying functional brain connections related to a behavior variable of interest, which is then used to predict behavior in novel participants (that is, participants whose data were not used in model creation)23. The CPM procedure was recently described in studies reporting its application to cognitive and psychiatry variables, such as fluid intelligence, attention control and ADHD51,61,62,63. The CPM processing pipeline is available online (https://www.nitrc.org/projects/bioimagesuite/). We slightly modified the original CPM, which used the leave-one-out crossvalidation, to a 50-fold crossvalidation process to hasten the process while maintaining robustness. In the first step, we randomly divided the data into 50 folds, where one fold was left out as the testing dataset while the other 49 folds were used as the training dataset. Next, a vector of behavioral scores (for example, ADHD symptoms) was associated with the edge of the connectome (that is, the FC matrix) across participants from the training dataset, with site and handedness being included as covariates. Then, a default threshold23 (that is, P < 0.01 in our study) was applied to retain only edges that were significantly associated (either positively or negatively) with behavioral symptoms in the training dataset. Analyses were also repeated with three additional thresholds (for example, 0.05, 0.005 and 0.001), demonstrating similar predictive performance (Supplementary Table 2). Next, the sum of the weights of positive and negative edges (negative edges will be multiplied by −1 before summing up) was calculated for each individual and entered into a linear regression model to estimate the relationship between the summed edge strength and the observed behavior in the training dataset. In the testing dataset, the summed edge strength of each individual was submitted to the corresponding linear model estimated in the training dataset to generate the predicted behavior score. This process was repeated 50 times, with predicted behavior scores in each testing fold established on the basis of the remaining 49-fold data. Finally, Spearman’s correlation was applied to estimate the model performance between predicted and actual behavior scores across all individuals. We repeated the CPM 1,000 times and continued further analyses using the edges selected in over 95% of models to select the most robust edges. For more details on CPM, see Shen et al.23.
Neuropsychopathology factor
The NP factor was constructed to represent longitudinally consistent and generalizable transdiagnostic brain signatures across externalizing and internalizing spectra. First, by applying CPM on condition-specific functional neural networks (that is, the functional connectome derived for each task condition), we identified crossdisorder edges that were associated with at least one externalizing symptom and one internalizing symptom simultaneously. Then, for each task condition, we investigated if the number of crossdisorder edges identified was significantly higher than a random observation using a permutation test (see Reliability assessment using permutation tests for more details). Only the significant, and therefore informative, task conditions and their crossdisorder edges were retained for further analyses. Next, given that different combinations of association directions with externalizing and internalizing symptoms have distinct neurobiological implications, we stratified these crossdisorder edges into four groups to improve interpretability: positive–positive (or negative–negative) edges that were associated with both externalizing and internalizing symptoms positively (or negatively); positive–negative edges that were associated positively with externalizing symptoms but negatively with internalizing symptoms; and negative–positive edges of negative associations with externalizing symptoms but positive associations with internalizing symptoms. Lastly, the four groups of crossdisorder edges were investigated for longitudinal consistency on the basis of their predictive performance on both externalizing and internalizing symptoms in the follow-up study at age 19, and the longitudinally consistent crossdisorder edges (that is, the FC strength) were summed to generate the NP factor. Please note that only positive–positive edges (that is, edges positively associated with both internalizing and externalizing symptoms) were found to be longitudinally consistent and used to compute the NP factor. Therefore, the NP factor may serve as a transdiagnostic neural indicator for comorbid externalizing and internalizing symptoms.
Reliability assessment using permutation tests
To investigate which task conditions provided reliable crossdisorder edges, we implemented permutation tests evaluating if identified crossdisorder edges from each task condition were indeed informative, that is, if the number of edges identified for the given condition was significantly larger than that in a random discovery (Supplementary Fig. 5). Due to the time-consuming nature of the proposed CPM analysis (1,000 repetitions of 50-fold crossvalidation as described in Connectome-based predictive modeling), the number of permutations was set as 1,000, which was sufficient to provide an accurate estimation of a P value as small as 0.01. This permutation process was also used to provide unbiased P values for the association of the crossdisorder network with behavioral symptoms.
Generalization datasets
To investigate whether the NP factor identified with the adolescent IMAGEN dataset using the task-based connectomes could be generalized into other developmental periods and fMRI states, we used multiple, large-scale, population-based datasets (ABCD cohort33 and the HCP64) and clinical case–control datasets (Stratify34 and ADHD-20065).
ABCD cohort
The dataset used for this study was selected from the Annual Curated Data Release (https://data-archive.nimh.nih.gov/abcd) of the ABCD cohort, which recruited 11,875 children between 9 and 11 years of age from 21 sites across the USA66. MRI data in the ABCD study were collected from different 3 T scanner platforms (Siemens Prisma, General Electric MR750 and Philips Achieva dStream). To minimize the biases introduced by multiple platforms, we only included MRI data from the most frequent manufacturer, Siemens Prisma; data from this manufacturer comprised 5,968 participants from 13 sites. By examining the similarity of brain activations across these 13 sites, we further selected 2,326 participants with consistent activation patterns from 4 sites. After quality control67, 1,966 participants of the MID task and 1,837 participants of the SST were included in further analysis. ABCD has balanced sample sizes for boys and girls (based on self-reported sex) (Table 1). To construct the NP factor in the ABCD dataset, with the same positive–positive edges used to establish the NP factor in the IMAGEN cohort, we extracted the corresponding FC of reward anticipation and reward positive feedback from the MID task and FC of the stop success and stop failure from the SST. The sum of FCs for the MID task and SST was the corresponding NP factor for the ABCD. For psychiatric symptoms, we used the Parent Child Behavior Checklist Scores (abcd_cbcls01) to assess the dimensional psychopathology in children68. The summed scores of externalizing and internalizing symptoms were used in further analysis. The ABCD Parent Diagnostic Interview for Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) provides a diagnostic output for common psychiatric disorders (abcd_ksad01). Diagnosis of ASD was provided from a clinical assessment questionnaire (abcd_screen01). Because the morbidity of SP (21.5%) with abcd_ksad01 in the ABCD dataset was much higher than that of other pediatric epidemiologic investigations of SP (4.8%)69,70, we excluded this diagnostic information in the clinical relevance analysis. For all analyses of ABCD data, we included site, family, handedness and sex as covariates in a mixed model71.
HCP
The dataset used for this investigation was selected from the March 2017 public data release from the HCP, WU-Minn Consortium. HCP has balanced sample sizes for men and women (based on self-reported sex; Table 1). Our sample included 1,081 participants (aged 22–35 years, mean age 31 years) scanned on a 3 T Siemens connectome-Skyra scanner. More details of participants and collection and preprocessing of data are provided at the HCP website (http://www.humanconnectome.org/). Externalizing symptoms were measured using the Achenbach Adult Self-Report (ASR) Syndrome Scales72 (ASR_Computed_Externalizing_Adjusted_T). For all analyses of HCP data, we included site, handedness and sex as covariates.
Stratify
Stratify recruited participants (ages 19–25) with anorexia nervosa, alcohol use disorder, bulimia nervosa, major depression and controls with no mental disorder diagnosis at three sites (Berlin, London and Southampton). The proportions of men and women (based on self-reported sex) varied across different mental health disorder groups (Table 1). Furthermore, the protocol of Stratify was harmonized to match the IMAGEN protocol. Stratify datasets collected task-based neuroimaging data of the SST and MID task. After quality control (the same quality control procedures as with the ABCD dataset67), 267 cases and 46 controls of the MID task and 380 cases and 64 controls of the SST were included in further analysis. For all analyses of Stratify data, we included site, handedness and sex as covariates.
ADHD-200
ADHD-200 is a grassroots initiative dedicated to accelerating the scientific community’s understanding of the neural basis of ADHD (aged 7–21 years). Males are predominant in the case group whereas both sexes (based on self-reported sex) are balanced in the control group (Table 1). Inclusion criteria included no history of neurological diseases and other chronic medical conditions and estimates of full-scale IQ above 80, and psychostimulant drugs were withheld at least 24–48 hours before scanning. Data were downloaded from the ADHD-200 consortium website (http://fcon_1000.projects.nitrc.org/indi/adhd200). In our study, we used data from four sites (Peking University, Kennedy Krieger Institute, New York University Child Study Center and Oregon Health & Science University) that recruited both participants with ADHD and control participants without ADHD. In total, there were 228 cases and 292 controls. For all analyses of ADHD-200 data, we included site, handedness and sex as covariates.
Genotyping for the IMAGEN study
DNA purification and genotyping were performed by the Centre National de Génotypage. DNA was extracted from whole-blood samples (∼10 ml) preserved in BD Vacutainer EDTA Tubes (Becton, Dickinson and Company) using the Gentra Puregene Blood Kit (QIAGEN), according to the manufacturer’s instructions. SNPs with call rates of <98%, minor allele frequency <1% or deviation from the Hardy–Weinberg equilibrium (P < 1.00 × 10−4) were excluded from analyses. Individuals with an ambiguous sex code, excessive missing genotypes (failure rate >2%) and outlying heterozygosity (heterozygosity rate of 3 s.d. from the mean) were also excluded.
Polygenic risk scores
To calculate the PRSs of depression, ADHD and intelligence, we used previously published GWASs of ADHD28, depression29 and intelligence30. The discovery depression GWAS consisted of 135,458 cases and 344,901 controls, the ADHD study consisted of 20,183 cases and 35191 controls and the IQ study included 269,867 individuals. We then used PRSice software (http://prsice.info/) to calculate the corresponding PRS. The clumping process was applied to retain only SNPs with the smallest P value for each linkage disequilibrium block (combined with a sliding window process to exclude any less significant SNPs with an r2 < 0.1 in 250-kb windows). PRSs were calculated at P value thresholds between 0 and 0.5 in increments of 0.01, and we used the mean PRSs of depression, ADHD and intelligence for subsequent analyses73.
Cognition–behavior phenotypes
Cambridge Cognition Battery
The Cambridge Cognition Battery (http://www.cambridgecognition.com/) comprised the Spatial Working Memory task (number of errors and strategies), the Cambridge Guessing Task (CGT; risk taking, quality of decision-making, delay aversion, deliberation time, overall proportion bet, risk adjustment), the Rapid Visual Information Processing task and the Affective Go-No Go task (mean correct latency for positive and negative stimuli, number of omission errors for positive and negative stimuli). The CGT quality of decision-making is the proportion of trials on which the participant chooses the most likely outcome. The CGT deliberation time is the reaction time to choose the color of the box. The overall bet is the overall bet across the trials. CGT risk taking is mean proportion of available points the participant stakes at each trial. CGT delay aversion is the difference between the risk-taking score in the descending and the ascending conditions. CGT risk adjustment is the degree to which a participant adjusts their risk taking according to the ratio of colored boxes, calculated as [2 × (proportion of points staked (%) at 9:1) + (% 8:2) − (% 7:3) − 2 × (% 6:4)] ÷ CGT risk taking. The Rapid Visual Information Processing task is a 10-minute test that measures sustained attention by presenting a rapid stream of digits and requiring participants to detect target sequences. A white box is displayed in the center of the screen, in which digits 2–9 are rapidly presented at 100 digits per minute. Participants are required to detect target sequences (for example, 2-4-7, 3-5-7 or 4-6-8) and respond to this target sequence as quickly as possible. Outcome measures include a signal detection theory measure of target sensitivity and mean response latency.
IQ
We measured intelligence using the fluency and verbal components of the Wechsler Intelligence Scale for Children, Fourth Edition74.
Delay discounting
We used the Monetary-Choice Questionnaire, as described by Kirby75. The Monetary-Choice Questionnaire is an efficient and reliable measurement of delay discounting that has been validated in adolescents76. For each participant, we estimated the k values that reflect how one discounts a reward value with the delay required to obtain it. The questionnaire contains 27 dichotomous-choice items pitting a smaller immediate reward against a larger delayed reward for three levels of reward magnitude (small, medium and large). Higher k coefficients in a hyperbolic discounting equation for each reward level represent greater preference for small immediate rewards and higher impulsivity. The geometric mean was calculated and logarithmically transformed to use in our analyses.
Personality
Substance Use Risk Personality Scale
The Substance Use Risk Personality Scale (23 items, self-questionnaire) was used to measure sensation seeking, impulsivity, anxiety sensitivity and negative thinking subscores, and has been shown to be related to substance use in adolescents77.
NEO Personality Inventory
The NEO Personality Inventory (60 items, self-questionnaire) explores the big-five domains of personality: neuroticism, extraversion, openness, agreeableness and conscientiousness78.
Temperament and Character Inventory–Revised
The Temperament and Character Inventory–Revised (36 items)79 was used to measure excitability, impulsiveness, reserve, disorderliness and their combined measure of novelty seeking.
Substance use
Alcohol
Alcohol abuse was assessed using the screening questions from the Alcohol Use Disorders Identification Test (AUDIT, ten items)80. The AUDIT was developed by the World Health Organization as a simple way to screen and identify people who are at risk of developing alcohol problems. AUDIT focuses on identifying the preliminary signs of hazardous drinking and mild dependence. It is used to detect alcohol problems experienced within the last year, and it is one of the most accurate alcohol screening tests available.
Smoking
Smoking behavior was assessed as the frequency (that is, cigarettes per day) of smoking during the last 30 days using the European School Survey Project on Alcohol and Other Drugs81.
Environmental risk
Childhood Trauma Questionnaire
The Childhood Trauma Questionnaire (CTQ)82 was used to assess childhood maltreatment across childhood and adolescence. It consists of five domains: emotional abuse, emotional neglect, physical abuse, physical neglect and sexual abuse. The scores from the five domains was summed for a total CTQ score; the higher the score the greater the severity of maltreatment.
School bully
School bully behavior was measured using an adapted questionnaire grounded on the Health Behaviour in School-aged Children survey. These questions were initially used in the revised Olweus Bully/Victim Questionnaire83.
Family stress
Family stress was measured using the family stress and socioeconomic item from the DAWBA. A larger score for this item indicates greater family stress.
Family drinking
Family drinking was measured using the parent AUDIT.
Other risks
Body mass index
Recorded weight and height were used to calculate the body mass index (weight in kilograms per height in meters squared).
Pregnancy and Birth Questionnaire
The Pregnancy and Birth Questionnaire was used to collect information during the pregnancy; it consisted of mother and father data, medical condition of mother (‘did the mother take any prescribed medication during pregnancy?’), smoking exposure (‘how many cigarettes did the mother smoke per day before pregnancy?’) and birth weight (‘what was the birth weight of the child?’).
Ethical approval
The IMAGEN study was approved by local ethics research committees at each research site: King’s College London, University of Nottingham, Trinity College Dublin, University of Heidelberg, Technische Universität Dresden, Commissariatà l’Energie Atomique et aux Energies Alternatives and University Medical Center. Informed consent was sought from all participants and a parent/guardian of each participant. The ABCD study conforms to each site’s institutional review board’s rules and procedures, and all participants provided informed consent (parents) or informed assent (children). The WU-Minn HCP Consortium obtained full informed consent from all participants, and research procedures and ethical guidelines were followed in accordance with the institutional review boards. ADHD-200 is a multicenter study, and each site was approved by the local research ethics review board. Signed informed consent was obtained from all participants or their legal guardians before participation. Stratify was approved by the London – Westminster Research Ethics Committee, and signed informed consent was obtained from all participants. Compensation for time and travel costs were provided for participants in the above cohorts, as approved by the ethical committees.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.