Study population and design
The UK Biobank is an ongoing, multi-centre prospective cohort study of over half a million participants, that provides a resource for investigating the determinants of disease in middle and older age [23]. The design and methods of this study have been described elsewhere [24]. Briefly, between 2006 and 2010, men and women aged 40–69 years were recruited from across England, Scotland and Wales using National Health Service (NHS) patient registers. Participants attended one of 22 assessment centres where they completed a touchscreen questionnaire, verbal interview, and provided measures of physical function alongside biological samples. Subsequently, participants were invited to complete additional measures, including enhanced dietary assessments, imaging, and assessment of multiple health-related outcomes. UK Biobank also includes linkage to electronic healthcare records (death, cancer, inpatient and primary care records) for disease ascertainment. Ethical approval for the UK Biobank study was provided by the North West–Haydock Research Ethics Committee (REC reference: 16/NW/0274), and all participants provided electronic signed consent. The current study included participants who self-reported a racial/ ethnic background of white British, Irish or other white, were aged ≥ 60 years at recruitment with genetic data, appropriate dietary data (self-reported atypical dietary reports were excluded) and were not missing data for any of the included covariates (Additional file 1, Fig. S1).
Dietary assessment and calculation of Mediterranean diet scores
The Oxford WebQ is a web-based, self-administered 24-h dietary assessment tool, validated for use in large-scale observational studies [25, 26]. This tool collects information about the consumption of 206 types of foods and 32 types of drinks during the previous 24-h period, with participants selecting the number of standard portions for each item that they consumed. Participants recruited between April 2009 and September 2010 completed the Oxford WebQ as part of their baseline assessment centre visits. In addition, between February 2011 and June 2012, participants were invited to complete the Oxford WebQ assessment via their home computer every three to four months, up to a total of five assessments (including the baseline assessment). Consistent with previous investigations [17, 27], we energy-adjusted the dietary data (2000 kcal/d) for each time point via the residuals method to allow evaluation of diet quality independent of diet quantity [28]. Data were then averaged across all available time points (minimum 1, maximum 5) for each participant prior to calculation of MedDiet scores.
We quantified MedDiet adherence using two separate scores: the MedDiet Adherence Screener (MEDAS) score, and the MedDiet PYRAMID score. These scores define MedDiet adherence in different ways (e.g., using different dietary targets and food components) and therefore may differ in terms of their association with dementia.
MEDAS score
The MEDAS is a 14-point score developed as part of the Prevención con Dieta Mediterránea (PREDIMED) trial [29] that has been used widely in trials and observational studies [8, 30]. The MEDAS has been validated for use in the UK (the UK-validated version of the MEDAS was used to develop our MEDAS scores in this study) [31] and endorsed for use as a rapid diet assessment screening tool in clinical practice by the American Heart Association [32]. The MEDAS is conventionally calculated with a binary evaluation for each of the 14 food components, with one point awarded if the participant’s consumption meets a pre-defined cut-off (e.g., intake of a specific amount of vegetables), and zero points if they do not. The total possible score ranges from 0–14 points. We have shown previously that using the same dietary targets but implementing an alternative continuous scoring system using linear equation principles (y = ax + b, in which y is the number of points scored between 0 and 1, a is the slope and b is the intercept), in which points are awarded between zero and one depending upon proximity to the dietary targets, increases the sensitivity of this score in detecting differences in diet quality [17]. Therefore, this score, referred to here as the MEDAS continuous score, was used for the primary analyses in the present study. As a hypothetical example to illustrate the difference between the MEDAS and MEDAS continuous scores, an individual with a daily vegetable intake of 295 g or ~ 1.5 * 200 g servings of vegetables would be awarded 0 points for this specific MedDiet component for the MEDAS score, as they have not achieved the dietary target of 2 servings (i.e., 400 g) vegetable intake per day. By contrast, according to the MEDAS continuous score, this individual would be awarded ~ 0.74 points (y = 0.5 * 1.475 + 0 = 0.7375 points), based around how close they are to the specific dietary target (i.e., ~ 3/4 of the way towards achieving the dietary target). We repeated the analysis using the conventionally-scored MEDAS as a sensitivity analysis.
Both the MEDAS and MEDAS continuous scores award points for use of olive oil as the main culinary fat and, separately, for consumption of a target amount (4 or more tablespoons per day) of olive oil. Although we were able to determine use of olive oil as a culinary fat and to award points for consumption (1 point) or non-consumption (0 points) accordingly, it was not possible to determine the amount of olive oil consumed from the available dietary data, limiting the maximum possible scores for the MEDAS and MEDAS continuous to 13 points in this study.
PYRAMID score
The PYRAMID score is a 15-point MedDiet score used widely in epidemiological studies [9, 17, 27]. Each of the 15 individual components are coded on a continuous basis with scores ranging from zero to one (26). Further details of the calculation of each MedDiet score is provided in Additional file 1, Tables S1 and S2. For both MedDiet scores, higher values reflect greater adherence to the MedDiet.
Polygenic risk score
To estimate genetic risk of dementia, we used the polygenic risk score developed by Lourida and colleagues, who demonstrated that higher values of this score are associated with higher all-cause dementia risk in the UK Biobank cohort [22]. The score was based on a genome-wide association study of individuals of European ancestry [33]. Therefore, the current analysis was restricted to individuals who self-reported a racial/ ethnic background of white British, Irish or other white (who constitute > 90% of the UK Biobank cohort). For the primary analyses, the polygenic risk score was divided into quintiles, and participants were categorised into low (quintile 1), medium (quintiles 2–4) and high (quintile 5) risk groups. A total of 249,273 independent genetic variants were used to create the polygenic risk score. Further details of the polygenic risk score creation and this approach can be found elsewhere [22].
Dementia outcome ascertainment
All-cause incident dementia cases were ascertained using data linkage to hospital inpatient records and death registries. Diagnoses were recorded using the International Classification of Diseases (ICD) coding system [34]. Participants with a primary or secondary diagnosis of dementia were identified from hospital records or underlying/contributory cause of death from death registries using relevant ICD-9 and ICD-10 codes (Additional file 1, Table S3.). We used the censoring dates recommended by UK Biobank for death data and hospital inpatient data. These are the dates up to which the data is estimated to be over 90% complete in England, Scotland and Wales separately. At the time of analysis, the recommended censoring dates were 31st March, 2021 for England and Scotland, and 28th February, 2018 for Wales. The mean (SD) and median (interquartile range) follow up was 9.1 (1.7) and 9.3 (8.8–9.7) years, respectively. Follow up time was calculated from the most recent eligible dietary report used for MedDiet score creation and either the date of first dementia diagnosis, death, loss to follow-up, or censoring date, whichever was the earliest.
Statistical analysis
All analyses were conducted in SPSS version 27. Baseline characteristics of the analytic sample, stratified by dementia status, were summarised as mean ± SD for continuous variables and as percentages for categorical variables. Cox proportional hazard regression models were used to examine the association between MedDiet adherence and time to incident all-cause dementia, with the duration of follow-up in years used as the timescale. We also explored the association between the polygenic risk score and dementia incidence, to confirm the previously reported associations between these variables in this cohort [22]. The possible interaction between MedDiet adherence and polygenic risk for dementia was investigated by including an interaction term, with both variables expressed continuously.
Analyses were adjusted simultaneously for: age, sex, socioeconomic status (Townsend Index categorised as low [quintile 1], moderate [quintiles 2–4], high [quintile 5] deprivation), education (higher [college/university/other professional qualification], vocational [NVQ/HND/HNC], upper secondary [A-levels], lower secondary [O-levels/GCSEs /CSEs] or none), smoking status (never, past, current), typical sleep duration (< 7, 7–8, > 8 h), physical activity (international physical activity questionnaire [IPAQ] group, categorised as low, medium, high), energy intake (kcal/d), third-degree relatedness of individuals in the sample, and the first 20 principal components of ancestry. Models which included the polygenic risk score were additionally adjusted for the number of alleles included in the score, to account for SNP-level variation [22]
Sensitivity analyses
Sensitivity analyses were performed to test the robustness of associations between MedDiet adherence and dementia incidence. First, we used the conventional binary MEDAS score. Secondly, we included participants with a minimum of two, 24-h diet recalls to provide a more stringent measure of habitual dietary intake [26]. Thirdly, we excluded participants with 24-h recalls with extreme energy intakes (defined as < 800 or > 4200 kcal/d for males and < 600 or > 3500 kcal/d for females) [35]. Fourth, to assess whether any individual components of the MedDiet drove the observed associations, we repeated the analyses after sequentially removing each MedDiet component from the total score. Fifth, in consideration of the potential for reverse causality, we repeated the primary analyses after excluding participants with less than 2- and less than 5-years of follow-up, respectively. Sixth, we repeated the analyses including potential mediators individually; stroke history (yes/no for any type of stroke diagnosed prior to dementia diagnosis or end of follow-up for those who remained dementia-free), self-reported depressive symptoms (yes/no for reporting feeling down/depressed/hopeless on ‘several days’, ‘more than half the days’ or ‘nearly every day’), and body mass index (BMI) category (< 25, 25–29.9, > 30 kg/m2). Seventh, as an alternative method of exploring whether associations between MedDiet adherence and dementia risk were influenced by polygenic risk score, we conducted stratified analyses exploring associations between MedDiet adherence and dementia risk in low, medium and high genetic risk categories. Eighth, we investigated the interaction between MedDiet adherence and genetic risk, with genetic risk defined by Apolipoprotein E (APOE) genotype only (a more common but less comprehensive measure of genetic risk, which may be easier to apply in clinical practice). APOE ε4 carriers were defined as higher risk, whilst non-carriers were defined as lower risk. Nineth, to evaluate the influence of missing data, we repeated analyses following imputation of missing dietary and covariate data using multiple imputations by chained Eqs. (70 imputations, 20 iterations) [36]. We included all analytic variables (covariates and outcome data) as predictors in the model. In addition, we created abbreviated MedDiet scores using dietary data from the UK Biobank touchscreen questionnaire (data available for all participants) which were used as auxiliary variables in the imputation model. Tenth, we carried out separate analyses for fatal and non-fatal cases of dementia. Eleventh, we conducted stratified analyses in individuals with higher (college/university/other professional qualification) and lower (vocational, upper secondary, lower secondary, and none) education levels.