Associations between types and sources of dietary carbohydrates and cardiovascular disease risk: a prospective cohort study of UK Biobank participants - BMC Medicine

Study design and participants

UK Biobank is a prospective cohort study of 503,317 men and women aged 37 to 73 years recruited between 2006 and 2010 [20]. Eligible adults living within 25 miles of 22 assessment centres across England, Wales and Scotland (9.2 million) were identified from National Health Service (NHS) registers and invited to participate (response rate 5.5%). At baseline, participants provided detailed information on lifestyle and sociodemographic factors via a self-administered touchscreen questionnaire and interview, and physical measurements and biological samples were collected using standardised procedures (see Additional file 1: Supplemental methods). The UK Biobank was approved by the NHS North West Multicentre Research Ethics Committee (approval letter dated 29th of June 2021, reference 21/NW/0157), and all participants provided informed consent to participate and be followed through linkage to their health records. Further details regarding the study protocol and data access for researchers have been published elsewhere [21].

Assessment of carbohydrate intakes

Diet was measured using the Oxford WebQ questionnaire, an online 24-h dietary assessment [22]. This questionnaire was recently validated against energy expenditure measured by accelerometery and biomarkers for total sugar intake and found to perform well compared with traditional interviewer-administered 24-h dietary recalls [23]. Participants recruited between April 2009 and September 2010 completed the 24-h dietary assessment at the assessment centre. Participants who provided a valid email address at recruitment were invited to complete identical 24-h dietary assessments on four further occasions between February 2011 and April 2012 (Additional file 1 Fig. S1).

Intakes of 206 food items and 32 beverages were calculated from responses to each 24-h dietary assessment. Carbohydrate intakes were calculated by multiplying the carbohydrate content of food items and beverages by the frequency of intake using the UK Nutrient Databank food composition tables [24]. Types of carbohydrates calculated included total sugars, which were further separated into free sugars and non-free sugars (total sugars minus free sugars) [5], and fibre (non-starch polysaccharides [NSPs] measured using the Englyst method) [24, 25]. Sources of carbohydrates were also calculated as follows: refined grain starch (starch content of white bread, white pasta and rice, other cereals, pizza, samosas, pakoras, grain dishes with added fat, savoury snacks, savoury crackers, biscuits, cakes, pastries and desserts), and wholegrain starch (starch content of brown seeded and wholemeal bread, wholemeal pasta and brown rice, bran cereal, biscuit cereal, oat cereal and muesli) [26]. The starch content of wholegrain and refined grain foods were calculated to approximate the amount of wholegrain and refined grains consumed, as starch is the primary component of wheat grains [27]. Intakes of carbohydrates were calculated from the average of ≥ two (maximum of five) 24-h dietary assessments to minimise the effects of random error and within-person variability [9, 28]. See Additional file 1 Table S1 for further details on the food items and beverages used to calculate carbohydrate types and sources.

Ascertainment of cardiovascular disease outcomes

Information on date and cause for hospital admission were coded from linkage to Health Episode Statistics for English participants, the Patient Episode Database for Welsh participants, and Scottish Morbidity Records for Scottish participants. Date and cause of death were provided by the NHS Information Centre for English and Welsh participants and NHS Central Register Scotland for Scottish participants’ death certificates. At the time of our analyses, hospital admission data were available up until 30th of September 2021 for England, 31st of July 2021 for Scotland, and 28th of February 2018 for Wales, and death data were available up until 30th of September 2021 for England and Wales, and 31st of October 2021 for Scotland. Therefore, we censored analyses for all outcomes at the earliest censoring date for each country.

Primary outcomes were IHD, defined as a primary diagnosis of incident (fatal or non-fatal) IHD (ICD-10 [international classification of diseases, 10th revision] codes I21-I25) or coronary revascularisation (OPCS-4 [Classification of Interventions and Procedures, 4th revision] codes K49-K50, K75, K40-K46); total stroke, defined as primary diagnosis of incident (fatal or non-fatal) ischaemic or haemorrhagic stroke (ICD-10 codes I60-I61, I63-I64); and total CVD, defined as a primary diagnosis of incidental (fatal or non-fatal) IHD or total stroke (see Additional file 1 Table S2) [29,30,31,32]. We performed secondary analyses for IHD and stroke subtypes, including acute myocardial infarction (AMI; ICD-10 I21), ischaemic stroke (ICD-10 I63), and haemorrhagic stroke (ICD-10 I60-I61).

Measurement of triglycerides in lipoprotein subclasses

Lipids and other metabolic measures (168 absolute levels and 81 ratios) were quantified from a random subset of ~118,000 non-fasting plasma samples obtained from UK Biobank participants at baseline (2006–2010) using high-throughput NMR spectroscopy (Nightingale Health Ltd., Helsinki, Finland) [33]. In a recent UK Biobank study of macronutrient intakes and serum lipids measured by clinical chemistry [9], carbohydrate intakes were most strongly associated with total triglycerides, although it remains unclear whether these associations diverge for triglycerides within different lipoprotein subclasses [10, 34]. The Nightingale NMR platform provided simultaneous quantification of total triglyceride concentrations and triglyceride concentrations within 17 lipoprotein subclasses. Triglyceride measurements with ≥20% of values below the limit of quantification (LOQ) were excluded (n=1) and values below the LOQ were set to half the minimum lowest measured value for that triglyceride measurement (Additional file 1 Table S3) [35, 36]. Therefore, total triglyceride concentrations and triglyceride concentrations within 16 lipoprotein subclasses were included in this study. Non-fasting blood collection procedures are described in detail elsewhere [37], and further information on NMR spectroscopy measurements and quality control can be found in Additional file 1 Supplemental methods.

Exclusion criteria

Participants were excluded if they withdrew consent from the study (n=904), had prevalent CVD prior to their most recent 24-h dietary assessment (n=9132), or diabetes at recruitment (either self-reported diabetes diagnosis or were taking medication for diabetes; n=3759), or they did not complete ≥ two 24-h dietary assessments (n=376,074; see Fig. 1). Participants were also excluded if they did not have ≥ two 24-h dietary assessments after excluding dietary assessments with extreme energy intakes (outside the range of 3347 to 17573 kJ, or 800 to 4200 kcal/d for men, outside the range of 2092 to 14,644 kJ, or 500 to 3500 kcal/d for women [38]; n=2140) or where participants reported they were ill or fasting on the day of dietary assessment (n=811). The main prospective analyses included a total of 110,497 participants who completed on average 2.9 (SD 0.9) 24-h dietary assessments. For the observational analyses of carbohydrate intakes and triglycerides, participants were further excluded if they were missing values for one or more triglyceride measurements (n=84,402), leaving a total of 26,095 participants available for these analyses.

Fig. 1 Flow chart of participants included in the sample for the main prospective analyses (n=110,497) and the observational analyses of plasma total triglycerides and triglycerides in lipoprotein subclasses (n=26,095). Abbreviations: CVD cardiovascular disease Full size image

Statistical analysis

Carbohydrate intakes were expressed as a percentage of total energy intake, except for fibre which was expressed in grams per day (g/d), and each were categorised into quartiles. The baseline characteristics of participants were described by highest and lowest quartiles of total carbohydrate, free sugar, and fibre intakes.

Cox proportional hazards regression, with age as the underlying time variable, was used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the associations between quartiles of carbohydrate intakes and CVD incidence. Carbohydrate intakes were also modelled continuously in increments of 5% higher energy intake except for fibre intake, which was modelled in increments of 5 g/d higher intake. Potential non-linear associations were assessed by using likelihood ratio (LR) tests comparing the model with quartiles of carbohydrates intake treated as ordered categorical variables to a model with quartiles of carbohydrate intakes treated as continuous variables. Tests for linear trend were performed using the continuous (per increment) values for carbohydrate intakes. We tested the proportional hazards assumption on the basis of Schoenfeld residuals, and this was not violated for exposures and covariates of interest in our multivariable models for any outcome.

We estimated participant survival time from age at last completed 24-h dietary assessment until age at last follow-up, first diagnosis of CVD outcome, loss to follow-up or death, whichever occurred first. The minimally adjusted model was stratified by age at recruitment (<45, 45–49, 50–54, 55–59, 60–64, ≥65 years) and sex, and adjusted for recruitment region (London, North West England, North-Eastern England, Yorkshire & the Humber, West Midlands, East Midlands, South-East England, South-West England, Wales & Scotland). Multivariable models were further adjusted for ethnicity (white, mixed, Asian or Asian British, black or black British, other, unknown), Townsend deprivation index (quintiles from least to most deprived, unknown), education (college/university degree or vocational qualification, national examination at 17–18 years of age, national examination at 16 years of age, unknown), alcohol intake (0.1-0.9 g/d, 1–4.9 g/d, 5-14.9 g/d, ≥15 g/d, or none for women, and 0.1–0.9 g/d, 1–4.9 g/d, 5–29.9 g/d, ≥30 g/d, or none for men), smoking status (never, former, light smokers [<15 cigarettes/d], medium to heavy smokers [≥15 cigarettes/d], smoker of unknown number of cigarettes, unknown), physical activity (low, medium or high according to excess metabolic equivalent task [MET] hours/week, unknown), menopausal status at recruitment among women only (pre-menopausal, post-menopausal, unknown), body mass index (BMI; <20, 20–22.49,22.5–24.9, 25.0–27.49, 27.5–29.9, 30–32.49, 32.5–34.9, ≥35 kg/m2, unknown), saturated fatty acid (SFA) intake (quintiles of % of energy intake), and average daily energy intake (sex-specific quintiles of kJ/d). Multivariable models were also adjusted for fruit and vegetable intake (quintiles of g/d) as a marker of healthy diet, excepting for models with total sugars, non-free sugars, and fibre because whole fruit and vegetables are a major source of these exposures and therefore introduce collinearity.

We also examined the role of other key cardiometabolic risk factors in supplemental analyses, including waist circumference, systolic blood pressure, serum lipids measured by clinical chemistry (LDL cholesterol [LDL] cholesterol, high-density lipoprotein [HDL] cholesterol, triglycerides, apolipoprotein B [ApoB]), and glycated haemoglobin (HbA1c); however, because these were potential physiological mediators, they were not included in the final models. While we examined other dietary factors (i.e. polyunsaturated fatty acids, monounsaturated fatty acids and trans fatty acids) and women-specific variables (i.e. menopausal hormonal therapy, oral contraceptive pill use, and parity) as potential covariates, these did not have any effects on the model and were therefore not included in the final models. Dietary covariates (i.e. total energy, fruit and vegetable, and SFA intakes) were calculated from responses to ≥ two 24-h dietary assessments (2009–2012), and all other covariates were defined from questionnaire and interview data and physical measurements collected at the UK Biobank assessment centre visit at recruitment (2006–2010). See Additional file 1 Supplemental methods for further details on covariate definitions.

Analyses of dietary substitution

In modelled isoenergetic substitution analyses, we estimated the risks of CVD outcomes when 5% of energy from refined grain starch was replaced with wholegrain starch, or 5% of energy from free sugars was replaced with non-free sugars. Models included energy from all other carbohydrates (i.e. energy from total carbohydrates minus energy from free sugars or refined grain starch), energy from protein, energy from fats, and total energy. Therefore, regression coefficients can be interpreted as the estimated effect of replacing refined grain starch or free sugars with wholegrain starch or non-free sugars, respectively [38].

Observational analyses of triglycerides in lipoprotein subclasses

For carbohydrate types or sources that were significantly associated with CVD risks in our main analyses, and were also significantly associated with triglycerides measured by clinical chemistry in our prior analyses of UK Biobank [9], we assessed their associations with concentrations of plasma total triglycerides and triglycerides in lipoprotein subclasses in a subsample of participants with baseline NMR spectroscopy measurements (n=26,095; see Fig. 1). We calculated the geometric means (with 95% CI) of triglyceride measurements. Multivariable linear regression models adjusted for the same covariates as our main Cox regression models were used to examine the associations between the carbohydrate of interest and each log-transformed triglyceride measurement. We exponentiated the regression coefficients, subtracted one from this number, and multiplied by 100 to obtain the estimated percentage difference in triglyceride concentrations per each higher increment of carbohydrate intake. Further details on metabolite analyses can be found in Additional file 1 Supplemental methods.

Sensitivity and subgroup analyses

The robustness of our prospective findings was examined in sensitivity analyses by restricting to participants with (i) ≥ three 24-h dietary assessments (n=67,218), and (ii) ≥ 2 years of follow-up (n=109,682). We also conducted a sensitivity analysis using absolute intakes of refined grain foods and wholegrain foods in grams per day as exposures. Heterogeneity in associations across subgroups of sex, BMI (~median, <26, ≥26 kg/m2), and smoking status (never smoker, ever smoker) was assessed by including an interaction term between the subgroup and exposure of interest in the Cox model and testing for statistical significance using a LR test.

LR χ2 statistics were obtained by comparing the Cox regression models with and without the exposure of interest (i.e. carbohydrate intakes) as a measure of the extent to which each exposure predicted CVD risks in different models [39]. The percentage change in the LR χ2 statistic after adjustment for covariates was calculated using the minimally adjusted model as the reference, with large reductions suggesting that part of any remaining associations may be due to residual confounding [39]. All tests of significance were two-sided, and the Benjamini-Hochberg method was used to control the false discovery rate (FDR) with the alpha set to 0.05 to determine P-values that survived multiple testing [40]. All analyses were conducted using Stata version 17.0 (Stata Corp, TX, United States), and figures were created using R 4.1.2 (R Core Team, Vienna, Austria).