GWAS results for coronary atherosclerosis

We identified a total of 2 302 variants associated (GWS, p < 5 × 10−8) with coronary atherosclerosis (detailed description of the definition of the endpoint is in Supplementary Note 1). These variants were located in 38 distinct genetic loci (a minimum of 0.5 Mb distance from each other; Fig. 1 and Supplementary Table 1). Out of the 38 GWS loci, four (within or near genes MFGE8, TMEM200A, PRG3, and FHL1) have not been previously reported to associate with any CVD-related endpoints or risk factor for CVD in GWAS Catalog10 [https://www.ebi.ac.uk/gwas/]. Lead variants in these loci and their characteristics are listed in Table 1 and locus zoom plots for each of the loci are in Supplementary Fig. 1.

Fig. 1: GWAS results for coronary atherosclerosis in FinnGen. Total number of independent genome-wide significant associations (GWS; p < 5 × 10-8) is 38, the lead variant in each marked with diamonds. Four previously unreported associations for CVD-related phenotypes are highlighted with ±750 Mb around the lead variant in the region as red and the lead variant marked with red diamond. Full size image

Table 1 Lead variants in previously unreported loci for coronary atherosclerosis. Full size table

Among these four previously unreported loci for coronary atherosclerosis, the locus near MFGE8 had the strongest association (p-value = 2.63 × 10−16 for top variant rs534125149). The lead variant is an inframe insertion located in the sixth exon in the MFGE8 gene (Supplementary Fig. 2) and it is highly enriched in the Finnish population compared to NFSEEs (Non-Finnish, Estonian or Swedish Europeans). Interestingly, MFGE8 is mainly expressed in coronary and tibial arteries according to data from GTEx v8 (Supplementary Fig. 3), and furthermore the expression of MFGE8 is highest in aorta. In addition, previously identified common variants in MFGE8 locus that have been associated with decreased expression of MFGE8 in tibial artery and aorta have also been associated with decreased risk of CHD11.

In addition to MFGE8, we identified three additional previously unreported loci to be associated with coronary atherosclerosis, TMEM200A, PRG3 and FHL1 being the nearest genes of the lead variants. TMEM200A and PRG3 loci had one non-coding low-frequency variant reaching the genome-wide significance threshold, and FHL1 had 11. All variants in the credible sets of all these associations were either intergenic or intronic variants and had no reported significant GWAS associations with any trait in the GWAS Catalog or significant eQTL associations in GTEx. The one variant (rs118042209) in the credible set of TMEM200A locus was associated with multiple disease endpoints representing major coronary heart disease (CHD) in FinnGen, including coronary atherosclerosis, ischemic heart disease and angina pectoris, whereas the lead variant in the PRG3 locus was associated with cardiomyopathy. All variants in the credible set of FHL1 were associated with multiple disease endpoints representing major CHD in FinnGen, including angina pectoris and ischemic heart disease. TMEM200A have been reported to be associated with ten traits (including height and trauma exposure) and PRG3 with two traits (eosinophil count and eosinophil percentage of white cells) in the GWAS Catalog. FHL1 gene had no reported associations in GWAS Catalog.

Replication

Association between rs534125149 in MFGE8 locus with CHD was replicated in Biobank Japan12,13 (BBJ) and the Estonian Biobank (EstBB)14 (35,644 cases and 328 461 controls total: OR = 0.752 [0.67–0.84], p = 4.37 × 10−7). Association results for rs534125149 with CHD and MI across different cohorts are in Fig. 2. Post hoc power calculations for each cohort were performed (probability that the test will reject the null hypothesis H0 at GWS threshold) and the results as the function of effect size are in Supplementary Fig. 4. From these calculations we can see that in FinnGen the power to detect the variant as GWS is remarkably greater than in EstBB or BBJ, even with similar effect sizes and sample sizes. Therefore, this boost in power in FinnGen seems to be mainly due to a different allele frequencies, since this variant is highly enriched to Finland.

Fig. 2: Results for rs534125149 against coronary heart disease and myocardial infarction across cohorts where available and meta-analysis results. Logistic regression has been applied, adjusted for age and sex. Meta-analysis was performed using inverse-variance weighted fixed-effects meta-analysis method. Black dots represents odds ratios, and lines 95% confidence interval from the the single cohorts and red diamonds represent the results from meta-analysis ends of the diamonds representing the ends of the 95% confidence interval. Source data for the figure is in Supplementary Data 1. Full size image

In addition to MFGE8, meta-analysis across FinnGen, UKBB, EstBB, and BBJ was performed for the lead variants in the three other previously unreported loci for CHD (TMEM200A, PRG3, and FHL1), where available. Lead variant in PRG3 locus is highly enriched to Finland and absent in all other cohorts, and thus replication efforts for that variant were not possible. The two other loci that were meta-analyzed (TMEM200A and FHL1) did not replicate (p-value in the combined meta-analysis of the replication cohorts (meta-analysis without FinnGen) is smaller than 0.05/4 = 0.0125 and all effect size estimates are in same direction). Association results for rs534125149 with CHD and MI across different cohorts for TMEM200A and FHL1 variants are in Fig. 3. Post hoc power calculations for each cohort were performed and the results as the function of effect size are in Supplementary Fig. 5. From those results we can see that the lack of replication in UKBB, EstBB and BBJ does not appear to be due to lack of power. Therefore, we identified and replicated one novel locus for CHD (MFGE8).

Fig. 3: Results for rs118042209 in TMEM200A and rs5974585 in FHL1 against coronary heart disease and myocardial infarction across different cohorts across cohorts where available. Logistic regression has been applied, adjusted for age and sex. Meta-analysis was performed using inverse-variance weighted fixed-effects meta-analysis method. Black dots represent odds ratios, and lines 95% confidence interval from the single cohorts and red diamonds represent the results from meta-analysis ends of the diamonds representing the ends of the 95% confidence interval. Source data for the figure is in Supplementary Data 1. Full size image

Phenome-wide association results for rs534125149

We observed a highly protective association for the Finnish enriched inframe insertion rs534125149 in the MFGE8 gene and multiple disease endpoints, all representing major CHD, including coronary atherosclerosis (OR = 0.75 [0.71–0.81], p = 2.63 × 10−16) and myocardial infarction (MI) (OR = 0.74 [0.68–0.81], p = 1.95 × 10−11). In total, this variant was associated (PWS) with 14 disease endpoints, all representing major CHD (Fig. 4). Majority of them are highly overlapping, and thus similar associations to all of them is expected. Thus, we pruned the 14 PWS disease endpoints down to six disease endpoints (coronary atherosclerosis, coronary revascularization, ischemic heart diseases, major coronary heart disease event, myocardial infarction, and statin medication) that have fundamental characteristics for further analyses. For the inframe insertion rs534125149 in MFGE8, we did not detect other phenome-wide significant associations among the 2 861 endpoints in our data.

Fig. 4: Phenome-wide association study (PheWAS) results for rs534125149. Total number of tested endpoints is 2861 (A complete list of endpoints analyzed and their definitions is available at https://www.finngen.fi/en/researchers/clinical-endpoints). The dashed line represents the phenome-wide significance threshold, multiple testing corrected by the number of endpoints = 0.05/2861 = 1.75 × 10−5. All endpoints reaching that threshold are labeled in the figure. Full size image

Splice acceptor variant rs201988637 in MFGE8

In addition to inframe insertion rs53412514, we identified a splice acceptor variant (rs201988637) in MFGE8 to be associated with coronary atherosclerosis (OR = 0.72 [0.63–0.83], p = 7.94 × 10−06) and multiple disease endpoints representing major CHD. The splice acceptor variant had very similar PheWAS profile as the inframe insertion (Supplementary Fig. 6) and furthermore the two variants had very similar protective effect sizes for the endpoints (Fig. 5 and Supplementary Table 2). Similar to rs534125149, this variant is also highly enriched in Finland (37-fold compared to NFE), allele frequency in Finland being 0.6%. Moreover, both the splice acceptor and the inframe insertion variants were enriched to Eastern Finland (Supplementary Fig. 7).

Fig. 5: Effect size comparison. Comparison of the effects (OR) of rs534125149 and rs201988637 for 14 endpoints with p-value < 1.75 × 10-5 (PWS) for rs534125149 in FinnGen R6. 95% confidence intervals represented as gray lines. Full size image

These two variants (rs534125149 and rs201988637) are in low linkage disequilibrium (LD, r2 = 0.00015) and did not have any effect on the other variant’s associations with coronary atherosclerosis or MI (Table 2 and Supplementary Fig. 8). This indicates that they both are independently associated with these endpoints.

Table 2 Results of the conditional analysis on MI and coronary atherosclerosis. Full size table

Survival analysis

In addition to protection against coronary atherosclerosis and myocardial infarction, both the infame insertion rs534125149 and splice acceptor variant rs201988637 showed also significant association in survival analysis when analyzing survival time from birth to first diagnose of coronary atherosclerosis (HR = 0.78 [0.74–0.93]), p = 1.67 × 10−17 and HR = 0.77 [0.69–0.88], p = 5.08 × 10−05, respectively) and myocardial infarction (HR = 0.86 [0.80–0.93], p = 2.63 × 10−10 and HR = 0.72 [0.61–0.85], p = 8.16 × 10−05). In addition, when combining the heterozygous and homozygous carriers of both rs534125149 and rs201988637 together, carriers get the first diagnose significantly later than non-carriers (HR = 0.81 [0.77–0.85], p = 6.4 × 10−16 for coronary atherosclerosis and HR = 0.78 [0.72–0.85], p = 1.16 × 10−11 for MI) (Fig. 6).

Fig. 6: Cumulative incidence plots for first event of myocardial infarction in FinnGenR6. Red line represents carriers (homo- or heterozygous) for either rs534125149 or rs201988637 (n = 17,838), and blue line represent non-carriers (n = 242,567). Hazard ratio and p-value are from cox-proportional hazards model. Dashed lines represent 95% confidence intervals. Full size image

In addition, as a sensitivity analysis we performed the similar Cox model for first event of MI by adding different risk factors for CHD as covariates in the model to see if any of these risk factors (BMI, Type 2 Diabetes, smoking, statin use or sex) have impact on the observed association. Risk factors were added to the model both individually and together. As a result, we saw only a small change in the effect size when adjusting for these risk factors (Supplementary Table 3). The change was more noticeable on p-values where the missing data in the added covariates lead to decreased statistical power.

Associations with risk factors for CVD

We then tested for possible associations between the MFGE8 variants and risk factors for CVD. The splice acceptor variant rs201988637 was associated with pulse pressure in analyses across four cohorts with pulse pressure measurement and variant rs201988637 available, with the risk lowering allele associated with lower pulse pressure (p = 1.7 × 10−04, β = −0.13 [−0.2 to −0.06]) (Fig. 7). Association with pulse pressure was also tested for inframe insertion rs534125149 and previously reported common variant in the locus, rs8042271 across all where the variants were available. We saw consistent effect sizes across the cohorts, and significant (p < 0.05) meta-analysis p-values for both variants (Supplementary Fig. 9).

Fig. 7: Results for pulse pressure association across all cohorts with splice acceptor variant rs201988637 available (FINRISK, GeneRISK, YFS, EstBB, and UKBB). Size of the boxes represent the sample size of the cohorts, and the lines the 95% confidence interval. Associations were tested using linear regression, adjusting for age and sex Pulse pressure phenotypes were inverse-rank normalized prior analysis. Source data for the figure is in Supplementary Data 1. Full size image

In addition, in recent studies for blood pressure measurements (systolic and diastolic blood pressure and pulse pressure), genome-wide significant association have been reported in the region15,16. To assess whether these reflects the same signal, we performed colocalization analysis in the region ±200 kB around rs53412514 using Coloc package in R17 with coronary atherosclerosis results from FinnGen and pulse pressure GWAS results from Evangelou et al.16 The probability for shared signal (PP4) was 97.1%, further validating MFGE8 locus is associated with pulse pressure (Supplementary Fig. 10).

In addition to pulse pressure associations in the region, rs534125149 was significantly associated with height, but further analysis pointed this signal to be reflecting the association of a known association of ACAN with height, located near MFGE8 (Supplementary Fig. 11). No associations with other risk factors were observed.

In the Corogene cohort (n = 4896), rs534125149 was significantly (p < 0.05) associated with lower risk for acute coronary syndrome and stable coronary heart disease (RR = 0.87 and 0.83, respectively) compared to healthy controls, but not with myocardial infarction without coronary artery occlusion (Supplementary Fig. 12). These results are in line with our findings regarding the specificity of the association of variants in MFGE8 on atherosclerotic cardiovascular disorders. The p-value for the difference of the AFs of rs534125149 among patients with acute coronary syndrome or stable coronary heart disease and among MINOCA was, however, not significant (p = 0.78), which may due to lack of power. In addition, the cohort is very heterogeneous.

Previously reported common variants near MFGE8

Previously, common intergenic variant (rs8042271) near MFGE8 has been reported to associate with coronary heart disease (CHD) risk3,18. We replicate this association (OR = 0.90, p = 3.69 × 10−10 for coronary atherosclerosis) in FinnGen. LD between the common variant rs8042271 and the inframe insertion rs534125149 is 0.154. The LD characteristics for all three variants in MFGE8 (rs534125149, rs201988637 and rs8042271) in FinnGen are in Supplementary Table 4. Common variant rs8042271 was in the 95% credible set for MI with the causal probability of 0.003 but was not included in the 95% credible sets for coronary atherosclerosis (Supplementary Tables 5 and 6). The conditional analyses of all three MFGE8 variants showed that the association of the previously reported common variant rs8042271 can be explained by the inframe insertion variant rs534125149, but not vice versa, and that the association of the splice acceptor variant rs201988637 is independent of both rs534125149 and rs8042271. (Supplementary Table 7). This was the case also with previously reported common variant rs734780, showing very similar LD with rs534125149 (0.112) as rs8042271 (0.154).

Fine-mapping of the MFGE8 locus

In our fine-mapping analyses, MI had most probably one credible set (set of causal variants) of 32 variants with the highest posterior probability (posterior probability = 0.62), and coronary atherosclerosis had two credible sets of 6 and 45 variants, respectively, with the highest posterior probability (posterior probability = 0.74). For both MI and coronary atherosclerosis, rs534125149 had the highest probability of being causal (probability of being causal = 0.250 and 0.318, respectively) and was included in the first credible set (Supplementary Tables 5 and 6; and Supplementary Fig. 13). Splice acceptor variant rs201988637 was not included in the credible sets for either MI or coronary atherosclerosis, whereas previously reported common variant rs8042271 was included in the credible set for MI with the probability of being causal = 0.003 (Supplementary Table 6).

Protein modeling

We predicted the impact of the insertion variant rs534125149 on the protein structure of MFGE8 using AlphaFold19. The predicted conformational changes were localized to a loop region within the C2 domain, ~20 Å away from the key amino acids involved in membrane binding (Supplementary Fig. 14)20,21. This loop contains Asn238, which is known to be glycosylated22. It is possible that the insertion of an additional asparagine may lead to impaired glycosylation, which is important for protein folding, among other cellular processes23. The role of this region in the function of MFGE8 hasn’t been previously described and it is therefore unclear how this variant would otherwise lead to an impact on MFGE8 function. Thus, further experimental work is necessary to understand the mechanism by which this variant leads to protection against coronary atherosclerosis.