Multi-level analysis of the gut–brain axis shows autism spectrum disorder-associated molecular and microbial profiles

Search strategy and inclusion criteria

We performed a systematic search for published and/or publicly deposited or not yet published and/or publicly available human microbiome, metabolome, immunome, transcriptome and autism/ASD datasets in several National Center for Biotechnology Information (NCBI) databases (PubMed, Sequence Read Archive (SRA) and BioProject), UCSD’s MassIVE resource, the PsychENCODE consortium and the American Gut Project and from individual research groups worldwide. About half of the 70+ studies that we identified were already deposited on public data repositories or were made directly available to us by the research groups.

Most studies consisted of heterogenous—no genotype or phenotype stratification—ASD and neurotypical age-matched and sex-matched cohorts and had one or two datasets (microbiome (16S, SMS), metabolome (urine/serum/fecal), immunome (cytokines), transcriptome (RNA-seq), dietary survey and behavioral survey) associated with them, with only a few studies having three or more omic datasets associated with them (Table 1). We adopted a multi-cohort and multi-omics meta-analysis framework that allowed us to combine independent and dependent omic datasets in one overall analysis19. In total, we analyzed 528 ASD–control pairs that had either age and sex information or sibling-matching information. To reduce the batch effects and noise associated with primer choice in the 16S datasets, a major confounder in microbiome analyses, we restricted the 16S datasets to include only those targeting the variable region V4 of the bacterial ribosomal RNA, a region exhibiting higher heterogeneity and lower evolution rates than other variable regions64. Previous studies showed how primers targeting adjacent regions in the 16S can yield similar composition estimates up to the genus level65. Our analysis included 16S datasets obtained targeting the V4 region exclusively, the V3–V4 region or the V4–V5 region.

The final metabolomic meta-analysis that we present here consists of the combined analysis of only four independently pre-processed, normalized and analyzed metabolomic datasets. Despite several more ASD-related datasets being available, the disparity in mass spectrometric technologies used to generate them, which results in the detection of different subsets of metabolites, precluded their side-by-side comparison (Table 1). For example, targeted mass spectrometry enables the precise determination of concentrations for a finite number of metabolites, whereas untargeted mass spectrometry detects up to two or three orders of magnitude more metabolites but is compositional in nature and, thus, does not yield absolute abundances. Furthermore, batch effects due to sample processing, such as differences in reagents, sample storage and mass spectrometry instruments, can introduce unwanted variation in both the abundances and the detected molecular features66. One additional obstacle that we encountered was the proprietary nature of many of the metabolomic datasets that made it impossible to access the raw data and run standardized workflows.

Of the 40 transcriptomic datasets that were available in recount3 (ref. 67), the vast majority were obtained from studies with model animals, and only four of them had been obtained from postmortem processing of brain samples from autistic and neurotypical individuals. These four datasets collected different brain tissue types, including from the amygdala, the prefrontal cortex, the anterior cingulate and the dorsolateral prefrontal cortex.

Martin-Brevet et al. cohort

Data from Martin-Brevet et al.38 were acquired from two different cohorts: one from the Simons Variation in Individuals Project, consisting mostly of families from the United States. From this cohort, there are 24 individuals with the 16p11.2 deletion and 24 corresponding siblings from the same family who do not carry the deletion; and a second cohort consisting of individuals from the European 16p11.2 consortium (24 deletion carriers and 24 familial controls). More exact information about this cohort was previously published38. Deletion carriers were ascertained regardless of age or clinical diagnosis. DNA was extracted from stool samples, and 16S sequencing was performed using primers to the V4 region.

Data processing

We constructed matched reference databases for 16S and SMS data analyses. The Web of Life 2 (WoL2) reference genome database contains 15,953 bacterial and archaeal genomes sampled from the NCBI to maximize representation of biodiversity. It is a major upgrade from WoL (10,575 genomes). A reference phylogeny was reconstructed based on 387 universal marker genes using uDance, a novel phylogenomic inference workflow employing a divide-and-conquer method. Taxonomic assignments of the genomes were based on Genome Taxonomy Database (GTDB) r207 and curated according to the phylogeny using tax2tree. The Greengenes2 reference 16S rRNA database was constructed based on the WoL2 whole-genome phylogeny and updated with full-length 16S rRNA sequences from the Living Tree Project and 16S from high-quality bacterial operons, using uDance to revise the topology. Into this backbone, we inserted all 16S V4 amplicon sequencing variants from public and private samples Qiita using DEPP. A taxonomy based on GTDB r207, expanded with lineages from the Living Tree Project not present in GTDB, was decorated onto the phylogeny using tax2tree. Full details behind the construction of WoL and Greengenes2 can be found in Usyk et al.32.

The 16S amplicon and shotgun metagenomics samples were downloaded from the SRA. The 16S amplicon samples were processed using Deblur and subsequently mapped to Greenegenes2 using Vsearch with qiime2 (ref. 68). Shotgun metagenomics samples were mapped to bacterial whole genomes captured in the WoL2 using Bowtie2 followed by Woltka69. Viral abundances were extracted from shotgun metagenomics samples using GPD and BWA. RNA expression data were obtained directly from recount3 (ref. 67); the four metabolomics datasets were provided by the authors.

To enable age and sex matching, a bipartite matching between individuals with ASD and neurotypical individuals was performed using age and sex covariates. This approach has been shown to be optimal for case–control matching70. Individuals who could not be matched were excluded from the meta-analysis. Among the 16S and SMS datasets, there were multiple longitudinal datasets. To integrate these datasets into the cross-sectional analysis, we picked only the first timepoint for each individual.

Differential ranking analysis

One of the most common approaches to evaluating microbiome and other omic studies consists of determining differences in the abundances of microbial taxa, human metabolites or other omic features between cases and controls. Such differential abundance analysis is typically performed by computing the log fold changes between the case and control groups21. However, confounders, such as sex-related, age-related and geography-related batch effects, compositionality, high dimensionality, overdispersion and sparsity, prevented a reliable estimation of differential abundances and, thus, compromised the side-by-side comparison of these differential abundances across studies in the manner of a traditional meta-analysis. Here, we set out to overcome these inherent limitations of traditional meta-analyses by developing a generalizable approach for controlling for select confounders that would help reveal a comprehensive picture of ASD-specific omic signals.

To minimize confounder effects, we developed a Bayesian differential ranking algorithm that uses bipartite matching to optimize the age-based and sex-based pairing of ASD and control individuals within each dataset. This approach helped control for potential age and sex confounders while also minimizing batch effects, such as sample collection method, sample processing protocol, different primers and geographical provenance71. Our approach could do this by leveraging recent insights into the multiplicative nature of protocol biases18. Fold change calculations can be designed to be robust to bias induced by protocols, provided that the fold changes are being computed only on samples processed under the same protocol. Similar observations have been made about biases induced by differences in polymerase chain reaction (PCR) primers, with abundance-based beta diversity metrics being robust to primer biases, as long as comparisons are confined to datasets generated with the same protocol71. We extended this strategy to handle age and sex matching, taking advantage of the fact that most of the cohorts that we analyzed selected their participants to be age and sex matched. Most of the case–control pairings in the 16S and SMS datasets were within 1 year apart, providing an opportunity to remove age-related confounders in downstream analyses. Our Bayesian models were fitted via Markov chain Monte Carlo (MCMC) using Stan72. Conceptually, this allowed us to compute log fold change differences of microbes between age-matched and sex-matched individuals, but, because we did not have absolute abundance information, we could estimate this log fold change only up to a constant73 (Supplementary Information).

To determine how sensitive our proposed differential abundance strategy was to sequencing depth, we conducted a rarefaction benchmark in addition to a simulation benchmark. When comparing unrarefied data (with mean sequencing depth greater than 200,000 reads per sample) and sequencing count data rarefied down to 9,000 reads per sample from the 16S cross-sectional cohort data, we still see strong agreement between the unrarefied log fold changes and the rarefied log fold changes (Extended Data Fig. 2e). This supported the theoretical evidence that our differential abundance method was scale equivariant and that changes in sequencing depth would not markedly affect the mean log fold change estimates.

This was further validated in our simulation benchmarks, where we showed that our model could capture the ground truth log fold changes based on 16S differentials from the age-matched and sex-matched cohort (Extended Data Fig. 2f). We compared our proposed age-matched and sex-matched differential ranking method to ANCOM-BC and our differential abundance method without age and sex matching (which we will refer to as group-averaged differential ranking) (Extended Data Fig. 2i–k) to showcase the differences between these methods. This benchmark was performed using data-driven simulations derived from the 16S cohort analysis. For the side-by-side comparison, we ran three different configurations of ANCOM-BC: (1) case–control differences only [‘formula=disease status’]; (2) case–control differences adjusted by age and sex confounders [‘formula=disease status + age + sex’]; and (3) case–control differences by age and sex matching [‘formula=disease status + (disease status–age sex matching IDs)’]. The first configuration provided a direct comparator to our ‘standard group-averaged differential ranking’, and the third configuration provided the most direct comparator to our ‘sex- and age-matched differential ranking’. None of the three ANCOM-BC models could recover the ground truth log fold changes in our simulations (r = 0.38, 0.37 and 0.39 for implementations 1, 2 and 3, respectively), whereas both the ‘standard group-averaged differential ranking’ and the ‘sex- and age-matched differential ranking’ models were able to recover the ground truth (r = 0.64 and 0.79, respectively). Ultimately, this illustrates how our method could account for age and sex matching and perform as expected if the assumptions were satisfied.

Similar to other simulation-based benchmarks, this is not a rigorous benchmark showcasing the improved performance of our method; rather, it is showcasing how all three methods have different assumptions. To determine in which biological scenarios age and sex matching could be more informative than household matching, we generated another simulation incorporating both a household confounder and an age confounder. The subject ages and age differences were sampled from the age distribution observed in David et al.37. Similarly to our previous simulations, we simulated the ground truth log fold changes using the model from the 16S cohort analysis. Here, we observed that, with a sufficiently large age confounder, the household matching estimated log fold changes with a noticeably large bias (mean squared error = 589.3) (Extended Data Fig. 6e). In contrast, although age and sex matching did not precisely estimate the ground truth log fold changes, we observed a 10×-fold reduction in bias (mean squared error = 57.8) (Extended Data Fig. 6f). This simulation also showcased how cohort randomization may play a role in mitigating the bias introduced by age confounding, at the expense of increased variance of the estimator.

To determine how sensitive our proposed differential abundance strategy is to batch effects, we computed the log fold changes between two samples, ‘sample4’ and ‘sample6’, from the MBQC study31 for each processing laboratory. These samples were replicated and processed by multiple laboratories, providing an experimental setup for validating batch removal methods. Bray–Curtis PCoA shows a weak separation in sample name but a strong separation due to batch effects induced by differences in processing protocols. However, when we compare the log fold changes for each processing laboratory, we see strong agreement (r > 0.5, P < < 0.05) (Extended Data Fig. 6m), which supports the claim of McLaren et al.18 that within-study fold change calculations are insensitive to batch effect, as long as the processing protocol is consistent within the study.

To determine if there was a significant difference between the age-matched and sex-matched pairs, we constructed an effect size metric using our model’s uncertainty estimation (see Supplementary Methods for more details). A global model for each data type—16S, SMS and RNA-seq—was used to determine if there was a significant difference between age-matched and sex-matched case–control pairs across each datatype. When we evaluated our Bayesian model fit on the 16S, SMS and RNA-seq datasets, our model fits achieved Rhat values below 1.1 and ESS values above 300, indicating that the draws from the posterior distribution are reliable.

The age-matched and sex-matched classifiers were constructed to build a classifier that generalizes across cohorts, identifying microbes that consistently differentiate between age-matched and sex-matched case–control pairs. To build age-matched and sex-matched classifiers, within each age-matched and sex-matched 16S cohort, we fitted our Bayesian model and assigned taxa into three groups: ASD-associated, Control-associated and Neutral. A taxon is assigned to the ASD-associated group if 70% of their posterior samples are greater than 0; a taxon is assigned to the Control-associated group if 70% of their posterior samples is less than 0. The remaining taxa are assigned to the Neutral group. After assigning taxa to each group, for each sample, a single log ratio, or balance15, is computed by taking the geometric mean of all of the taxon abundances within each group. To create a single log ratio that generalizes across cohorts, we assigned taxa to the ASD-associated group if it appears to be ASD associated in at least two studies. The same procedure is applied to the Control-associated group. The differences of these log ratios across the case–control pairs are shown for the age-matched and sex-matched cohorts in Fig. 2c. Although we did not apply this approach to the shotgun metagenomics datasets, we showed that the log ratios constructed from the 16S datasets also separated more than 70% of the ASD–control samples, serving as an additional cross-validation.

To determine if a microbe in increased or decreased between two groups of samples, a reference frame that identifies which group of microbes is stable is required. To do this using our Bayesian models, the quantiles estimated from the posterior distribution of the log fold change is used. A microbe is said to be significantly increased if the log fold change is greater than 0 in 95% of the posterior samples (5% log fold change quantile > 0). Finally, a microbe is said to be stable if the 90% quantile of the posterior distribution overlaps with 0 and the standard deviation of the posterior distribution is less than 3. Similarly, a microbe is said to be significantly decreased if the log fold change is less than 0 in 95% of the posterior samples (95% log fold change quantile < 0). The reference frame in the FMT analysis used microbes that were identified to be neutral or control associated by the age-matched and sex-matched classifier, with the assumption that the average abundance of these taxa is stable throughout the entire 2-year follow-up study. The FMT analysis used the same matching strategy, but, instead of matching on age and sex, the matchings were performed on the subjects to compare different timepoints. When identifying microbes that are the core microbiome, we focused on taxa that overlapped with 0 and had a posterior standard deviation of less than 3. Similarly, when computing the overlap between the cross-sectional cohort and the microbes depleted after the FMT, we focused on taxa with low uncertainty with a posterior standard deviation of less than 3.

The heat map shown in Fig. 2b displays the log fold changes for each case–control pair. To do this, a robust center log ratio (CLR) transform was performed, and all zeros were imputed to the mean abundance for visualization purposes. The case–control log fold changes were then computed for each case–control pair.

Bayesian differential ranking

Conceptually, the goal of a differential analysis is to make a statement about change in abundance for a given feature i between conditions A and B by evaluating the following null hypothesis:

$$\frac{{A}_{i}}{{B}_{i}}=1$$

However, most omic datasets do not provide a direct observation of the absolute quantities of A i and B i , or the total microbial loads ${N}_{{A}_{i}}$ and ${N}_{{B}_{i}}$ but, rather, only an observation of their proportions ${p}_{{A}_{i}}$ and ${p}_{{B}_{i}}$, respectively, within each dataset, which are determined by a bias term, $\frac{{N}_{A}}{{N}_{B}}$. This bias term, given by

$$\frac{{A}_{i}}{{B}_{i}}=\frac{{p}_{{A}_{i}}{N}_{A}}{{p}_{{B}_{i}}{N}_{B}}=\frac{{p}_{{A}_{i}}}{{p}_{{B}_{i}}}\frac{{N}_{A}}{{N}_{B}}$$

results in high FDRs that cannot be adjusted for in models analyzing compositional omics datasets because the overall contribution of N A and N B to change cannot be unequivocally quantified74. To avoid the total biomass bias without having to resort to performing traditional FDR corrections, we adopted a ranking approach that allowed us to sort omic features by their log fold change values independently of how large their change was in absolute terms73. Because the biomass bias impacts every species within a dataset equally, the ranking approach ignores this bias, making the approach scale invariant (Equation 1).

$${{{\rm{rank}}}}\left(\frac{A}{B}\right)={{{\rm{rank}}}}\left(\frac{{p}_{A}{N}_{A}}{{p}_{B}{N}_{B}}\right)={{{\rm{rank}}}}\left(\frac{{p}_{A}}{{p}_{B}}\right)$$

The overall model that we designed consisted of a customized differential abundance tool that leveraged the experimental design of each study included in the analysis to determine study-specific feature perturbation profiles that could then be combined with the normalized perturbation profiles of other studies to perform a global differential perturbation analysis. The overall model had the following structure:

$$\begin{array}{rcl}{y}_{i,\,j}& \sim &{{{\rm{NegativeBinomial}}}}({\lambda }_{i,\,j},{\alpha }_{ij})\\ \log {\lambda }_{i,\,j}&=&\log {N}_{i}+{C}_{k(i),\,j}+{D}_{j}{{{\rm{I}}}}[i=ASD]\end{array}$$

where y i, j denotes the microbial counts in sample i of species j across d species; λ i, j ,α ij represents the expected counts for species j; sample i, j represents a microbe-specific overdispersion term; N i represents sequencing depth (self-normalization and preemptive of rarefaction); C k(i),j represents the log proportion of species j in the k(i) control subject (age matched and sex matched); and D j I[i = ASD] represents the log fold change difference between control and ASD subjects with a corrective function that equals 1 when i corresponds to the paired ASD subject and 0 when i corresponds to the control subject. Incorporating N i into the model renders the model self-normalizing and not dependent on rarefaction, and C k(i),j incorporates the age-matching and sex-matching component for a given pair k. The priors for these variables are given below.

$$\begin{array}{rll}{\alpha }_{ij}&=&\frac{{a}_{0,k(i),\,j}}{{\lambda }_{ij}}+{a}_{1,k(i),\,j}+{\beta }_{p}\quad {a}_{0} \sim {{{\rm{LogNormal}}}}(0,1)\\ && {a}_{1} \sim {{{\rm{LogNormal}}}}(\log (10),0.1)\\ {\beta }_{p} &\sim& {{{\rm{Normal}}}}({\beta }_{\mu },{\beta }_{\sigma })\quad{\beta }_{\mu } \sim {{{\rm{Normal}}}}(0,3)\\ && {\beta }_{\sigma } \sim {{{\rm{LogNormal}}}}(\log (0.1),0.1)\\ {C}_{k(i),\,j} &\sim & {{{\rm{Normal}}}}({C}_{{\mu }_{j}},{C}_{{\sigma }_{j}})\quad{C}_{{\mu }_{j}} \sim {{{\rm{Normal}}}}\left(\frac{1}{d},3\right)\\ && {C}_{{\sigma }_{j}} \sim {{{\rm{Normal}}}}\left(\frac{1}{d},1\right)\\ {D}_{j} &\sim& {{{\rm{Normal}}}}(0,3)\end{array}$$

Here, the overdispersion parameters are estimated for each microbe, for each batch and for the ASD and control groups. This approach is adapted from DESeq2, allowing for the overdispersion to be modeled by both linear and quadratic terms with respect to the abundance. Furthermore, this parameterization does allow for a compositional interpretation owing to the following rationale. The Poisson distribution with an offset term is known to approximate the multi-nomial distribution. Furthermore, the negative binomial can be re-parameterized as a gamma–Poisson distribution, allowing for overdispersion modeling by breaking the mean–variance relationship inherent in the Poisson distribution.

The age-matched and sex-matched differential abundance has a similar methodology to paired tests, such as the paired t-test and the Wilcoxon test. To this end, we also used this differential abundance methodology to analyze the FMT dataset. Here, instead of matching pairs of subjects, we matched pairs of timepoints and computed the differential abundance across each pair of timepoints. To make these differentials comparable, a common set of taxa that was detected to be associated with controls was selected. Specifically, taxa that had a log fold change less than 0 in the cross-sectional cohort were assigned to this reference set. The estimated log fold changes were adjusted by centering around the mean log fold in the reference dataset as follows:

$${{{{\boldsymbol{D}}}}}^{* }={{{\boldsymbol{D}}}}-{\bar{{{{\boldsymbol{D}}}}}}_{R}$$

where ${\bar{{{{\boldsymbol{D}}}}}}_{R}$ denotes the mean of the log fold changes of the reference taxa, and D* represents the recentered log fold changes. By doing this, all timepoints will have the same reference and will be more directly comparable.

One of the advantages of the above model is that it will cancel out any multiplicative batch effect, such as PCR amplification bias, with no impact on j. This is because D is computed only within cohorts, and, as a result, cohort-specific batch effects are mitigated. Another advantage of the proposed model is that negative binomial models can be fitted independently for each microbe; as a result, the log fold change estimates for one microbe will not affect the estimates of other microbes. This can be a benefit, because these models will be agnostic to the choice of filtering criteria—filtering certain microbes will not affect the log fold change estimates of the remaining microbes. Furthermore, this differential abundance model can be applied to different types of omics data. Moreover, because we built the differential ranking model in a Bayesian environment, we were able to fit the model using an MCMC approach to estimate uncertainty by sampling the resulting posterior distributions.

For example, to make a statement about the value of an estimated posterior probability distribution p(D∣ y), we could compute an average value using the following approximation:

$${\mathbb{E}}[D]\approx \frac{1}{m}\mathop{\sum }\limits_{i=0}^{m}{\hat{D}}_{i}\quad {\hat{D}}_{i} \sim p(D| \,y)$$

Using this classic application of MCMC sampling in which N samples of i are drawn from the posterior distribution p(D∣y), we were able to approximate the true mean of the posterior differential abundance distributions and the corresponding effect sizes. With this, we can compute an effect size metric that determines if there is any global difference detected. This metric is analogous to PERMANOVA but one that computes this from log fold changes using the age-matched and sex-matched design. The effect size E is measured as follows:

$$E=\frac{| | {\mu }_{D}| {| }_{2}}{{r}_{D}},\quad {\mu }_{D}=\frac{1}{m}\mathop{\sum }\limits_{i=0}^{m}{\hat{D}}_{i}\quad {r}_{D}=\mathop{\max }\limits_{{\hat{D}}_{i} \sim p(D| y)}\parallel | {\hat{D}}_{i}-{\mu }_{D}\parallel {| }_{2}$$

where μ D is the mean of the posterior distribution, and r D represents the radius of a sphere that contains all of the samples from the posterior distribution. If the effect size is greater than 1, that means that 0 is not included in the posterior distribution, and the difference is significant. Bayesian P values are computed from the number of draws of ${\hat{D}}_{i}$ that were simulated from the posterior distribution p(D∣y). For instance, if 100 draws are sampled from the posterior distribution, and 0 is not within the sphere estimated from those 100 draws, then we say that the posterior distribution is significantly not overlapping with 0 with P < 0.01.

Other methods

We fitted gradient boosting classifiers on 10 16S datasets and on three SMS datasets using q2-sample-classifier68. We randomly split the samples into 80/20 training and test splits, performed a fivefold cross-validation on the training datasets to obtain optimal model parameters and computed predictions on the held-out test dataset. PERMANOVA with Bray–Curtis distances was used to determine if confounding variation due to household, age and sex was statistically significant in the sibling cohorts.

We used MMvec73 to perform the diet–microbe co-occurrence analysis. Here, microbes were used to predict dietary intake. This analysis enabled the estimation of conditional probabilities, namely the probability of observing a dietary compound given that the microbe was already observed. To estimate these conditional probabilities, MMvec performs a matrix factorization, identifying the factors that explain the most information in these interactions. We compared the MMvec microbial factors against the cross-sectional log fold changes. We then compared the MMvec dietary factors against t-statistics that measure the differences in dietary compounds between children with ASD and neurotypical children.

To identify candidate viral–microbe interactions, we ran MMvec on each of the SMS datasets. We then pulled out the top co-occurring viral taxa for each microbe that had a conditional log probability greater than 1, amounting to 78,580 microbe–viral interactions. Then, we filtered out the microbe–viral interactions that were not present in the Gut Phage Database (GPD)44, leaving 31,276 microbial–viral interactions. The microbe–viral interactions estimated by Dan et al.28 and Wang et al.33 were weakly generalizable (Q2 =0.0036 > 0 and Q2 =0.0114 > 0). However, the microbe–viral interactions estimated from Averina et al.34 were similar to random chance (Q2 = −0.005).

We used Songbird15 to perform the cytokine−microbe analysis via a multinomial regression that used the cytokines to predict microbial abundances. We reported biased microbial log fold changes with respect to cytokine concentration differences. Pearson correlation was used to determine the agreement between the 16S cross-sectional microbial differentials and the microbe−cytokine differentials. To directly link these microbial abundances to the cytokine concentrations, we computed log ratios, or balances, of microbes for each sample. For example, for IL-6, the numerator consisted of the top 50 microbes that are estimated to increase the most in abundance when IL-6 concentration increased, and the denominator consisted of the bottom 50 microbes that are estimated to be the most decreased when IL-6 concentration increases. Once these partitions are defined, the balances for each sample are computed by taking the log ratio of the average abundance of the numerator group and the denominator group15. Pearson correlation between these balances and the cytokine concentrations is then computed to measure the agreement between the microbial abundances and the cytokine concentrations.

To identify key microbial genes, we performed a comparative genomic analysis in which we binned the microbial genomes into those associated with ASD subjects and those associated with control subjects in the shotgun metagenomics data. We focused on microbes that are strongly associated with ASD, specifically those that are significantly greater than 10% of taxa that are estimated to be enriched in ASD. Using a binomial test, we were able to determine if a particular gene was more commonly observed in ASD-associated microbes than by random chance. Altogether, we identified 2,176 statistically significant microbial genes that differentiated ASD-associated microbial genomes from neurotypical-associated microbial genomes. Similarly, we identified 1,570 human transcripts that were differentially expressed between ASD and neurotypical subjects. Significant microbial genes and RNA transcripts were subsequently mapped to KEGG pathways. To directly compare the two contrasting omics levels and gauge metabolic similarity, we retrieved all the molecules involved in both the microbial and human pathways and calculated their intersection. Because the metabolomics datasets are not discrete values like sequencing count data, we additive log ratio (ALR) transformed the metabolomics datasets using the reference frames highlighted in the original papers. We then performed Wilcoxon tests on age-matched and sex-matched metabolomics samples within each cohort separately. Although our analysis revealed multiple metabolites that were below the 0.05 P value threshold, none of these metabolites passed the FDR-corrected threshold.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.