We start by describing the dataset we have analyzed and briefly explaining the methodology we have used to build the citation network and the pairs of similar papers. Then, we proceed to study gender disparities, first at the aggregate level and then by comparing pairs of similar papers.
Data description
We study an American Physical Society (APS) dataset from 1893 to 2009, which contains articles’ metadata, the authors’ basic information, and the citations within the papers. The metadata consists of authors’ full names and a unique digital object identifier (DOI) of the publication in a string format. For those names that are repeated in the dataset, we used name disambiguation methods proposed by Sinatra et al.19 to detect unique authors and correctly match authors to publications (see Supplementary Fig. 1). To infer gender from names, we implemented a gender-detection procedure that combines author names with an image-based gender inference technique applied to search results from Google Images20. This combined method results in high accuracy in the gender identification of scholars from different nationalities (see Supplementary Methods). The final dataset consists of 541,448 scholarly articles published over the course of 116 years, categorized into 11 journals. Among those 541,448 papers, we were able to identify at least one participating author’s gender of 375,736 papers. We have identified 120,776 gendered names, 17,763 women and 103,013 men. The evolution of the number of authors per year is shown in Fig. 1a.
Fig. 1: Rate of growth of women participation, average publications by career age, dropout rate and annual ratio of men/women self-citations. a Number of men (blue) and women (orange) authors per year. b Average number of publications by authors' career age. The shaded area shows the standard deviation. c Proportion of men and women authors who drop out compared to the remaining active authors per career age. d Normalized ratio of men/women self-citations computed from (1) during the time period of interest. The horizontal dashed line is the line of equilibrium; data points above the equilibrium line indicate a higher ratio of men's self-citation, and points below the line imply a higher ratio of women's self-citation. Full size image
Here, the notion of “gender” refers neither to the sex of the authors nor to the gender that the author self-identifies as. By the word “woman”, we mean an author whose name has a high probability of being assigned to female at birth or being identified as a woman due to facial characteristics. Given this limitation, we can safely argue that these methodologies are in accordance with social constructs and what people perceive as gender in society.
Constructing citation networks and assessing similar pairs
We build the citation networks by considering each paper as a node and making a link from paper i to paper j if i includes a citation to j. We measure the similarity between two papers using the bibliographic coupling strength21,22; that is, the number of publications that both papers cite. Two papers that cover similar topics in a comparable way are assumed to include a similar set of outgoing citations. However, within subfields there is usually a handful of classic publications that are cited in most works, so their inclusion in two different papers may not indicate actual similarity, but a citation convention. To avoid such shortcomings of naive bibliographic coupling, and guarantee the significance of the overlapping set of citations, we apply a statistical test based on the hypergeometric distribution. This test controls for the incoming citations of the commonly cited papers and checks whether the size of the common set of citations is so large that it cannot be explained by randomness. The problem of identifying similar papers to assess gender disparities has also been approached recently using machine learning techniques23.
To explore gender disparities, we select pairs of similar papers respectively written by men and women primary authors. Then, we compare the future incoming citations to each of the pair. This comparison allows us to detect potential inequalities in the citation patterns. We have summarized this methodology in the diagram of Fig. 2 and provided all the technical details in Methods.
Fig. 2: Assessing similar pairs. We use bibliographic coupling and hypergeometric statistical tests to select couples of similar papers based on their outgoing citation activities. Then we compare their respective popularity (incoming citations). Each node and each arrow represent a paper and a citation respectively, whereas each dashed arrow represents a potential citation that is missing. The pair of papers being assessed (i and j) are shown in blue and orange, the papers cited by them in yellow, and the papers that cite them in green. The black arrow at the bottom represents a timeline showing the publication times of the papers. Full size image
Aggregate gender disparity trends
To characterize the gender disparities at the aggregate level, we first analyze the aspects of scientific production that depend primarily on individual choices and ability: in particular, productivity, dropout rate, and self-citations. Then, we discuss authorship order, which depends on the internal organization of research groups. Finally, we study the behavior of the scientific community as a whole by comparing the citations received by men and women.
Productivity
We define productivity as the number of publications that scholars produce during their career. In physics, we observe that women have a lower average number of publications compared to men across all their career ages (Fig. 1b). While in the first two years of author’s career the publication gap is closing, we observe a sudden increase in the gap from the second to the eleventh year. After this point, the publication gap starts decreasing again. These fluctuations in publication productivity can be associated with, among other things, the disproportionate family responsibilities that women have to take on compared with men24. For the aggregate results, see the productivity distributions by gender in Supplementary Fig. 4.
Although a researcher’s productivity can be considered to be determined mainly by individual skills, the collaborative nature of scientific work makes it dependent on external factors such as other team members or departmental organization. Likewise, these factors, together with other aspects like social perception or family responsibilities, affect women’s motivation to keep working in academia, potentially leading to the leaky pipeline phenomenon. To quantify this phenomenon, in the next section we explore the differences in dropout rates between men and women.
Dropout rate
We compute dropout as a lack of publication activity for at least five years to distinguish the authors who are active in publishing from those who have dropped out. We investigate the ratio of dropout scholars at each career age compared to the number of active scholars by gender. Figure 1c shows that women authors have a higher dropout ratio throughout their whole career. The largest gaps appear in the early career years, with a 2.28% difference between men and women in the first year and a 2.26% difference in the sixth year. The dropout rates of authors who leave academia after their first year (career age 0) are not shown in Fig. 1c. This career age presents the highest dropout rates, with 39.94% for men authors and 47.55% for women authors.
Self-citation
Self-citation refers to cases where authors cite their own previous works. Self-citations increase the total citation count and the visibility of scholars25,26,27, potentially enhancing academic promotion and attention. We have measured the relative number of self-citations by all men and women authors with the following metric (r) to study the difference in self-citation ratios between the two genders over time25:
$$r=\frac{\frac{ \% {{{{{{{\rm{men}}}}}}}}{{\hbox{'}}} {{{{{\rm{s}}}}}}\,{{{{{\rm{self}}}}}}-{{{{{\rm{citations}}}}}}}{ \% {{{{{\rm{men}}}}}}{{\hbox{'}}} {{{{{\rm{s}}}}}}\,{{{{{\rm{citations}}}}}}}}{\frac{ \% {{{{{\rm{women}}}}}}{{\hbox{'}}} {{{{{\rm{s}}}}}}\,{{{{{\rm{self}}}}}}-{{{{{\rm{citations}}}}}}}{ \% {{{{{\rm{women}}}}}}{{\hbox{'}}} {{{{{\rm{s}}}}}}\,{{{{{\rm{citations}}}}}}}}$$ (1)
Figure 1d shows the temporal evolution of the ratio r. This result shows that women tend to cite themselves less than men and that this trend is consistent over the years (See Supplementary Table 2 for more details). Consequently, women’s visibility in the citation network is partly penalized by the higher ratio of men citing their own previous works.
Another fundamental factor that affects an author’s visibility is the position in which her name appears in the list of authors. This position depends on how the whole research group is organized and, crucially, in most cases it depends on the perceived level of contribution of each collaborator.
Authorship order analysis
In the majority of the scientific fields, including physics, the authorship order indicates relative contribution and seniority by putting emphasis on the first, the last, and the second positions28,29. In order to compare the positions of authors, we first discard those papers for which authorship order is alphabetical. For this purpose, we perform a string comparison of the last names of the contributing authors and consider them to be in alphabetical order if the paper has at least four authors and all of them follow this order. Around 3.54% of the papers can be considered as alphabetically ordered; in Supplementary Table 3 we detail their fraction by PACS subfield (Physics and Astronomy Classification Scheme). After discarding those papers from the analysis, we study the authorship order in each publication and compare the proportion of women and men in each position of the author list (first, second, middle and last). We perform this comparison using a two-proportion z-test (see Methods). If there is only one author in a paper, we consider her the first author. Middle authors are those between second and last in papers with more than three authors.
The results show that there are more women than expected by chance in the first, second and middle author positions, and they are heavily under-represented as last authors (see Supplementary Table 4). The last author in physics papers is usually the most senior member of the team, so this trend can be explained by the later and slower rate of arrival of women, combined with their higher dropout rate throughout their career. This is in line with previous findings that women feature only rarely as the last authors in leading journals30.
While the authorship order reflects how a researcher’s coworkers perceive her contribution, the collective perception of the scientific community regarding the importance of a paper is manifested in the citations of papers. In the following sections we will thoroughly compare the relative popularity of publications led by women and men.
Citation centrality analysis
The flow of citations determines the visibility and recognition of papers both locally and globally. To measure the local influence of papers we use the in-degree metric, and to measure the global influence, we use the PageRank centrality. Our aim is to verify if the visibility of papers written by women is proportionate to what we expect from their overall population size. To do that, we focus on the ranking of the nodes according to their respective centrality.
Understanding ranking centrality is important for three reasons. First, the authors of papers in top ranks gain more visibility for themselves and those central papers influence future citation patterns31,32,33. Second, the visibility of papers in top ranks is being exacerbated by algorithmic tools such as Google Scholar. Third, since citation networks follow a heavy-tailed distribution, those in top ranks stabilize their ranking position and give few opportunities for other papers to catch up34. Because of these network effects, it is important to study how minorities are represented in top network centrality ranks.
We assigned to each paper a gender by labeling it based on its first author. Then, we analyzed the top h% in-degree/PageRank centrality of the papers. Figure 3a suggests that papers written by women have significantly lower in-degree and pagerank centrality than expected from their overall proportion. Women-led publications are substantially under-represented in the highest 20th, 30th, and 40th percentages, and the deviation between the observed and the expected proportions likewise increases in the highest rank positions. While in-degree and PageRank follow a similar trend as expected, the proportion of women with high PageRank centrality is even lower when compared to the in-degree centrality. This suggests not only that papers written by women receive less attention but also that they are disadvantaged in terms of their position within the entire citation network. Statistical tests confirm these findings (see Supplementary Table 5).
Fig. 3: Women author proportions in degree and PageRank centrality, evolution of centrality difference by year and relationship between time of publication and citation. a Proportion of publications with a woman primary author per top h% of degree (black) and PageRank centrality (red). The dotted horizontal line signifies the proportion of women primary authors in the observed samples. b Citation and temporal differences between man–woman pairs of papers with validated similarity. The colors indicate the quadrant each pair belongs to (black—quadrant 1, red—quadrant 2, green—quadrant 3, and purple—quadrant 4). c Heat map showing the probability anomaly of the joint probability distribution of citation and temporal differences computed with equation (2). d Centrality differences of similar man-man pairs and similar man–woman pairs over the years. The two papers within each pair are published no more than 3 years apart, and the publication year of the pair is defined by the year of the latter paper. The lines are the mean values and the shaded areas the standard errors. The evolution of the distribution as a whole is shown in Supplementary Fig. 7 as a percentile plot. Full size image
So far, the global gender analysis points towards a notable disparity in productivity and citation of men and women. This could be partly due to historical reasons, to the cumulative advantage that early arrival confers to men, as well as to the high dropout rate of women7. The slower rate of arrival of women (see Fig. 1a) may also play a relevant role. Together, these factors affect women’s global visibility. The question that arises from these global results is, are scholars intentionally ignoring (and therefore, under-citing) research works led by women? To explore this possibility, in the following section we study pairs of papers written by men and women that are statistically validated twins, and measure the citations that each paper receives.
Pair-wise citation analysis
We identified statistically validated pairs of similar papers (one with a man as first author and the other with a woman) using the methodology described in Methods and summarized in Supplementary Fig. 2. Then, we computed the difference in the number of citations each member of the pair receives. The overall expectation is that similar pairs of papers should have a similar number of incoming citations on average. The first sign of gender bias that we have found is that, within similar pairs of man–woman papers, men get more citations in 45% of the pairs, women in 39%, and in 16% they receive the same number of citations. We performed binomial tests against the null hypothesis that men and women should be equally likely to get citations within each similar pair and obtained a strong rejection (p-value ≈ 0).
To quantify men’s advantage, we computed the average citation difference between the man-led and the woman-led paper of each pair. Then we normalized it using the standard deviation of men’s and women’s citations to obtain Cohen’s d, a measure of effect size for the difference of means. We evaluated the significance of these differences using z-tests (see Methods). As shown in Table 1, men’s average citation count is significantly higher than women’s both in aggregate and when we consider each PACS subfield separately to control for potential differences in the citation biases per subfield. We obtained similar results by controlling for journal instead of subfield (see Supplementary Note 1 and Supplementary Table 10). We performed analogous analyses for last authors, finding consistent results for most subfields and journals (see Table 2 and Supplementary Table 12). The only noteworthy difference appears in PACS 80 (Interdisciplinary Physics & Related Studies), where women get more citations on average as first authors.
Table 1 Differences in received citations among similar pairs of publications labeled by their first-author gender. Full size table
Table 2 Differences in received citations among similar pairs of publications labeled by their last-author gender. Full size table
It is known that the publication time of a paper influences its citation count, and previous studies1,35 have used different strategies to control for it. To check whether the temporal difference between two papers is responsible for the citation disparity for women (an older paper has had more time to accumulate citations), we add a maximum 3-year difference restriction between two similar papers and redo the citation difference analyses. Tables 1 and 2 show that when the time constraint is applied, the citation difference between two similar publications decreases significantly (see Supplementary Tables 11 and 13 for the journal-wise analyses). The effect is stronger for first than for last authors. The subfield Interdiscplinary Physics & Related Studies (PACS 80) presents an anomalous behavior, as women have the citation advantage as first authors while men have it as last authors. In contrast to the rest of subfields, this advantage increases after applying the time constraint.
However, citations have a very heterogeneous distribution, with a tiny fraction of papers gathering a huge number of citations, so these discrepancies may be caused by a few papers written by women with many citations. To mitigate the influence of such outliers, we have performed analogous tests for the difference of medians. In particular, we have used the Wilcoxon test to quantify the significance of the difference and the rank biserial correlation (rc)36 to estimate its effect size. The rc metric takes values between −1 when women have more citations in every pair and +1 when men do. The results, presented in Supplementary Tables 14 and 15, show that the apparent advantage of women in PACS 80 (and in PACS 00—General Physics) after applying the time constraint, were mostly driven by outliers, as rc is positive in all cases; although, consistent with the previous analyses, it is smaller when the time constraint is applied.
Throughout these analyses, we have seen that the gender disparity within similar man–woman pairs is small (small effect sizes), but significant (p-values close to 0). However, we should be cautious when interpreting those p-values. The statistical tests rely on the assumption of independent samples, but in our methodology one paper can be part of several statistical twins, so those pairs would not be independent. The independence violation results in narrower standard errors and, in turn, lower p-values. Nevertheless, the consistency of the gender asymmetries should not be underestimated.
The temporal dimension is fundamental when comparing citation counts, as the first-mover advantage plays a crucial role in scientific success37. Within similar man–woman pairs, the man’s paper is published first in 47.7% of the pairs, the woman’s paper in 41.3%, and approximately at the same time (the same year) in 11.0% of the pairs. These results point to a clear first-mover advantage by men.
First-mover advantage within similar pairs of papers
Given the above results, we now seek to confirm whether the time of publication is a main driver for the citation disparity and whether the first-mover advantage in publication affects men-led papers and women-led papers similarly. We define Δ t = Y m − Y f as the year difference between the publication dates of man–woman pairs of similar papers and Δ C = c m − c f as their citation difference. We plotted the year difference Δ t against the citation difference Δ C in Fig. 3b. We likewise elaborated ten analogous plots after categorizing the data into subfields by PACS number (shown in Supplementary Fig. 5) to control for variations between subfields. Note that for this analysis we impose no time restriction between the publication times of the two papers of each pair.
To verify that the disparity in citations is caused by the first-mover advantage, we first need to test whether a first-mover advantage in fact exists. If that is the case, when a man publishes first (Δ t < 0) he should get more citations (Δ C > 0) on average, but when a woman publishes first (Δ t > 0) she is the one who should get more citations (Δ C < 0) on average; that is, in Fig. 3b, quadrants Q2 and Q4 should be more populated than expected if we treated Δ t and Δ C as independent random variables. Equivalently, we should observe a negative correlation between Δ t and Δ C .
To test this hypothesis, we compared the empirical joint probability distribution of Δ t and Δ C (P emp (Δ t , Δ C )) with the one that we would obtain if they were independent variables (P null (Δ t , Δ C ) = p(Δ t )p(Δ C )) by computing the probability anomaly as:
$${P}_{{{{{{{{\rm{diff}}}}}}}}}({\Delta }_{t},{\Delta }_{C})=\frac{{P}_{{{{{{{{\rm{emp}}}}}}}}}({\Delta }_{t},{\Delta }_{C})-{P}_{{{{{{{{\rm{null}}}}}}}}}({\Delta }_{t},{\Delta }_{C})}{{P}_{{{{{{{{\rm{null}}}}}}}}}({\Delta }_{t},{\Delta }_{C})}$$ (2)
The resulting values of P diff (Δ t , Δ C ) are shown in Fig. 3c and, as can be observed, they support the hypothesis of the first-mover advantage, since Q2 and Q4 present positive anomalies while Q1 and Q3 present negative ones. It is worth emphasizing that a positive (resp. negative) anomaly indicates higher (resp. lower) density of points with respect to a situation of no correlation between Δ t and Δ C . To quantify this trend we computed the Pearson and Spearman correlations between Δ t and Δ C , obtaining − 0.13 and − 0.34, respectively.
Once the existence of the first-mover advantage has been confirmed, we need to test whether there exists an asymmetry in the relative advantage that men and women obtain when they publish first. If there is no asymmetry, the average number of citations that a woman obtains by publishing a certain number of years ahead of a man should be comparable to the number of citations that a man obtains in the equivalent situation.
To verify this, we compared the citation differences of Q2 with Q4 (pairs where the earlier paper received more citations) and Q1 with Q3 (pairs where the earlier paper received fewer citations) for each temporal difference; in other words, we compared the average absolute value \(|{\Delta }_{C}|\) of points from Q2 with the average \(|{\Delta }_{C}|\) of points from Q4 for each \(|{\Delta }_{t}|=1,2,\ldots\) separately (analogously for Q1 and Q3). To perform this comparisons, we used z-tests for difference of means for each year difference (see Methods). The results of the tests for the whole dataset, shown in Table 3, indicate that men have an asymmetric advantage, gaining comparatively more citations when they publish first. We obtain similar results for each subfield (see Supplementary Table 16). The exceptions are General Physics (PACS 00) and Interdisciplinary Physics & Related Studies (PACS 80), where women get an asymmetric advantage.
Table 3 Statistical tests of gender asymmetry in the first-mover advantage. Full size table
Researcher seniority as a temporal advantage
While we have verified that the first-mover advantage plays a relevant role in the citation disparities between genders in a microscopic level, the differences between similar pairs, even if significant, are fairly small. Therefore, the temporal advantage gained by individual papers published earlier than their statistical twins may not be enough to explain the visibility differences manifested in the centrality rankings shown in Fig. 3a. As mentioned above, there are group-level temporal disparities that should also be taken into account: women’s delayed arrival, their slower rate of arrival, and their higher dropout rate, captured in Fig. 1.
These factors can have dramatic effects on the distribution of seniority of researchers (see Fig. 4a), which is another potential source of inequality. As a researcher progresses through her career, she not only gathers citations, but also recognition, which in turn attracts more citations. As we observe in Fig. 4b, the proportion of male to female authors increases with career age, indicating a strong gender bias in the seniority distribution. This bias in the proportion of senior researchers is transferred to the ranking of centrality of papers (see Fig. 4c), which shows, on the one hand, that the higher ranks are occupied on average by older researchers, and on the other hand, that the average age of women authors is consistently lower throughout all ranks.
Fig. 4: Seniority distribution of researchers by gender. a Number of men and women authors by their career age. b Proportion of men to women by career age. c Average age of men and women authors of papers in each top h% of degree centrality (number of citations). The inset shows the same result zooming in on the higher ranks. Full size image
This thorough analysis indicates that temporal advantages are critical factors in the emergence of gender inequalities. From the individual’s perspective, researchers that publish a result earlier gain the first-mover advantage. Men publish earlier more frequently and obtain an asymmetrical advantage when they do so. At the population level, historical disadvantages driven by the late arrival and higher dropout rate of women cause a deficit of female senior researchers, which may explain women’s low visibility in the citation network.
Historical trend in citation
Finally, we hypothesize that the physics community might have been less receptive to the contribution of women in the past compared to the present. To test this hypothesis, we measure the temporal evolution of the centrality differences (Δ C ) between man–woman pairs by year and limit the publication time difference between the two papers to a trailing window of 3 years. Then, we compute the mean and standard error of Δ C for all the pairs within each window. For comparison, we perform an analogous computation for random samples of similar man-man pairs. In each time window, we matched the number of sampled man-man pairs with the number of similar man–woman pairs. We repeated the man-man computation 100 times independently and computed the average Δ C and the standard error, which we use as a baseline.