Unfortunately, as demonstrated in Fig. 3a, the gates had not swung as wide as Blackwell had hoped. Women were effectively shut out of many of the schools that remained open during the reform period. Some institutions, such as Harvard Medical School, did not admit women until 1945 and did so then only because of the pressures on enrollment due to World War II. Other schools were formally coeducational, but admitted only a few women a year. More striking, however, are the schools that significantly reduced their female enrollment during the early decades of the twentieth century. Table 5 reports female enrollment between 1900 to 1940 for the schools with the numbers of women in 1900 that remained in operation through 1940. The numbers in parentheses are the shares of females in their total enrollments. Nearly all of these schools experienced sharp declines in the numbers of female students between 1900 and 1910, and in virtually every case, these declines translated into reductions in women’s share of enrollment.
In this section, we further explore how changes in medical education requirements at the surviving schools contributed to this pattern. We focus on the adoption of pre-medical college coursework requirements by schools as our measure of educational reform because it captures a central dimension of the transformation of medical education during this period. Prior work shows that, among the many state licensing rules enacted between 1880 and 1930, only two had measurable effects on the stock of physicians: the requirement of a four-year program and the requirement of college coursework prior to admission to medical school (Law and Kim 2005). We focus on the pre-med college coursework requirement for two reasons. First, the earlier findings suggest it had a substantially large impact on the stock of physicians. Second, most surviving medical schools had already adopted four-year term by 1900, well before the contraction in enrollments and most states added the four-year term requirement for licensure. On the other hand, adoption of the two-year pre-med college requirement spans our period of declining enrollments with significant variation in timing up until 1918. Ultimately, all surviving schools adopted such college coursework admission requirements, with the last doing so in 1922. Conversely, only 15 had adopted these pre-med requirements among the 181 schools that would eventually closed.
The pre-medical education requirement may have reduced female enrollment in two ways. First, it may have had a direct effect on supply. If women faced greater barriers to completing college-level coursework due to limited access to higher education or social constraints, then raising admission requirements would reduce the supply of qualified female applicants relative to men. Second, higher admissions standards may have facilitated discrimination in admissions. By raising barriers to entry and reducing competition for applicants, surviving schools may have faced lower costs to acting on gender-based preferences. In requirements may have opened the door for discrimination against women in medical school admissions. The raising of entrance requirements captures one of the mechanisms used by the AMA and AAMC, in conjunction with state licensing boards, to reduce the number of medical schools and the supply of (unqualified) physicians.
Table 2 Women’s Medical Schools in 1895 Full size table
5.1 Empirical strategy
We quantify the effect of pre-med college requirements on the enrollment of women by using the variation in timing of when medical schools adopted the requirements. The standard method to estimate treatment effects with a staggered policy rollout is a regression that incorporates unit and time fixed effects. This two-way fixed effects (TWFE) regression in our context of medical school-level outcomes is:
$$\begin{aligned} Y_{i,t}=\alpha +\theta R_{i,t} +\lambda _{i} + \lambda _{t}+\varepsilon _{i,t} \quad , \end{aligned}$$ (1)
where \(Y_{i,t}\) is an outcome for medical school i during year t. The indicator variable \(R_{i,g,t}\) takes the value of 1 if the observation is part of the treatment group in year t and 0 otherwise. The \(\lambda _{i}\) are school (“unit”) fixed effects, and the \(\lambda _{t}\) are year fixed effects. The coefficient of interest is \(\theta\), and captures the treatment’s effect on the outcome.
However, when treatment effects vary across cohorts or evolve over time, interpreting the TWFE coefficient \(\theta\) as a causal effect can be misleading (Goodman-Bacon 2021; de Chaisemartin and D’Haultfœuille 2020). This bias arises from comparisons that rely on “bad” control units. In our setting, these include medical schools that always required two years of pre-medical education (the “always treated”) and schools that had already adopted the requirement (the “already treated”), both of which may be used as controls for newly treated schools. To assess the extent of this potential source of bias, we use the decomposition procedure of Goodman-Bacon (2021) to estimate the weights placed on each type of comparison (see Appendix Table 8). In our sample, TWFE assigns 20 percent of the weight to comparisons using always-treated schools as controls and 49 percent to comparisons using already-treated schools. Comparisons using not-yet-treated schools receive only 31 percent of the weight. Our sample contains no never-treated units.
Table 3 Summary statistics sample of schools operating in 1893–1894 Full size table
Table 4 School characteristics, school closure, and proportion female Full size table
Instead, we use a method that uses only “clean controls,” the medical schools that are yet to be treated. We choose to use the extended two-way fixed effects (ETWFE) method developed in Wooldridge (2021) to allow for a fully flexible amount of treatment effect heterogeneity by timing group, calendar time, and event time. Moreover, because all surviving medical schools eventually adopt the two-year pre-med college requirements, no never-treated units exist, and we must rely on the not-yet-treated units to serve as controls, one of the strengths of the ETWFE method.Footnote 7
We estimate the following ETWFE estimating equation as our baseline:
$$\begin{aligned} Y_{i,t}=\alpha +\sum _{g\in G}\sum _{t=g}^{T}\theta _{g,t}R_{i,g,t} +\lambda _{i} + \lambda _{t}+\varepsilon _{i,t} \quad , \end{aligned}$$ (2)
where \(Y_{i,t}\) is the number of female students, male students, ln(total students), or the proportion of students that are female at school i in year t. The two-year pre-med college requirements are captured by the \(R_{i,g,t}\)’s, which are indicators that take the value of 1 if the observation is in treatment timing group g and in period t. G contains all the treatment timing groups. Standard errors are clustered at the school level.
The coefficients \(\theta _{g,t}\) measure the average dynamic treatment effect for units first exposed in group g at time t, relative to untreated observations at the same point in time. For example, \(\theta _{1912,1914}\) captures the average difference in outcomes in 1914 for schools that first adopted the requirements in 1912, compared to schools that had not yet adopted the requirements by 1914. Importantly, the comparison group does not include schools that adopted the requirements before 1915, since their outcomes may already reflect treatment effects. The procedure gives estimates for the full set of \(\theta _{g,t}\) for each treatment timing cohort g in each post-treatment period t (e.g., for the 1912 treatment timing group - \(\theta _{1912,1912}\), \(\theta _{1912,1913}\), \(\theta _{1912,1914}\),..., \(\theta _{1912,1918}\)). Given the large number of treatment-group-by-year coefficients, interpretation requires aggregation.
To summarize the dynamic treatment effects into a single measure, we aggregate the group–time estimates \(\theta _{g,t}\) into a single average effect across all treatment cohorts and time periods. Following Wooldridge (2021, 2023), this is given by:
$$\begin{aligned} ATT = \frac{\sum _{g}\sum _{t} \theta _{g t}\,\omega (g,t)\,\textbf{1}(t \ge g)}{\sum _{g}\sum _{t} \omega (g,t)\,\textbf{1}(t \ge g)}, \end{aligned}$$ (3)
where \(\omega (g,t)\) are nonnegative weights (based on the relative frequency of observations in each group and period) and \(\textbf{1}(t \ge g)\) is an indicator that ensures effects are only defined for post-treatment periods.
Intuitively, this procedure takes all the estimated effects (i.e., the 1912 adopters in 1914, the 1915 adopters in 1917, and so on) and averages them using sample size weights. The numerator sums all valid group–time treatment effects, while the denominator normalizes the weights so that the overall estimate is interpretable as a weighted average. The control groups in each comparison remain the not-yet-treated schools at the relevant time, so the aggregation simply stacks these valid group-specific comparisons into one overall effect. This single coefficient, \(\theta\), can therefore be interpreted as the average association of the policy over the observed post-treatment periods and across all treatment cohorts.
We also report separate aggregated treatment effects for early- and late-adopter schools. Early adopters are defined as those schools that adopted the pre-med requirements prior to their state passing a licensure law and prior to the American Medical Association’s enforcement of the A rating standard, which effectively required compliance beginning with the 1918 entering class. Late adopters are those schools that adopted only after these institutional pressures were in place. For each subgroup, we compute the aggregation in equation 3 using only the relevant set of cohorts, allowing us to compare whether schools that acted independently of external mandates experienced different effects from those that complied once requirements became unavoidable.Footnote 8
Our regressions restrict the sample to medical schools that operated continuously over 1906–1918 for enrollment outcomes and 1906–1928 for graduate outcomes. Panel B of Fig. 4 plots the timing of adoption of the two-year pre-medical college requirement across schools. Most of the variation in treatment timing occurs between 1910 and 1918. Only a small number of early adopters, primarily leaders of the educational reform movement such as Johns Hopkins and Harvard, implemented pre-med requirements before 1910. A modest wave of adoptions followed between 1911 and 1915, with the majority occurring between 1918 and 1922.
This later surge coincided with state licensure laws requiring a medical degree from a school rated A or B by the American Medical Association (AMA). Beginning with the 1922 cohort, the AMA required schools to impose pre-med college requirements in order to receive an A rating. As a result, all schools in the sample had adopted the requirement by 1922. This institutional setting has two implications for identification. First, there are no never-treated units, as all schools are treated by 1922. Second, treatment timing is concentrated within a relatively short window, limiting statistical power to detect treatment effects.
Because our analysis focuses on the shutout effect on women at existing medical schools, we exclude institutions that closed before 1924 or were established during the period. Table 6 summarizes the balanced sample of 57 coeducational schools used in the shutout regressions. Across both enrollment (Panel A) and graduates (Panel B), female representation is low and highly skewed: Many school-year observations have zero women, and the distributions of female counts and shares are concentrated near zero. The median number of female students is zero for both enrollment and graduates, while at the 75th percentile schools enroll six women and graduate two. This pattern implies that much of the relevant variation for women might operate on the extensive margin, whether a school enrolls or graduates any women in a given year, rather than through large changes in levels. At the same time, the table shows substantial heterogeneity in school size, with male enrollment averaging 178 students (s.d.=140) and male graduates averaging 43 (s.d.=34).
5.2 School-level results
Table 7 reports aggregated ETWFE estimates of the effect of adoption on enrollment and graduates, by sex and in total. For enrollment (Panel A), the pooled estimates indicate a statistically meaningful decline in male enrollment, while effects on female enrollment and on the proportion female are small and imprecisely estimated. The effect on the log of total enrollment is negative but not precisely estimated in the combined sample, with suggestive evidence of a decline for late adopters. For graduates (Panel B), point estimates are generally close to zero and statistically insignificant for men, women, and total graduates, and there is no systematic evidence of changes in the proportion female among graduates. Overall, the table suggests that adoption of pre-med requirements is associated primarily with a reduction in male enrollment, with little to no change in women’s enrollment or in graduation outcomes.
Figure 5 plots event-study estimates for the first eight years after adoption for female, male, and total enrollment. Event time begins in the first post-treatment year because the sample contains no never-treated units, so we cannot estimate pre-trends Wooldridge (2021). The right-hand panels split the sample into early and late adopters. Across all panels, we find no evidence of an effect on female enrollment. In contrast, male and total enrollment exhibit a dynamic response. Estimates are negative in roughly years 1–4, followed by a recovery that turns positive by year eight. The pattern is similar for early and late adopters, though estimates for early adopters are more precise and more often statistically distinguishable from zero, while late-adopter estimates are noisier with wider confidence intervals. Because the estimates remain positive beyond year eight, we truncate the figure at eight years for clarity. Appendix Fig. 7 reports similar event studies for graduation outcomes and shows no comparable dynamic effects.
Fig. 5 The alternative text for this image may have been generated using AI. Full size image Event study for school-level enrollment outcomes. Notes: This figure plots event-study estimates of the effect of adopting two-year pre-med college requirements, aggregated by years relative to treatment. Early adopters are schools adopting before a state licensure mandate or the AMA’s A rating requirement, which began with the 1918 entering class. Panels (a) and (b) report results for female enrollment, panels (c) and (d) for male enrollment, and panels (e) and (d) for the natural log of the total enrollment. Standard errors are clustered by school. Because our setting does not have any “never-treated” units, the ETWFE procedure cannot estimate pre-trends (Wooldridge 2021)