This section begins by outlining the main data sources used in this study (Sections ‘GLODAP and GO-SHIP data’ and ‘Argo data’), before moving to explain the OMP-based water mass classification (Section ‘OMP water mass classification’). Initially, this method is used with the GLODAP climatology to provide a mean-state estimate of the Southern Ocean water mass configuration (Section ‘Climatology’), with the end member selection process described in section ‘Initial end member selection: Climatology’. The same method is used with varied end members (Section ‘Time-varying end members’) to compute the second water mass classification with GO-SHIP section data, which provides: (a) a direct assessment of water mass changes across individual cruises, and (b) the necessary training data for the RF model ensemble (described in Section ‘Machine learning model and application to Argo data’), which is then applied to the RG Argo data (Section ‘Temporal changes in water mass distribution’) to investigate variability and change in the top 2000 m at higher temporal resolution.
Whilst GLODAP/GO-SHIP data contains the necessary biogeochemical tracer data to properly constrain the OMP inversion, the gridded Argo data contains only θ and salinity. As we show in Section ‘Machine learning model and application to Argo data’, the machine learning approach enables a water mass classification with only θ, salinity, and positional data that exceeds the abilities of a traditional OMP analysis using just θ and salinity without an a priori (Fig. S19). Given that the resulting RF model ensemble does not require the specification of end members, this method has the added benefit of improved ease of application.
GLODAP and GO-SHIP data
GLODAP climatology
GLODAP is a synthesis activity for ocean surface to bottom biogeochemical data collected through chemical analysis of water samples. As such, it also includes all repeated hydrographic observations collected approximately every 10 years through the GO-SHIP program. The GLODAPv2 climatology is constructed by mapping ship-based data collected during the period 1972–2013 onto a 1 × 1 degree grid with 33 standard depth surfaces by using the Data-Interpolating Variational Analysis method. Further information, including estimates of error fields, quality control procedures and removal of anthropogenic trends, can be found in Lauvset et al.60. We use this product to provide a mean-state of Southern Ocean water masses (Section ‘Climatology’) for the first part of the OMP-based water mass classification analysis, as well as in the determination of adjusted end members (section ‘Time-varying end members’). For all GLODAP-related applications the following fields are used: θ, salinity, oxygen, nitrate, phosphate, and total alkalinity. The domain incorporates data from all longitudes within the latitudinal range 90°S–30°S. We acknowledge that the mapped θ and salinity from this product are not optimised for dynamical applications. However, these fields are used only to provide visual context in Section ‘Climatology’ and to maintain consistency with the biogeochemical end member definitions and with our use of GLODAP section data elsewhere. None of the key analyses of temporal change rely on the mapped climatology.
GO-SHIP cruise section data
The second part of the analysis focuses on observable changes to the Southern Ocean water mass configuration. For this we select individual GO-SHIP cruise sections from the merged and adjusted GLODAPv2.2023 synthesis dataset77, as presented in Lauvset et al.78. We select all measurements within the Southern Ocean domain which meet the highest quality control criteria and contain the full set of aforementioned required variables. We separate this data into a 2005–2010 dataset and post-2015 dataset (primarily 2015–2020) to examine changes in water mass distribution. 2005–2010 is selected to align with the start of the Argo dataset. These sub-sets are used in both the direct analysis of change along repeat sections and to train the RF model ensemble, which we later apply to Argo data. Both these analyses are presented in Section ‘Temporal changes in water mass distribution’. The GO-SHIP data provide high accuracy, quality controlled measurements with full biogeochemical coverage. The primary limitation is that it is necessarily temporally sparse and spatially limited to discrete lines. For full details on the data, including quality control procedures, bias adjustments and error estimates, see Lauvset et al.78.
Argo data
We use the mapped Roemmich-Gilson (RG) Argo product, which provides a monthly global objective mapping of all good Argo profile data. All Argo data receive additional quality control prior to mapping. The monthly maps are given at a 1 × 1 degree gridded resolution with 58 pressure levels over the depth range 0–2000 m between January 2004 and January 2024. Full details can be found in Roemmich & Gilson58. We apply the GO-SHIP-trained RF model ensemble to this data to determine a time-varying gridded water mass classification. The advantage of this core Argo data is that it provides high-resolution and near-global coverage with robust regular sampling. The principal disadvantage of this dataset is that the domain only extends as far as 65°S, which makes analysis of the near shelf water mass configuration extremely limited; for example, our water mass classification in the GLODAP climatology dataset shows high concentrations of DSW near the shelf (Fig. S2), but this is almost entirely missing in our Argo-based results. The core Argo floats also do not provide any biogeochemical tracer measurements. However, we contend that this is a necessary trade-off, given that, unlike other similar mapped Argo datasets, the RG product corrects for the well-known spurious salinity trend arising from drift, which has potential to impact the analysis of trends from the water mass classification method79,80,81.
OMP water mass classification
We use an Optimum Multi-Parameter (OMP) framework to provide a GLODAP-based Southern Ocean water mass classification, using the Pyompa implementation from Shrikumar et al.82, which is built on the original OMP package from Tomczak59 using the same fundamental non-negative least-squares machinery. Classical OMP analysis relies on the definition of end members of unmixed Source Water Types (SWTs), which are considered as water masses at the point of formation. SWTs are assigned characteristic variable values and it is assumed that these variables undergo the same mixing processes with identical mixing coefficients. Observed variables can therefore be considered to be a linear combination of each of the SWTs, such that it is possible to determine the spatial distribution of each SWT with a set of linear equations at each point in space. It is also assumed that all variables are conservative, although buoyancy fluxes at the surface and biogeochemical processes involving assimilation or remineralisation can violate this assumption. OMP analysis is a well-established method for identifying water-mass fractions and leveraging multiple tracers, but it is inherently sensitive to end member selection and is generally limited to relatively scarce bottle data. This provides our rationale for using a machine learning framework (Section ‘Machine learning model and application to Argo data’) to extend water mass classification to higher-resolution Argo float data. For the full set of equations, see Section S1.1.
We use θ, salinity, dissolved oxygen, nitrate, phosphate, and total alkalinity as raw input variables in our analysis from both GLODAP climatology and GO-SHIP cruise section data. Given the non-conservative potential for biogeochemical tracers, we elect to model the ‘semi-conservative’ parameters NO and PO via flexible exchange parameters83. This is the extended OMP analysis described in Karstensen & Tomczak84. This allows deviations from strict conservativeness to be absorbed into an additional Δ term while retaining all three tracers in the mixing system. The full set of equations is set out in Section S1.1. For additional details of the Pyompa configuration, linear constraints and normalisation practices, including further details on adaptions made to the original implementation, see Shrikumar et al.82.
Initial end member selection: Climatology
We select end members for 9 Southern Ocean water masses: CDW, AABW, NADW, AAIW, DSW, AASW, STCW and two forms of SAMW. These are the same water masses used in Pardo et al.65, who defined end members based on ship section data. We begin with the θ and salinity end members from Pardo et al.65 and make fine-scale adjustments to these to fit the parameter space of our data (GLODAP climatology and pre-2010 GO-SHIP data). Most notably, these include a cooling/freshening of the NADW end member by 0.78 °C/0.08 psu, and a cooling/freshening of the AABW end member by 0.25 °C/0.02 psu. These adjustments ensure that the end member captures the θ-salinity extreme, whilst ensuring that no end members lie just outside the empirical property space. The chosen θ and salinity end members are shown in Fig. 2. To determine the end member for the remaining tracers, we firstly use a K-D tree to find the nearest 1000 data points to the θ and salinity end-member pair. We then select the median value of this sample as the end member for each tracer. A complete account of all end members used and the sensitivity tests can be found in Sections S1.2 and S1.4. Figs. S20, S21, S22, and S23 show the sensitivity of the OMP solution to a simultaneous perturbation of all end-member values by 1 standard deviation, according to our source water property mask. Fig. S24 shows the impact of perturbing the chosen weighting matrix on the OMP solution. Importantly, the main conclusion of this study—relating to a poleward reorganisation of upper 2000 m CDW—is robust to a variety of end member perturbations (Figs. S15, S16, and S18).
Time-varying end members
OMP analysis is often used to describe the time-mean water mass configuration, as we do initially in this study (Section ‘Climatology’). However, it is more challenging to use end member analysis to infer changes in water masses, considering that end members themselves may also change. Changes in the computed water mass fractions may therefore reflect both changes in water mass distribution and/or changes in the properties of the associated end members. This is particularly relevant given that observational data shows a warming trend and substantial salinity trends in the Southern Ocean in recent decades85,86,87,88. Here we outline the method that is used to address this, which involves the specification of two groups of end members to account for changes across end member properties in time.
We begin by separating the GO-SHIP ship section data (Section ‘GLODAP and GO-SHIP data’) into two distinct 2005–2010 and post-2015 datasets. We run an OMP analysis with the 2005–2010 dataset, using the climatological end members as described in Section ‘Initial end member selection: Climatology’.
Selection of Δ values
For the post-2015 dataset, we define additional θ and salinity change (ΔT, ΔS) values to be summed with the original end members values to form adjusted end members. Firstly, we train a RF model ensemble on the water mass output from the OMP run with the GLODAP climatology. We apply this model to RG Argo θ, salinity, and positional data, to give a 3-dimensional water mass fraction in every grid box for the monthly output between years 2004 and 2024. We use the same method as described in the following Section ‘Machine learning model and application to Argo data’, although this is a preliminary (climatology) analysis used solely for the determination of end member ΔT and ΔS values. We use this output to define the 3-dimensional spatial distribution of each water mass on the Argo grid for each timestep, and then average in time to define a mean gridded distribution for each water mass across the Argo period. Next, we identify the 1000 grid boxes with the highest water mass fraction, which we take to approximate the properties of the unmixed source waters associated with each water mass. We calculate the change in θ and salinity (2021–2024 mean, minus 2004–2007 mean) in this 3-D source water mask over the 20-year Argo period to yield ΔT and ΔS values for each water mass. These are added to the original climatological end members to give the final adjusted θ salinity end members to be used with the post-2015 GO-SHIP dataset.
This method is used for all water masses, except for CDW and NADW. We assume that the end members of these two remaining water masses do not change during this period. This is based on observations of CFCs in the Southern Ocean, which show that the age of upwelling core of CDW/NADW exceeds 50 years89. We therefore assume that any change in the source water properties of these water masses has not propagated to the Southern Ocean during this time period. We also assume that the dominant trends in water mass properties are those of temperature and salinity, and thus assume that biogeochemical tracer end members remain constant. The complete set of adjusted end members can be found in Section S1.2 (Table 3).
The computed ΔT values are largely consistent with observational studies of water mass property change in the last two decades. All end members show at least some degree of warming, consistent with the Southern Ocean-wide warming trend that is apparent in the Argo float record (Fig. S25). Perhaps most importantly for the focus of this study, the chosen AABW ΔT of 0.077 °C is similar to observed rates of AABW warming. For example, Johnson et al.75 estimate a AABW warming trend of 2.8m °C per year in West Antarctica, which corresponds to an approximate 20-year warming of 0.056 °C. Similarly, Purkey et al.90 find warming rates of 3.5 m °C per year in the Ross Sea, corresponding to an approximate 20-year warming of 0.07 °C. There is evidence that this warming is accelerating: Johnson et al.75 show that the 2016/2017–2023/2024 trend is nearly triple the 30-year trend. In order to ensure that we do not underestimate the warming of the AABW end member, we test the sensitivity of our conclusions to a large warming of 0.16 °C (discussed more in Sections ‘Discussion and conclusions’ and S2.1.1). Importantly, the poleward migration of CDW is largely insensitive to this change (Figs. S14 and S15).
There is more variability in the ΔS values. Surface waters end members (AASW, STCW) and the near-surface mode water end members show an increase in salinity, with values ranging from 0.0039 psu (AASW) to 0.0217 psu (SAMW2). The AAIW end member shows a freshening of 0.0059 psu and DSW shows a very small freshening of 0.0001 psu. These trends are consistent with the spatial structure of the large-scale salinity changes that we note in the Argo dataset (Fig. S25). AABW also exhibits a 20-year increase in salinity of 0.0069 psu, which is seemingly inconsistent with recent studies that highlight a sustained freshening trend in AABW. This discrepancy arises due to the 2000 m maximum depth and maximum southern extent of 65°S of the Argo dataset; what is defined as the AABW source water by this method is actually a relatively north-ward portion of the downslope AABW cascade, which experiences minor increases in salinity during this period85 (shown in Fig. S25).
Assessing change along GO-SHIP sections
In order to assess changes in Southern Ocean water mass distribution, we compare the abundance of water masses along repeat GO-SHIP sections of the Southern Ocean (Section ‘GLODAP and GO-SHIP data’). Repeat hydrographic sections are sorted into early (2005–2010) and late (post-2015) transects, so as to align with the start of the Argo dataset and the associated changes analysed in Section ‘Temporal changes in water mass distribution’. The only exception to this is in the Weddell Sea sector, where we are unable to access any complete biogeochemical dataset post-2015. As such, in order to assess change in this region, we extend the dataset to incorporate the 1999 A23 section in the early data, and the 2010 A23 section in the late data. We classify water masses across in both early and late datasets, using the original end members for the early GO-SHIP lines, and adjusted end members for the late GO-SHIP lines. For a sample of the water mass classification in GO-SHIP sections, see Figs. S7, S8, and S9. The outputs generally agree well with the water mass distribution of the climatology. For a full sample of the output of the classification, including comparison with GLODAP climatology output, see Sections S1.5 and S1.6.
We interpolate each set of observations on to a common grid as a means to compare early and late sections. To minimise spatial biases between the early and late datasets during interpolation and subsequent comparison, we split each dataset into 40 × 9° longitude bins and mask any grid cell in which the other time period is missing data. We map the data to the same grid as the GLODAP climatology from Lauvset et al.60.
Machine learning model and application to Argo data
Overview of method
The final part of our analysis involves the training of a machine learning-based water mass classification model and the application of this to the gridded monthly RG Argo product. We selected a RF for this task, which is effectively a multi-output regression on tabular hydrographic data. Tree-based ensemble methods like RF are exceptionally well-suited for this class of problem, as they can capture the complex, non-linear, and potentially disjoint decision boundaries separating water masses in feature space more effectively than models with a strong continuity bias, such as neural networks. Furthermore, the native ability of RFs to handle multiple target variables (i.e. the water mass fractions) simultaneously makes them a more direct and robust choice than other gradient boosting methods like XGBoost, which would require training a separate model for each water mass. It should be noted that the skill of the RF model depends on the quality of the solution of the OMP analysis upon which it is trained.
We use this to analyse trends in the Southern Ocean water mass distribution during the period 2004–2024. The model is trained on the combined OMP output from both the 2005–2010 and post-2015 GO-SHIP datasets, with the original and adjusted end members, respectively. This means that the RF model ensemble sees both the 2005–2010 and post-2015 versions of each water mass, such that changes in water mass properties over the Argo period are broadly constrained by the model ensemble.
We use randomised 5-fold cross-validation to train the model. The method is set out below:
1. The output of both the 2005–2010 and post-2015 GO-SHIP OMP runs are collated and combined into a single training dataset. The sine and cosine of longitude are used as features (instead of longitude) to prevent discontinuity at the 0/360° transition. 2. The training dataset is randomised by row and then divided into 5 equally sized partitions (folds). In each fold, one partition is held out as the test set while the remaining 4 serve as the training set. 3. For each of the 5 folds, a separate RF regressor is trained on its corresponding training partition (the RF models are ensembles of decision trees with a maximum depth of 16 levels; the ensemble size is set according to our hyperparameter tuning.) Each trained model is then used to generate predictions on its own held-out test partition. 4. Combining the predictions from all 5 folds gives full coverage of the dataset. For each fold, we compute the R2 value on the test data, which collectively provides a robust cross-validated measure of model performance. 5. Ultimately, we obtain 5 independently trained RF models. These models are then applied to each time step and location in the RG Argo dataset. At every grid point, the final prediction is taken as the mean of the 5 model outputs, while the variance across these predictions is used as an uncertainty metric.
Model validation
We employ an exclusion study method to provide geographically out-of-distribution validation (the randomised K-fold cross-validation approach above is an out-of-sample testing approach) of the output of the RF model. We group repeat sections by location and collate all unique cruises (expocodes) relating to these groups. For a full list of these groupings by expocode, see Table S4. Next, we loop through each repeat cruise section (region), excluding it sequentially from the RF model training and subsequently using it as a testing dataset. The resulting R2 values quantify the ability of the model to predict geographically out-of-distribution observations. We do this for both RF models trained on the full suite of biogeochemical data, and also models trained on just θ, salinity, latitude, longitude and depth. This is shown in Table 1. It should be noted that, in some regions, we remove certain water mass contributions to the overall mean R2 value. This is typically where the relative fraction of this water mass tends toward zero, and the model shows no real predictive skill. For example, the I05 line is a zonal transect in the Indian Ocean near 30° S. The R2 breakdown by water mass for this section shows that the model displays high skill values (R2 > 0.90) for all water masses except for DSW and AASW, for which they are below 0.35. Given that DSW and AASW are functionally absent at this latitude and in this basin, we choose to exclude these skill scores from the overall R2 values in Table 1. A full account of where we do this and associated justification can be found in Section S1.5.
Table 1 Validation of the RF model via the exclusion of repeat sections Full size table
The RF model is able to predict geographically out-of-distribution water mass fractions with a relatively high degree of skill. R2 scores exceed 0.84 throughout all regions when using all variables in the training. For this case, the mean R2 value across all locations is 0.94. With the exception of the Drake Passage region, scores remain above 0.85 when biogeochemical variables are removed entirely from the training data set. For this case, the mean R2 value across all locations is 0.88 (or 0.90, excluding the Drake Passage). The Drake Passage hosts the lowest model skill in the exclusion studies across both train/test cases, with an anomalously low value of 0.62 when just θ, salinity, latitude, longitude and depth are used in training. Further attention to the R2 contributions from each water mass reveals that this results from particularly low model skill in predicting the fractions of SAMW and DSW (note that we already exclude NADW and STCW from the R2 value calculation here–see section S1.5). This may be partially related to substantial eddy variability and water mass mixing in this region, which likely reduces predictability in the model. Moreover, Fig. 3 reveals that, as an acting boundary between the Pacific and Atlantic sectors, the Drake Passage experiences a relatively strong zonal gradient in water mass fractions. Mode water fractions are high in the Pacific sector to the west of the passage, but decrease rapidly to the east. This pattern is reversed for DSW+AABW. We suggest that it is likely that strong eddy-driven mixing of tracers (T, S), coupled with anomalously strong zonal gradients in water mass fractions (i.e. such that longitude loses predictive power), limits the predictive capabilities of the hyperparameters in the Drake Passage.
However, we retain high skill scores in all other regions when biogeochemical tracers are not used in model training. This has a number of implications that underpin our methodology, which are discussed in the following section. Note that an exhaustive list of expocodes and location groups used in the out-of-distribution validation can be found in Section S1.5. Further exclusion tests reveal that using only θ-salinity inputs reproduces most of the predictive skill, whereas positional predictors alone perform poorly (Table S5). Consistent with this, Shapley Additive Explanations (SHAP) analysis shows that model skill is driven primarily θ and salinity, with a more modest contribution from spatial terms (Table S6).
Application to Argo data and uncertainty estimation
The water mass classification analysis described in section ‘OMP water mass classification’ requires biogeochemical tracers to produce physically meaningful and well constrained solutions. When the OMP is run using only θ and salinity, the problem becomes greatly underdetermined: θ-salinity pairs alone cannot distinguish between source waters with similar density but different origins, and the resulting fields are diffuse and largely unable to separate NADW, CDW, and AABW, which share overlapping θ-salinity characteristics but distinct biogeochemical signatures. The inclusion of biogeochemical tracers therefore provides the essential constraints on the inversion and preserve realistic vertical and lateral gradients among these deep and bottom waters. This is shown in further detail in section S1.3 and in Fig. S19. Consequently, there is a fundamental limit on this type of analysis in the Southern Ocean: most biogeochemical measurements occur along repeat cruise sections, which are sparsely sampled in both space and time. Any analysis of the change in water mass distribution is hence limited to a handful of discrete repeat sections. In contrast, the vast majority of Argo floats do not measure these tracers, but provide some of the best temporal and spatial coverage in the mid-to-upper depths of the Southern Ocean.
However, the results of the machine learning verification demonstrate that our RF model can achieve high levels of skill even with just θ, salinity, latitude, longitude and depth. Importantly, this enables skilful water mass classifications to be undertaken without knowledge of biogeochemical tracers, and thus expands the possible coverage of classifiable data to include Argo floats which measure only θ and salinity. Our approach takes the OMP classification of the individual point measurements taken along GO-SHIP cruises (including all the tracer data) and, leveraging the skill of the RF model in the absence of tracers, translates this into a 3-dimensional Argo-based gridded water mass product at high spatial and temporal resolution. Figure S26 shows that the output of the RF model ensemble applied to Argo data exhibits good qualitative agreement with the output of the initial OMP water mass classification with the GLODAP climatology.
We also quantify uncertainty in the application of the model to the Argo data as the variance in predictions across the model ensemble (Fig. S27). The highest uncertainty values occur in CDW and NADW at intermediate depths toward the northern boundary of the Atlantic sector (i.e. in the space that NADW occupies in high fractions, see Fig. S3 and Fig. S27a, c). In this case, we suggest that this arises due to the difficulty the model has in differentiating between CDW and NADW, which are very similar in density space (Table 1). Elsewhere, variance across the fold ensemble appears in AAIW and SAMWs at mid-depths (Figs. S27f, i). Uncertainty associated with these water masses is particularly high in the Pacific sector of the Southern Ocean. Here, it appears that there is some uncertainty in the model’s differentiation between AAIW and SAMWs. We do not consider this to have an important impact on the main conclusion (of upper 2000 m poleward CDW redistribution) in this study, but it should be considered when assessing the relative change in these two water masses in Fig. 6. Most importantly, we find low uncertainty in the modelled fractions of CDW and AABW in the region near the shelf at all depths. This means that we are confident that the signal of poleward migration of CDW at the expense of AABW/DSW found in this study is unlikely to be substantially influenced by uncertainty in the model.