Satellite datasets

Three MODIS datasets (2002–2021) were used in this study, including the LST (MYD11A2, 8-day composites), EVI (MOD13A2, 16-day composites), and yearly land cover type (MCD12Q1) datasets. All these datasets were downloaded from the EOSDIS (https://earthdata.nasa.gov/). The LST dataset includes two overpasses per day (at ~01:30 am and ~13:30 pm local solar time). The spatial resolution of the LST and EVI data is 1000 m, and that for the land cover type data are 500 m. The land cover type data were resampled to 1 km to match resolution of the LST and EVI data using the nearest neighbor method.

City cluster data

The city cluster boundary dataset was obtained from global urban boundary (GUB) data in 2018 (http://data.ess.tsinghua.edu.cn/). City clusters with a size larger than 50 km2 at the beginning of the study period were considered in this study. In total, 2080 city clusters were used. They were divided into four groups, labeled as small, medium, large, and megacities, according to ascending order in urban size between the 0 to 25th, 25th to 50th, 50th to 75th, and 75th to 100th percentile, respectively. The corresponding mean areas are 63, 95, 194 and 475 km2. Here we choose urban built area as a proxy of city size mainly by referring to previous studies46,47, which have examined the relations between urban warming and city size as measured by urban built area. There are 844, 132, 604, 346, 130, 24 cities in Asia, Africa, Europe, North America, South America, and Oceania, respectively. This city cluster dataset was generated from the global artificial impervious area product, and it corresponds well with the global nighttime light data and population data48. Due to the high quality, the GUB dataset have been used extensively in various similar studies49,50. This dataset was resampled to the resolution of 1 km to match those of the satellite data using the nearest neighbor method. The rural buffer of each city cluster was set to match the size of the city cluster, noting that the urban expansion information for each city cluster was obtained by the MODIS yearly land cover type dataset (i.e., MCD12Q1) rather than by this city cluster boundary dataset. For each city cluster, the pixels classified as ‘urban and built-up areas’ by MODIS land cover product were defined as urban areas. Then the urban expansion was determined by examining the identified urban area from 2002 to 2021. Note that we used the data in 2020 to represent the land cover type of 2021, because the land cover type products are only available until 2020.

Population data

The population dataset (2002 to 2020) was obtained from the Oak Ridge National Laboratory (https://landscan.ornl.gov/). This dataset was generated at a 1 km grid resolution from land cover type, nighttime light, and high-resolution panchromatic imagery51. Note that we used the 2020 data to represent the population of 2021, because the population data are only available until 2020.

Reanalysis data

The monthly reanalysis SAT for the period of 2002 to 2021 was provided by the Goddard Earth Sciences Data and Information Services Center (GES DISC) (https://disc.gsfc.nasa.gov/), produced by the common Global Land Data Assimilation System (GLDAS). The original data has a grid resolution of 9 km. It was also resampled to 1 km to match the grid size of the satellite data using the nearest neighbor method. The SAT trend for each city cluster was calculated as the mean of all the grids in the city cluster using linear regression. Before the trend calculation, all the monthly SATs within an annual cycle were averaged into yearly mean value to eliminate the seasonal effect.

Detection of trends in urban core, rural background, and transitional land

To avoid the confounding effect of transitional pixels on time trend detection, we divided the pixels in each cluster and in its buffer into three groups: urban core, rural background, and transitional. Delineation is made with the help of a breakpoint detection algorithm (Breaks For Additive Season and Trend, BFAST) and the annual data on LST and land cover type. The urban core pixels are those tagged as urban by the land cover product at the beginning and the end of the observation period but exclude those that exhibit a breakpoint behavior. The rural background consists of pixels tagged as land cover types other than urban or water and excludes pixels with breakpoint behaviors. Water bodies are removed because they are not suitable as rural background for surface UHI calculations25,50. The transitional pixels are those that exhibit breakpoint behaviors during the observational period.

We used the BFAST algorithm to determine abrupt changes, or breakpoints, and time trends in both satellite LST and EVI time series52. This algorithm decomposes the time series into the trend, the seasonal, and the remainder components (refer to Supplementary Note 4):

$$S(t)= \; {S}_{{{{{{\rm{tr}}}}}}}(t)+{S}_{{{{{{\rm{sn}}}}}}}(t)+{S}_{{{{{{\rm{re}}}}}}}(t)\\ = \; ({\alpha }_{i}(t)+{\beta }_{i})+(A\cdot \,\sin (2\pi f(t-{t}_{0})+\theta ))+{T}_{{{{{{\rm{re}}}}}}}(t)$$ (1)

where S(t) is the observed LST/EVI at time t (t = 1, …, n, where n is the number of LST/EVI observations throughout the period from 2002 to 2021); S tr (t), S sn (t), and S re (t) are the trend, seasonal, and remainder components, respectively; α i (t) and β i represent the segment-specific trends and intercepts on each sub-period within the time series LST/EVI data; A, t 0 , and θ denote the seasonal amplitude, day of spring equinox and phase shift respectively; and f is the frequency, set as a constant 1/46 for LSTs and 1/23 for EVI, due to the temporal resolution of LST (8-day) and EVI (16-day) data (i.e., 46 LST observations and 23 EVI observations per year). More information on major acronyms and abbreviations can be found in Supplementary Table 5.

The BFAST algorithm was implemented with the ‘bfast’ package in R (http://bfast.r-forge.r-project.org/)52; and the ordinary least squares residuals-based moving sum test was applied to test whether abrupt changes appear within time series data. With the BFAST algorithm, we obtained the abrupt changes (i.e., breakpoints) and trends for each pixel of the LST and EVI data for each city cluster (refer to Supplementary Note 4). For each city cluster, the trends at the urban core and rural background were calculated based on all the available pixels within the urban core and rural background, respectively. Due to the segment-specific trends for transitional pixels, the trends in the transitional land were calculated as the mean of all the grids tagged as transitional pixels using linear regression. The 8-day interval LST were averaged into annual means to suppress the seasonal effect.

Here we chose the BFAST algorithm rather than the simple land use product to determine urban core, rural background, and transitional lands. This is mostly because the latter usually omit the land transitions, such as intra-urban renewal and redevelopment in a city center, that can distort the estimation of warming trend but are not tagged by the simple land use product53. The BFAST algorithm has been used widely by the remote sensing community to determine transitional pixels with a high accuracy52. In the present study, the percent of pixels that have undergone transitional change detected by the BFAST algorithm (20.3%) is greater than the pixels tagged as land cover change with the simple land use product (14.7%). This higher percent confirms that the BFAST algorithm is better at determining land transitions, especially for those that are not revealed in the land use product.

The surface UHI intensity was calculated as the difference in LST between the urban core and the rural background pixels. The trend in the surface UHI intensity was calculated by subtracting the trend at the urban core from that in the rural background.

Attribution of urban surface warming

The individual contributions from URB, BCC, and LSG to urban surface warming trends were quantified with the least square-based statistical attribution approach. This approach and its derivative version have been successfully used to isolate the contributions of urbanization and greenhouse gas emissions to the observed warming in China13. The statistical attribution approach expresses the observed urban core temperature as a linear combination of the component contributions, as:

$${T}_{{{{{{\rm{OBS}}}}}}}(t)={\beta }_{{{{{{\rm{BCC}}}}}}}({T}_{{{{{{\rm{BCC}}}}}}}-{

u }_{{{{{{\rm{BCC}}}}}}})+{\beta }_{{{{{{\rm{URB}}}}}}}({T}_{{{{{{\rm{URB}}}}}}}-{

u }_{{{{{{\rm{URB}}}}}}})+{\beta }_{{{{{{\rm{LSG}}}}}}}({T}_{{{{{{\rm{LSG}}}}}}}-{

u }_{{{{{{\rm{LSG}}}}}}})+\varepsilon$$ (2)

where T OBS is the yearly anomaly (the observed change of annual mean LST as referenced to that in the previous year) during the study period; T BCC , T URB , and T LSG are changes in temperature signals attributed to contributions from BCC, URB, and LSG, respectively; β BCC , β URB , and β LSG are the associated scaling factors; v BCC , v URB , and v LSG are, respectively, the noises from internal variability in BCC, URB and LSG components, which were included to reduce the possible uncertainties induced by the presence of noises in variables; and ε is a residual error term. Note that the annual mean T OBS , T BCC , T URB , and T LSG were calculated to suppress natural variability such as the seasonal effect13. The scaling factors and noise terms were estimated using the total least squares method in order to account for contributions from BCC, URB, and LSG to T OBS more appropriately.

The BCC component (T BCC ) was given by SAT from reanalysis data (Supplementary Fig. 12) at rural areas for each city54, for which the difference between LST and SAT trends could be reconciled by the scaling factor β BCC . We further need to note that the reanalysis SAT data was used mainly by referring to previous studies18, which pointed out that assimilation models do not account for land use changes and therefore should not include urbanization signals2. Nevertheless, there may exist possible uncertainties induced by urbanization signals involved in reanalysis SAT data. These urbanization signals may arise from the data assimilation of different datasets when generating reanalysis data. Therefore, the reanalysis SAT data over urban areas were entirely excluded to reduce the possible uncertainties related to the possible urbanization signals involved in reanalysis data.

The URB component (T URB ) was set as the LST variations induced by urban population change through the statistical relationship between LST and population obtained at the pixel level. We quantified T URB based on the widely used ‘space-for-time substitution’ approach55. For each city, we established the statistical relationship between yearly population density and LST anomalies (the observed change of population density or LST as referenced to those in the previous year) from 2002 to 2021 at the pixel level. Here the logarithmic function between LST and population density was used mainly due to the nonlinearity between these two parameters44. Using the mean yearly anomalies in population density at the city level as the prediction variable, the T URB was estimated based on the established logarithmic function between LST and population density.