A two-stage method was used to examine the national pooled association between ambient temperature and hospitalizations for CKD[
3,
30,
31]. In the first stage, a standard time series regression model[
3,
32] was applied to estimate city-level relationships. To account for the potential delayed effects of ambient temperature, a cross-basis function was generated through the distributed lag non-linear model[
33,
34]. The final model structure has the following formula:
where
E(
Y) is the estimated daily hospitalizations on a certain day.
cb(
temp) is the cross-basis function of ambient temperature, which describes the temperature-lag-response relationship. For the temperature-response dimension, Generalized Cross Validation (GCV) was used to test the performance of the non-linear model [natural cubic spline function with 3 degrees of freedom (df) and equally spaced knots] and the linear model[
35]. The linear model had a lower GCV scores (sum of 261 cities, 354.59) than the non-linear model (359.19, Additional file 1: Fig. S1), and therefore the final cross-basis function of temperature included a linear function in the temperature-response dimension and a natural cubic spline function for the lag-response dimension, which had 3 dfs with equally spaced knots[
9]. We set maximum lag in the cross-basis function as 7 d to characterize the potential delayed effects[
9].
cb(
humidity) represents the cross-basis function of daily mean relative humidity for lag 0–7 d with the 3 dfs natural cubic splines in humidity and lag dimension respectively. As illustrated in previous studies[
36-
39], air pollution may affect the prevalence of kidney diseases. Therefore, we adjusted air pollution factors in our model with the same lags as the high temperature, which were calculated by the moving averages. Two air pollutants including particles with an aerodynamic diameter of 10 μm or less and ozone were considered.
ns(
calendar day) presents the natural cubic spline for non-linear variables. Consistent with previous study, we introduced the calendar day (with 3 dfs per year) and the day of year (DOY, with 3 dfs) variables to control the long time and seasonal trend[
33]. A dummy variable for the day of the week and a dichotomous variable for a public holiday were also introduced[
35,
40]. GCV values were used to choose the appropriate dfs for the spline functions.