A two-stage inflation method in parameter estimation to compensate for constant parameter evolution in Community Earth System Model

A two-stage inflation method in parameter estimation to compensate for constant parameter evolution in Community Earth System Model

PDF

Zheqi Shen¹^,²^,³, Youmin Tang¹^,²^,³^,⁴^,^*

Acta Oceanologica Sinica | 2022, 41(2) : 91 - 102

Less

Acta Oceanologica Sinica | 2022, 41(2): 91-102

• Ocean Data Assimilation •

A two-stage inflation method in parameter estimation to compensate for constant parameter evolution in Community Earth System Model

Full

Zheqi Shen¹^,²^,³, Youmin Tang¹^,²^,³^,⁴^,^*

Affiliations

¹ Key Laboratory of Marine Hazards Forecasting of Ministry of Natural Resources, Hohai University, Nanjing 210098, China

² College of Oceanography, Hohai University, Nanjing 210098, China

³ Southern Marine Science and Engineering Laboratory (Zhuhai), Zhuhai 519080, China

⁴ Environmental Science and Engineering, University of Northern British Columbia, Prince George V2N 4Z9, Canada

Published: 2022-02-25 doi: 10.1007/s13131-021-1856-5

Outline

Abstract

Less

Parameter estimation is defined as the process to adjust or optimize the model parameter using observations. A long-term problem in ensemble-based parameter estimation methods is that the parameters are assumed to be constant during model integration. This assumption will cause underestimation of parameter ensemble spread, such that the parameter ensemble tends to collapse before an optimal solution is found. In this work, a two-stage inflation method is developed for parameter estimation, which can address the collapse of parameter ensemble due to the constant evolution of parameters. In the first stage, adaptive inflation is applied to the augmented states, in which the global scalar parameter is transformed to fields with spatial dependence. In the second stage, extra multiplicative inflation is used to inflate the scalar parameter ensemble to compensate for constant parameter evolution, where the inflation factor is determined according to the spread growth ratio of model states. The observation system simulation experiment with Community Earth System Model (CESM) shows that the second stage of the inflation scheme plays a crucial role in successful parameter estimation. With proper multiplicative inflation factors, the parameter estimation can effectively reduce the parameter biases, providing more accurate analyses.

Key words

parameter estimation / data assimilation / inflation / CESM / EnKF

Cite this Article

Zheqi Shen, Youmin Tang. A two-stage inflation method in parameter estimation to compensate for constant parameter evolution in Community Earth System Model[J]. Acta Oceanologica Sinica, 2022 , 41 (2) : 91 -102 . DOI: 10.1007/s13131-021-1856-5

Full Text

Less

1 Introduction

Less

In oceanic and atmospheric models, physical processes and dynamical motions with spatial scales smaller than the model resolution cannot be fully resolved by the model equations. Contributions from them are usually represented through subgrid-scale schemes or parametrizations in numerical models. The parametrizations formulate the unsolved small-scale effects as a function of the resolved model variables, which can be derived from either the physical laws or the empirical relations. It inevitably involves a certain number of parameters, whose values are intrinsically uncertain. The uncertainty in some parameters may have great impacts on the model performance (Tong and Xue, 2008; Furue et al., 2015).

Parameter estimation (PE) is defined as the process to adjust or optimize the model parameter using observations (Zhang et al., 2020). PE has shown great potential to reduce model biases and improve the model simulation and prediction (Wu et al., 2016; Zhang et al., 2016). Most parameter estimation methods are developed based on the data assimilation methods, which are initially designed for generating estimates of model states (Evensen et al., 1998). Particularly, the ensemble-based methods, such as ensemble Kalman filters (Evensen, 2003) and particle filters (Liu and West, 2001), have been successfully applied for PE (Ruiz et al., 2013). In these ensemble-based methods, the model state can be augmented with a set of poorly known parameters ( Banks, 1992; Anderson, 2001). Updating the augmented state with observations can therefore estimate the model state and parameter simultaneously ( Kivman, 2003; Annan and Hargreaves, 2004; Annan et al., 2005). In practice, PE is more difficult than state estimation (SE) because the connection between parameters and observations is indirect, often nonlinear. It is even more challenging if one estimates the parameters of a general circulation model (GCM).

In GCMs, some parameters arise from the simplification of the underlying physical processes, which may take a globally uniform value. It is found that assimilating numerous data to estimate one single scalar parameter would over-adjust its value, bring unexpected errors. Also, it is impractical to use observational information that has strong geographic dependence to update a global parameter. Aksoy et al. (2006) propose a method that first transforms a single-value parameter into a two-dimensional field and then updates the field spatially in the analysis step. A spatial average (SA) of the entire spatially varying parameter field is then used to recover the globally uniform parameter value for model integration. Besides, the geographic-dependent parameter optimization (GPO) is also developed to allow model parameters to vary geographically after each analysis in particular models (Wu et al., 2013). Liu et al. (2014a) proposed the adaptive spatial average (ASA) method for the recovery of the spatially uniform parameter value. It shows that ASA could be more effective than either SA or GPO for parameter estimation in a coupled GCM (Liu et al., 2014b).

One other issue of PE in GCMs is that parameters are usually assumed to be constant during the model integration so that one parameter only changes its value in the data assimilation step. As a result, the parameter ensemble spread will decrease more rapidly than the state ensemble spread, and the parameter ensemble tends to collapse before an optimal parameter value is found (Santitissadeekorn and Jones, 2015). Gordon et al. (1993) suggest an approach by adding small random disturbances to state members between time steps to deal with the sample degeneracy/attrition for state variables. Later on, this idea has been extrapolated to PE by adding small perturbations to parameters. Liu and West (2001) point out that the artificial parameter evolution by adding artificial perturbations would make the posteriors too diffusive relative to the theoretical posteriors for the fixed parameters. So, they proposed a new artificial model for the parameter by kernel smoothing. These methods are designed for particle filters, and they have been shown effective with moderate dimensional models. However, due to the severe filter degeneracy problem in particle filters, it seems impractical to use these methods to perform joint state/parameter estimation with GCMs.

Adding random perturbations between model steps in some sense can be regarded as additive inflation. For the PE with ensemble Kalman filter, Aksoy et al. (2006) propose an approach called the conditional covariance inflation (CCI) to apply multiplicative inflation to the parameters. Liu et al. (2014a, b) also use the CCI for parameter inflation for PE in coupled GCMs (CGCMs). The CCI is designed to inflate the parameter ensemble spread back to a predefined threshold value when it is smaller than the threshold. However, their experiments do not use covariance inflation for state variables, which might lead to the degeneracy of the state ensemble. Besides, the CCI will not take effect at the first few parameter estimation steps if the parameter spread is larger than the threshold. This will bring incorrect parameter increments in the early stages, and cause the parameter estimation less effective. Besides, Koyama and Watanabe (2010) apply multiplicative inflation to both state and parameter ensembles, but with different inflation factors. Their method is applied to the PE with an ideal Lorenz’ 96 model, and the inflation factors are tuned empirically. However, more details need to be developed to make it available with realistic applications.

In this work, a two-stage inflation method is developed for PE, which can deal with the collapse of parameter ensemble due to the constant evolution of parameters in a CGCM, i.e., the National Center for Atmospheric Research Community Earth System Model (NCAR-CESM). Specifically, the adaptive inflation (Anderson, 2009) is applied to the augmented state, in which the parameters are transformed to fields and attached to the state vectors; and then multiplicative inflation is additionally used to inflate the spatially averaged parameters before they are used for model integration. The inflation factor in the second stage is determined according to the spread growth ratio (SGR) of model states during the model integration, with which the inflation can remedy the underestimation of parameter ensemble spread due to constant parameter evolution.

This paper is organized as follows. The methods are described in Section 2. The model and data are introduced in Section 3. Section 4 presents the results of numerical experiments. Summary and discussion are finally drawn in Section 5.

2 Method

Less

In this work, the ensemble adjustment Kalman filter (EAKF; Anderson, 2001) is used to perform state and parameter estimations. The adaptive spatial average (ASA) method by Liu et al. (2014a) is also used to transform a scalar parameter to a two-dimensional field in the analysis step and turn it back before model integration. The details of EAKF and ASA are introduced as follows.

$ x $

denotes the state vector,

$ {y}^{o} $

denotes a single scalar observation with variance

$ ({{\sigma }^{o})}^{2} $

and

$ h $

is the forward operator; the EAKF algorithm first applies the operator

$ h $

to each ensemble member of the state, producing the ensemble of prior estimates for the observation, namely,

(1)

$ {{y}}\,_{{n}}^{{p}}=h\left({{x}}\,_{{n}}^{{p}}\right),{n}=1,2,\cdots ,{N} , $

here, superscript

$ {p} $

indicates prior and subscript

$ {n} $

is the index of the ensemble member. And the prior mean

$ \overline{{y}\,^{p}} $

and prior variance

$ {{(\sigma }_{y}^{p})}^{2} $

can be computed directly from the ensemble members of

$ {y}\,_{n}^{p} $

EAKF update the prior variance

$ {{(\sigma }_{y}^{p})}^{2} $

by the observation variance

$ ({{\sigma }^{o})}^{2} $

to give the posterior variance and mean value (indicated by superscript

$ u $

) as

(2)

$ {\left({\sigma }_{y}^{u}\right)}^{2}=\left[{\left({\sigma }_{y}^{p}\right)}^{-2}+\left({{\sigma }^{o}}\right)^{-2}\right]^{-1} , $

(3)

$ {\overline{{y}^{u}}}={{(\sigma }_{y}^{u})}^{2}\left[\frac{\overline{{y}^{p}}}{{{(\sigma }_{y}^{p})}^{2}}+\frac{{y}^{o}}{({{\sigma }^{o})}^{2}} \right].$

The posterior value in observational space can be computed for each member by shifting the mean and linearly contracting the members to make the sample variance exactly

$ {{(\sigma }_{y}^{u})}^{2} $

, i.e.,

(4)

$ {y}_{n}^{u}=\left(\frac{{\sigma }_{y}^{u}}{{\sigma }_{y}^{p}}\right)({y}_{n}^{p}-{\overline{{y}^{p}}})+{\overline{{y}^{u}}},\;n=1,2,\cdots ,N . $

The increment of each observation can update different model variables at different locations by regression using the following equation,

(5)

$ {x}_{m,n}^{u}={x}_{m,n}^{p}+\rho \frac{{\Sigma }_{{x}_{m},y}}{{\left({\sigma }_{y}^{p}\right)}^{2}}({y}_{n}^{u}-{y}_{n}^{p})={x}_{m,n}^{p}+\rho {r}_{{x}_{m},y}\frac{{\sigma }_{{x}_{m}}}{{\sigma }_{y}^{p}}({y}_{n}^{u}-{y}_{n}^{p}) , $

where

$ {x}_{m,n}^{p} $

denotes the prior value of a variable at a particular location indicated by the subscript

$ m $

for the

$ n $

th member.

$ {\Sigma }_{{x}_{m},y}={r}_{{x}_{m},y}{\sigma }_{{x}_{m}}^{p}{\sigma }_{y}^{p} $

is the covariance of

${{x}}_{{m}}$

and

$ {y}^{p} $

, and

$ {r}_{{x}_{m},y} $

is the correlation coefficient.

$ \rho $

is the localization factor (Gaspari and Cohn, 1999) which restricts the computation adjacent to the observation while removing the spurious long-distant relations due to insufficient ensemble size.

To estimate a global scalar parameter member

$ {\eta }_{n} $

, one should add a spatial dependence to the parameter. The prior value of the parameter field at each location is assumed to be uniform, i.e.,

$ {\theta }_{m',n}={\eta }_{n} $

, where

$ m' $

is the index of a parameter in the augmented state. And the observations are assimilated with the same covariance localization strategy as

(6)

$ {\theta }_{m',n}^{u}={\theta }_{m',n}^{p}+\rho {r}_{{\theta }_{m'},y}\frac{{\sigma }_{{\theta }_{m'}}^{p}}{{\sigma }_{y}^{p}}({y}_{n}^{u}-{y}_{n}^{p}) . $

Therefore, the posterior values of the parameter are spatial varying. For simplicity, it is assumed that the parameter is extended to a two-dimensional variable in the sea surface in the analysis step.

To show the concept of ASA, a simple scenario in which

$\rho = 1$

is considered. Using Eqs (4) and (6) , it can be derived that the updated variance for the parameter at the location indicated by m is

(7)

$ ({\sigma }_{{\theta }_{m'}}^{u}{)}^{2}=({\sigma }_{{\theta }_{m'}}^{p}{)}^{2}+{r}_{{\theta }_{m'},y}^{2}{\left({\sigma }_{{\theta }_{m'}}^{p}\right)}^{2}\left[{\left(\frac{{\sigma }_{y}^{u}}{{\sigma }_{y}^{p}}\right)}^{2}-1\right] . $

If substitute Eq. (2) into Eq. (7), it can be written as

(8)

$ {r}_{{\theta }_{m'},y}^{2}\frac{{\left({\sigma }_{y}^{p}\right)}^{2}}{{\left({\sigma }_{y}^{p}\right)}^{2}+({{\sigma }^{o})}^{2}}=1-\left(\frac{{\sigma }_{{\theta }_{m'}}^{u}}{{\sigma }_{{\theta }_{m'}}^{p}}\right)^{2}. $

Denote

$ h=\dfrac{{\sigma }_{{\theta }_{m'}}^{u}}{{\sigma }_{{\theta }_{m'}}^{p}} $

, Eq. (8) strongly indicates that

$ h $

can be an index to select the good values from a posterior field. Since

$ {\sigma }_{{\theta }_{m'}}^{u} $

always smaller than

$ {\sigma }_{{\theta }_{m'}}^{p} $

, a smaller

$ h $

makes both sides of Eq. (8) closer to 1, representing that either the correlation of observation and parameter is stronger or the ratio of prior variance to the total variance is larger at location

$ m' $

. It can be sure that those locations correspond to large

$ h $

are good values that have the most significant observation impacts, thus averaging them can accelerate PE. Liu et al. (2014a) use a threshold of

$ h $

for spatial average. The spatial varying posterior parameter field is averaged over the regions in which

$ h $

are larger than the threshold. They also make the threshold adaptive to ensure a considerable amount of good values are selected and averaged.

Covariance inflation is an ad-hoc method to prevent the ensemble spread from narrowing due to model errors. In this work, the time-varying adaptive inflation method developed by Anderson (2009) is applied to the augmented state ensemble. Inflation is applied to the prior ensemble just before forward operators are computed in Eq. (1) as

(9)

$ {x}_{m,n}^{inf}=\sqrt{{\text{λ}}_{m}}({x}_{m,n}-\overline{{x}_{m}})+\overline{{x}_{m}} , $

(10)

$ {\theta }_{m',n}^{inf}=\sqrt{{\text{λ} }_{m'}}({\theta }_{m',n}-\overline{{\theta }_{m'}})+\overline{{\theta }_{m'}} .$

The inflation factors

$\text{λ}$

vary in different locations and with different variables/parameters, they also evolve adaptively with time. For different locations or variables, the correlation between any two elements of the inflation is assumed to be the same as the prior sample correlation between the corresponding elements of the model state vector. The inflation is also updated according to the observation by applying Bayes theorem. The details of the method can be found in Anderson (2009). However, since the two-dimensional parameter fields

${\theta }_{m',n}$

are averaged to a scalar

$ {\eta }_{n} $

before model integration, the inflation factors

${\text{λ} }_{m'}$

for parameters are not updated from the previous step, they essentially do not evolve with time.

As mentioned, the state ensemble spread will grow during the model integration, but the parameter ensemble spread remains unchanged due to the constant parameter evolution model, i.e.,

$\mathrm{\eta }\left({{t}}+1\right)=\mathrm{\eta }\left({{t}}\right)$

. It is not a surprise that

$ {\sigma }_{{\theta }_{m'}} $

will be relatively under-inflated comparing to

$ {\sigma }_{{x}_{m}} $

if only Eqs (9) and (10) are used for covariance inflation. In that situation, the uncertainty of the parameter is under-estimated, such that the observational increments represented by

$ ({y}_{n}^{u}-{y}_{n}^{p}) $

can not effectively correct the parameters. To deal with that, extra inflation is used for the scalar global parameters

$ \eta $

as follows,

(11)

$ {\eta }_{n}=\overline{\eta }+\alpha ({\eta }_{n}-\overline{\eta }),n=1,2,\cdots ,N , $

where

$ \overline{\eta } $

is the ensemble mean of the parameter. The inflation factor

$ \mathrm{\alpha } $

will be computed and compared in the next section.

With Eqs (10) and (11), a two-stage inflation method is applied to the parameter ensemble to increase the ensemble spread, whereas only one stage inflation is used for the state ensemble. Eq. (11) is applied to the global scalar parameters, which are not involved in the PE analysis. This extra inflation is employed to compensate for constant parameter evolution, which is crucially important for GCMs.

3 Model and data

Less

3.1 CESM

The NCAR-CESM (version 1.2.1) is used in this study. This model is a fully coupled earth system model, whose atmospheric and oceanic components are the Community Atmosphere Model (CAM) and the Parallel Ocean Program version 2 (POP2; Smith et al., 2010) respectively. The resolution of the atmosphere component is horizontally

${0.9}^{\circ } \times {1.25}^{\circ }$

, with 26 vertical levels for the CAM model. The ocean component is integrated on a nominal resolution of

$ {1}^{\circ } $

. Enhanced meridional resolution of

$ {0.5}^{\circ } $

is employed in the equatorial region, and the ocean model is run with 60 vertical levels. The Data Assimilation Research Testbed (DART, Karspeck et al., 2018) is employed to assimilate ocean observations into the CESM model, in which the adaptive inflation method (Anderson, 2009) is available for model states and augmented parameters.

It is well acknowledged that one of the largest uncertainties in ocean general circulation models (OGCMs) is related to the parameterization of vertical mixing processes; poorly specified schemes can lead to a large bias in SST simulations (Zhu and Zhang, 2018). The K-Profile Parameterization (KPP), (Large et al., 1994) is used for vertical mixing parameterizations in the POP2 model. The background diffusivity, representing the integrated effects of diapycnal mixing processes in the ocean interior, is assigned with a constant value magnitude of

${O}\left({10}^{-5}{\mathrm{m}}^{2}/{\mathrm{s}}\right)$

in this model. Also, the diapycnal diffusivity is reduced near the equator, with a magnitude of

${O}\left({10}^{-6}{\mathrm{m}}^{2}/{\mathrm{s}}\right)$

(Gregg, 2003). And the Banda Sea diffusivity is modified with a magnitude of

${O}\left({10}^{-4}{\mathrm{m}}^{2}/{\mathrm{s}}\right)$

(Ffield and Gordon, 1992).

In this work, the ensemble data assimilation methods are used to estimate the background vertical diffusivity coefficient (call bckgrnd_vdc hereafter) in CESM (Smith et al., 2010), the temperature and salinity profiles are assimilated to reduce the errors and uncertainties in both model states and parameters.

3.2 Observing system simulation experiment

To verify the performance of state and parameter estimation methods, the observing system simulation experiment (OSSE) is conducted in this study. In OSSE, the reference states (which are regard as the true states) are derived from the “truth run”, in which the reference initial conditions and parameters are used. Observations are selected from the true states with a prescribed observation network. The parameter estimation experiment is conducted with biased bckgrnd_vdc, and the synthetic ocean temperature (T) and salinity (S) observations are assimilated to estimate the parameter. The procedure of the OSSE can also be seen from the flow-diagram in Fig. 1, where the reference bckgrnd_vdc = 0.16

$\mathrm{c}{\mathrm{m}}^{2}/{\mathrm{s}}$

, and the initial guess of it is 0.10

$\mathrm{c}{\mathrm{m}}^{2}/{\mathrm{s}}$

for both control run and state estimation experiment.

To better simulate the reality, the TS-profile locations and error variances are adopted from the EN4 quality controlled ocean data (https://www.metoffice.gov.uk/hadobs/en4/download-en4-2-1.html) over the years 2010 and 2011. Since the assimilation time window is 10 days, all temperature and salinity data over the period are assimilated at the same time. To prevent the over-fitting due to excessive observations, those profile data are typically merged into

$ {1}^{\circ }\times {1}^{\circ } $

grids with at most 42 vertical levels (Good et al., 2013) in assimilation, in which the mean value and variance of the data in each grid block are regarded as its observational value and error variance respectively. In this experiment, the synthetic observations are created using the locations and error variances of the EN4 profiles. As an example, Fig. 2 shows the locations of the period January 1−10, 2010, which vary at each assimilation step.

To perform the SE and PE, an ensemble size of 20 is used in all of the experiments. The initial conditions for the experiment is generated from a 150-years simulation using the built-in initial condition of the coupled model, which could ensure that the initial conditions were adequately spun-up. The pseudo-random fields (Evensen, 2003) with 20 layers and with a magnitude of

$ {{10}^{-2}}$

°C are added to the temperature of each ensemble member. And then the model is integrated for a spinup period of 1 year to ensure the ensemble spread develop sufficiently.

One of the 20 members is selected as the initial condition for the truth run. If these ensemble members are used for data assimilation experiments, the uncertainty of initial states would be relatively small, such that the SE can soon reach the quasi-equilibrium, a situation that the uncertainty of model states is sufficiently constrained by observations. As Zhang et al. (2012) suggests, only when the SE reaches the quasi-equilibrium, the covariances between parameters and model states can be signal dominant, and thereby the parameter estimation could be activated to effectively correct the parameters. So this data assimilation enhancive parameter correction (DAEPC) strategy is adopted in the experiment to start with a pure SE and activate PE when the assimilated states reach quasi-equilibrium.

4 Numerical experiments

Less

4.1 Parameter sensitivity

To show the parameter bckgrnd_vdc do have an impact on the model simulation. The parameter sensitivity experiment is conducted at the first stage. The models are run with the same initial conditions for 1 year, but the parameters are slightly perturbed from the initial guess 0.10 with 20 random values with a standard deviation of 0.08. As an example, the SST and SSS spreads of the 1st, 4th, 7th, and 10th month are shown in Fig. 3. The perturbation of parameters leads to a significant growth of the spread for both temperature and salinity. The most significant growth of SST spread appears in the northern Pacific, whereas the most significant increase of SSS spread appears in the western tropical Pacific. Deeper layers’ temperature and salinity also have similar patterns of spreads (not shown), which implies that assimilating the TS profiles can potentially correct the parameter bias.

4.2 SE stage for DAEPC

The data assimilation is performed every 10 days, and the duration is two years. The temperature and salinity observations are created from the EN4 profile data of the years 2010 and 2011. The horizontal localization strategy is applied with a cut-off radius of about 1 200 km distance. Meanwhile, the vertical localization is also applied with a cut-off distance of approximately 300 m. The DAEPC method (Zhang et al., 2012) is adopted to perform pure state estimation for 6 months and activate the parameter estimation at the beginning of the 7th month. For both state and parameter estimation, the spatial-varying adaptive inflation proposed by Anderson (2009) is used, such that the inflation value for each model variable at each model grid is different. The root-mean-squared-errors (RMSE) of prior and posterior innovations as

$ |{y}^{o}-H(x\left)\right| $

are shown in Fig. 4 for different levels. It is shown that the errors decrease rapidly at the first few data assimilation steps for both temperature and salinity at each level. After about one or two months, the error reduction from data assimilation tends to a saturated value such that each analysis step reduces a similar amount of errors. This is a clear indication that the state estimation reaches the quasi-equilibrium state and parameter estimation could be activated. It is also noteworthy that the prior and posterior RMSEs are very close to each other at a depth of 1 000 m. This is because the model errors are small and the uncertainties of observations are large at depth.

Comparing to the control run without data assimilation, the state estimation significantly reduces the errors of each model state. Figure 5 has shown the RMSE of SST, SSS, and SSH from the control run and state estimation of the first 6 months. It can be seen that the errors of not only the temperature and salinity but also SSH are reduced by the multivariate data assimilation with EAKF.

4.3 PE with one-stage inflation

For parameter estimation, a spatial uniform parameter field is augmented to the state vector, and updated by incorporating the same observations. The posterior parameter field is spatial-varying, such that the averaged parameter of a specified region is used to integrate the model for the next cycle. The so-called adaptive spatial average method is employed to select the location with good values in which the ratio of posterior to prior standard deviation

$ h $

exceeds a threshold value. In this experiment, the threshold value is adaptive to ensure that at least 10 000 model grids out of about 80 000 ocean grids are used to compute the average. In this experiment, the adaptive spatial averaging is performed as follows:

i) Compute

$h=\sigma _{\theta _{{\rm m}'}}^{{u}}/\sigma _{\theta'_{\rm{m}}}^{{p}}$

for each

$\mathrm{m}'$

ii) Set the initial threshold value

$ {h}_{0}=0.68 $

iii) Count the number of grid points which satisfies

$ h > {h}_{0} $

iv) If the number of grid points are greater than 10 000 or

$ {h}_{0}=0.98 $

, exit the procedure; otherwise, add

$ {h}_{0} $

by 0.05.

v) Repeat steps iii) and iv) until the procedure ends.

Figure 6 shows an example of the spatial averaging scheme for the first parameter estimation step. Figure 6a displays the

${h}$

values for each model grid, computed from the ensemble members. Since h measures the proportions of posterior error standard deviations to prior error standard deviations of the parameter, its value is smaller in areas where observational errors are more sensitive to the parameters. The solid lines indicates some threshold values (e.g.,

$ {h}_{0}=0.73 $

, 0.83). It can be seen that if

$ {h}_{0}=0.83 $

, a large area with more than 10 000 model grids are included, such that the adaptive procedure could be stopped. Figures 6b–d show the two-dimensional posterior parameter fields

$ \theta $

for the first 3 ensemble members, respectively. The parameters in those areas enclosed by solid lines are averaged to compute

$ \mathrm{\eta } $

, whose ensemble is then used in the model integration. Although the members have different posterior patterns, they are averaged at the same region in the same parameter estimation step.

We first use one-stage inflation to perform PE, i.e., only Eqs (9) and (10) are used to inflate the state and parameter ensemble, respectively. The inflation factors are spatial-varying. The first three rows of Fig. 7 show the inflation factors for temperature and salinity at different depths after the first analysis step with parameter estimation. The inflation factors at most locations are slightly larger than 1. However, in particular regions, such as the tropical Pacific and the Indian Ocean, the inflation factors have values larger than 2. If we compare that with Fig. 6a, it can be found that those regions with large inflation factors are mostly with large observational impacts. Moreover, the inflation factors are smaller for deeper temperature and salinity variables, because the assimilation is insignificant in-depth, as Fig. 4 reveals.

The inflation factors for the two-dimensional parameter field are also computed during the assimilation. Figure 7 also shows the inflation factors for the parameter field, it is shown that the distribution of them is similar to that of SST or SSS. However, although the inflation for the parameter field is updated by the adaptive inflation algorithm in the analysis step, it is not used in the model integration due to the ASA method.

Figure 8a shows the PE results using one-stage inflation. The black dashed line is the initial guess of the bckgrnd_vdc, which is also the mean value of the initial parameter ensemble. The red solid line represents the mean value of the parameter ensemble, and the red dotted lines represent the mean value plus/minus one standard deviation of the parameter ensemble. The PE experiment uses 20 perturbed parameters (with standard deviation 0.08) to integrate the model, conduct a spinup process for pure state estimation for 6 months, therefore the parameter ensemble remains unchanged for the first 6 months. As soon as the parameter estimation is activated in the 7th month, the variance of the parameter ensemble decreases rapidly. The CCI technique (Aksoy et al., 2006) is applied with a standard deviation limit set at 1/5 of the initial spread 0.08. It can be seen from Fig. 8b that the parameter ensemble spread shrinks to the standard deviation limit 0.016 with only 4 PE steps. And the parameter error stops decreasing after that. In this case, it is shown that the parameter fails to converge to the true value, and the error of the parameter ensemble mean is over 30%. The main reason for this failure is the under-estimation of the parameter ensemble spread due to the lack of spread growth during the model integration. It can be expected that a relatively small spread would weaken the impact of the observational increment, making insufficient correction to the parameter ensembles, as the red solid line in Fig. 8a indicates.

4.4 PE with two-stage inflation

To compensate for the parameter evolution model, a two-stage inflation method by Eqs (10) and (11) is used to inflate parameter ensemble, while the one-stage inflation by Eq. (9) is used to inflate model state ensembles.

The inflation factor

$ \mathrm{\alpha } $

in Eq. (11) should be determined before activating the parameter estimation. Considering the extra parameter inflation is employed to compensate for constant parameter evolution, the value of

$ \mathrm{\alpha } $

should be given according to the state ensemble spread growth ratio.

Figure 9 measures the spread growth ratio (call SGR hereafter) of temperature and salinity variables at each model integration. The duration of model integration is 10 days; therefore the global mean spreads of temperature and salinity are increased by different amounts at different depths. Take temperature at 5 m depth as an example, the global mean spread grows from about 0.15 to 0.25 in the first cycle, and the corresponding SGR is about 1.6 (Fig. 9a). However, the spread itself and the SGR are smaller at deeper layers. Figure 9g shows the SGR with different depths, it becomes very close to 1 when the depth is larger than 500 m. Comparing to temperature, the SGR of salinity is relatively smaller. For instance, Fig. 9h shows that the SGR for salinity barely exceeds the value 1.3.

According to Fig. 9, to inflate the parameter ensemble such that its spread will be comparable to those of state variables, the value of

$ \mathrm{\alpha } $

should be neither too large nor too small. The median value of the SGR of temperature and salinity can be used for the extra parameter inflation factor. To simplify the discussion,

$ \mathrm{\alpha } $

is kept constant until it reaches the standard deviation limit to turn on CCI. Four different

$ \mathrm{\alpha } $

values as 1.1 (median value of salinity SGR), 1.3 (median value of temperature SGR), 1.2 (mean of both median values), and 1.5 (the maximum SGR) are used and compared.

Figure 10 shows the PE results with

$ \mathrm{\alpha }=1.1,\;1.2,\;1.3 $

and

$ 1.5 $

for the extra parameter inflation respectively. For the first three cases, it takes about 6, 8 and 12 data assimilation steps respectively for the parameter spread to shrink to the standard deviation limit. Take

$ \mathrm{\alpha }=1.1 $

for example, even though the parameter spread maintains its value after it reaches the threshold value 0.016, the parameter estimation can still correct the parameter bias effectively, such that the error decreases until it converges to the true value in the 7th month of the 2nd year. This is different from the case

$ \mathrm{\alpha }=1.0 $

in Fig. 8a, in which the parameter ensemble spread can not match the state ensemble spread. The cases

$ \mathrm{\alpha }=1.2 $

and 1.3 show similar results. By checking the ensemble members, it is found that the ensembles in cases with

$ \mathrm{\alpha }=1.1,\;1.2,\;1.3$

are more reliable than the case with

$ \mathrm{\alpha }=1.0 $

when the ensemble spread shrinks to its threshold value, therefore the filter can further correct the parameters.

Figure 10d shows the consequence of over-inflating in PE. If the extra parameter inflation factor is too large, e.g.,

$ \mathrm{\alpha }=1.5 $

, it will inflate the parameter ensemble too much, such that the spread has no chance to approach the std limit of CCI. The uncertainty of the parameter is over-estimated, such that the observational increments bring excessive adjustments to the parameter. As shown by the figure, the parameter reduces its bias at the first stage, but it does not converge to true value even though more data assimilation cycles are involved.

It concludes that either

$ \mathrm{\alpha }=1.1,\mathrm{ }1.2 $

$ \mathrm{\alpha }=1.3 $

provides successful PE that corrects the biased bckgrnd_vdc parameter. The pure SE experiment is also conducted, in which each member uses the same biased parameter bckgrnd_vdc = 0.10 cm²/s and the same initial ensemble with PE experiment is used. Therefore one can compare the quality of analyses from SE and PE with two-stage inflation (

$ \mathrm{\alpha }=1.1 $

). As indicated by Fig. 10a, the parameter approaches the true value in the last 6 months of the second year. So the RMSEs of temperature and salinity over a specified region for those months are compared. As Fig. 3 shown, the most sensitive regions for temperature and salinity to the parameter are different. Therefore, the temperature RMSEs are computed between the equator and 60°N, while the salinity RMSEs are computed between 30°S and 30°N.

Figure 11 shows the temperature and salinity RMSE for each month using the data assimilation results from PE and SE, respectively. The monthly mean temperature and salinity are subtracted from the reference states and averaged in each layer. In each month, the PE result seems to have smaller errors than the SE result for both temperature and salinity, the advantage is more obvious in the mixed layer. With more details, Fig. 12 shows the absolute errors of the SST in the 9th month and SSS in the 7th month respectively. In the Northwest Pacific, in which SST is sensitive to the uncertainty of bckgrnd_vdc, the SST errors are significantly reduced with adjusted background vertical diffusivity coefficient. For salinity, similar error reductions can be observed in the equatorial western Pacific. It can be concluded that PE can reduce the errors introduced by the parameter bias effectively and provide analyses with high quality.

5 Summary and discussion

Less

The PE problem is extremely complicated since it is nonlinear even if the dynamical model itself is linear, and it is in many cases difficult to find a proper solution. It is more challenging when dealing with CGCMs. The state augmentation technique, in which the state vectors are augmented with the poorly known parameters, provides a general framework to handle the parameter estimation problem using data assimilation methods. To apply the coupled PE in CGCMs, several issues are reported in previous studies, and many new techniques are developed in recent years (Zhang et al., 2020). For example, Zhang et al. (2012) introduced the DAEPC to enhance the quality of state-parameter covariance. Liu et al. (2014a, b) developed the ASA to update the single-value parameter with localized observations. Askoy et al. (2006) use the CCI to prevent parameter variance narrowing.

This paper focuses on the constant evolution model for parameters during model integration. It is one of the most widely used assumptions in PE that parameter remains unchanged during the state model integration. This assumption is introduced in the early stage of PE using the data assimilation method ( Evensen et al., 1998; Kivman, 2003; Annan and Hargreaves, 2004; Annan et al., 2005), and works well when the model is simple. As the state model becomes more and more complicated, this assumption brings several issues. In many cases, the parameter ensemble spread decreases with time and goes to zero eventually. To avoid that, Aksoy et al. (2006) propose the CCI in which the parameter ensemble spread is not allowed to be smaller than a prescribed threshold. However, the result in Fig. 8 shows that CCI is not able to avoid the divergence of PE in many circumstances. The lack of spread growth in parameter ensembles will make them under-inflated relative to the model state ensembles, and the underestimation of parameters uncertainty will weaken the impact of observations on correcting parameters.

Koyama and Watanabe (2010) use multiplicative inflation in PE with simple models, with an inflation factor for the parameter ensemble different from that used for the state variables. The current work extends their idea, to make it feasible for PE with CGCMs. Since the global scalar parameters are transformed to fields in the analysis step and turn back into scalar before model integration, a two-stage inflation method for parameters is designed. The first stage used Eq. (10) to inflate the parameter field, while the second stage used Eq. (11) to inflate the spatial-averaged scalar parameter. The second stage is extra inflation to compensate for constant parameter evolution during model integration, so its inflation factor can be determined according to the spread growth ratio of the observed variables, as Fig. 9 shown.

The PE results in Fig. 10 show that the two-stage inflation method with proper inflation factors can help the parameter to converge to its optimal value. Moreover, Figs 11 and 12 have compared the PE and SE results, revealing that the bias due to uncertain parameters is significantly reduced by PE, and the quality of PE analyses is better.

However, this paper use OSSE to verify the two-stage inflation method, and only a single parameter estimation is performed. The situation will be more complicated when more parameters are estimated simultaneously. The KPP parameterization in CESM also involves more parameters than the background vertical diffusion coefficient (Jochum, 2009), more attention should be paid to them. Moreover, it is even more challenging to estimate CGCM parameters using real observations (Zhao et al., 2019). As Zhang et al. (2020) reviewed, coupled PE (CPE) is still in the research stage, and the application of CPE in CGCMs with real observations to improve coupled model reanalysis and prediction still has a long way to go. This goal will be the focus of our future work.

Funding

Less

The National Key Research and Development Program under contract No. 2017YFA0604202; the Fundamental Research Funds for the Central Universities under contract No. B210201022; the National Natural Science Foundation of China under contract Nos 42176003, 41690124, 41806032 and 41806038.

References

Less

Aksoy A, Zhang Fuqing, Nielsen-Gammon J W. 2006. Ensemble-based simultaneous state and parameter estimation with MM5. Geophysical Research Letters, 33(12): L12801, doi: 10.1029/2006GL026186

Anderson J L. 2001. An ensemble adjustment Kalman filter for data assimilation. Monthly Weather Review, 129(12): 2884–2903, doi: 10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO;2

Anderson J L. 2009. Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus A, 61(1): 72–83, doi: 10.1111/j.1600-0870.2008.00361.x

Annan J D, Hargreaves J C. 2004. Efficient parameter estimation for a highly chaotic system. Tellus A, 56(5): 520–526, doi: 10.3402/tellusa.v56i5.14438

Annan J D, Hargreaves J C, Edwards N R, et al. 2005. Parameter estimation in an intermediate complexity Earth system model using an ensemble Kalman filter. Ocean Modelling, 8(1–2): 135–154

Banks H T. 1992. Computational issues in parameter estimation and feedback control problems for partial differential equation systems. Physica D: Nonlinear Phenomena, 60(1–4): 226–238.

Evensen G. 2003. The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dynamics, 53(4): 343–367, doi: 10.1007/s10236-003-0036-9

Evensen G, Dee D P, Schröter J. 1998. Parameter estimation in dynamical models. In: Chassignet E P, Verron J, eds. Ocean Modeling and Parameterization. Dordrech, the Netherlands: Springer, 373–398

Ffield A, Gordon A L. 1992. Vertical mixing in the Indonesian thermocline. Journal of Physical Oceanography, 22(2): 184–195, doi: 10.1175/1520-0485(1992)022<0184:VMITIT>2.0.CO;2

Furue R, Jia Yanli, McCreary J P, et al. 2015. Impacts of regional mixing on the temperature structure of the equatorial Pacific Ocean. Part 1: vertically uniform vertical diffusion. Ocean Modelling, 91: 91–111, doi: 10.1016/j.ocemod.2014.10.002

Gaspari G, Cohn S E. 1999. Construction of correlation functions in two and three dimensions. Quarterly Journal of the Royal Meteorological Society, 125(554): 723–757, doi: 10.1002/qj.49712555417

Good S A, Martin M J, Rayner N A. 2013. EN4: quality controlled ocean temperature and salinity profiles and monthly objective analyses with uncertainty estimates. Journal of Geophysical Research: Oceans, 118(12): 6704–6716, doi: 10.1002/2013JC009067

Gordon N J, Salmond D J, Smith A F M. 1993. Novel approach to non- linear/non-Gaussian Bayesian state estimation. IEE Proceedings F: Radar and Signal Processing, 140(2): 107–113, doi: 10.1049/ip-f-2.1993.0015

Gregg M C, Sanford T B, Winkel D P. 2003. Reduced mixing from the breaking of internal waves in equatorial waters. Nature, 422(6931): 513–515, doi: 10.1038/nature01507

Jochum M. 2009. Impact of latitudinal variations in vertical diffusivity on climate simulations. Journal of Geophysical Research: Oceans, 114(C1): C01010

Karspeck A R, Danabasoglu G, Anderson J, et al. 2018. A global coupled ensemble data assimilation system using the Community Earth System Model and the Data Assimilation Research Testbed. Quarterly Journal of the Royal Meteorological Society, 144(717): 2404–2430, doi: 10.1002/qj.3308

Kivman G A. 2003. Sequential parameter estimation for stochastic systems. Nonlinear Processes in Geophysics, 10(3): 253–259, doi: 10.5194/npg-10-253-2003

Koyama H, Watanabe M. 2010. Reducing Forecast Errors Due to Model Imperfections Using Ensemble Kalman Filtering. Monthly Weather Review, 138(8): 3316–3332, doi: 10.1175/2010MWR3067.1

Large W G, McWilliams J C, Doney S C. 1994. Oceanic vertical mixing: a review and a model with a nonlocal boundary layer parameterization. Reviews of Geophysics, 32(4): 363–403, doi: 10.1029/94RG01872

Liu Yun, Liu Zhengyu, Zhang Shaoqing, et al. 2014a. Ensemble-based parameter estimation in a coupled GCM using the adaptive spatial average method. Journal of Climate, 27(11): 4002–4014, doi: 10.1175/JCLI-D-13-00091.1

Liu Yun, Liu Zhengyu, Zhang Shaoqing, et al. 2014b. Ensemble-based parameter estimation in a coupled general circulation model. Journal of Climate, 27(18): 7151–7162, doi: 10.1175/JCLI-D-13-00406.1

Liu J, West M. 2001. Combined parameter and state estimation in simulation-based filtering. In: Sequential Monte Carlo Methods in Practice. New York, NY, USA: Springer, 197–223

Ruiz J J, Pulido M, Miyoshi T. 2013. Estimating model parameters with ensemble-based data assimilation: a review. Journal of the Meteorological Society of Japan, 91(2): 79–99, doi: 10.2151/jmsj.2013-201

Santitissadeekorn N, Jones C. 2015. Two-stage filtering for joint state-parameter estimation. Monthly Weather Review, 143(6): 2028–2042, doi: 10.1175/MWR-D-14-00176.1

Smith R, Jones P W, Briegleb B, et al. 2010. The Parallel Ocean Program (POP) reference manual, ocean component of the community climate system model (CCSM). Technical Report LAUR-10-01853. Boulder, CO, USA: National Centre for Atmosphere Research

Tong Mingjing, Xue Ming. 2008. Simultaneous estimation of microphysical parameters and atmospheric state with simulated radar data and ensemble square root Kalman filter. Part I: sensitivity analysis and parameter identifiability. Monthly Weather Review, 136(5): 1630–1648, doi: 10.1175/2007MWR2070.1

Wu Xinrong, Han Guijun, Zhang Shaoqing, et al. 2016. A study of the impact of parameter optimization on ENSO predictability with an intermediate coupled model. Climate Dynamics, 46(3–4): 711–727

Wu Xinrong, Zhang Shaoqing, Liu Zhengyu, et al. 2013. A study of impact of the geographic dependence of observing system on parameter estimation with an intermediate coupled model. Climate Dynamics, 40(7): 1789–1798

Zhang Shaoqing, Liu Zhengyu, Rosati A, et al. 2012. A study of enhancive parameter correction with coupled data assimilation for climate estimation and prediction using a simple coupled model. Tellus A, 64(1): 10963, doi: 10.3402/tellusa.v64i0.10963

Zhang Shaoqing, Liu Zhengyu, Zhang Xuefeng, et al. 2020. Coupled data assimilation and parameter estimation in coupled ocean–atmosphere models: a review. Climate Dynamics, 54(11–12): 5127–5144

Zhang Xuefeng, Zhang Shaoqing, Liu Zhengyu, et al. 2016. Correction of biased climate simulated by biased physics through parameter estimation in an intermediate coupled model. Climate Dynamics, 47(5–6): 1899–1912

Zhao Yuchu, Liu Zhengyu, Zheng Fei, et al. 2019. Parameter optimization for real-world ENSO forecast in an intermediate coupled model. Monthly Weather Review, 147(5): 1429–1445, doi: 10.1175/MWR-D-18-0199.1

Zhu Yuchao, Zhang Ronghua. 2018. An Argo-derived background diffusivity parameterization for improved ocean simulations in the Tropical Pacific. Geophysical Research Letters, 45(3): 1509–1517, doi: 10.1002/2017GL076269

Appendix

Less

Year 2022 volume 41 Issue 2

PDF

Cite this Article

BibTeX

Article Info

doi: 10.1007/s13131-021-1856-5

Receive Date：2021-02-10
Online Date：2025-11-20
Published：2022-02-25

Article Data

Affiliations

History

Received：2021-02-10
Accepted：2021-05-20

Funding

The National Key Research and Development Program under contract No. 2017YFA0604202; the Fundamental Research Funds for the Central Universities under contract No. B210201022; the National Natural Science Foundation of China under contract Nos 42176003, 41690124, 41806032 and 41806038.

Affiliations

¹ Key Laboratory of Marine Hazards Forecasting of Ministry of Natural Resources, Hohai University, Nanjing 210098, China

² College of Oceanography, Hohai University, Nanjing 210098, China

³ Southern Marine Science and Engineering Laboratory (Zhuhai), Zhuhai 519080, China

⁴ Environmental Science and Engineering, University of Northern British Columbia, Prince George V2N 4Z9, Canada

Corresponding:

* ytang@sio.org.cn

References

Share

https://castjournals.cast.org.cn/joweb/aos/EN/10.1007/s13131-021-1856-5

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Fig. 1. Flow diagram of OSSE for state and parameter estimation. The unit of bckgrnd-vdc is cm²/s.

Fig. 2. TS profile locations (a) and depths (b) of the period from January 1, 2010 to January 10, 2010. For a better display, only the depths of the first 500 observations are shown in b.

Fig. 3. SST spread (left) and SSS spread (right) of the 1st (a, b), 4th (c, d), 7th (e, f), 10th (g, h) month from the parameter sensitivity experiment.

Fig. 4. RMSEs of prior and posterior innovations of the temperature (left) and salinity (right) for depths 10 m (a, b), 50 m (c, d), 200 m (e, f), and 1 000 m (g, h).

Fig. 5. RMSE of SST (a,b), SSS (c,d), and SSH (e,f) from control run (left) and state estimation (right) of the first 6 months.

Fig. 6. The ratio of posterior error standard deviation to prior error standard deviation of the parameter ensemble (a); the posterior ensemble member 1–3 (b–d). The quantities are computed for the first parameter estimation step.

Fig. 7. The spatial varying inflation values for temperature (left) and salinity (right) in 5 m (a, b), 100 m (c, d) and 200 m (e, f) depth; the inflation values for the two-dimensional parameter field for the first analysis step (g).

Fig. 8. The evolution of the parameter mean (red solid line) plus/minus one standard deviation (red dotted lines) over the data assimilation period (a); the parameter error (red line) and ensemble spread (green line) over the data assimilation period (b).

Fig. 9. The global mean temperature (left) and salinity (right) spread in each data assimilation step at the depth 5 m (a, b), 50 m (c, d), and 200 m (e, f), each dashed line connects the posterior of the previous step and the prior of the current step. And the temperature (g) and salinity (h) spread growth ratio at the first step.

Fig. 10. The evolution of the parameter mean (red solid line) and spread (red dotted lines) over the data assimilation period using extra parameter inflation with $ \mathrm{\alpha }=1.1 $ (a), 1.2 (b), 1.3 (c), and 1.5 (d), respectively.

Fig. 11. The temperature (upper) and salinity (bottom) RMSE for the last 6 months of the 2nd year using the data assimilation results from parameter estimation and pure state estimation, respectively.

Fig. 12. The absolute errors of the SST in the 9th month from state estimation (a) and parameter estimation (b); the absolute errors of the SSS in the 7th month from state estimation (c) and parameter estimation (d).

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House