Project on Prediction of climate variations on seasonal to interannual timescales (PROVOST)
EU contract ENV4-CT95-0109
ISSN 1399-1949 (Online)
The working hypothesis for seasonal forecasting is that relatively slow fluctuations in the atmosphere's lower boundary (primarily in sea surface temperature) can modulate the otherwise chaotic internal variability of the atmosphere. One of the principal objectives of the PROVOST project has been to quantify atmospheric seasonal predictability by running ensembles of simulations where state-of-the-art atmospheric climate and weather prediction models have been forced by observed SST. The simulations cover the 15-year ECMWF re-analysis period, 1979-93.
The level of skill of the simulations relative to the ECMWF re-analyses gives an estimate of the potential level of skill of a coupled ocean-atmosphere seasonal forecast system. Until the recent development of coupled ocean-atmosphere models, seasonal forecasts have almost exclusively been based on empirical methods where forecasts of precipitation, temperature, etc. are derived statistically from a set of predictors which typically include observed sea surface temperature (SST) anomalies.
One contribution from DMI to the PROVOST project is a comparison of the level of skill between one such empirical method and the level of skill of the PROVOST simulations. The comparison is done for fields of precipitation and 500 hPa geopotential height during the 1979-93 period. The fields are (1) specified statistically from observed SST using canonical correlation analysis (CCA) and (2) simulated by the ECMWF model from which data is easily accessible through the ``Seasonal Simulations CD-ROM'' (Becker, 1998).
It is shown in this report that when skill is averaged over time and space there is little difference between the performance of the two methods. However, there are substantial differences in both temporal evolution and spatial distribution of skill scores between the two methods.
A second contribution from DMI to the PROVOST project is the development of a method which combines the dynamical and statistical approaches. This is done by statistically correcting the output from the dynamical model, i.e. essentially producing a model output statistics forecast.
Seasonal predictability in Europe is of special interest to the PROVOST project, but Europe is also known as a region where seasonal predictability is generally low. This is also reflected in the uncoupled PROVOST simulations. A third contribution from DMI to the project is an analysis of relations between near-global SST anomalies and model simulated temperature in Europe (in 850 hPa) in summer and winter. The aim of this analysis is to identify regions in the ocean where SST has have an impact on seasonal temperature in Europe.
DMI's contributions to the final PROVOST report are described in the following sections. Section 2 contains a brief description of the datasets used in the analyses in the subsequent sections. It is explained in Section 3 how fields of precipitation and 500 hPa geopotential height can be specified from observed SST using canonical correlation analysis, and it is shown how the skill of this statistical specification compares with the skill of dynamically simulated fields for the 1979-93 period. In section 4 statistical methods (based on singular value decomposition analysis of the cross-covariance between model simulated and observed precipitation) is used to linearly correct model simulated precipitation. It is demonstrated that this kind of statistical post-processing improves the model performance in the rainy seasons in many regions, including Europe. Section 5 shows that global SST, particularly strong ENSO-related SST, has a systematic impact on model simulated 850 hPa temperature in Europe in the ECMWF uncoupled PROVOST simulations. Concluding remarks and outlook are presented in Section 6. Finally, Section 7 contains a list of PROVOST-related publications from DMI.
A second set of model simulations are used for comparison. They comprise three 34-year AMIP-type integrations using the ECHAM4 model at T30 horizontal resolution (Moron et al., 1998).
Dynamical climate models are immensely more expensive to run on a computer than linear regression models, but do they give more accurate seasonal forecasts? In the following a comparison between the predictive skill of the two methods is described for fields of precipitation in selected regions (Northeast Brazil, North America and Europe) and 500 hPa geopotential height (Northern Hemisphere) for which reasonably good quality observational data is available for a period longer than the 15-year PROVOST period, 1979-93. The longer period is necessary in order to have a suitable training period for the linear regression model. The training period has been chosen to run from 1961 until 1993 which represents a compromise between a long period and a period during which high quality data is available.
The dynamical simulations are run with observed SST, i.e. they are not forecasts. In order not to make an unfair comparison, the statistical ``predictions'' of precipitation and geopotential height are calculated for simultaneous SST and are referred to as specifications rather than predictions. Note that the statistical specifications contain no information about cause and effect. They are only based on empirical relationships between SST anomalies and precipition or geopotential height.
The statistical specification equations are derived using canonical correlation analysis (CCA) of SST anomaly predictor fields and precipitation or geopotential height anomaly predictand fields, and the method follows closely that used by NCEP/Climate Prediction Center for their operational seasonal forecasts (Barnston, 1994; Barnston and Smith, 1996). Very briefly, the technique is to expand the predictor and predictand fields in terms of a few (no more than five or six) canonical patterns and derive a linear regression model which specifies the predictand from projections of the predictor field onto the canonical predictor patterns.
The verification period for the empirical specifications is coinciding with the 15-year PROVOST period in order to easily compare the skill of the empirical specifications to the skill of the PROVOST simulations. In order to verify the empirical specifications on independent data it is necessary to repeatedly exclude one year (between 1979 and 1993) from the training dataset and verify the specification for that year (cross-validation; Michaelsen, 1987).
The first example is for rainfall in northeast Brazil in the MAM rainy season. Seasonal rainfall in this region has previously been demonstrated to be highly predictable and closely linked to SST anomalies in both the tropical Pacific and the tropical Atlantic Oceans (Hastenrath and Greischar, 1993; Hastenrath et al., 1984; Ward and Folland, 1991). Figure 1 shows the pattern anomaly correlation coefficient (ACC) for both empirically specified and dynamically simulated rainfall (using ECMWF's PROVOST simulations). There appears to be general, although not perfect, agreement between the ACCs for the two different methods, and on average the performance of the two methods is similar when based on ACC skill scores.
If we consider the performance in individual grid points there are, however, obvious differences. Figure 2 which shows skill scores based on the linear error in probability space (LEPS; Ward and Folland, 1991; Potts et al., 1996), shows that the dynamical model definitely simulates rainfall more accurately than the empirical specification in the eastern part of the region (including the Nordeste region).
The second example is for precipitation in Europe in the JFM winter season. Also in this case there are winters (e.g. 1992 and 1993) in which precipitation is well reproduced by empirical specification from SST (Fig. 3). In those winters the empirical specification gives much better results than the dynamical model simulation, but in other winters the dynamical model is more skilful than the empirical specification. On average, the empirical method scores slightly higher ACCs than the dynamical model, but the difference is probably not significant (not tested).
The geographical distributions of skill for the two methods are more or less orthogonal: the empirical method is skilful in southern Europe, while the dynamical model is skilful in northern Europe (Fig. 4).
For a third region, North America, where precipitation in winter is known to be moderately influenced by the state of ENSO, the dynamical model is on average slightly more skilful than the empirical method. Especially in the southern part of the continent is the dynamical model outperforming the empirical method (fig. 5).
ACC and LEPS skill scores for 500 hPa geopotential height (Z500) in the Northern Hemisphere (25°N-70°N) in the DJF season are shown in Figs. 6 and 7. As for the precipitation examples (Figs. 1-4) skill scores for the empirical specification is comparable to skill scores for the dynamical model. The highest ACC skill scores (Fig. 6) are obtained for the dynamical model in DJF 1982/83 and 1988/89 which were both winters with a very strong SST forcing in the tropical Pacific, but the average ACC for the empirical method is marginally higher than the average ACC for the dynamical model.
Both approaches give little or no skill in the Atlantic/European region (Fig. 7); the dynamical model is good in the eastern part of the north Pacific, while the empirical method specifies Z500 well around the southeastern part of the U.S.
This section describes the results of specifying (observed) precipitation from model simulated precipitation. That is, the model simulated precipitation which depends nonlinearly on SST anomalies, is statistically corrected using a technique which is very similar to the empirical specification described in the previous section. This is essentially a model output statistics (MOS) system. In the examples in the following, singular value decomposition analysis (SVDA) is used throughout, but tests have shown that statistical correction based on CCA give similar results (see Feddersen et al., 1999).
The SVDA pattern analysis shows that the model in many regions, particularly in the tropics, is capable of simulating a correct temporal evolution of the leading precipitation anomaly patterns (see Fig. 8, showing time series of model simulated and observed rainfall anomalies in northeast Brazil projected onto the leading SVDA patterns), i.e. the model atmosphere responds at the correct points in time to variations in SST anomalies, e.g. during ENSO events. But the simulated precipitation anomaly patterns in many regions are geographically shifted, or otherwise corrupted, when compared to observed precipitation anomaly patterns. This leads to a degradation of model skill when skill scores are based on the model performance in individual grid points. The examples in this section show how a correction based on linear regression of the time series of the leading SVDA modes can be applied to model simulated precipitation to reduce model errors and increase skill scores.
Figure 9 shows heterogeneous correlation patterns (Wallace et al., 1992) for the first SVDA mode of simulated and observed rainfall in northeast Brazil in the rainy MAM season. Heterogeneous correlation patterns are maps of correlation coefficients between a predictor SVDA time series (e.g. the time series denoted ``ensemble mean'' in Fig. 8) and a predictand field (e.g. observed rainfall anomaly in northeast Brazil) and vice versa. Thus, the heterogeneous correlations indicate how well one field can be specified from the SVDA time series corresponding to the other field.
The two correlation patterns in Fig. 9 are similar with positive correlations in the bulk part of northeast Brazil and negative correlations only in the northwesternmost and southwesternmost parts of the shown region (the observational dataset does not contain data in ocean regions, so that part of the left heterogeneous correlation pattern which is over the ocean is not considered here). The gradient between positive and negative correlations along the north coast is, however, shifted southeastwards in the simulation pattern (Fig. 9a) compared to the position in the observational pattern (Fig. 9b).
The effect of correcting the model simulated rainfall using only the first SVDA mode, is evident in Fig. 10 which show LEPS skill scores for simulated MAM rainfall in northeast Brazil before and after correction. Skill scores are dramatically increased near the north coast where the SVDA correlation patterns (Fig. 9) showed significant differences. The LEPS skill scores are calculated in cross-validation mode, i.e. the linear regression model which is used for the correction is based on a 14-year training period which specifically excludes the year for which rainfall is corrected. For the direct model output cross-validation means calculating anomalies as the difference between simulated rainfall for the year being verified and model climatology based on simulated rainfall in the remaining 14 years. The difference between anomalies calculated this way and anomalies which are all based on the same 15-year model climatology is, however, marginal.
Fig. 11 shows the heterogeneous correlation patterns for the first SVDA mode. Both patterns are dominated by a north-south dipole, but there are many differences on smaller scales. Also, the zero contour is further south in the simulation pattern. The corresponding time series (Fig. 12) shows good agreement between observations and ensemble mean, but there is a very large spread between the individual members of the ensemble. Nevertheless, the corrected model simulation (based on the first two SVDA modes) is overall more skilful than the raw model output (Fig. 13). The skill is increased particularly in southern Europe.
Figure 14 shows good agreement between the two heterogeneous correlation patterns for the first SVDA mode, and there is also very good agreement between the corresponding time series (Fig. 15). ENSO is clearly impacting the first SVDA mode. There is not much room for improvement using the first two SVDA modes. Figure 16 shows LEPS skill scores after statistical correction of the PROVOST model simulations.
In order to check whether 15-year training periods are sufficiently long to get reliable statistical results, statistical corrections are considered for precipitation simulated in a different set of experiments by the ECHAM4 model. The ECHAM4 simulations (Moron et al., 1998) comprise an ensemble of three AMIP-type integrations covering the 34-year period 1961-94.
Statistical corrections of the ECHAM4 simulations have been compared for combinations of long (1961-94) and short (1979-93) training periods and for long and short cross-validation periods, and for all three examples considered so far in this report, i.e. precipitation in northeast Brazil (MAM), Europe (JFM) and North America (JFM), it was found that varying training and verification periods did not affect the main results. In fact the highest skill scores were in all three cases obtained for the short training period and the short cross-validation period, i.e. corresponding exactly to the training and cross-validation periods for the PROVOST simulations. It was found (but not shown here) that precipitation in the PROVOST simulations in general is more accurate than in the ECHAM4 simulations. This is probably due to a combination of the higher horizontal resolution of the ECMWF model used for the PROVOST integrations (T63 vs. T30), short lead-time and three times as many ensemble members.
Motivated by the strong temperature signal seen in Europe in recent seasonal forecasts (ECMWF forecasts for, e.g., JJA 1997 and DJF 1997/98) using a coupled ocean-atmosphere model, possible links between SST anomalies and simulated 850 hPa temperature (T850) in Europe have been studied. As in the previous section the model simulations are taken from ECMWF's uncoupled PROVOST simulations.
The consistency between individual members of the ensemble of simulated T850 in Europe varies considerably from year to year. In some years the individual ensemble members are highly consistent. Fig. 17 shows histograms of anomaly correlation coefficients between every pair of ensemble members (there are 36 such pairs for an ensemble of 9 members) for the July-August (JA) season 1982 and for the January-February (JF) season 1983. A nonparametric statistical test [Wilcoxon-Mann-Whitney test; Wilks (1995)] confirms that the distributions shown in Fig. 17 are both significantly shifted relative to the distribution one gets if all other JA seasons or JF seasons in the 1979/80-93 period are pooled together. Similarly, ACC distributions of T850 ensemble members are positively shifted in JA 1979, 88 and 93 and in JF 1985 and 89.
The relatively high internal consistencies in those cases are most likely the result of SST forcing of the model atmosphere. So in order to check whether SST anomalies in any particular area are dominant when T850 internal ensemble consistency in Europe is high, absolute values of SST anomalies are composited for the above-mentioned years. Absolute values are chosen so that, e.g., warm and cold ENSO events which may both lead to increased internal ensemble consistency of T850 in Europe, will not cancel when composited. The JA season composite is not dominated by any strong SST anomalies, although there are indications of a connection between JA T850 internal ensemble consistency in Europe and onsetting ENSO events (Fig. 18).
For the JF season the SST anomaly composites are clearly dominated by a strong ENSO signal (Fig. 19). This is hardly surprising as all three years in the composite are ENSO years. But note that internal ensemble consistency was not increased during other ENSO events such as those of 1986/87 and 1991/92. Thus, the occurrence of an ENSO event is not a sufficient condition for increased internal ensemble consistency.
In a separate approach, global SST anomaly patterns are related to the model simulated ensemble mean of T850 in Europe using CCA. The SST anomaly field is divided into three two-monthly segments, one simultaneous with the T850 period (JA or JF) and two prior to the T850 period in order to capture possible lagged connections between SST and T850. The resulting leading mode CCA SST correlation patterns and time series are shown in Figs. 20-22.
The patterns in Fig. 20 show a temporal development in most of the Pacific Ocean which agrees well with SST anomaly patterns in the onsetting stages of a generic warm ENSO event (Harrison and Larkin, 1998). The associated SST time series in Fig. 22a shows that the leading CCA mode is most pronounced in 1982, 88, 92 and 93. Except for 1992 these are also years in which the ensemble of simulated T850 in Europe shows a high internal consistency. In 1992 SST anomalies near Indonesia were much weaker than in the other three years, so if SST anomalies in this region are critical to the model's atmospheric response in Europe, this could explain why the model fails to produce an internally consistent T850 response in Europe in 1992.
In the extratropics correlations between SST anomalies and the leading CCA mode of T850 in Europe in JA is generally weaker than in the Tropics. In particular, SST anomalies in the oceans immediately adjacent to Europe, the northeastern Atlantic and the Mediterranean, do not correlate with T850 in Europe.
The patterns in Fig. 21 agree very well with the SST patterns in both the Pacific and the Indian Oceans of a generic warm ENSO event (Harrison and Larkin, 1998). The fact that the CCA picks out the ENSO signal so clearly suggests that ENSO almost certainly has an impact on T850 in Europe in the JF season in the ensemble simulations. The leading mode time series (Fig. 22b) show that the patterns in Fig. 21 are most prominent in 1982/83 and in 1988/89. In agreement with this, the T850 internal ensemble consistency in Europe is high in JF in both 1983 (Fig. 17b) and in 1989. The occurrence of a moderately strong ENSO event (e.g. in 1986/87), however, does not guarantee high T850 internal ensemble consistency.
Figure 23 shows correlations between the leading mode time series for the SST anomaly field and global model simulated T850 although CCA is only applied to T850 in Europe (inside the box on the maps). Areas of high correlations (positive or negative) in the global patterns in Fig. 23 are areas where simulated seasonal T850 variations are linearly related to those SST variations which per construction are linearly related to simulated seasonal T850 variations in Europe. In the JA season correlations are negative in all of Europe (Fig. 23a). The correlations are most negative in the southeastern and westernmost parts. Thus simulated T850 will tend to be below (above) normal during an onsetting warm (cold) ENSO event (when the SST anomalies correspond to the correlation patterns in Fig. 21) in those parts of Europe. The leading CCA mode for SST is also inversely related to simulated T850 in large parts of North America, the northern Atlantic and the western part of Asia. The general agreement over most of the Pacific Ocean between the T850 and SST correlation maps shows that the simulated T850 tends to follow the prescribed SST in this region.
In the JF season the T850 correlation pattern (Fig. 23b) shows high positive values in southern Europe and contrasting high negative values in northern Scandinavia and the Barents Sea. That is, during warm (cold) ENSO events the model atmosphere responds with mild (cold) winters in southern Europe and cold (mild) winters in northern Scandinavia and the Barents Sea. The high correlations in southern Europe are part of a band of high correlations which stretches from the central and eastern equatorial Pacific northeastwards across the northern, tropical Atlantic, northwest Africa and the Mediterranean to parts of central Asia. The well established connection between ENSO and a north-south temperature anomaly dipole in North America (Ropelwski and Halpert, 1986; Livezey et al., 1997) is also evident in the global T850 correlation pattern in Fig. 23b.
The results so far in this section have been based entirely on model simulations. It is, of course, also of interest how well the model simulated T850 in a particular season compares with T850 in the real atmosphere in the same season. Here this is done by comparing two ACC distributions. One distribution is for ACCs between pairs of ensemble members of model simulated T850 in Europe; the other distribution is for ACCs between T850 ensemble members and the T850 ECMWF reanalysis. A Wilcoxon-Mann-Whitney test is applied to the two sets of obtained ACCs to test the null hypothesis that they are realisations of the same distribution. The resulting position of the test statistic in probability space is shown in Fig. 24 for each year in the JA and JF seasons. Values near zero occur when the ACCs between reanalysed T850 and the ensemble of simulated T850 is significantly smaller than the ACCs between pairs of simulated T850 within the ensemble. Values near 1 occur when the opposite is the case, while values near 0.5 indicate no significant difference between ACCs. The null hypothesis is rejected at the 5% level if the test statistic exceeds one of the dashed lines in Fig. 24.
Figure 24 shows that the internal ensemble ACC distribution is significantly different from the observation-simulation ACC distribution in JA 1979 (Fig. 24a) and in JF 1983 and 1985 (Fig. 24b). In all three cases ACCs between simulations within an ensemble are higher than ACCs between the same simulations and observations, i.e. the model is not reliable for Europe in the cases where the simulations show the most consistent T850 anomaly field in Europe. Similar tests (not shown) for the entire Northern Hemisphere between 20°N and 70°N show much better agreement between internal T850 ensemble consistency and model skill, i.e. Europe is one of the more problematic areas in the Northern Hemisphere for the model.
Comparison of dynamically simulated precipitation with empirically specified precipitation has shown that ECMWF's uncoupled atmospheric PROVOST simulations for prescribed SST are slightly more skilful than CCA-based specification of seasonal precipitation from simultanous SST anomalies. For a different set of simulations (three 34-year AMIP-type integrations of the ECHAM4 model at T30 horizontal resolution) precipitation is less skilful than for the PROVOST simulations and for some regions also less skilful than the empirical specifications.
Skill scores for empirically specified Z500 in DJF in the Northern Hemisphere has been shown to be comparable to skill scores for dynamically simulated Z500 (ECMWF's uncoupled PROVOST simulations). There are regional differences in skill, but neither method is capable of reproducing observed Atlantic/European Z500. The dynamical simulations appear to be more accurate when there is a strong ENSO forcing of the atmosphere, but on average the empirical method appears to be as good as the dynamical method in the Northern Hemisphere.
A statistical correction procedure based on SVDA has been applied to model simulated precipitation. The correction can improve the model skill when the simulated precipitation anomaly patterns are geographically shifted relative to observed anomaly patterns while at the same time the model responds more correctly in time. For precipitation this is a situation which frequently happens, and it has been demonstrated for several regions, including Europe, that model skill is increased after statistical correction. The statistically corrected model precipitation is more accurate relative to observations than empirically specified precipitation, also for the low-resolution ECHAM4 simulations. For large-scale fields such as geopotential height or mean sea level pressure there is apparently not much to gain from statistical correction.
Diagnostic analyses of model simulated T850 in Europe (ECMWF's uncoupled PROVOST simulations) have shown that strong ENSO forcing almost certainly has an impact on model simulated T850 in Europe in winter (the January-february ``mid winter'' season). In the model warm (cold) ENSO events are associated with mild (cold) conditions in southern Europe and cold (mild) conditions in northern Scandinavia and around the Barents Sea.
A high internal ensemble consistency for model simulated T850 in Europe has been seen also in non-ENSO years, particularly in the July-August ``high summer'' season. This suggests that SST patterns not related to ENSO exist which have a systematic impact on model simulated T850 in Europe. Those SST patterns have not been identified, probably because they are less frequent than ENSO-related SST patterns and therefore are not captured by compositing or CCA applied to only 14 or 15 years of data.
Comparison of the internal ensemble consistency with the agreement between ensemble members and ECMWF reanalysis data show that there are several years, including a number of the strong ENSO years, in which the two are significantly different in Europe, i.e. the model ensemble is not encompassing the reanalysis. It is likely that a multi-model ensemble would not suffer to the same extent from this problem as this single-model ensemble is clearly biased under certain SST forcings.
The results of the statistical corrections of model simulated seasonal precipitation are encouraging, and hopefully it will be possible to test the corrections on a coupled model seasonal forecast system in the years to come.
One aspect which is not taken into account in the statistical correction method, is the spread between individual ensemble members. Seasonal forecasts are conveniently presented as probability forecasts where the probabilities are based on the different behaviour of the members of an ensemble of simulations. Preferably, a statistically corrected forecast should also be presented in terms of probabilities. The method which has been developed during the PROVOST project, has been applied to ensemble means (in which the ensemble spread is only taken indirectly into account in the sense that high spread generally results in near-climatological ensemble means). Further work is needed in order to make proper use in the statistical corrections of the information which is contained in the spread between ensemble members.
Feddersen, H., SVD analysis and statistical post-processing of seasonal climate
simulations, in Proceedings of the twenty-second annual climate diagnostics and
prediction workshop, pp. 61-64, 1997.
Feddersen, H., SVD pattern analysis and statistical post-processing of simulated
seasonal precipitation, in Proceedings of WMO International Workshop on Dynamical
Extended Range Forecasting, WMO/TD 881, pp. 92-95, 1998.
Feddersen, H., Navarra, A. and Ward, M. N., Reduction of model systematic error by
statistical correction for dynamical and seasonal predictions, J. Climate,
Feddersen, H., Impact of global SST on summer and winter temperature in Europe in a set of seasonal ensemble simulations, Q. J. R. Meteorol. Soc., Submitted, 1999.
Up: Statistical Analysis and Post-Processing Previous: Bibliography
Statistical Analysis and Post-Processing of Uncoupled PROVOST Simulations
This document was generated using the LaTeX2HTML translator Version 98.1 release (February 19th, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
The translation was initiated by on 1999-03-25