## Lecture 2 on forecast downscaling

##### LECTURE 2 on FORECAST DOWNSCALING

Probabilistic climate prediction. Basics and different approaches to probabilistic multimodel prediction

Fig. 1. Historical performance of the ECHAM3 model for the August–October season compared to observations averaged over the Indonesia region (10°S–20°N, 95°– 140°E).

The open blue circles show the model anomaly for individual ensemble members (expressed as a percentage of long-term mean), solid blue circles show the ensemble mean, and red crosses indicate the observed anomaly. The green circles are for the current forecast, and the solid green circle represents the ensemble mean. The gray-shaded area indicates the range of the near-normal tercile based on the climatological period 1961–90. The numbers at the top of the graph indicate the correlation between the ensemble mean simulation and the observed anomalies (R) and the tercile hit score (P). (b) Distribution of forecast members for August–October 1997 (open green bars) relative to the climatological distribution (solid blue bars). From Mason et al., 1999
Error of a single ensemble member prediction

Where err_m – difference between ensemble mean (deep blue point) and observation (red plus);

err_r - difference between ensemble member forecast (light blue point) and ensemble mean (deep blue point)
Ensemble mean forecast

Probabilistic versus deterministic

Fig. 2. Ensemble mean Xm and spread of ensemble member forecasts. Y is observation.

##### CATEGORICAL PROBABILISTIC FORECAST FOR TERCILE CATEGORIES: ABOVE NORMAL, NEAR NORMAL, BELOW NORMAL.

Estimation of climatological terciles.

Fig. 3. Climatological terciles. Empirical PDF.
If xr is ranked x than Xb = xri=N/3 while Xa = xri=2N/3. where N is HINDCAST sample size

Fig. 4. Climatological terciles. Gaussian PDF.

Fig.5. Probabilistic forecast of tercile categories. Empirical PDF.

Fig.6. Probabilistic forecast of tercile categories. Gaussian PDF.

##### A single model ensemble forecast.

Boundaries of the categories are defined using hindcast data, and forecast probability of each category is estimated as a portion of cumulative probability of forecast sample associated with this category. Particularly, given Gaussian fitting and tercile categories, the lower ( ) and upper ( ) terciles are estimated as and , correspondingly, with µ and ?being mean and standard deviation estimated using hindcast data. Meanwhile, forecast probabilities for tercile categories BN, NN, and AN are estimated as

where is a Gaussian probability distribution function:
(1)

##### Pooling

The mostly used in both operational and research practice method of multimodel combination, unweighted according to the past individual model forecast skill, is pooling. That is, all the (bias-corrected) ensemble members from all the participating models are pooled in a single sample with equal weights. Such approach is realized as operational by Meteorological Service of Canada (MSC). MSC uses four models, with each model producing 10 ensemble members in both forecast and (yearly) hindcast datasets (WMO CBS Report 2007).
It should be noted that difference in individual model ensemble sizes does not restrict the use of pooling. In this case the model weights become proportional to their ensemble sizes (Robertson et al. 2004).
The main restriction on the use of pooling arises from the method of estimation of forecast probabilities, which implies that climatological PDF is constructed on the basis of hindcast data and forecast PDF is constructed on the basis of forecast data. This usage of both hindcast and forecast datasets requires the consistency between the model weights in the hindcast and forecast datasets. Otherwise, it appears that climatological PDF is dominated by one subset of models which have the larger ensembles in hindcast, while forecast PDF is dominated by other subset of models which have the larger ensembles in forecast.

##### Averaging of individual model forecasts

There are situations when there is inconsistency between the model weights in the hindcast and forecast datasets.
In such situation, it is reasonable to estimate probabilistic forecast for each model separately and then to combine obtained forecasts. Particularly, European multi-model Seasonal to Inter-annual Prediction (EUROSIP) system, consisting of three models, each model having 41 ensemble members in forecast and 11-15 ensemble members in (yearly) hindcast datasets, produces multimodel forecast as an average of individual model probabilistic forecasts. It should be noted that forecast ensemble sizes of all three models are equal (41 members) and the simple average with equal weights is the most reasonable method of multimodel combination.
However, there are situations when inconsistency between the model weights in the hindcast and forecast datasets is in combination with essential difference in model ensemble sizes. In such situations it is reasonable to use the Total Probability Formula.
Total probability formula
Example 1.1.
We have M dice of different size, so probability that we take a big die is larger than probability that we take a small one. All dice are false, so probability to get 5 (:.:) are not equal to 1/6 for all of them. The total probability to get :.: taking one random die and tossing it is
(1.1)
It means that total probability is equal to sum over all the dice (j = 1…M) of products of non-conditional probability of diej and probability of :.: conditioned diej. In other words: we summarize over all the dice probability that we take diej which depends upon its size multiplied by probability of :.: featured by this particular diej.
Please note that resulting P(:.:) may differ from all M particular probabilities of :.: featured by the dice, i.e. from all P(:.:/diej). It is rather some die-weighted average over all the dice. By the way, if all dice are of the same size and all are true, i.e. P(diej) = 1/M for all M dice and P(:.:)= 1/6 for all M dice, total probability of :.: is 1/6.

Example 1.2.
We have M models, each of them predicts event A with its particular probability. Total probability of event A predicted by M models will be:
(1.2)
(mdl – is model)
It is clear that averaging is a special case of TPF when all the are equal.
Methods of multimodel combinations weighted of the skill of past forecasts will be discussed in lectures 5 and 6.