• Increase font size
  • Default font size
  • Decrease font size
Training Lecture 5 on forecast downscaling

Lecture 5 on forecast downscaling



Probabilistic downscaling: Bayes theorem: basics, likelihood function, multivariable case

Vladimir Kryjov



1. Uncalibrated probabilistic forecast.

Probabilistic forecast may be not calibrated (not corrected) – it is what probability of a, n, b according to distribution of ensemble members of model(s) shows, that probability we show as our forecast of A, N, B.

1.1. Single model. If we have a single model – it is clear: 10 ens. Members are distributed in such a way: 5 – a, 3 – n, 2 – b. We say that probabilities are 0.5 – A, 0.3 – N, 0.2 – B.

1.2. Multi-Model

1.2.1. “Pooling”. If we have several models, we may “pool” them and than to process them as a single model (See Point 1.1). Say, we have 3 models 9, 20, 10 ens. members correspondingly. We “pool” all ensemble members and get a “single model” with 39 ensemble members. Then, absolutely same as in Point 1.1.

1.2.2. Total probability formula. It is what we have realized. Each model is analysed separately and then we combine probabilities.

2. Calibrated probabilistic forecast. We may calibrate both single-model and multi-model forecasts. There are many methods, i.e. Zwiers and Kharin, like FSU, and many others. Bayes approach is one of them!

We apply Bayes theorem for correction (calibration) of predicted PROBABILITIES accounting to training period “skill”.

2.1. Single model. Estimate probabilities of a, n, b, then apply Bayes theorem and get probabilities of A, N, B.

2.2. Multi-Model.

2.2.1. “Pooling”. Pooling several models into “single” , then like in 2.1.

2.2.2. Our approach suggests: Bayes method based calibration for each model separately, then like in Point 1.2.2.

So, Bayes may work in both single- and multi-model cases.

There may be many Bayes approach realizations. They differ from each other in “likelihood” term, in simplifications, etc. It is why in many papers they write about different methods but name all of them “Bayes”. We will discuss these things in our lectures.


To avoid any confusion let’s name some abstract events with letters U and V. Bayes formula is derived from the definition of conditional probability.
Conditional probability of U given V is (assuming P(V)>0)

Conditional probability of V given U is (assuming P(U)>0)

Also, please note that P(UV)=P(VU).

So that (assuming both P(V)>0 and P(U)>0):

If to divide both parts by P(V) we will get Bayes formula:

If to assume that event U may be divided into M parts: U1, U2, … UM , we will get Bayes theorem:

Where P(V) is total probability of V (You know the formula for total probability) and Bayes theorem may be written as

Usually V is named “event” and Ui is named “hypothesis”.

In this formula (2.6) the term P(Ui) is named a priori probability of hypothesis Ui ; P(V/Ui) is named “likelihood”; and P(Ui/V) is named a posteriori probability of hypothesis Ui .

What do these things mean? Let’s consider an example (this example is in almost all the text books).

Example 2.1.

We have two absolutely similar coins C1 and C2. We take randomly one of them, toss it, and get “head”. What is probability that we have tossed C1? C2?

Let’s name C1 to be hypothesis U1, C2 – hypothesis U2, “head” – event V. Since the coins look similar we suggest that probability of C1 is 1/2 and probability of C2 is 1/2. I.e., P(U1) = 1/2; P(U2) = 1/2.

This is a priory probability – our knowledge before the tests - we have not tested the coins and believe them equal.

Then, we toss the coins separately many times and get that C1 features probability of “heads” equal 1/2, it is a true coin, meanwhile C2 features probability of “heads” equal only 1/3, it is a false coin, – this is likelihood – our knowledge obtained in the tests: P(V/U1) = 1/2; P(V/U2) = 1/3.

Actually, it may be treated as an answer to the question: what was behavior (probability, frequency) of the event V (“head”) when hypothesis Ui (C1 or C2) occurred.

Now, intuitively we may suggest that if we get a “head” probability of C1 is larger than probability of C2.

However, intuitively is intuitively. Let’s have a quantitative estimate.

Let’s exploit our new knowledge obtained in the tests (P(V/U1) = 1/2; P(V/U2) = 1/3) and correct our a priori knowledge ( P(U1) = 1/2; P(U2) = 1/2 ) which we had before the tests. In doing so we will get a posteriori knowledge (i.e., knowledge after the tests): A posteriori probability of C1 is

A posteriori probability of C2 is

Please pay attention that

Also please pay attention that denumerators are the same – total probability of event V.

Thus, estimated a posteriori probabilities support our intuitively got conclusion: P(C1) > P(C2).

Let’s return to our probabilistic forecast.

Let’s assume that we have one model. It predicts probabilities of a, n, b.

We are to know future probabilities of A, N, B – a posteriori probabilities accounting for tests.

In such a way, hypotheses are Ei – A, N, B in observations, events are ei – a, n, b in model forecasts (hindcasts).

Our tests is our hindcast – training period – during which we get some knowledge – P(a/A) - estimate of likelihood – how successfully models had predicted above (a) when it was real above in observations (A).

Task 2.1.

Please write a Bayes theorem formula for P(A/a). Please write numerical value of P(A) and P(a) for hindcast period – a priory probabilities f A and a.

Task 2.2.

Please write a Bayes theorem formula for P(Ei/ek).