LECTURE 1: Subtlety of statistical applications in climate research (What we must but often forget to do)
In general, downscaling from model outputs to a target station (region) climate variable consists of three steps:
1. Assessment of the “relationships” between observed target variable at the target point (region) and model outputs in order to select statistically significant “links”.
2. Development of the “forecasting” equation, which links the target variable with selected model output items, to be used for prediction.
3. Forecast of the target variable at the target point (region) using developed regression equation and selected model output items as predictors.
Let us discuss the dangerous points we may meet.
Assessment of the “relationships” between observed target variable at the target point (region) and model outputs in order to select statistically significant “links”.
We may use a correlation analysis (mostly common), composite analysis, etc. In all these methods we perform an analysis of statistical relationships between a series of the target variable and a huge number of the model output series (the number of points in a global map (2.5ox2.5o resolution) exceeds 10000).
Let us use a correlation analysis as an example. As a result of correlation analysis we obtain a set of correlation maps between the series of the target variable at the target station (point) and all the series of the model outputs at all the points. These maps look like the maps shown in Fig. 1.
Fig.1. Examples of correlation maps
Now, we may decide that there is a statistically significant link between our target variable at the target station (point) and model outputs in the selected areas, find the regions of the largest correlation, and go to the Step 2 (development of the forecasting equation).
However, we MUST be sure that the selected links were not obtained by chance.
Local hypotheses (correlation at each grid point) – we test ~ 10000 local hypotheses.
Under H0L a statistic
Has Student’s t distribution with N-2 degrees of freedom (DOF). So, we may set a significance level at, say, 5% and assess whether our correlation is significant or not.
IMPORTANT: we must account for serial correlation in the series and use effective number of DOF.
Where xr and yr are autocorrelation functions of x and y – correlated series.
Field significance test
Necessary explanation: Let us analyse what means “correlation is significant at the 5% level”. According to definition, 5% is probability of the error of second type – rejection of H0 when it is true. It can be illustrated with correlations between “white noise” series. Let’s estimate 10000 correlation coefficients between the series of white noise, which are independent (and uncorrelated) by definition. We will get a PDF of correlation coefficients similar to that shown in Fig. 2.
Fig 2. PDF of correlation coefficient
Most of correlations will be close to 0. However, about 5% (500) of them will appear significant. Just by chance.
We have a map of approximately 10000 correlation coefficients. If all the grid point series were independent, at least 500 of correlation coefficients may appear significant by chance.
So, let’s formulate global hypotheses.
H0G : all H0L are true
H1G : at least one H0L is false
If all the grid point series were independent, it is a Bernoulli scheme (coins, dice, etc.).
Cumulative probability of m-number of results from n-tests follows binomial distribution:
p – theoretical probability = m/n
n – total number of grid points
m – number of grid points for which H0L was rejected
However, grid point series are not independent. So we are to apply a Monte Carlo method.
We are to scramble model output fields several hundred times, say, 1000 times. Each time we are to estimate correlation at each grid point and count the number of points where H0L is rejected (M).
Significance level (one tailed) of rejection of H0G is estimated as a number of M’s which exceed our M obtained on the original series divided by the number of Monte Carlo trails.
We have the right to go to Step 2 only if H0G is rejected at the appointed significance level. Otherwise, we may obtain “significant” results for the training period but fail in real time forecasts.
Development of the “forecasting” equation, which links the target variable with selected model output items, to be used as predictors. Those could be time series of model predicted variables from certain grid points, time series of area averaged model predicted variable values, time series of various modes from PC, MCA, CCA, etc.
Constructing forecasting equation we are carefully check significance of contribution of each predictor (we may use ANOVA) – it is usual.
What we use to forget about is to assess serial correlation in residuals. It is important for both multiple regressions and ordinary single-predictor regressions.
Figure: scatter plot with a hook
A value of 2 indicates there appears to be no autocorrelation. If the Durbin–Watson statistic is substantially less than 2, there is evidence of positive serial correlation. As a rough rule of thumb, if Durbin–Watson is less than 1.0, there may be cause for alarm. Small values of d indicate successive error terms are, on average, close in value to one another, or positively correlated. Large values of d indicate successive error terms are, on average, much different in value to one another, or negatively correlated.
To test for positive autocorrelation at significance ?, the test statistic d is compared to lower and upper critical values (d L,? and d U,?):
• If d < d L,? , there is statistical evidence that the error terms are positively autocorrelated.
• If d > d U,? , there is statistical evidence that the error terms are not positively autocorrelated.
• If d L,? < d < d U,? , the test is inconclusive.
To test for negative autocorrelation at significance ?, the test statistic (4 ? d) is compared to lower and upper critical values (d L,? and d U,?):
• If (4 - d) < d L,? , there is statistical evidence that the error terms are negatively autocorrelated.
• If (4 ? d) > d U,? , there is statistical evidence that the error terms are not negatively autocorrelated.
• If d L,? < (4- d) < d U,? , the test is inconclusive.
The critical values, d L,? and d U,? , vary by level of significance (?), the number of observations, and the number of predictors in the regression equation (Table 1).
5. Comparison of correlation coefficients (Fisher’s z-transform)
Correlation coefficient PDF is not Gaussian. So, assessment of significance of the difference between correlation coefficients is rather questionable.
However, if to transform correlation coefficients into a form which has Gaussian PDF they could be compared by means of conventional statistics.
Such transform was suggested by Fisher:
This transformed variable has Gaussian PDF with ?=0 and ?=(N-3) -1/2.
has a Gaussian PDF with ?=0 and ?=(2/(N-3))1/2. If N is small, it’s reasonable to use tables for Student’s t distribution rather than for Gaussian one.