• Multivariate GARCH Models - A Comparative


  •   
  • FileName: RP2010-1.pdf [preview-online]
    • Abstract: 1 Multivariate GARCH Models - A ComparativeStudy of the Impact of AlternativeMethodologies on CorrelationKarl Shutes and Jacek NiklewskiFebruary 16, 2010Economics, Finance & Accounting Department, Coventry University, Coventry

Download the ebook

1
Multivariate GARCH Models - A Comparative
Study of the Impact of Alternative
Methodologies on Correlation
Karl Shutes and Jacek Niklewski
February 16, 2010
Economics, Finance & Accounting Department, Coventry University, Coventry
[email protected]
[email protected]
GARCH, multivariate GARCH, BEKK, VECH, CCC, DCC, O-GARCH, GO-
GARCH
Abstract
With the growth in the requirements of the risk management indus-
try and the complexity of instruments that are used in nance, there has
been a signicant growth in the forms of multivariate GARCH models.
These models now allow a signicant number of dimensions to be con-
sidered rather than the relatively small number that used to be the case.
This paper examines three multivariate GARCH models: the Dynamic
Conditional Correlation GARCH model of Engle [2002], the Generalized
Orthogonal GARCH model of Broda and Paolella [2008]and the General-
ized Orthogonal GARCH model of Boswijk and van der Weide [2009]for
modelling conditional correlation. The data from Polish Stock Exchange
are considered for ten companies. The results present high volatility in
conditional correlation for both GO-GARCH models whereas DCC seem
to more stable.
1 Introduction
A number of models have arisen in light of the relative computational diculty
of estimating a multivariate GARCH model. This paper will consider the ap-
proaches taken by a number of these and compare the estimates and forecasts for
2
each. With the current crisis of condence in risk management and the require-
ments of regulators for the implementation of Basel II, there is a requirement
for GARCH modelling to explicitly to take into account multivariate issues.
2 Literature Review
GARCH models have a distinguished history that can be traced back to Boller-
slev, 1986. The emphasis in the early work was primarily the univariate time
series properties of the data series. Indeed much of the early multivariate work
also looked to reduce the multivariate GARCH models to an univariate model
wherever possible.
One of the most important factors in nance is risk. The risk is not observable
directly which induces problems with measuring, modelling and forecasting it.
The ability to measure, model and forecast the risk precisely is very important
as it used in portfolio management, asset and option valuation, hedging or
choosing appropriate investment strategy (Brooks [2008], Piontek [2003]). There
are dierent methods for measuring risk. GARCH and its derivative models are
perhaps the most widely used and considered in the literature and in industry.
This section considers a number of the main frameworks and extensions to the
model as well as considering a number of the known characteristics of nancial
data.
2.1 Features of nancial data
The nancial data exhibit dierent features like (Brooks 2008: 380, Piontek
2004a,b):
ˆ Volatility clustering  There are periods of high and low volatility. The high
absolute returns tend to follow high absolute returns and small absolute returns
tend to follow small absolute returns.
ˆ Leptokurtosis eect  The distribution of returns shows much fatter tails than
the normal distribution assumes (i.e. The probability of rare events is much
larger.).
ˆ Leverage eect  The volatility tend to be larger for the price falls than for
price rises when the magnitude of the price rise and fall is identical. This is
asymmetric inuence of negative and positive information on future level of
volatility.
ˆ Skewness  The returns distribution presents some degree of skewness.
ˆ Autocorrelation of rates of returns especially in periods of low variability.
ˆ Long-run memory eect  High order autocorrelation coecients of squared
returns (errors) are signicant. More precisely when autocorrelation coecients
of squared errors sum up to innity.
3
2.2 Autoregressive conditionally heteroscedastic (ARCH)
This is a special class of models very popular in nancial modelling and forecast-
ing. This model was proposed by Engle [1982] and ARCH(q) can be represented
as (Yu 2002):
rt = µ + ut
2
σt = α0 + α1 u2 + . . . + αq u2
t−1 t−q
where ut ∼ iid(0, σt )
2
or
rt = µ + σt εt
2
σt = α0 + α1 (rt−1 − µ)2 + . . . + αq (rt−q − µ)2
where εt ∼ iid(0, 1)
The conditional variance of error depends on q lags of squared errors. The
h-step-ahead forecast of volatility can be shown as (Yu 2002):
σt+h = α0 + α1 (ˆt+h−1 − µ)2 + . . . + αq (ˆt+h−q − µ)2
ˆ2 r r
rt+h−j = rt+h−j
ˆ 1≤h≤j
where
(rt+h−j − µ)2 = σt+h−j
ˆ2 h>j
This model allows to model time-varying variances however there are some lim-
itations. Firstly when modelling nancial time series the number q tend to
be large. Secondly the non-negativity constrain of alphas ( ∀ αi ≥ 0) can
i=0,...,q
be violated as the number of alphas increases (Brooks 2008: 391-392, Piontek
2000). Therefore generalized version of ARCH model was developed by Boller-
slev [1986].
2.3 Generalised autoregressive conditionally heteroscedas-
tic (GARCH)
The conditional variance in GARCH model depends not only on lagged squared
errors but also on lags of conditional variance. The GARCH(p,q) can be pre-
sented as follows (Yu 2002):
rt = µ + ut
2
σt = α0 + α1 u2 + . . . + αq u2 + β1 σt−1 + . . . + βp σt−p
t−1 t−q
2
4
where ut ∼ iidN (0, σt )
2
or
rt = µ + σt εt
2
σt = α0 + α1 (rt−1 − µ)2 + . . . + αq (rt−q − µ)2 + β1 σt−1 + . . . + βp σt−p
2
(1)
where εt ∼ iidN (0, 1)
The h-step-ahead forecast of volatility can be shown as (Yu 2002):
m
ˆ2
σt+h = α0 + i=1 (αi σ2
+ βi )ˆt+h−i − βh wt − . . . − βm wt+h−m
ˆ ˆ h = 1, . . . , p
m
ˆ2
σt+h = α0 + i=1 (αi
2
+ βi )ˆt+h−i
σ h = p + 1, . . .
where:
st = rt − µ,
m = max{p, q},
αi = 0 f or i > q
βi = 0 f or i > p
wτ = s2 − E(s2 | Iτ −1 ) f or 0 < τ ≤ t
ˆ τ τ
wτ = 0 f or τ ≤ 0
ˆ
στ = s2
ˆ2 τ f or 0 < τ ≤ t
1 T
στ = s2 =
ˆ2 τ T
2
i=1 si f or τ ≤ 0
The GARCH(p,q) model can be presented as ARCH(∞). GARCH(1,1) is su-
cient to capture all volatility clustering in the data. GARCH is more parsimo-
nious and avoids overtting (Brooks 2008: 393, Piontek 2004aPiontek 2004b).
The unconditional variance of error is (Hamilton 1994: 666):
q p
α0
var(ut ) = q p f or αi + βi < 1
i=1 i=1
1− αi − βi
i=1 i=1
q p
If αi + βi ≥ 1then the unconditional variance is not dened.
i=1 i=1
5
2.4 ARCH and GARCH extensions
`Simple` ARCH and GARCH models cannot account for all of these features
presented above. That is why many extensions was developed to model the
nancial data accurately. These are only some examples from the extensive
collection (Bollerslev 2008, Brooks 2008: 404-410): APARCH (Engle 1990),
EGARCH (Nelson 1991), FIGARCH (Baillie et al. 1996), GARCH-M (Engle
et al. 1987), GJR-GARCH (Glosten et al. 1993), GARCH-t (Bollerslev 1987),
IGARCH (Engle and Bollerslev 1986), NGARCH (Higgins and Bera 1992) and
TGARCH (Zakoian 1994).
3 Multivariate models
So far I have focused mainly on modelling and forecasting volatility of one time-
series. However in practice there is a need for being able to model and predict
the covariances (correlations) between time-series. Though we have to move
from univariate models to multivariate models. The covariances in nance are
used for calculations of hedge ratios, portfolio VaR (Value at Risk) estimates,
betas of CAPM (Capital Asset Pricing Model), assets weights in portfolio and
many more. Multivariate models not only model variances but also covariances
(Bauwens et al. 2006, Brooks [2008], Silvennoinen and Teräsvirta 2008).
Consider a vector stochastic process{rt } with dimension N × 1. Let t−1 denote
the information set generated by the observed series{rt } until time t − 1. I
assume that (Bauwens et al. 2006):
rt = µt (θ) + t
1
t = Ht2 (θ)zt
where:
θ- vector of parameters,
µt (θ)- conditional mean N × 1 vector,
Ht (θ)- conditional variance N × N matrix,
zt - iid vector N × 1, thatE(zt ) = 0 andV ar(zt ) = IN
It is worth noting that the conditional variance of rt is equal to the conditional
variance of t (Bauwens et al. 2006):
1 1
V ar(rt | t−1 ) = V ar( t | t−1 ) = Ht2 V ar(zt | t−1 )(Ht
2
) = Ht
6
1
Ht2 is positive denite matrix (N × N ) which may be obtained by e.g. the
Cholesky decomposition (Piontek 2006).
The next few sections will focus on specication of Ht .
3.1 VEC model
This model was proposed by Bollerslev et al. [1988] The VEC model can be
presented as follows (Silvennoinen and Teräsvirta 2008):
q p
vech(Ht ) = c + Aj vech( t−j t−j ) + Bj vech(Ht−j )
j=1 j=1
where vech(·) operator stacks the columns of the lower triangular part of a N ×N
matrix as a N (N +1) × 1 vector and Aj and Bj are N (N +1) × N (N +1) matrices
2 2 2
of parameters (Silvennoinen and Teräsvirta 2008). Each conditional variance
and covariance depends on lagged squared errors and cross-products of errors
and lagged conditional variances and covariances. That is why VEC model is
very general however high exibility induces some disadvantages. Firstly the
2
(N + 1) (N + 1)
number of parameters is equal to (p + q) N +N , which is
2 2
large even for p = q = 1 and N = 3 the number of parameters equals 78. This
makes estimation demanding. There are restrictive conditions induced to make
the covariance matrix Ht positive denite for all t (Bauwens et al. 2006,Brooks
2008: 434, Piontek 2006, Silvennoinen and Teräsvirta 2008). Therefore diagonal
version of VEC model was proposed.
3.2 DVEC model
DVEC model is restricted version of VEC Bollerslev et al. [1988]. This model
assumes that Aj and Bj are diagonal matrices. This assumption implies less
(N + 1)
parameters to be estimated (p + q + 1) N (e.g. for p = q = 1 and
2
N = 3 the number of parameters equals 18). Therefore estimation is less de-
manding at the cost of exibility. Each elementhijt depends on lagged values
of errors it jt and its own lagged values. This induces the lack of transmission
eect (Piontek 2006). Even though it is easier to obtain positive deniteness of
the conditional variance matrices for DVEC than VEC, the restrictions are still
strong (Bauwens et al. 2006, Brooks 2008: 434-435, Engle et al. 1995, Piontek
2006, Silvennoinen and Teräsvirta 2008).
7
3.3 BEKK model
The solution for the problem of ensuring positive deniteness is a new parame-
terisation of the conditional variance matrix Ht (Engle et al. 1995):
q K p K
Ht = CC + Akj t−j t−j Akj + Bkj Ht−j Bkj
j=1 k=1 j=1 k=1
where Akj , Bkj and C are parameter matrices with dimension N × N however
C is lower triangular. This model was proposed by Baba, Engle, Kraft and
Kroner and is called the BEKK model (Engle et al. 1995). Parameter k ensures
the generality of the model however when K > 1 then identication prob-
lems arise (Silvennoinen and Teräsvirta 2008). Under very weak condition the
conditional covariance matrix Ht is positive denite at all time (Engle et al.
1995). The constant term matrix is decomposed into two C and C to ensure
positive deniteness of Ht .BEKK is almost as general as VEC as it includes
all diagonal representation of VEC and almost all positive denite VEC rep-
resentations (Engle et al. 1995). The number of parameters to be estimated
(p + q) KN 2 + N (N +1) is still large. Assuming that p = q = 1, N = 3 and
2
K = 1 then (p + q) KN 2 + N (N +1) = 24.
2
The model can simplied by assuming that Akj , Bkj matrices are diagonal. The
number of parameters decreases to (p + q) KN + N (N +1) (e.g. for p = q = 1,
2
N = 3 and K = 1 the number of parameters equals 12) but is still large
(Silvennoinen and Teräsvirta 2008).
By using BEKK parametrization for Ht the positive deniteness is easily ob-
tained, the problem with convergence could be an issue as Ht is not linear in
parameters. The interpretation of parameters seems not to be easy (Silven-
noinen and Teräsvirta 2008).
3.4 O-GARCH model
To overcome the estimation problem of large number of parameters the O-
GARCH model was presented by Alexander [2000]. This model tries to express
multivariate GARCH by means of univariate GARCH models i.e. the N ×N con-
ditional variance matrix Ht is modelled using m ≤ N univariate GARCH models
(Bauwens, Laurent and Rombouts 2006). The error vector process { t }can be
represented as linear combinations of m uncorrelated factors ft with uncondi-
tional variances of one, where m is usual much smaller than N (Alexander 2000,
Bauwens et al. 2006, Silvennoinen and Teräsvirta 2008):
1
V −2 t = ut = Wm ft
where:
8
ft = (f1t . . . fmt ) that E (ft | t−1 ) = 0 and V ar (ft | t−1 )
2 2
= Σt = diad σf 1t , . . . , σf mt
Each factor is assumed to follow GARCH(1,1) process so:
2
σf it = (1 − αi − βi ) + αi fi,t−1 + βi σf i,t−1 f or i = 1, . . . , m
V = diag (v1 , . . . , vN )andvi the population variance of it
1
Wm is orthogonalN × mmatrix thatWm = Pm Λm 2
Λm = diag (λ1 . . . λm ) that λ1 ≥ . . . ≥ λm > 0 and λ is the eigenvalue of the
population correlation matrix of ut
Pm is N × m matrix of corresponding eigenvectors to eigenvalues of the popu-
lation correlation matrix of ut
The conditional variance matrix of ut is equal :
Vt = V ar (ut | t−1 ) = W m Σ t Wm
Therefore the conditional variance matrix of t equals :
1 1 1 1
Ht = V ar ( t | t−1 ) = V 2 Vt V 2 = V 2 W m Σ t Wm V 2
The parameters for O-GARCH(1,1,m) model are V, Wm , all αi all βi .The number
of parameters is equal N (m+1)+4m or in extreme case (i.e. m = N ) N (N +5) .
2 2
V, Wm are obtained by sample counterparts. The number of factors used is
established by principle component analysis.
The advantage of the model is that in practice only a few principle compo-
nents are enough to explain most of variability in the system. This means that
estimation process is much easier. However if the data are weakly correlated
then identication problems arise. Another problem for O-GARCH model is
when the components have similar scaling (unconditional variance). Thirdly if
the number of components m is less than N then rank of the conditional vari-
ance matrix is reduced which can be a problem for some diagnostic tests and
−1
applications which use the Ht matrix van der Weide [2002]. Finally the trans-
formation matrix Wm is restricted to be orthogonal. Therefore van der Weide
[2002]showed generalized version of O-GARCH model.
3.5 GO-GARCH
The model can be dened as the O-GARCH model above with two main dier-
ences. Firstly the number of factors equals the number of series (i.e. m = N ).
Secondly the transformation matrix W is restricted to be invertible not only
orthogonal like in O-GARCH model. W is obtained by using singular value
decomposition (Bauwens et al. 2006, Silvennoinen and Teräsvirta 2008, van
der Weide 2002):
9
1
W = PΛ2 U
where: Λ = diag (λ1 . . . λN ) that λ1 ≥ . . . ≥ λN > 0 and λ is the eigenvalue
of the population correlation matrix of ut
P is N ×N matrix of corresponding eigenvectors to eigenvalues of the population
correlation matrix of ut
U is N × N orthogonal matrix with det (U ) = 1
N (N +1)
Matrix U can be obtained as a product of 2 rotation matrices (Bauwens
et al. 2006, van der Weide 2002):
U= Rij (δij ) − π ≤ δij ≤ π i, j = 1, . . . , N
where Rij (δij )performs a rotation in the plane spanned by ei and ej over an
angle δij . δij are called the Euler angles may be obtained by maximum likelihood
estimation.
The implied conditional correlation matrix of t can be calculated as follows
(Bauwens et al. 2006, van der Weide [2002]):
−1 −1
Rt = Dt Vt Dt
1
where: Dt = (Vt ◦ I) 2 and Vt = W Σt W
◦ is Hadamard product (i.e. elementwise product)
The model can be estimated using two-step procedure. In the rst step van
der Weide [2002] P and Λ are estimated by exploiting unconditional variance of
ut (i.e. sample counterparts). In the second step the conditional information
is used to estimate rotation coecients of U and all αi and βi of N factors.
This means that N (N +3) (i.e. N (N −1) + 2N ) parameters to be estimated by log
2 2
likelihood function (Bauwens et al. [2006], Silvennoinen and Teräsvirta [2008],
van der Weide 2002). The number of parameters is quite large.
It is worth mentioning that MGARCH-in-mean models cannot be estimated
with O-GARCH and GO-GARCH due to two-step estimation procedure. Sec-
ondly O-GARCH and GO-GARCH are part of factor GARCH models and there-
fore are nested in BEKK model (Bauwens et al. [2006]).
Allowing the transformation matrix W to be time-varying is one of the possi-
ble extensions. Secondly to use dierent GARCH models for components (i.e.
not only GARCH(1,1)) would be another extension left for further study (van
der Weide 2002).
10
3.5.1 NLS
The problem of maximizing the multivariate likelihood function for high dimen-
sions led to development of three-step procedure. This estimation method was
proposed by Boswijk and van der Weide [2006]. The second step of the two-
step procedure is divided into two steps. This allows to separate the estimation
of a part of link matrixW (i.e.U matrix) from univariate GARCH parameters
(i.e.{αi , βi }m ).
i=1
The three-step procedure tries to identifyU from the autocorrelation structure
of s∗ s∗ where s∗ = Λ−1/2 P εt . They obtain the estimate for B = U AU by
t t t
regressing the following model:
st st − Im = B st−1 st−1 − Im B + Γt , E (Γt ) = 0,
using non-linear least-squares method. Estimate forU is obtained from B as A
is diagonal matrix.
The three-step procedure is not only more practical in terms of implementation
but also is less prone to convergence problems. However the main disadvantage
is loss of eciency.
They apply O-GARCH, DCC and GO-GARCH model to 10 years daily returns
of Dow Jones Industrial index and NASDAQ Composite index. They nd that
patters are quite similar for volatilities and covariances with some dierences in
heights of the peaks however more discrepancy is observed in estimated corre-
lations between GO-GARCH and two other models. GO-GARCH correlations
seem to like a smoothed version of the DCC and O-GARCH. GO-GARCH es-
timates display lower and upper bands which is a conrmation of the previous
results (van der Weide 2002).
They also perform a test for two ve-variate examples of ve indices US and
European indices. What they nd is that the NLS (e.i three-step) estimator
performs as good as the ML (e.i two-step) estimator or even better. US data
exhibit noticeable skewness and kurtosis which makes the model to be misspec-
ied. These facts have bad inuence on ML estimator whereas NLS estimator
seems to be much more robust.
3.5.2 Chicago
The two-step as well as three-step procedure seem to be too slow when di-
mension of the model is high. For that reason Broda and Paolella [2008] in-
troduce a two-step procedure for estimation of GO-GARCH model. They use
independent component analysis (ICA) as the main tool for the decomposi-
tion of a high-dimensional problem into a set of univariate models. The ICA
algorithm maximizes the conditional heteroscedasticity of the estimated com-
ponents. Their method is called CHICAGO (e.i. Conditionally Heteroscedastic
11
Independent Component Analysis of Generalized Orthogonal GARCH models).
Their procedure allows them to apply non-Gaussian innovations.
The independent component analysis ICA is more powerful tool than the prin-
ciple components analysis PCA in a sense of preserving interesting features of
the data like clusters. This is because the PCA tries to nd the direction of
the component in which the variance of the data is maximized whereas the ICA
tries to nd the direction of the component in which the interesting features of
the data are kept.. This objective leads to dierent components between ICA
and PCA. For details see Hyvarinen [1999a].
Broda and Paolella estimateU by independent component analysis. There are
many approaches for solving ICA problem. There is a matter of choosing an
appropriate objective function and optimization algorithm. This might be ex-
pressed in the following `equation` (Hyvarinen 1999b):
ICA method = objective function + optimization algorithm
The matrix M dening the transformation:
ft = Mm ε t
The aim of ICA is to nd Mm ≡ Wm such that yt = Mm εt are independent. One
−1
of the most important restrictions for ICA is that the independent components
must be nongaussian. If more than one of components is gaussian, the matrix
Wm is not identiable.
One of the method for solving this problem is by maximizing negentropy. The
central limit theorem tells that the distribution of the sum of independent ran-
dom variables with nite second moments converges to a gaussian distribution.
Let us dene z = Wm m. Then we have y = mT ε = mT Wm f = z T f which
T
means that y is a linear combination of f with weights given by z T . Accord-
ing to central limit theorem z T f more gaussian than any fi and least gaussian
when it equals one of fi (only when one of zi of z is nonzero). Taking m that
maximizes the nongaussianity of mT ε. This vector m corresponds to a z which
has only one nonzero component. This in turn leads to one of the independent
components equals mT ε = z T f .
Dierential entropy H of a random vector y with density f (y) is dened as
(Hyvarinen and Oja 2000):
ˆ
H(y) = − f (y)logf (y)dy
This measure is well known as the Shannon's entropy or measure of uncertainty
(Shannon 1948). A gaussian variable has the largest entropy among all random
variables of equal variances. Now we can dene negentropy J (i.e a measure of
nongaussianity):
12
J(y) = H(ygaussian ) − H(y)
In practice however the density is unknown and the estimate of the negentropy
is needed. One of the possible estimators of the negentropy suggested by Hy-
varinen [1999a] is:
JG (m) = [E{G(mT ε)} − E{G(v)}]2
where m is an m-dimensional (weight) vector constrained so that E{(mT ε)2 } =
1 and G is non-quadratic function. Hyvarinen proposed the following choices of
G functions:
G1 (u) = log cosh a1 u
G2 (u) = exp(−a2 u2 /2)
with 1 ≤ a1 ≤ 2, a2 ≈ 1
To summarize the aim is to nd m that maximizes the negentropy of mT ε.
The example of a FastICA xed-point algorithm for one and several units was
proposed by Hyvarinen [1999a]. This algorithm is based on Newton-Raphson
method. It is transformed to a xed-point iteration. It is worth noting that the
convergence is cubic (or at least quadratic).
The second method of solving ICA is by exploiting the time structure of the data
set. This approach seems to be more natural for time series data e.g. nancial
returns data as the nancial data exhibit GARCH-eects. That is why by max-
imizing the autocorrelation of the squared returns one can separate independent
components (Broda and Paolella 2008). The a xed-point algorithm was pro-
posed by Hyvarinen et al. [2001] based on cross cumulants. The convergence is
cubic. For details see Hyvarinen et al. [2001].
Broda and Paolella [2008] use the second algorithm however the suggest that
one may use the rst one if the second algorithm fails to converge but this is
rare.
They also compare three estimators of matrix U: ML of van der Weide [2002],
NLS of Boswijk and van der Weide [2006] and ICA of Broda and Paolella [2008].
ML and NLS estimators are virtually unbiased whereas ICA shows a small bias.
NLS and ICA are much more robust than ML as they are separated from factors
specications. ICA doesn't exhibit problems with convergence conversely to ML.
The time of the estimation for their data set shows big discrepancy between
the estimators. ICA algorithm is 56 and 297 times faster than NLS and ML
respectively. Taking into account all features (i.e robustness, accuracy, reliability
and speed) the ICA estimator looks very promising.
13
They also apply non-gaussian distributions for components. They use two spe-
cial cases of the generalized hyperbolic distribution (i.e. normal inverse gaussian
and hyperbolic). They also propose to use Asymmetric Power ARCH model
for the components instead of GARCH(1,1). However the problem with using
generalized hyperbolic distribution of a weighted sum of independent random
variables lies in estimating cumulative density function which is needed in cal-
culating portfolio risk measures like VaR or Expected Shortfall. This problem
can be solved by saddlepoint point approximation. This method is not only
extremely accurate but also computationally cheap. Their application example
consider Var forecasts for 3 equally weighted portfolios of 10 companies taken
from Dow Jones. The data spans the period from 23/09/1992 to 23/03/2007.
VaR forecast obtained are 1.13% (4.48%) for the normal inverse gaussian distri-
bution and 1.04% (3.98%) for the hyperbolic distribution at nominal level 1%
(5%). The null hypothesis of correct coverage of the Kupiec test is accepted
with a p-value of 0.54 (0.26) for the normal inverse gaussian distribution and
0.85 (0.02) for the hyperbolic distribution.
Method of Moments
Boswijk and van der Weide [2009] propose another three-step method for esti-
mation of GO-GARCH model based on the method of moments. This method is
based on the fact that latent factors exhibit heteroscedasticity. All they assume
about the factors is that they have persistence in variance and nite fourth
moments. This method is very convenient as it doesn't require an optimiza-
tion of an objective function. In the third step univariate GARCH models are
estimated for latent factors.
The starting point for derivation of their estimator is matrix-valued process St =
st st − Im and Ft = ft ft − Im and in particular their autocorrelation properties.
st = V −1/2 εt . It worth noting that O-GARCH model of Alexander (2000)
assumes standardized principle components s∗ = Λ−1/2 P εt are independent
t
whereas here the components are conditionally uncorrelated. This is a weaker
assumption. Let us dene the autocorrelations ρik = corr(fit , fi,t−k ) and the
2 2
cross-covariances τijk = cov(fit , fi,t−k fj,t−k ). Another assumption states that
2
for some integer p, min max |ρik | > 0, max |τijk | = 0. They dene
1≤i≤m1≤k≤p 1≤k≤p,1≤i≤j≤m
the autocovaviance matrices as:
Γk (f ) = E(Ft Ft−k ), k = 1, 2, . . .
Taking into account all the assumptions they end up with
Γk (f ) = diag((κ1 − 1)ρ1k , . . . , (κm − 1)ρmk )
The autocorrelation matrix then can be dene as
Φk (f ) = Γ0 (f )−1/2 Γk (f )Γ0 (f )−1/2 = diag(ρ1k , . . . , ρmk )
14
The autocovariance and autocorrelation matrices for st = U ft :
Γk (s) = E(St St−k ) = E(U Ft U U Ft−k U ) = U Γk (f )U
Φk (s) = Γ0 (s)−1/2 Γk (s)Γ0 (s)−1/2 = U Φk (f )U
U matrix can be identied by eigenvectors of Γk (s) or Φk (s) as Γk (f ) and Φk (f )
are diagonal and U is orthogonal matrix.
The sample estimators for Γk (s) or Φk (s) are given as follow:
T
ˆ 1 1 T
Γk (s) = St St−k = t=k+1 (st st − Im )(st−k st−k − Im )
T T
t=k+1
ˆ ˆ ˆ ˆ
Φk (s) = Γ0 (s)−1/2 Γk (s)Γ0 (s)−1/2
However their experiment suggests that the most ecient estimator of Uk using
ˆ
a symmetric version of Φ
ˆ k (s) (i.e. 1 (Φk (s) + Φk (s) )).
2
ˆ ˆ
Obtaining even more ecient estimator U may be possible by combining infor-
ˆ
mation from dierent lags. That is why the the follow the Cayley transform to
derive the pooled estimator:
p p
ˆ
U = (Im − ˆ ˆ
wk (Im − Uk )(Im − Uk )−1 )(Im − ˆ ˆ
wk (Im − Uk )(Im − Uk )−1 )−1
k=1 k=1
where wk can be chosen as an equal weight or depending on eigenvalues of
2 (Φk (s) + Φk (s) ) for details see Boswijk and van der Weide [2009].
1 ˆ ˆ
They perform nite sample performance of their estimator of U matrix. To do
this they follow Fan et al. [2008] approach by dening the square root d(U, U )of
ˆ
a symmetric version of the distance measure D(U, U ˆ) for orthogonal matrices.
For details see Boswijk and van der Weide [2009]. They calculate the root
ˆ
mean square distance of d(U, U ) (i.e RMSD) over 5000 Monte Carlo replica-
tions for dierent numbers of the observations T ∈ {800, 1600, 3200, 6400} and
dierent values of p ∈ {1, 5, 10, 25, 50, 100, 200}. The eigenvalue-weighted esti-
mator always is better than the equally-weighted estimator. The optimal lag
length is p = 50(all the components have nite kurtosis) or p = 100(some of
the components don't have nite kurtosis) depending on the properties of the
components.The larger the sample size is the higher lag order is needed.
The Maximum Likelihood estimator (ML) has much smaller RMSD than the
Method of Moments estimator (MM). However very important fact is that MM
15
estimator for the process with some of the components having not nite kurtosis
(which violates one of the assumptions) has the same behaviour as for the process
with all the components with nite kurtosis. The gap between eciency of ML
and MM estimators is reduced when dierent GARCH specications or non-
Gaussian innovations are proposed for the components. When the dimension of
the system increases then convergence problems are possible for ML estimator.
The gap between time needed for estimation ML and MM grows signicantly
when the dimension of the system increases.
The also perform two empirical applications for comparison of ML and MM
estimates. They rst consider Dow Jones STOXX 600 European stock market
sector indices. The data spans the period from January 1987 to December 2007.
They focus on a trivariate model of three sectors. They nd that the estimates
obtained forU matrix as well as the GARCH parameters are dierent. Estimated
variances and covariances are quite similar but correlations seem to dier more.
Generally speaking more variation can be noticed in series estimated by ML
method than by MM method. Then they add to system another twelve sectors
and perform the above-mentioned estimation once again. The variances and
covariances are similar. The conditional correlations display larger dierences
however the variation in 15-variate model is small around their unconditional
mean. All variances, covariances and correlations in the 15-variate model are
much smoother than in the 3-variate model.
The second application examine the conditional correlations between American
Airlines, South-West Airlines, Boeing, FedEx, crude oil and kerosene daily re-
turns. The focus on the data from July 19,2003 to August 12, 2008. They
nd that all correlations display the same pattern. MM correlations show more
variation that ML correlations.
3.5.3 DCC of Engle
The Dynamic Conditional Correlation (DCC) model was proposed by Engle
[2002]. This model belongs to a group of multivariate models that can be seen
as nonlinear combinations of univariate GARCH models. The DCC is one of
a generalized version of the Constant Conditional Correlation (CCC) model of
Bollerslev [1990]. Other DCC models are Tse and Tsui [2002] or Christodoulakis
and Satchell [2002]. However I will just focus here of Engle's DCC model which
is dened as follows:
Ht = Dt Rt Dt
where
1/2 1/2
Dt = diag(h11t . . . hN N t )
hiit can be any univariate GARCH model
16
1/2 1/2 1/2 1/2
Rt = diag(q11t . . . q N N t )Qt diag(q11t . . . q N N t )
Qt = (qijt )is the N xN symmetric positive denite matrix dened as:
¯
Qt = (1 − α − β)Q + αut−1 ut−1 + βQt−1

where uit = εit / hiit ,
α and β are non-negative scalars that α + β < 1,
Qis the N xN unconditional variance matrix of ut .
¯
The main drawback of the model is that all conditional correlations follow the
same dynamic structure. The number of parameters to be estimated equals
(N + 1)(N + 4)/2 is large when the N is large (Bauwens et al. 2006). Therefore
Engle propose the estimation of the DCC model by two-step procedure. This
is possible as the conditional variance Ht = Dt Rt Dt can be seen as volatil-
ity part and correlation part. Instead of using the likelihood function for all
the coecients he suggested replacing Rt by the identity matrix. This leads
to a quasi-loglikelihood function that is the sum of loglikelihood functions of
N univariate models. In the second step Engle estimates parameters of Rt .
This method produces consistent but not ecient estimators. It is possible to
compare loglikelihood function of the two-step procedure with of the one-step
procedure and of the other models. For details see (Bauwens et al. 2006, Engle
2002).
Engle performs a comparison of several correlation estimators. The data gener-
ating process is described by two GARCH models and by six dierent correlation
functions. The simulation is performed 200 times for 1000 observations. They
use eight dierent models for estimating correlations: the Moving Average, the
exponential smoothing, the scalar BEKK, the diagonal BEKK, the Orthogonal-
GARCH, the DCC with integrated moving average estimation, the DCC by
log likelihood for integrated model and the DCC by log likelihood for mean
reverting model. Three dierent measures for comparison are used. The rst is
the mean absolute error. The second is the autocorrelation test of the squared
standardized residuals. The third test is based on estimator of VaR (i.e. Value
at Risk) for two assets portfolio. For details see Engle [2002]. Overall the ex-
periment shows that DCC models are very goo


Use: 0.607