Nonlinear system theory: Another look at dependence
-
Communicated by Murray Rosenblatt, University of California at San Diego, La Jolla, CA, August 4, 2005 (received for review April 29, 2005)
Abstract
Based on the nonlinear system theory, we introduce previously undescribed dependence measures for stationary causal processes. Our physical and predictive dependence measures quantify the degree of dependence of outputs on inputs in physical systems. The proposed dependence measures provide a natural framework for a limit theory for stationary processes. In particular, under conditions with quite simple forms, we present limit theorems for partial sums, empirical processes, and kernel density estimates. The conditions are mild and easily verifiable because they are directly related to the data-generating mechanisms.
Let εi,
, be independent and identically distributed (iid) random variables and g be a measurable function such that
is a properly defined random variable. Then (Xi) is a stationary process, and it is causal or nonanticipative in the sense that Xi does not depend on the future innovations εj, j > i. The causality assumption is quite reasonable in the study of time series. Wiener (1) considered the fundamental coding and decoding problem of representing stationary and ergodic processes in terms of the
form Eq. 1. In particular, Wiener studied the construction of εi based on Xk, k ≤ i. The class of processes that Eq. 1 represents is huge and it includes linear processes, Volterra processes, and many time series models. In certain situations,
Eq. 1 is also called the nonlinear Wold representation. See refs. 2-4 for other deep contributions of representing stationary and ergodic processes by Eq. 1. To conduct statistical inference of such processes, it is necessary to consider the asymptotic properties of the partial
sum
and the empirical distribution function
.
In probability theory, many limit theorems have been established for independent random variables. Those limit theorems play
an important role in the related statistical inference. In the study of stochastic processes, however, independence usually
does not hold, and the dependence is an intrinsic feature. In an influential paper, Rosenblatt (5) introduced the strong mixing condition. For a stationary process (Xi), let the sigma algebra
, m ≤ n, and define the strong mixing coefficients
If αn → 0, then we say that (Xi) is strong mixing. Variants of the strong mixing condition include ρ, Ψ, and β-mixing conditions among others (6). A central limit theorem (CLT) based on the strong mixing condition is proved in ref. 5. Since then, as basic assumptions on the dependence structures, the strong mixing condition and its variants have been widely
used and various limit theorems have been obtained; see the extensive treatment in ref. 6.
Since the quantity
in Eq. 2 measures the dependence between events A and B and it is zero if A and B are independent, it is sensible to call αn and its variants “probabilistic dependence measures.” For stationary causal processes, the calculation of probabilistic dependence
measures is generally not easy because it involves the complicated manipulation of taking the supremum over two sigma algebras
(7-9). Additionally, many well-known processes are not strong mixing. A prominent example is the Bernoulli shift process. Consider
the simple AR(1) process Xn = (Xn
-1 + εn)/2, where εi are iid Bernoulli random variables with success probability 1/2 (see refs. 10 and 11). Then Xn is a causal process with the representation
and the innovations εn, εn
-1,..., correspond to the dyadic expansion of Xn. The process Xn is not strong mixing since αn ≡ 1/4 for all n (12). Some alternative ways have been proposed to overcome the disadvantages of strong mixing conditions (8, 9).
Dependence Measures
In this work, we shall provide another look at the fundamental issue of dependence. Our primary goal is to introduce “physical
or functional” and “predictive dependence measures” a previously undescribed type of dependence measures that are quite different
from strong mixing conditions. In particular, following refs. 1 and 13, we shall interpret Eq. 1 as an input/output system and then introduce dependence coefficients by measuring the degree of dependence of outputs on
inputs. Specifically, we view Eq. 1 as a physical system
where ei, ei
-1,... are inputs, g is a filter or a transform, and xi is the output. Then, the process Xi is the output of the physical system 3 with random inputs. It is clearly not a good way to assess the dependence just by taking the partial derivatives ∂g/∂ej, which may not exist if g is not well-behaved. Nonetheless, because the inputs are random and iid, the dependence of the output on the inputs can be
simply measured by applying the idea of coupling. Let (
) by an iid copy of (εi); let the shift process ξi = (..., εi
-1, εi),
. For a set
, let
if j ∈ I and εj
,
I = εj if j ∉ I; let ξi
,
I = (..., εi
-1,
I, εi
,
I) and
. Then ξi
,
I is a coupled version of ξi with εj replaced by
if j ∈ I. For p > 0 write
if
and ∥X∥ = ∥X∥2.
Definition 1 (Functional or physical dependence measure): For p > 0 and
let δp(I, n) = ∥g(ξn) - g(ξn
,
I)∥p and
. Write δ(n) = δ2(n).
Definition 2 (Predictive dependence measure): Let p ≥ 1 and gn be a Borel function on
such that
, n ≥ 0. Let ωp(I, n) = ∥gn(ξ0) - gn(ξ0,
I)∥p and
. Write ω(n) = ω2(n).
Definition 3 (p-stability): Let p ≥ 1. The process (Xn) is said to be p-stable if
, and p-strong stable if
. If Ω = Ω2 < ∞, we say that (Xn) is stable.
By the causal representation in Eq. 1, if min{i: i ∈ I} > n, then δp(I, n) = 0. Apparently, δp(I, n) quantifies the dependence of Xn = g(ξn) on {εi, i ∈ I} by measuring the distance between g(ξn) and its coupled version g(ξn
,
I). In Definition 2,
is the n-step ahead predicated mean, and ωp(n) measures the contribution of ε0 in predicting future expected values. In the classical prediction theory (14), the conditional expectation of the form
is studied. The one
used in Definition 2 has a different form. It turns out that, in studying asymptotic properties and moment inequalities of Sn, it is convenient to use
and predictive dependence measure (cf. Theorems 2 and 3), while the other version
is generally difficult to work with. In the special case in which Xn are martingale differences with respect to the filter σ(ξn), gn = 0 almost surely and consequently ω(n) = 0, n ≥ 1.
Roughly speaking, since
, the p-stability in Definition 3 indicates that the cumulative contribution of ε0 in predicting future expected values
is finite. Interestingly, the stability condition Ω2 < ∞ implies invariance principles with
-norming in a natural way (Theorem 3). By (i) of Theorem 1, p-strong stability implies p-stability since δp(n) ≥ ωp(n).
Our dependence measures provide a very convenient and simple way for a large-sample theory for stationary causal processes (see Theorems 2-5 below). In many applications, functional and predictive dependence measures are easy to use because they are directly related to data-generating mechanisms and because the construction of the coupled process g(ξn , I) is simple and explicit. Additionally, limit theorems with those dependence measures have easily verifiable conditions and are often optimal or nearly optimal. On the other hand, however, our dependence measures rely on the representation 1, whereas the strong mixing coefficients can be defined in more general situations (6).
Theorem 1. (i) Let p ≥ 1 and n ≥ 0. Then δp(n) ≥ ωp(n). (ii) Let p ≥ 1 and the projection operator
,
. Then for n ≥ 0,
(iii) Let p > 1, Cp = 18p
3/2(p - 1)-1/2
if 1 < p < 2, Cp =
if p ≥ 2; let
. Then
Proof: (i) Since
,
which by Jensen's inequality implies δp(n) ≥ ωp(n). (ii) Since
and
and (εi) are independent, we have
and inequality 4 follows from
(iii) For presentational clarity, let I = {..., -1, 0}. For i ≤ 0 let
Then D
0, D
-1,.. .are martingale differences with respect to the sigma algebras σ(εi,..., εn), i = 0, -1,.... By Jensen's inequality, ∥Di∥p ≤ δp(n - i). Let
,
and
. Then
and
To show Eq. 5, we shall deal with the two cases 1 < p < 2 and p ≥ 2 separately. If 1 < p < 2, then
. By Burkholder's inequality (15)
If p ≥ 2, by proposition 4 in ref. 16,
. So Eq. 5 follows.
Inequality 5 suggests the interesting reduction property: the degree of dependence of Xn on
can be bounded in an element-wise manner, and it suffices to consider the dependence of Xn on individual εi. Indeed, our limit theorems and moment inequalities in Theorems 2-5 involve conditions only on δp(n) and ωp(n).
Linear Processes. Let εi be iid random variables with
, p ≥ 1; let (ai) be real coefficients such that
is a proper random variable. The existence of Xt can be checked by Kolmogorov's three series theorem. The linear process (Xt) can be viewed as the output from a linear filter and the input (..., εt
-1, εt) is a series of shocks that drive the system (ref. 17, pp. 8-9). Clearly,
, where
. Let p = 2. If
then the filter is said to be stable (17) and the preceding inequality implies short-range dependence since the covariances are absolutely summable. Definition 3 extends the notion of stability to nonlinear processes.
Volterra Series. Analysis of nonlinear systems is a notoriously difficult problem, and the available tools are very limited (18). Oftentimes it would be unsatisfactory to linearize or approximate nonlinear systems by linear ones. The Volterra representation
provides a reasonably simple and general way. The idea is to represent Eq. 3 as a power series of inputs. In particular, suppose that g in Eq. 3 is sufficiently well-behaved so that it has the stationary and causal representation
where functions gk are called the Volterra kernel. The right-hand side of Eq. 8 is generically called the Volterra expansion, and it plays an important role in the nonlinear system theory (13, 18-22). There is a continuous-time version of Eq. 8 with summations replaced by integrals. Because the series involved has infinitely many terms, to guarantee the meaningfulness
of the representation, there is a convergence issue that is often difficult to deal with, and the imposed conditions can be
quite restrictive (18). Fortunately, in our setting, the difficulty can be circumvented because we are dealing with iid random inputs. Indeed,
assume that et are iid with mean 0, variance 1 and gk(u
1,..., uk) is symmetric in u
1,..., uk and it equals zero if ui = uj for some 1 ≤ i < j ≤ k, and
Then Xn exists and is in
. Simple calculations show that
and
The Volterra process is stable if
.
Nonlinear Transforms of Linear Processes. Let (Xt) be the linear process defined in Eq. 6 and consider the transformed process Yt = K(Xt), where K is a possibly nonlinear filter. Let ω(n, Y) be the predictive dependence measure of (Yt). Assume that εi have mean 0 and finite variance. Under mild conditions on K, we have
(cf. theorem 2 in ref. 23). By Theorem 1,
. In this case, if (Xt) is stable, namely Eq. 7 holds, then (Yt) is also stable.
Quite interesting phenomena happen if (Xn) is unstable. Under appropriate conditions on K, (Yn) could possibly be stable. With a nonlinear transform, the dependence structure of (Yt) can be quite different from that of (Xn) (24-27). The asymptotic problem of
has a long history (see refs. 23 and 27 and references therein). Let
and assume
for some
. Consider the remainder of the τ-th order Volterra expansion of Yn
where
r = 0,..., τ, and
Let
and
. Under mild regularity conditions on K and εn, by theorem 5 in ref. 23,
. By Theorem 1, the predictive dependence measure ω(τ)(n) of the remainder L
(τ)(ξn) satisfies
It is possible that
while
. Consider the special case a = n
-β
l(n), where 1/2 < β < 1 and l is a slowly varying function, namely, for any c > 0. l(cn)/l(n) → 1 as n → ∞. By Karamata's Theorem (28) for j ≥ 2,
. If τ > (2β - 1)-1 - 1, then
is summable. Therefore, if the function K satisfies κr = 0 for r = 0,..., τ and (τ + 1)(2β - 1) > 1, then Yt = K(Xt) is stable even though Xt is not. Appell polynomials (29) satisfy such conditions. For example, let
, then K
∞(w) = w
2 and κ1 = 0, κ2 = 2. If β ∈ (3/4, 1), then the process
is stable. If 1/2 < β < 3/4, then Sn(K)/∥Sn(K)∥ converges to the Rosenblatt distribution.
Uniform Volterra expansions for Fn(x) over
are established in refs. 30 and 31. Wu (32) considered nonlinear transforms of linear processes with infinite variance innovations.
Nonlinear Time Series. Let εt be iid random variables and consider the recursion
where R is a measurable function. The framework 11 is quite general, and it includes many popular nonlinear time series models, such as threshold autoregressive models (33), exponential autoregressive models (34), bilinear autoregressive models, autoregressive models with conditional heteroscedasticity (35), among others. If there exists α > 0 and x
0 such that
where
then Eq. 11 admits a unique stationary distribution (36), and iterations of Eq. 11 give rise to Eq. 1. By theorem 2 in ref. 37, Eq. 12 implies that there exists p > 0 and r ∈ (0, 1) such that
where I = {..., -1, 0}. Recall
. By stationarity,
. So Eq. 13 implies
. On the other hand, by Theorem 1 (iii), if
holds for some p > 1 and for some r ∈ (0,1), then Eq. 13 also holds. So they are equivalent if p > 1. In refs. 37 and 38, the property 13 is called geometric-moment contraction, and it is very useful in studying asymptotic properties of nonlinear time series.
Inequalities and Limit Theorems
For (Xi) defined in Eq. 1, let Su = Sn + (u - n)Xn
+1, n ≤ u ≤ n + 1, n = 0, 1,..., be the partial sum process. Let Rn(s) =
[Fn(s) - F(s)], where
is the distribution function of X
0. Primary goals in the limit theory of stationary processes include obtaining asymptotic properties of {Su, 0 ≤ u ≤ n} and
. Such results are needed in the related statistical inference. The physical and predictive dependence measures provide a
natural vehicle for an asymptotic theory for Sn and Rn.
Partial Sums. Let
,
and Bp = p
/(p - 1), p > 1. Recall
and let
By Theorem 1, Θp ≤ Ωp ≤ 2Θp. Moment inequalities and limit theorems of Sn are given in Theorems 2 and 3, respectively. Denote by IB the standard Brownian motion. An interesting feature in the large deviation result in Theorem 2(ii) is that Ωp and Xk do not need to be bounded.
Theorem 2.
Let p ≥ 2. (i) We have ∥Zn∥p ≤ BpΘp ≤ BpΩp. (ii) Let 0 < α ≤ 2 and assume
Then
for 0 ≤ t < t
0, where t
0 = (eαγα)-12-α/2. Consequently, for u > 0,
.
Proof: (i) It follows from W.B.W. (unpublished results) and theorem 2.5 in ref. 39. For completeness we present the proof here. Let
and
. Then
. By Doob's maximal inequality and theorem 2.5 in ref. 39 (or proposition 4 in ref. 16),
Since
, (i) follows. (ii) Let Z = Zn and p
0 = [2/α] + 1. By Stirling's formula and Eq. 14
By (i), since
, (ii) follows from
Example 1: For the linear process 6, assume that
and
. We now apply (ii)of Theorem 2 to the sum
, where g̃(ξi) = 1
Xi
≤
u - F(u). To this end, we need to calculate the predictive dependence measure ωp(n, g̃) (say) of the process g̃(ξn). Without loss of generality let a
0 = 1. Let F
ε and f
ε be the distribution and density functions of ε0 and assume c:= supuf
ε(u) < ∞. Then Eq. 14 holds with α = 1. To see this, let Yn
-1 = Xn - εn, Zn
-1 = Yn
-1 - anε0 and
. Let n ≥ 1. Then
and
. By the triangle inequality,
Hence,
. Since
, we have
. Clearly, 0 ≤ Qn ≤ 1. So
, where C = 2cA. For η > 0 let the set
. By Eq. 15
Condition 15 holds if
.
Theorem 3. (i) Assume that Ω2 < ∞. Then
where
. (ii) Let 2 < p ≤ 4 and assume that
. Then on a possibly richer probability space, there exists a Brownian motion IB such that
where l(n) = (log n)1/2+1/
p(log log n)2/
p.
The proof of the strong invariance principle (ii) is given by W.B.W. (unpublished results). Theorem 3(i) follows from corollary 3 in ref. 40, and the expression
is a consequence of the martingale approximation: let
and Mn = D
1 +... + Dn, then ∥Sn - Mn∥ = o(
) and ∥Sn∥/
= σ + o(1) (see theorem 6 in ref. 41). Theorem 3(i) also can be proved by using the argument in ref. 42. The invariance principle in the latter paper has a slightly different form. We omit the details. See refs. 43 and 44 for some related works.
Empirical Distribution Functions. Let
,
, be the conditional distribution function of Xi given ξ0. By Definition 2, the predictive dependence measure for g̃(ξi) = 1
Xi
≤
u - F(u), at a fixed u,is
. To study the asymptotic properties of Rn, it is certainly necessary to consider the whole range u ∈ (-∞, ∞). To this end, we introduce the integrated predictive dependence measure
and the uniform predictive dependence measure
where
, j = 0, 1,..., i ≥ 1. Let
. Theorem 4 below concerns the weak convergence of Rn based on
. It follows from corollary 1 by W.B.W. (unpublished results).
Theorem 4.
Assume that
and
for some positive constants τ, c
0 < ∞. Further assume that
Then
, where W is a centered Gaussian process.
Kernel Density Estimation. An important problem in nonparametric inference of stochastic processes is to estimate the marginal density function f (say) given the data X
1,..., Xn. A popular method is the kernel density estimation (45, 46). Let K be a bounded kernel function for which
and bn > 1 be a sequence of bandwidths satisfying
Let Kb(x) = K(x/b). Then f can be estimated by
If Xi are iid, Parzen (46) proved a central limit theorem for
under the natural condition 21. There has been a substantial literature on generalizing Parzen's result to time series (47, 48). Wu and Mielniczuk (49) solved the open problem that, for short-range dependent linear processes, Parzen's central limit theorem holds under Eq. 21. See references therein for historical developments. Here, we shall generalize the result in ref. 49 to nonlinear processes. To this end, we shall adopt the uniform predictive dependence measure 19. The asymptotic normality of fn requires a summability condition of
.
Theorem 5.
Assume that
for some constant c
0 < ∞ and that f = F′ is continuous. Let
. Then under
Eq.21
and
we have
for every
.
Proof: Let m be a nonnegative integer. By the identity
and the Lebesgue dominated convergence theorem, we have
and hm
+1 is also bounded by c
0. By Theorem 1(ii),
. Let
. By Theorem 2(i) and Eq. 23
Let
and
. Observe that
Then
. Following the argument of lemma 2 in ref. 49, Mn/
⇒ N[0, f(x)κ], which finishes the proof since
and bn → 0.
Acknowledgments
I thank J. Mielniczuk, M. Pourahmadi, and X. Shao for useful comments. I am very grateful for the extremely helpful suggestions of two reviewers. This work was supported by National Science Foundation Grant DMS-0448704.
Footnotes
-
↵ † E-mail: wbwu{at}galton.uchicago.edu.
-
Author contributions: W. B. W. designed research, performed research, and wrote the paper.
-
Abbreviation: iid, independent and identically distributed.
- Copyright © 2005, The National Academy of Sciences





