-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy path01-intro.Rmd
831 lines (613 loc) · 44.1 KB
/
01-intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
```{r, echo=FALSE, include=FALSE}
library(astsa)
library(mgcv)
library(simts)
library(imudata)
library(timeDate)
library(tsdl)
```
```{r introduction_code, echo = FALSE, warning=FALSE, message=FALSE}
library(astsa)
library(imudata)
library(simts)
```
# Basic Elements of Time Series {#introtimeseries}
>
> "*Prévoir consiste à projeter dans l’avenir ce qu’on a perçu dans le passé.*" – Henri Bergson
>
```{block2, type='rmdimportant'}
To make use of the R code within this chapter you will need to install (if not already done) and load the following libraries:
- [simts](http://simts.smac-group.com/);
- [astsa](https://cran.r-project.org/web/packages/astsa/index.html);
- [mgcv](https://cran.r-project.org/web/packages/mgcv/index.html).
These libraries can be install as follows:
```
```{r, eval=FALSE}
install.packages(c("devtools", "astsa", "mgcv"))
devtools::install_github("SMAC-Group/simts")
```
and simply load them using:
```{r, eval=FALSE}
library(astsa)
library(mgcv)
library(simts)
```
We can start the discussion on the basic elements of time series by using a practical example from real data made available through the `R` software. The data represent the global mean land–ocean temperature shifts from 1880 to 2015 (with base index being the average temperatures from 1951 to 1980) and this time series is represented in the plot below.
```{r glotempExample, fig.height = 4.5, fig.width = 7, cache = TRUE, fig.align='center'}
# Load data
data(globtemp, package = "astsa")
# Construct gts object
globtemp = gts(globtemp, start = 1880, freq = 1, unit_ts = "C", name_ts = "Global Temperature Deviations", data_name = "Evolution of Global Temperatures")
# Plot time series
plot(globtemp)
```
These data have been used as a support in favour of the argument that the global temperatures are increasing and that global warming has occured over the last half of the twentieth century. The first approach that one would take is to try and measure the average increase by fitting a model having the form:
$$
X_t = f(t) + \varepsilon_t,
$$
where $X_t$ denotes the global temperatures deviation and $f(\cdot)$ is a "smooth" function such that $\mathbb{E}[X_t] - f(t) = 0$ for all $t$. In general, $\varepsilon_t$ is assumed to follow a normal distribution for simplicity. The goal in this context would therefore be to evaluate if $f(t)$ (or a suitable estimator of this function) is an increasing function (especially over the last decades). In order to do so, we would require the residuals from the fitted model to be independently and identically distributed (iid). Let us fit a (nonparametric) model with the years (time) as explanatory variable using the code below:
```{r fitGam}
time = gts_time(globtemp)
fit = gam(globtemp ~ s(time))
```
and check the residuals from this model using:
```{r gamresid, fig.height = 5.5, fig.width = 6.5, fig.align='center'}
check(fit, simple = TRUE)
```
It can be seen from the upper left plot that the trend appears to be removed and, if looking at the residuals as one would usually do in a regression framework, the residual plots seem to suggest that the modelling has done a relatively good job since no particular pattern seems to emerge and their distribution is quite close to being Gaussian.
However, is it possible to conclude from the plots that the data are *iid* (i.e. independent and identically distributed)? More specifically, can we assume that the residuals are independent? This is a fundamental question in order for inference procedures to be carried out in an appropriate manner and to limit false conclusions. Let us provide an example through a simulated data set where we know that there is an upward trend through time and our goal would be to show that this trend exists. In order to do so we consider a simple model where $f(t)$ has a simple parametric form, i.e. $f(t) = \beta \cdot t$ and we employ the following data generating process:
$$X_t = \beta \cdot t + Y_t,$$
where
$$Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \varepsilon_t,$$
and where $\varepsilon_t \sim \mathcal{N}(0, \sigma^2)$. Intuitively, $Y_t$ is not an *iid* sequence of random variables except in the case where $\phi_1 = \phi_2 = 0$. In the following chapters we shall see that this intuition is correct and that this model is known as an AR(2) model. Considering this, we simulate two cases where, in the first, the residuals are actually *iid* Gaussian while, in the second, the residuals are Gaussian but are dependent over time. In the first case, the only parameters that explain $X_t$ are $\beta = 5 \cdot 10^{-3}$ and $\sigma^2 = 1$ since the residuals $Y_t$ are *iid* (i.e. $\phi_1 = \phi_2 = 0$). In the second case however, aside from the mentioned parameters we also have $\phi_1 = 0.8897$, $\phi_2 = -0.4858$. In both cases, we perform the hypothesis test:
$$
\begin{aligned}
\text{H}_0:& \;\;\; \beta = 0\\
\text{H}_1:& \;\;\; \beta > 0
\end{aligned}
$$
as our hope is to prove, similarly to the global temperature deviation example, that $f(t)$ is an increasing function. Our syntetic data are simulated as follows:
```{r}
# Set seed for reproducibility
set.seed(9)
# Define sample size
n = 100
# Define beta
beta = 0.005
# Define sigma2
sigma2 = 1
# Simulation of Yt
Yt_case1 = gen_gts(WN(sigma2 = sigma2), n = n)
Yt_case2 = gen_gts(AR(phi = c(0.95, -0.5), sigma2 = sigma2), n = n)
# Define explanatory variable (time)
time = 1:n
# Simulation of Xt
Xt_case1 = beta*time + Yt_case1
Xt_case2 = beta*time + Yt_case2
# Fit a linear models
model1 <- lm(Xt_case1 ~ time + 0)
model2 <- lm(Xt_case2 ~ time + 0)
```
The "summary" of our model on the first dataset is given by
```{r}
summary(model1)
```
As can be seen, in the first case the estimated slope ($\approx$ `r round(model1$coefficients[1], 3)`) is close to the true slope (0.005) and is significant (i.e. the p-value is smaller than the common rejection level 0.05) since the p-value of the above mentioned test is given by `r round(summary(model1)$coefficients[4]/2, 4)`. Hence, from this inference procedure we can conclude at the 5% significance level that the slope is significantly larger than zero and is roughly equal to `r round(model1$coefficients[1], 3)` (which is relatively close to the truth). However, let us perform the same analysis when the residuals are not independent (the second case) by examining its "summary":
```{r}
summary(model2)
```
In this case we can observe that the p-value of the above mentioned test is given by `r round(summary(model2)$coefficients[4]/2, 4)` and is therefore greater than the arbitrary value of 0.05. Consequently, we don't have evidence to conclude that the slope coefficient is larger than zero (i.e. we fail to reject H$_0$) although it is actually so in reality. Therefore, the inference procedures can be misleading when not taking into account other possible significant variables or, in this case, forms of dependence that can hide true underlying effects. The above is only one example and there are therefore cases where, despite dependence in the residuals, the estimated slope would be deemed significant even when not considering this dependence structure. However, if we decided to repeat this experiment using a larger quantity of simulated samples, we would probably see that we fail to reject the null hypothesis much more frequently in the case where we don't consider dependence when there actually is.
These examples therefore highlight how the approach to analysing time series does not only rely on finding an appropriate model that describes the evolution of a variable as a function of time (which is deterministic). Indeed, one of the main focuses of time series analysis consists in modelling the dependence structure that describes how random variables impact each other as a function of time. In other words, a time series is a collection of random variables whose interaction and dependence structure is indexed by time. Based on this structure, one of the main goals of time series analysis is to correctly estimate the dependence mechanism and consequently deliver forecasts that are as accurate as possible considering the deterministic functions of time (and other variables) as well as the random dependence structure.
## The Wold Decomposition
The previous discussion highlighted how a time series can be decomposed into a deterministic component and a random component. Leaving aside technical rigour, this characteristic of time series was put forward in Wold's Decomposition Theorem who postulated that a time series $(Y_t)$ (where $t = 1,...,n$ represents the time index) can be very generically represented as follows:
$$Y_t = D_t + W_t,$$
where $D_t$ represents the deterministic part (or *signal*) that can be modelled through the standard modelling techniques (e.g. linear regression) and $W_t$ that, restricting ourselves to a general class of processes, represents the random part (*noise*) that requires the analytical and modelling approaches that will be tackled in this book.
Typically, we have $\mathbb{E}[Y_t] \neq 0$ while $\mathbb{E}[W_t] = 0$ (although we may have
$\mathbb{E}[W_t | W_{t-1}, ..., W_1] \neq 0$). Such models impose some parametric
structure which represents a convenient and flexible way of studying time series
as well as a means to evaluate *future* values of the series through forecasting.
As we will see, predicting future values is one of the main aspects of time
series analysis. However, making predictions is often a daunting task or as
famously stated by Nils Bohr:
>
> "*Prediction is very difficult, especially about the future.*"
>
There are plenty of examples of predictions that turned out to be completely
erroneous. For example, three days before the 1929 crash, Irving Fisher,
Professor of Economics at Yale University, famously predicted:
>
> "*Stock prices have reached what looks like a permanently high plateau*".
>
Another example is given by Thomas Watson, president of IBM, who said in 1943:
>
> "*I think there is a world market for maybe five computers.*"
>
Let us now briefly discuss the two components of a time series.
### The Deterministic Component (Signal)
Before shifting our focus to the random component of time series, we will first just underline the main features that should be taken into account for the deterministic component. The first feature that should be analysed is the *trend* that characterises the time series, more specifically the behaviour of the variable of interest as a specific function of time (as the global temperature time series seen earlier). Let us consider another example borrowed from @shumway2010time of time series based on real data, i.e. the quarterly earnings of Johnson & Johnson between 1960 and 1980 represented below.
```{r jjexample, fig.height = 4.5, fig.width = 7, cache = TRUE, fig.align='center'}
# Load data
data(jj, package = "astsa")
# Construct gts object
jj = gts(jj, start = 1960, freq = 4, unit_ts = "$", name_ts = "Quarterly Earnings per Share", data_name = "Johnson & Johnson Quarterly Earnings")
# Plot time series
plot(jj)
```
As can be seen from the plot, the earnings appear to grow over time, therefore we can imagine fitting a straight line to this data to describe its behaviour by considering the following model:
\begin{equation}
X_t = \alpha + \beta t + \varepsilon_t,
(\#eq:modeljjexample)
\end{equation}
where $\varepsilon_t$ is iid Gaussian. The results are presented in the graph below:
```{r jjexample2, fig.height = 4.5, fig.width = 7, cache = TRUE, fig.align='center'}
# Fit linear regression
time_jj = gts_time(jj)
fit_jj1 = lm(as.vector(jj) ~ time_jj)
# Plot results and add regression line
plot(jj)
lines(time_jj, predict(fit_jj1), col = "red")
legend("bottomright", c("Time series", "Regression line"),
col = c("blue4", "red"), bty = "n", lwd = 1)
```
Although the line captures a part of the behaviour, it is quite clear that the trend of the time series is not linear as can be observed from the diagnotic plot below:
```{r lmresid, fig.height = 5.5, fig.width = 6.5, fig.align='center'}
check(fit_jj1, simple = TRUE)
```
It could therefore be more appropriate to define another function of time to describe it and, consequently, we add a quadratic term of time to obtain the following fit. Therefore, the model considered in \@ref(eq:modeljjexample) becomes:
\begin{equation}
X_t = \alpha + \beta_1 t + \beta_2 t^2 + \varepsilon_t,
(\#eq:modeljjexample2)
\end{equation}
The results of this regression are presented on the graphs below:
```{r, echo=FALSE, fig.height = 4.5, fig.width = 7, cache = TRUE, fig.align='center'}
fit_jj2 <- lm(as.vector(jj) ~ time_jj + I(time_jj^2))
plot(jj)
lines(time_jj, predict(fit_jj2), col = "red")
legend("bottomright", c("Time series", "Regression line"),
col = c("blue4", "red"), bty = "n", lwd = 1)
```
```{r lmresid2, fig.height = 5.5, fig.width = 6.5, fig.align='center'}
check(fit_jj2, simple = TRUE)
```
We can see now that the quadratic function of time allows to better fit the observed time series and closely follow the observations. However, there still appears to be a pattern in the data that isn't captured by this quadratic model. This pattern appears to be repeated over time: peaks and valleys that seem to occur at regular intervals along the time series. This behaviour is known as *seasonality* which, in this case, can be explained by the effect of a specific quarter on the behaviour of the earnings. Indeed, it is reasonable to assume that the seasons have impacts on different variables measured over time (e.g. temperatures, earnings linked to sales that vary with seasons, etc.). Let us therefore take the quarters as an explanatory variable and add it to the model considered in \@ref(eq:modeljjexample2), which becomes:
\begin{equation}
X_t = \alpha + \beta_1 t + \beta_2 t^2 + \sum_{i = 1}^4 \gamma_i I_{t \in \mathcal{A}_i} + \varepsilon_t,
(\#eq:modeljjexample3)
\end{equation}
where
\begin{equation*}
I_{t \in \mathcal{A}} \equiv \left\{
\begin{array}{ll}
1 & \mbox{if } t \in \mathcal{A} \\
0 & \mbox{if } t \not\in \mathcal{A}
\end{array}
\right. ,
\end{equation*}
and where
$$
\mathcal{A}_i \equiv \left\{x \in \mathbb{N} | x = i \; \text{mod} \; 4\right\}.
$$
The results are presented below:
```{r, echo=FALSE, fig.align='center'}
quarters = rep(1:4, 21)
fit_jj3 <- gam(as.vector(jj) ~ quarters + s(time_jj))
plot(jj)
lines(time_jj, predict(fit_jj3), col = "red")
legend("bottomright", c("Time series", "Regression line"),
col = c("blue4", "red"), bty = "n", lwd = 1)
```
```{r lmresid3, fig.height = 5.5, fig.width = 6.5, fig.align='center'}
check(fit_jj3, simple = TRUE)
```
This final fit appears to well describe the behaviour of the earnings although there still appears to be a problem of heteroskedasticity (i.e. change in variance) and random seasonality (both of which will be treated further on in this text). Hence, *trend* and *seasonality* are the main features that characterize the deterministic component of a time series. However, as discussed earlier, these deterministic components often don't explain all of the observed time series since there is often a random component characterizing data measured over time. Not considering the latter component can have considerable impacts on the inference procedures (as seen earlier) and it is therefore important to adequately analyse them (see next section).
### The Random Component (Noise)
From this section onwards we will refer to *time series as being solely the random noise component*. Keeping this in mind, a *time series* is a particular kind of *stochastic process* which, generally speaking, is a collection of random variables indexed by a set of numbers. Not surprisingly, the index of reference for a time series is given by *time* and, consequently, a time series is a collection of random variables indexed (or "measured") over time such as, for example, the daily price of a financial asset or the monthly average temperature in a given location. In terms of notation, a time series is often represented as
\[\left(X_1, X_2, ..., X_T \right) \;\;\; \text{ or } \;\;\; \left(X_t\right)_{t = 1,...,T}.\]
The time index $t$ is contained within either the set of reals, $\mathbb{R}$, or
integers, $\mathbb{Z}$. When $t \in \mathbb{R}$, the time series becomes a
*continuous-time* stochastic process such as a Brownian motion, a model used to
represent the random movement of particles within a suspended liquid or gas. However, within this book, we will limit ourselves to the cases where $t \in \mathbb{Z}$, better known as *discrete-time* processes. Discrete-time processes are measured sequentially at fixed
and equally spaced intervals in time. This implies that we will uphold two general assumptions for the time series considered in this book:
1. $t$ is not random, e.g. the time at which each observation is measured is known, and
2. the time between two consecutive observations is constant.
This book will also focus on certain representations of time series based on parametric probabilistic models. For example, one of the fundamental probability models used in time series analysis is called the *white noise* model and is defined as
\[X_t \mathop \sim \limits^{iid} N(0, \sigma^2).\]
This statement simply means that $(X_t)$ is normally distributed and independent over time. Ideally, this is the type of process that we would want to observe once we have performed a statistical modelling procedure. However, despite it appearing to be an excessively simple model to be considered for time series, it is actually a crucial component to construct a wide range of more complex time series models (see Chapter \@ref(fundtimeseries)). Indeed, unlike the white noise process, time series are typically *not* independent over time. For example, if we suppose that the temperature in State College is unusually low on a given day, then it is reasonable to assume that the temperature the day after will also be low.
With this in mind, let us now give a quick overview of the information that can be retrieved on a time series from a simple descriptive representation.
## Exploratory Data Analysis for Time Series {#eda}
When dealing with relatively small time series (e.g. a few thousands or less), it is
often useful to look at a graph of the original data. A graph can be
an informative tool for "detecting" some features of a time series such as trends and
the presence of outliers. This is indeed what was done in the previous paragraphs when analysing the global temperature data or the Johnson & Johnson data.
To go more in depth with respect to the previous paragraphs, a trend is typically assumed to be present in a time series when the data exhibit some form of long term increase or decrease or combination of increases or decreases. Such trends could be linear or non-linear and represent
an important part of the "signal" of a model (as seen for the Johnson & Johnson time series). Here are a few examples of
non-linear trends:
1. **Seasonal trends** (periodic): These are the cyclical patterns which repeat
after a fixed/regular time period. This could be due to business cycles
(e.g. bust/recession, recovery).
2. **Non-seasonal trends** (periodic): These patterns cannot be associated to
seasonal variation and can for example be due to an external variable such as,
for example, the impact of economic indicators on stock returns. Note that such
trends are often hard to detect based on a graphical analysis of the data.
3. **"Other" trends**: These trends have typically no regular patterns and are
over a segment of time, known as a "window", that change the statistical
properties of a time series. A common example of such trends is given by the
vibrations observed before, during and after an earthquake.
Moreover, when observing "raw" time series data it is also interesting to
evaluate if some of the following phenomena occur:
1. **Change in Mean:** Does the mean of the process shift over time?
2. **Change in Variance:** Does the variance of the process evolve with time?
3. **Change in State:** Does the time series appear to change between "states"
having distinct statistical properties?
4. **Outliers** Does the time series contain some "extreme" observations?
(Note that this is typically difficult to assess visually.)
```{example, label="earthquake"}
In the figure below, we present an example of displacement recorded
during an earthquake as well as an explosion.
```
```{r example_EQ, fig.height = 4, fig.width = 7, cache = TRUE, fig.align='center'}
data(EQ5, package = "astsa")
data(EXP6, package = "astsa")
# Construct gts object
eq5 <- gts(EQ5, start = 0, freq = 1, unit_ts = "p/s", name_ts = "Earthquake Arrival Phases", data_name = "Earthquake Arrival Phases")
exp6 <- gts(EXP6, start = 0, freq = 1, unit_ts = "p/s", name_ts = "Explosion Arrival Phases", data_name = "Explosion Arrival Phases")
# Plot time series
plot(eq5)
plot(exp6)
```
From the graph, it can be observed that the statistical properties of the time
series appear to change over time. For instance, the variance of the time series
shifts at around $t = 1150$ for both series.
The shift in variance also opens "windows" where there appear to be distinct
states. In the case of the explosion data, this is particularly relevant around
$t = 50, \cdots, 250$ and then again from $t = 1200, \cdots, 1500$. Even within
these windows, there are "spikes" that could be considered as outliers most
notably around $t = 1200$ in the explosion series.
Extreme observations or outliers are commonly observed in real time series data, this is illustrated in the following example.
```{example, label="precipitation"}
We consider here a data set coming from the domain of hydrology. The data
concerns monthly precipitation (in mm) over a certain period of time (1907 to
1972) and is interesting for scientists in order to study water cycles. The
data are presented in the graph below:
```
```{r example_hydro, fig.height = 4, fig.width = 7, cache = TRUE, message = FALSE, warning = FALSE, fig.align='center'}
# Load hydro dataset
data("hydro")
# Simulate based on data
hydro = gts(as.vector(hydro), start = 1907, freq = 12, unit_ts = "in.",
name_ts = "Precipitation", data_name = "Hydrology data")
# Plot hydro
plot(hydro)
```
We can see how most observations lie below 2mm but there appear to be different observations that go beyond this and appear to be larger than the others. These could be possible outliers that can greatly affect the estimation procedure if not taken adequately into account.
Next, we consider an example coming from high-frequency finance. The figure below presents the returns or price innovations
(i.e. the changes in price from one observation to the
next) for Starbuck's stock on July 1, 2011 for about 150 seconds (left
panel) and about 400 minutes (right panel).
```{r example_Starbucks, fig.height = 4, fig.width = 9, cache = TRUE, message = FALSE, fig.align='center'}
# Load "high-frequency" Starbucks returns for July 01 2011
data(sbux.xts, package = "highfrequency")
# Plot returns
par(mfrow = c(1,2))
plot(gts(sbux.xts[1:89]),
main = "Starbucks: 150 Seconds",
ylab = "Returns")
plot(gts(sbux.xts),
main = "Starbucks: 400 Minutes",
ylab = "Returns")
```
It can be observed on the left panel that observations are not equally spaced.
Indeed, in high-frequency data the intervals between two points are typically not
constant and are, even worse, random variables. This implies that the time when
a new observation will be available is in general unknown. On the right panel,
one can observe that the variability of the data seems to change during the
course of the trading day. Such a phenomenon is well known in the finance
community since a lot of variation occurs at the start (and the end) of the
day while the middle of the day is associated with small changes.
Moreover, clear extreme observations can also be noted in this graph at
around 11:00.
```{example, label="imu"}
Finally, let us consider the limitations of a direct graphical representation of
a time series when the sample size is large. Indeed, due to visual limitations,
a direct plotting of the data will probably result in an uninformative
aggregation of points between which it is unable to distinguish anything. This is
illustrated in the following example.
We consider here the data coming from the calibration procedure of
an Inertial Measurement Unit (IMU) which, in general terms, is used to enhance
navigation precision or reconstruct three dimensional movements:
```
```{r, echo = FALSE, fig.align='center'}
knitr::include_url("https://www.youtube.com/embed/htoBvSq8jLA")
```
These sensors are used in a very wide range of applications such as robotics, virtual reality `r emo::ji("bear")`,
vehicle stability control, human and animal motion capture and so forth:
```{r, echo = FALSE, fig.align='center'}
knitr::include_url("https://www.youtube.com/embed/g4tgtPA54_Y")
```
The signals coming from these instruments are measured at
high frequencies over a long time and are often characterized by linear trends
and numerous underlying stochastic processes.
The code below retrieves some data from an IMU and plots it directly:
```{block2, type='rmdimportant'}
To access the IMU time series represented below you must install the imudata package which can be found at this [link](https://github.com/SMAC-Group/imudata ).
```
```{r example_IMU, fig.height = 4, fig.width = 7, cache = TRUE, message = FALSE, dev='png', fig.align='center'}
# Load IMU data
data(imu6, package = "imudata")
# Construct gst object
Xt = gts(imu6[,1], data_name = "Gyroscope data", unit_time = "hour",
freq = 100*60*60, name_ts = "Angular rate",
unit_ts = bquote(rad^2/s^2))
# Plot time series
plot(Xt)
```
Although a linear trend and other processes are present in this signal
(time series), it is practically impossible to understand or guess anything
from the plot. For this reason, other types of representations are available to understand the behaviour of a time series and will be discussed in the next chapter. Having discussed these representations (and the relative issues with these representations) let us present the basic parametric models that are used to build even more complex models to describe and predict the behaviour of a time series.
## Dependence in Time Series
In this seciton we briefly discuss the concept of dependence (within time series). As mentioned earlier, it is straightforward to assume that observations measured through time are dependent on each other (in that observations at time $t$ have some form of impact on observations at time $t+1$ or beyond). Due to this characteristic, one of the main interests in time series is prediction where, if $(X_t)_{t=1,\ldots,T}$ is an identically distributed but not independent sequence, we often want to know the value of ${X}_{T+h}$ for $h > 0$ (i.e. an estimator of $\mathbb{E}[X_{T+h}| X_T,...]$). In order to tackle this issue, we first need to understand the dependence between $X_{1},\ldots,X_{T}$ and, even before this, we have to formally define what **independence** is.
```{definition, IndepEvents, name = "Independence of Events"}
Two events $A$ and $B$ are independent if
\begin{align*}
\mathbb{P}(A \cap B) = \mathbb{P}(A)\mathbb{P}(B),
\end{align*}
with $\mathbb{P}(A)$ denoting the probability of event $A$ occuring and $\mathbb{P}(A \cap B)$ denoting the joint probability (i.e. the probability that events $A$ and $B$ occur jointly). In general, $A_{1},\ldots,A_{n}$ are independent if
\begin{align*}
\mathbb{P}(A_1 \ldots A_n) = \mathbb{P}(A_1) \ldots \mathbb{P}(A_n) \;\; \forall \; A_i \in S, \;\; i=1,\ldots,n
\end{align*}
where $S$ is the sample space.
```
<br>
```{definition, IndepRV, name = "Independence of Random Variables"}
Two random variables $X$ and $Y$ with Cumulative Distribution Functions (CDF) $F_X(x)$ and $F_Y(y)$, respectively, are independent if and only if their joint CDF $F_{X,Y}(x,y)$ is such that
\begin{align*}
F_{X,Y}(x,y) = F_{X}(x) F_{Y}(y).
\end{align*}
In general, random variables $X_1, \ldots, X_n$ with CDF $F_{X_1}(x_1), \ldots, F_{X_n}(x_n)$ are respectively independent if and only if their joint CDF $F_{X_1, \ldots, X_n}(x_1, \ldots, x_n)$ is such that
\begin{align*}
F_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = F_{X_1}(x_1) \ldots F_{X_n}(x_n).
\end{align*}
```
<br>
```{definition, iid, name = "iid sequence"}
The sequence $X_{1},X_{2},\ldots,X_{T}$ is said to be independent and identically distributed (i.e. iid) if and only if
\begin{align*}
\mathbb{P}(X_{i}<x) = \mathbb{P}(X_{j}<x) \;\; \forall x \in \mathbb{R}, \forall i,j \in \{1,\ldots,T\},
\end{align*}
and
\begin{align*}
\mathbb{P}(X_{1}<x_{1},X_{2}<x_{2},\ldots,X_{T}<x_{T})=\mathbb{P}(X_{1}<x_1) \ldots \mathbb{P}(X_{T}<x_T) \;\; \forall T\geq2, x_1, \ldots, x_T \in \mathbb{R}.
\end{align*}
```
<br>
The basic idea behind the above definitions of independence is the fact that the probability of an event regarding variable $X_i$ remains unaltered no matter what occurs for variable $X_j$ (for $i \neq j$). However, for time series, this is often not the case and $X_t$ often has some impact on $X_{t+h}$ for some $h$ (not too large). In order to explain (and predict) the impact of an observation on future observations, a series of models have been adopted through the years thereby providing a comprehensive framework to explain dependence through time. The following paragraphs introduce some of these basic models.
## Basic Time Series Models {#basicmodels}
In this section, we introduce some simple time series models that consitute the building blocks for the more complex and flexible classes of time series commonly used in practice. Before doing so it is useful to define $\Omega_t$ as all the information available up to time
$t-1$, i.e.
\[\Omega_t \equiv \left(X_{t-1}, X_{t-2}, ..., X_0 \right).\]
As we will see further on, this compact notation is quite useful.
### White Noise {#wn}
As we saw earlier, the white noise model is the building block for most time series models and, to better specify the notation used throughout this book, this model is defined as
\[{W_t}\mathop \sim \limits^{iid} N\left( {0,\sigma _w^2} \right).\]
This definition implies that:
1. $\mathbb{E}[W_t | \Omega_t] = 0$ for all $t$,
2. $\text{cov}\left(W_t, W_{t-h} \right) = \boldsymbol{1}_{h = 0} \; \sigma^2$ for
all $t, h$.
More specifically, $h \in \mathbb{N}^+$ is the time difference between lagged variables. Therefore, in this process there is an absence of temporal (or serial) correlation and it is homoskedastic (i.e. it has a constant variance). Going into further details, white noise can be categorzied into two sorts of processes: *weak* and *strong*. The process $(W_t)$ is
a *weak* white noise if
1. $\mathbb{E}[W_t] = 0$ for all $t$,
2. $\text{var}\left(W_t\right) = \sigma_w^2$ for all $t$,
3. $\text{cov} \left(W_t, W_{t-h}\right) = 0$ for all $t$ and for all $h \neq 0$.
Note that this definition does not imply that $W_t$ and $W_{t-h}$ are independent (for $h \neq 0$) but simply uncorrelated. However, the notion of independence is used to define a *strong* white noise as
1. $\mathbb{E}[W_t] = 0$ and $\text{var}(W_t) = \sigma^2 < \infty$, for all $t$,
2. $F(W_t) = F(W_{t-h})$ for all $t,h$ (where $F(W_t)$ denotes the marginal distribution of $W_t$),
3. $W_t$ and $W_{t-h}$ are independent for all $t$ and for all $h \neq 0$.
It is clear from these definitions that if a process is a strong white noise it is also a weak white noise. However, the converse is not true as shown in the following example:
```{example, label="weaknotstrong"}
Let $Y_t \mathop \sim F_{t+2}$, where $F_{t+2}$ denotes
a Student distribution with $t+2$ degrees of freedom. Assuming the
sequence $(Y_1, \ldots, Y_n)$ to be independent, we
let $X_t = \sqrt{\frac{t}{t+2}} Y_t$. Then, the process $(X_t)$ is obviously
not a strong white noise as the distribution of $X_t$ changes with $t$. However
this process is a weak white noise since we have:
- $\mathbb{E}[X_t] = \sqrt{\frac{t}{t+2}} \mathbb{E}[Y_t] = 0$ for all $t$.
- $\text{var}(X_t) = \frac{t}{t+2} \text{var}(Y_t) = \frac{t}{t+2} \frac{t+2}{t} = 1$ for all $t$.
- $\text{cov}(X_t, X_{t+h}) = 0$ (by independence), for all $t$, and for all $h \neq 0$.
```
This distinction is therefore important and will be extremely relevant when discussing the concept of "stationarity" further on in this book. In general, the white noise model is assumed to be Gaussian in many practical cases and the code below presents an example of how to simulate a Gaussian white noise process.
```{r example_WN, fig.height = 4, fig.width = 7, cache = TRUE}
n = 1000 # process length
sigma2 = 1 # process variance
Xt = gen_gts(n, WN(sigma2 = sigma2))
plot(Xt)
```
This model can be found in different applied settings and is often accompanied by some of the models presented in the following paragraphs.
### Random Walk {#rw}
The term *random walk* was first introduced by Karl Pearson in the early
nineteen-hundreds and a wide range of random walk models have been defined over the years. For example, one of the simplest forms of a random walk process can be
explained as follows: suppose that you are walking on campus and your
next step can either be to your left, your right, forward or backward
(each with equal probability). Two realizations of such processes are
represented below:
```{r RW2d, fig.height = 5, fig.width = 5.5, cache = TRUE, fig.align='center'}
set.seed(5)
RW2dimension(steps = 10^2)
RW2dimension(steps = 10^4)
```
Such processes inspired Karl Pearson's famous quote that
>
> "*the most likely place to find a drunken walker is somewhere near his starting point.*"
>
Empirical evidence of this phenomenon is not too hard to find on a Friday or Saturday night. This two-dimensional process may easily be extended to three dimensions and a simulated example of such a process is presented in the animation below:
<center>

</center>
In this text, we only consider one very specific form of random walk, namely the Gaussian random walk which can be defined as:
$$X_t = X_{t-1} + W_t,$$
where $W_t$ is a Gaussian white noise process with initial condition $X_0 = c$ (typically $c = 0$.) This process can be expressed differently by *backsubstitution* as follows:
\[\begin{aligned}
{X_t} &= {X_{t - 1}} + {W_t} \\
&= \left( {{X_{t - 2}} + {W_{t - 1}}} \right) + {W_t} \\
&= \vdots \\
{X_t} &= \sum\limits_{i = 1}^t {{W_i}} + X_0 = \sum\limits_{i = 1}^t {{W_i}} + c \\
\end{aligned} \]
A random variable following a random walk can therefore be expressed as the cumulated sum of all the random variables that precede it. The code below presents an example of how to simulate a such process.
```{r example_RW, fig.height = 4, fig.width = 7, cache=TRUE}
n = 1000 # process length
gamma2 = 1 # innovation variance
Xt = gen_gts(n, RW(gamma2 = gamma2))
plot(Xt)
```
The random walk model is often used to explain phenomena in many different areas one of which is finance where stock prices follow these kind of processes.
### First-Order Autoregressive Model {#ar1}
A first-order autoregressive model or AR(1) is a generalization of both
the white noise and the random walk processes which are both special
cases of an AR(1). A (Gaussian) AR(1) process can be defined as
\[{X_t} = {\phi}{X_{t - 1}} + {W_t},\]
where $W_t$ is a Gaussian white noise. Clearly, an AR(1) with $\phi = 0$ is
a Gaussian white noise and when $\phi = 1$ the process becomes a random walk.
```{exercise, ar1realizations}
An AR(1) is in fact a linear combination of past realisations of
a white noise $W_t$ process. Indeed, we have
\[\begin{aligned}
{X_t} &= {\phi_t}{X_{t - 1}} + {W_t}
= {\phi}\left( {{\phi}{X_{t - 2}} + {W_{t - 1}}} \right) + {W_t} \\
&= \phi^2{X_{t - 2}} + {\phi}{W_{t - 1}} + {W_t}
= {\phi^t}{X_0} + \sum\limits_{i = 0}^{t - 1} {\phi^i{W_{t - i}}}.
\end{aligned}\]
Under the assumption of infinite past (i.e. $t \in \mathbb{Z}$) and $|\phi| < 1$,
we obtain
\[X_t = \sum\limits_{i = 0}^{\infty} {\phi^i {W_{t - i}}},\]
since $\operatorname{lim}_{i \to \infty} \; {\phi^i}{X_{t-i}} = 0$.
```
From the conclusion of the above the remark, you may have noticed how we assume that the considered time series have zero expectation. The following remark justifies this assumption.
```{exercise, ar1mean}
We generally assume that an AR(1), as well as other time series
models, have zero mean. The reason for this assumption is only to simplfy the
notation but it is easy to consider, for example, an AR(1) process around an
arbitrary mean $\mu$, i.e.
\[\left(X_t - \mu\right) = \phi \left(X_{t-1} - \mu \right) + W_t,\]
which is of course equivalent to
\[X_t = \left(1 - \phi \right) \mu + \phi X_{t-1} + W_t.\]
Thus, we will generally only work with zero mean processes since adding means
is simple.
```
As for the previously presented models, we provide the code that gives an example of how an AR(1) can be simulated.
```{r example_AR1, fig.height = 4, fig.width = 7, cache=TRUE}
n = 1000 # process length
phi = 0.5 # phi parameter
sigma2 = 1 # innovation variance
Xt = gen_gts(n, AR1(phi = phi, sigma2 = sigma2))
plot(Xt)
```
The AR(1) model is one of the most popular and commonly used models in many practical settings going from biology where it is used to explain the evolution of gene expressions to economics where it is used to model macroeconomic trends.
### Moving Average Process of Order 1 {#ma1}
As seen in the previous example, an AR(1) can be expressed as a linear
combination of all past observations of the white noise process $(W_t)$. In a similar manner we can (in some sense) describe the moving average process of order 1 or MA(1) as a "truncated"
version of an AR(1). This model is defined as
\begin{equation}
X_t = \theta W_{t-1} + W_t,
\end{equation}
where (again) $W_t$ denotes a Gaussian white noise process. As we will see further on, as for the AR(1) model, this model can also be represented as a linear combination of past observations but it has different characteristics which can capture different types of dynamics in various practical cases.
An example on how to generate an MA(1) is given below:
```{r example_MA1, fig.height = 4, fig.width = 7, cache=TRUE}
n = 1000 # process length
sigma2 = 1 # innovation variance
theta = 0.5 # theta parameter
Xt = gen_gts(n, MA1(theta = theta, sigma2 = sigma2))
plot(Xt)
```
The use of this model is widespread, especially combined with the AR(1) model, and can be found in fields such as engineering where it is often used for signal processing.
### Linear Drift {#drift}
A linear drift is a very simple deterministic time series model which can be
expressed as
\[X_t = X_{t-1} + \omega, \]
where $\omega$ is a constant and with the initial condition $X_0 = c$, where $c$ is an
arbitrary constant (typically $c = 0$). This process can be expressed in a more
familiar form as follows:
\[
{X_t} = {X_{t - 1}} + \omega
= \left( {{X_{t - 2}} + \omega} \right) + \omega
= t{\omega} + c . \]
Therefore, a (linear) drift corresponds to a simple linear model with slope $\omega$ and intercept $c$.
```{exercise, remdrift}
You may argue that the definition of this model is not useful since it constitutes a simple linear model. However this model is often accompanied by other time series models (such as the ones presented earlier) and its estimation can be greatly improved when considered in conjunction with the other models.
```
Given its simple form, a linear drift can simply be generated using the code below:
```{r example_Drift, fig.height = 4, fig.width = 7, cache=TRUE}
n = 100 # process length
omega = 0.5 # slope parameter
Xt = gen_gts(n, DR(omega = omega))
plot(Xt)
```
This time series model is widely used in different areas of signal analysis where mechanical systems and measuring devices can be characterized by this type of behaviour.
## Composite Stochastic Processes {#lts}
In the previous paragraphs we defined and briefly discussed the basic time series models that can individually be used to describe and predict a wide range of phenomena in a variety of fields of application. However, their capability of capturing and explaining the different behaviours of phenomena through time increases considerably when they are combined to form so-called *composite models* (or composite processes). A composite (stochastic) process can be defined as the sum of underlying (or latent) time series models and in the rest of this book we will use the term *latent time series models* to refer to these kinds of models. A simple example of such a model is given by
\[\begin{aligned}
Y_t &= Y_{t-1} + W_t + \delta\\
X_t &= Y_t + Z_t,
\end{aligned}\]
where $W_t$ and $Z_t$ are two independent Gaussian white noise processes.
This model is often used as a first basis to approximate the number of
individuals in the context ecological population dynamics.
For example, suppose we want to study the population of Chamois in the Swiss Alps.
Let $Y_t$ denote the "true" number of individuals in this population at time $t$.
It is reasonable to assume that the number of individuals at time $t$ ($Y_t$) is (approximately) the population at the previous time $t-1$ (e.g the previous year) plus a random variation and a drift. This random variation is due to the natural randomness in ecological population
dynamics and reflects changes such as the number of predators, the abundance
of food, or weather conditions.
On the other hand, ecological *drift* is often of particular interest for ecologists as
it can be used to determine the "long" term trends of the population
(e.g. if the population is increasing, decreasing, or stable).
Of course, $Y_t$ (the number of individauls) is typically unknown and we observe
a noisy version of it, denoted as $X_t$.
This process corresponds to the true population plus a measurement error since
some individuals may not be observed while others may have been counted several
times.
Interestingly, this process can clearly be expressed as a
*latent time series model* (or composite stochastic process) as follows:
\[\begin{aligned}
R_t &= R_{t-1} + W_t \\
S_t &= \delta t \\
X_t &= R_t + S_t + Z_t,
\end{aligned}\]
where $R_t$, $S_t$ and $Z_t$ denote, respectively, a random walk,
a drift, and a white noise. The code below can be used to simulate such data:
```{r fixbug, echo=FALSE, warning=FALSE, message=FALSE, include=FALSE}
lapply(paste('package:',names(sessionInfo()$otherPkgs),sep=""),detach,character.only=TRUE,unload=TRUE)
library(simts)
```
```{r examplecomposite, fig.align='center', fig.width=6, fig.height=8}
n = 1000 # process length
delta = 0.005 # delta parameter (drift)
sigma2 = 10 # variance parameter (white noise)
gamma2 = 0.1 # innovation variance (random walk)
model = WN(sigma2 = sigma2) + RW(gamma2 = gamma2) + DR(omega = delta)
Xt = gen_lts(n = n, model = model)
plot(Xt)
```
In the above graph, the first three plots represent the latent (unobserved)
processes (i.e. white noise, random walk, and drift) and the last one represents
the sum of the three (i.e. $(X_t)$).
Let us consider a real example where these latent processes are useful to
describe (and predict) the behavior of economic variables such as Personal
Saving Rates (PSR). A process that is used for these settings is the
"random-walk-plus-noise" model, meaning that the data can be explained by a
random walk process in addition to which we observe some other process (e.g.
a white noise model, an autoregressive model such as an AR(1), etc.). The PSR
taken from the Federal Reserve of St. Louis from January 1, 1959, to May 1,
2015, is presented in the following plot:
```{r example_PSR, fig.height = 4, fig.width = 7, cache=TRUE}
# Load savingrt dataset
data("savingrt")
# Simulate based on data
savingrt = gts(as.vector(savingrt), start = 1959, freq = 12, unit_ts = "%",
name_ts = "Saving Rates", data_name = "US Personal Saving Rates")
# Plot savingrt simulation
plot(savingrt)
```
It can be observed that the mean of this process seems to vary over time,
suggesting that a random walk can indeed be considered as a possible model
to explain this data. In addition, aside from some "spikes" and occasional
sudden changes, the observations appear to gradually change from one time point
to the other, suggesting that some other form of dependence between them could
exist.