Introduction

Consider a vector of $n$ real values1, with entries corresponding to observations on discrete times $t=1,\dots,n$,

$$ \boldsymbol x = \begin{pmatrix} x_1 & x_2 & \dots & x_{n} \end{pmatrix}’\in\mathbb R^n. $$ The discrete Fourier transform (DFT) $\{a_j \in \mathbb C\}$ of $\boldsymbol x$ at frequencies $\omega_j =j/n\in[0,2\pi]$, $j=0,1,\dots,n-1$ is defined by

$$ \begin{align} a_j = \sum_{t=1}^{n} x_t e^{-i2\pi t\omega_j}. \end{align} $$

The DFT decomposes the time-domain data $\boldsymbol{x}$ into its constituent frequencies, represented by the complex coefficients $a_j$. Each coefficient describes the amplitude and phase of a sinusoidal wave at frequency $\omega_j = j/n$, capturing how much that frequency contributes to the overall data. This transformation reveals the data’s periodic patterns in the frequency domain, laying the groundwork for exploring the periodogram and its connection to the (sample) autocovariance function.

Periodogram and Autocovariance

Define $$ \begin{align} I_x(\omega_j) := \frac{1}{n}\lvert a_j \rvert^2, \label{eq:periodogram} \end{align} $$ the periodogram of $\boldsymbol{x}$. The periodogram is a raw estimate of the power spectral density (PSD) of the DGP. Useful identities are

$$ \begin{align} I_x(\omega_j) =& \, \sum_{\lvert h\rvert < n} \widehat\gamma_{x,(h)} e^{-i2\pi h\omega_j},\tag{3}\\ \notag\\ \widehat\gamma_{x,(h)} =& \, \frac{1}{n} \sum_{j=0}^{n-1} I_x(\omega_j) e^{i2\pi h\omega_j}.\tag{4} \end{align} $$

Equations (3) and (4) state that $I_x(\cdot)$ and $\widehat\gamma_{x,(h)}$, the periodogram and the sample autocovariance function of the data, are discrete Fourier transform pairs: $\widehat\gamma_{x,(h)}$ is the the (inverse) discrete Fourier transform of $I_x(\cdot)$.

Fast Fourier Transform in R

Why is that interesting? Well, it allows us to compute the sample autocovariance function of $\boldsymbol x$ using a neat algorithm called the fast Fourier transform (FFT), which is implemented in R: stats::fft().

Let’s first check that stats::fft() computes Fourier coefficients as claimed in (1)2:

y <- 1:4
fft(y) # 'forward' Fourier transform
[1] 10+0i -2+2i -2+0i -2-2i
(cf <- sapply(0:3, function(k) sum(y * exp(-1i * 0:3 * 2 * pi * k/4))))
[1] 10+0i -2+2i -2-0i -2-2i

Inverse Fourier Transform

The inverse Fourier transform is obtained as follows:

Re(fft(cf, inverse = T)/4)
[1] 1 2 3 4
Re(sapply(0:3, function(k) sum(cf * exp(1i * 0:3 * 2 * pi * k/4)))/4)
[1] 1 2 3 4

Note the change of sign in the complex exponential and that fft(... , inverse = T) returns a complex number, which is why we use Re() to get the real part of the transform (the imaginary parts are zero).

Implementing ACF using FFT

We may easily implement an R function that computes $\widehat\gamma_{x,(h)}$ using stats::fft().

# acf using fft/inverse fft
acf_fft_R <- function(x) {
  n <- length(x)  
  a_j <- fft(x)
  I_x <- Mod(a_j)^2/n
  return(
    Re(fft(I_x, inverse = T)/n)
  )
}

Example: AR(1) process

Consider the following simulated data from a stationary AR(1) DGP $$ x_t = \phi x_{t-1} + \varepsilon_t, \quad \varepsilon_t\sim N(0,1), $$

with autoregressive coefficient $\phi = .75$, for $t=1,\dots,300$.

# simulate from zero-mean AR(1) process
set.seed(0)

n <- 300
x <- as.numeric(
  arima.sim(model = list("ar" = 0.75), n = n)
)

To obtain the periodogram, we apply the forward Fourier transform to x and use formula \eqref{eq:periodogram}.

library(tibble)

# Fourier coefficients
a_j <- fft(x)

# periodogram
I_x <- Mod(a_j)^2 / n

# frequencies
frequencies <- seq(0, n - 1) / n  # normalized frequencies

# gather for plotting ggplot
periodogram_data <- tibble(
  Frequency = frequencies[1:(n / 2)],  # first half of frequencies
  Power = I_x[1:(n / 2)]               # corresponding power values
)

We can now plot the periodogram using ggplot2.

library(ggplot2)
library(cowplot)

# plot the periodogram
ggplot(periodogram_data, aes(x = Frequency, y = Power)) +
  geom_segment(
    aes(xend = Frequency, yend = 0), 
    color = "steelblue", 
    size = 1.2
  ) +
  geom_point(color = "blue", size = 1.5) +
  theme_cowplot() +
  labs(
    title = "Periodogram of AR(1) Process (phi = 0.75)",
    x = "Frequency",
    y = "Power"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12)
  )

Note that the spectrum of a stationary AR(1) process is highest at low frequencies and has exponential-like decay in power as frequency increases. This reflects the fact that the process has significant autocorrelation (or, equivalently, significant covariances) at short lags only, resulting in slowly varying time series.3

Next, we compute $\widehat\gamma_{x,(h)}, \lvert h\rvert\leq10$ for the simulated data x and compare with results from stats::acf() in a stacked barplot.

library(dplyr)
library(tidyr)

# gather data in tibble and transform
acf_tbl <- tibble(
  lag = factor(0:10),
  gamma_dfft = acf_fft_R(x)[1:11],
  gamma_stats = acf(x, type = "covariance", lag.max = 10, plot = F)$acf 
) %>% 
  pivot_longer(2:3, names_to = "func")

# plot
acf_tbl %>% 
    ggplot(aes(x = lag, xend = lag, y = 0, yend = value, color = func)) +
    geom_segment(size=1.3, alpha = .9) +
    geom_hline(yintercept = 0, lty = "dashed") +
    theme_cowplot() +
    theme(legend.position = c(.6, .8))

The results are pretty close!

Performance Benchmarking

Here comes another motivation for using acf_fft_R(): FFT-based computation can be considerably faster than stats::acf().

bench::mark(
    "acf()" = acf(x, type = "covariance", plot = F),
    "acf_fft_R()" = acf_fft_R(x), 
    check = F
)
# A tibble: 2 × 6
  expression       min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 acf()           58µs   64.5µs    13078.    13.9KB     63.0
2 acf_fft_R()   10.3µs   11.9µs    82321.    56.4KB     16.5

The difference is in $\mu\textup{s}$ but the speed of acf_fft_R() may be of advantage if we need to compute the ACF many times for many long series, e.g., in a simulation study.

Another post discusses an even faster Rcpp-based implementation and an application to long-run variance estimation.

That’s it for now 😊.


  1. The DFT is defined for complex $\boldsymbol x$ but we consider an application to real valued time series below. ↩︎

  2. stats::fft uses a FFT algorithm which is faster but numerically equivalent to (1). ↩︎

  3. The theoretical spectral density of this AR(1) DGP is $$S(f) = \frac{1}{1 - 2\cdot.75 \cos(2\pi f) + .75^2}.$$ While $S(f)$ is a continuous function on $[0,2\pi]$ the periodogram is a discrete approximation as the number of discretely spaced frequncies $f\in[0,2\pi]$ for which $S(f)$ can be estimated is limited by the sample size, as these frequencies are determined by the Fourier transform’s resolution. ↩︎