Martin C. Arnold's Homepage

A utility function for LaTeX in R plots

I found myself repeatedly writing similar code to generate pubplication ready plots that include LaTeX annotations for my papers and teaching materials. The tikzDevice R package provides the foundation for combining R plots with LaTeX. I use the magick library to convert the compiled PDF file to the desired output format. To streamline this workflow, I wrote a utility function that handles the entire pipeline. create_latex_plot <- function( plot_expr, # plot object / function call out_name, # out files name out_format = "png", # out format out_dir = ".", # out directory width = 9, # plot width height = 6, # plot height cleanup = T # remove intermediate files ) { # libraries if (!requireNamespace("tikzDevice", quietly = TRUE)) stop("Please install the 'tikzDevice' package.") if (!requireNamespace("magick", quietly = TRUE)) stop("Please install the 'magick' package.") library(tikzDevice) library(magick) # tikzDevice options options(tikzLatexPackages = c( "\\usepackage{tikz}", "\\usepackage[active,tightpage]{preview}", "\\PreviewEnvironment{pgfpicture}", "\\setlength\\PreviewBorder{0pt}", "\\usepackage{amsmath, amssymb, amsthm, amstext}", "\\usepackage{bm}" )) # file paths tex_file <- file.path(out_dir, paste0(out_name, ".tex")) pdf_file <- file.path(out_dir, paste0(out_name, ".pdf")) out_file <- file.path(out_dir, paste0(out_name, ".", out_format)) # tikz file tikz(tex_file, standAlone = TRUE, width = width, height = height) eval(plot_expr) dev.off() # Compile to PDF system( paste( "cd", out_dir, "; lualatex -output-directory .", shQuote(basename(tex_file)) ) ) # Convert the PDF to PNG image_write( image_convert( image_read_pdf(pdf_file), format = out_format ), path = out_file, format = out_format ) message("Output file created at: ", out_file) if(cleanup) { system("rm *.aux; rm *.log; rm *.tex; rm *.pdf") message("Removed intermediate files.") } } The magick package is doing the PDF to X conversion internally using ImageMagick, which provides a cross-platform solution that doesn’t depend on Ghostscript being installed. ...

Distance to degenerate gamma distribution

When working with distance measures between distributions, singularities can pose a significant challenge. This happens when one of the distributions is degenerate, concentrating all its probability mass on a single point. In this post we discussed the comparison of a Gamma distribution (representing a complex model) with a singular Gamma distribution (representing a base model) in the context of constructing a penalized complexity (PC) prior for the overdispersion parameter $\phi$ in a Bayesian negative binomial regression. In said post, I stated that the distance measure of interest for constructing the PC prior is ...

Neg. Binomial Regression and PC Priors in R-INLA

Negative Binomial as Poisson Mixture I have recently found it useful having the negative binomial (NB) distribution represented as a continuous mixture distribution. This makes it straightforward to understand how the NB distribution relates to the Poisson distribution (how the Poisson assumptions can be relaxed to allow for overdispersion in count data regression). Also, the Bayesian Poisson-Gamma mixture model is nice to illustrate the concept of penalized complexity priors. ...

Reduced-Rank Linear Discriminant Analysis: R Example

Introduction Linear Discriminant Analysis (LDA) is a widely used technique in both classification and dimensionality reduction. Its goal is to project data into a lower-dimensional subspace where class separability is maximized. While it is routinely applied in many fields, many practitioners leverage its power without fully grasping what the algorithm used actually does. Recently, during one of my applied statistical learning classes, students raised a question about the R implementation in MASS::lda(). They were curious about how the associated predict() method actually transforms the feature data data into what is given as “LD” entries in the output object. It turns out that the method transforms the feature data into a lower-dimensional space to achieve optimal class separability. More mathematically: MASS::lda() implements reduced-rank LDA, where the optimal decision boundaries are determined in a lower-dimensional feature space created by projecting the original features into that space. ...

FFT based covariance estimation in R — Pt. II

In the previous post, I discussed an approach to obtain autocovariances1 of time series data through discrete Fourier transforms that I implemented in an R function acf_fft_R(). # ACF using FFT in R acf_fft_R <- function(x) { n <- length(x) a_j <- fft(x) I_x <- Mod(a_j)^2/n return( Re(fft(I_x, inverse = T)/n) ) } An RcppArmadillo version Recently, I wrote an Armadillo version for an Rcpp project. Here’s its definition and how to source it using Rcpp: ...

FFT based covariance estimation in R — Pt. 1

Introduction Consider a vector of $n$ real values1, with entries corresponding to observations on discrete times $t=1,\dots,n$, $$ \boldsymbol x = \begin{pmatrix} x_1 & x_2 & \dots & x_{n} \end{pmatrix}’\in\mathbb R^n. $$ The discrete Fourier transform (DFT) $\{a_j \in \mathbb C\}$ of $\boldsymbol x$ at frequencies $\omega_j =j/n\in[0,2\pi]$, $j=0,1,\dots,n-1$ is defined by $$ \begin{align} a_j = \sum_{t=1}^{n} x_t e^{-i2\pi t\omega_j}. \end{align} $$ The DFT decomposes the time-domain data $\boldsymbol{x}$ into its constituent frequencies, represented by the complex coefficients $a_j$. Each coefficient describes the amplitude and phase of a sinusoidal wave at frequency $\omega_j = j/n$, capturing how much that frequency contributes to the overall data. This transformation reveals the data’s periodic patterns in the frequency domain, laying the groundwork for exploring the periodogram and its connection to the (sample) autocovariance function. ...

Note on efficient programming: column-major order

Recently, I’ve been revisiting some of my older code and have noticed how much my understanding of efficient programming has grown. While I’m still exploring whether practice truly leads to perfection, it’s clear that my skills have improved 😊. In examining my past projects, I discovered a seemingly minor oversight that had a significant impact on performance: the choice between iterating over matrix rows versus columns in matrix operations. ...