When simulating random variables in statistics or machine learning, we often need samples from a standard normal distribution. However, programming languages that are not focused on statistics often lack a rich suite of random number generators for various distributions. Therefore, it is essential to know how to implement such generators from scratch, typically using uniformly distributed pseudo-random numbers. This is where the Box-Muller Transform comes into play—a clever method to transform two uniform random variables into two independent standard normal random variables.

In this post, I’ll explain the theory behind the Box-Muller algorithm and demonstrate how to implement it in Python.

Some Trigonometry

For the key idea of the algorithm, we need to understand how a point in the bivariate Cartesian coordinate system can be translated to polar coordinates. I’ve illustrated this in Figure 1:

In the bivariate Cartesian system (left subplot), we need the x and y coordinates to define the point $A$ $(3,4)$. Alternatively, we can describe this point using polar coordinates (right subplot): $A$ can be uniquely determined by its Euclidean distance $r=5$ from the origin (radius), and the angle $\theta \approx 53.13°$ between the radius vector and the positive x-axis. The relationship between these coordinate systems is given by:

$$x = r \cos(\theta), \quad y = r \sin(\theta).$$

Code
import numpy as np
import matplotlib.pyplot as plt

def cartesian_to_polar(x, y):

    r = np.sqrt(x**2 + y**2)
    theta = np.arctan2(y, x)
    return r, theta

def plot_vector_cartesian_polar(x, y):

    r, theta = cartesian_to_polar(x, y)
    
    # create a figure
    fig = plt.figure(figsize=(14, 6.5))
    
    #### left Subplot ####
    ax_cart = fig.add_subplot(1, 2, 1)
    limit = max(abs(x), abs(y), r) + 1
    ax_cart.set_xlim(-limit, limit)
    ax_cart.set_ylim(-limit, limit)
    
    # blue vector
    ax_cart.arrow(0, 0, x, y, head_width=0.3, head_length=0.5, fc='blue', ec='blue', length_includes_head=True)
    
    # projection lines
    ax_cart.plot([x, x], [0, y], 'k--', linewidth=1)
    ax_cart.plot([0, x], [y, y], 'k--', linewidth=1)
    
    # green arc for theta
    theta_range = np.linspace(0, theta, 100)
    arc_x = 1 * np.cos(theta_range)
    arc_y = 1 * np.sin(theta_range)
    ax_cart.plot(arc_x, arc_y, 'g-', alpha=0.5)
    
    # annotations
    ax_cart.annotate('x = r cos θ = 3', xy=(.25, 4.25), textcoords="offset points", xytext=(10,10), fontsize=12)
    ax_cart.annotate('y = r sin θ = 4', xy=(3.1, 2), textcoords="offset points", xytext=(10,10), fontsize=12)
    ax_cart.annotate(f'r = {r:.2f}', xy=(0, 2), textcoords="offset points", xytext=(10,10), fontsize=12, color='blue')
    ax_cart.annotate(f'θ = {np.degrees(theta):.2f}°', xy=(-.1, -.1), textcoords="offset points", xytext=(10, -15), fontsize=12, color='green')
    
    # labels and title
    ax_cart.set_title('Cartesian Coordinates', fontsize=14)
    ax_cart.set_xlabel('X')
    ax_cart.set_ylabel('Y')
    ax_cart.grid(True)
    ax_cart.set_aspect('equal')
    
    ##### right Subplot ####
    ax_polar = fig.add_subplot(1, 2, 2, projection='polar')
    
    # blue vector in polar coordinates
    ax_polar.arrow(theta, 0, 0, r, 
                   width=0.01, head_width=0.05, head_length=0.5, 
                   fc='blue', ec='blue', length_includes_head=True)
    
    # annotations
    ax_polar.annotate(f'r = {r:.2f}', xy=(np.pi * .45, 3.5), 
                      textcoords="offset points", xytext=(10,10), fontsize=12, color='blue')
    ax_polar.annotate(f'θ = {np.degrees(theta):.2f}°', xy=(theta, r+1), 
                      textcoords="offset points", xytext=(0, 10), fontsize=12, color='green')
    
    # title
    ax_polar.set_title('Polar Coordinates', fontsize=14)
    
    # r lim
    ax_polar.set_rlim(0, limit)
    
    # enable grid and labels
    ax_polar.grid(True)
    
    # layout adjustments
    plt.subplots_adjust(left=0.01, right=0.99, top=0.99, bottom=0.01, wspace=0.01)
    plt.show()

    
plot_vector_cartesian_polar(x = 3, y = 4)
Figure 1: Cartesian coordinates vs polar coordinates

What about Gaussians?

How does Figure 1 relate to bivariate Gaussian data? Remember that the desired probability density of $(X, Y)$ is $$ f_{X,Y}(x, y) = \frac{1}{2\pi} e^{-\frac{1}{2}(x^2 + y^2)}, $$ which is a radially symmetric distribution with respect to the origin. Using the trigonometry of Figure 1, we can define the bivariate Gaussian distribution using polar coordinates. This is a joint distribution of the r.v.s radius $R$ (the distance to the origin) and the angle $\Theta$ (between the point vector and the positive X axis). To derive the density for this joint distribution, we note that $R$ and $\Theta$ come from applying a transform $g(\cdot)$ to $X$ and $Y$:

$$ \begin{bmatrix}r\\ \theta\end{bmatrix} = g\begin{pmatrix}\begin{bmatrix}x\\ y\end{bmatrix} \end{pmatrix} := \begin{bmatrix}\sqrt{x^2 + y^2}\\ \tan^{-1}\left(\frac{y}{x}\right) \end{bmatrix}, \qquad r\geq0, \quad \theta\in[0,2\pi]. $$

The inverse transformation is

$$ \begin{align} \begin{bmatrix}x \\ y \end{bmatrix} = g^{-1}\begin{pmatrix}\begin{bmatrix}r\\ \theta\end{bmatrix}\end{pmatrix} := \begin{bmatrix} r \cdot \cos \theta \\ r \cdot \sin \theta \end{bmatrix}.\label{eq:transxy} \end{align} $$

The change of variables (CoV) formula gives the joint density of $[R \ \Theta]’$ as

$$ \begin{align} f_{R,\Theta}(r,\theta) =&\, f_{X,Y}\left(g^{-1}\left([r\ \theta]’\right)\right) \cdot \left\lvert \det\left[\frac{\mathrm{d}g^{-1}(\boldsymbol{z})}{\mathrm{d}\boldsymbol{z}}\Big\vert_{\boldsymbol{z} = [r\ \theta]’}\right] \right\rvert\notag\\ =&\, f_{X,Y}\left(g^{-1}\left([r\ \theta]’\right)\right) \cdot \lvert J\rvert,\label{eq:vtvcov} \end{align} $$

The Jacobian determinant $J$ for this CoV is computed as

$$ J = \det\begin{pmatrix} \frac{\partial x}{\partial r} & \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r} & \frac{\partial y}{\partial \theta} \end{pmatrix}. $$

From the components in \eqref{eq:transxy} we get

$$ \begin{align*} \frac{\partial x}{\partial r} =&\, \cos\theta, \quad \frac{\partial x}{\partial \theta} = -r \sin\theta,\\ \frac{\partial y}{\partial r} =&\, \sin\theta, \quad \frac{\partial y}{\partial \theta} = r \cos\theta. \end{align*} $$

The Jacobian determinant hence becomes $$ \begin{align*} J =&\, \det\begin{pmatrix} \cos\theta & -r \sin\theta \\ \sin\theta & r \cos\theta \end{pmatrix}\\ =&\, \cos\theta \cdot r \cos\theta - \sin\theta \cdot (-r \sin\theta)\\ =&\, r \cdot (\cos^2\theta + \sin^2\theta)\\ =&\, r, \end{align*} $$ and so $|J| = r$. Substituting the expressions in \eqref{eq:vtvcov} obtains $$ \begin{align*} f_{R,\Theta}(r, \theta) =&\, \frac{1}{2\pi} r\exp\left(-\frac{1}{2}r^2\right)\\ =&\, f_\Theta(\theta) \cdot f_R(r) \end{align*} $$ where $$ \begin{align} f_\Theta(\theta) = \frac{1}{2\pi} \quad \textup{and} \quad f_R(r) = r \cdot \exp\left(-\frac{1}{2}r^2\right).\label{eq:rthetadens} \end{align} $$ Note that the independence of $\Theta$ and $R$ follows from the independence of $X$ and $Y$. Therefore, $f_{R, \Theta}(r, \theta)$ is simply the product of the marginal densities $f_R(r)$ and $f_\Theta(\theta)$.

Let’s look closer at the two densities in \eqref{eq:rthetadens}:

  • $f_\Theta(\theta)$ is the density of a uniformly distributed r.v. $\Theta$ on $[0,2\pi]$. This seems intuitive: $\theta$ is a randomly drawn angle, measured in radians.

  • $f_R(r)$ is the density of a Rayleigh distributed r.v. $R$ on $[0,\infty)$ with parameter $\sigma = 1$. This also makes intuitive sense: The probability of observing larger radii $r$ decreases exponentially, which matches our expectations for a bivariate normal distribution.

The Box-Muller Algorithm

The Box-Muller Transform elegantly converts uniform random variables to Gaussian ones: It uses samples from two independent standard uniform random variables, $U_1,\,U_2\sim U[0,1]$ to generate samples of $R$ and $\Theta$, cf. the distributions in \eqref{eq:rthetadens}. These samples are then transformed to independent Gaussian samples using of $g^{-1}(\cdot)$. Here’s the algorithm.

Algorithm: Box-Muller

  1. Sample $U_1, U_2 \sim U(0, 1)$.
  2. Obtain polar coordinates: $$ R = \sqrt{-2 \log U_1}, \quad \Theta = 2\pi U_2. $$
  3. Transform back to Cartesian coordinates: $$ X = R \cos \Theta, \quad Y = R \sin \Theta. $$

For step 2, it is straightforward to show that $$ \Theta = 2\pi U_2 \sim U[0,2\pi]. $$ Here’s why $R = \sqrt{-2 \log U_1}\sim \textup{Rayleigh}(\sigma=1)$:

The CDF of a $\textup{Rayleigh}(\sigma=1)$ r.v. $R$ is $F_R(z) = 1 - \exp(-z^2/2)$. We solve $F_{R}(z) = U_1$ for $z$, where $U_1 \sim \text{U}[0,1]$: $$ z = \sqrt{-2 \log(U)}. $$

We thus use inverse transform sampling to sample from $R$.

Python Implementation

Here’s how to implement the Box-Muller Transform in Python function:

import numpy as np

def box_muller_transform(n_samples):
    # step 1:
    U1 = np.random.uniform(0, 1, n_samples)
    U2 = np.random.uniform(0, 1, n_samples)
    
    # step 2:
    R = np.sqrt(-2 * np.log(U1))
    Theta = 2 * np.pi * U2
    
    # step 3: 
    X = R * np.cos(Theta)
    Y = R * np.sin(Theta)
    
    return U1, U2, R, Theta, X, Y

Let’s generate some samples using box_muller_transform(). Note that the function not only outputs draws of $X$ and $Y$ but also draws of the uniform random variables $U_1$, $U_2$ and as well as draws of the polar coordinates $R$ and $\Theta$.

# generate samples
U1, U2, R, Theta, X, Y = box_muller_transform(n_samples = 2500)

Visualization

To inspect the outcomes, we create visualizations using matplotlib and seaborn. We arrange these plots in a grid layout.

import matplotlib.gridspec as gridspec

# subplots
fig = plt.figure(figsize=(12, 15))
gs = gridspec.GridSpec(3, 1, height_ratios=[1, 1, 1])

# plot uniform samples U1 vs U2
ax1 = fig.add_subplot(gs[0, 0])
ax1.scatter(U1, U2, alpha=0.3, edgecolor='none', s=10)
ax1.set_title('Uniform Samples (U₁ vs U₂)')
ax1.set_xlabel('U₁')
ax1.set_ylabel('U₂')
ax1.set_xlim(0, 1)
ax1.set_ylim(0, 1)
ax1.set_aspect('equal')

# R vs Theta in polar plot
ax2 = fig.add_subplot(gs[1, 0], projection='polar')
ax2.scatter(Theta, R, alpha=0.5, edgecolor='none', s=10)
ax2.set_title('Polar Coordinates (R vs Θ)', va='bottom')

# Gaussian samples X vs Y
ax3 = fig.add_subplot(gs[2, 0])
ax3.scatter(X, Y, alpha=0.3, edgecolor='none', s=10)
ax3.set_title('Gaussian Samples (X vs Y)')
ax3.set_xlabel('X')
ax3.set_ylabel('Y')
ax3.set_xlim(-5, 5)
ax3.set_ylim(-5, 5)
ax3.set_aspect('equal')
ax3.grid(True, which='both', linestyle='--', linewidth=0.5)

# adjust layout
plt.tight_layout()
plt.show()

Figure 2: Steps of Box-Muller Transform

Kernel density estimates of the marginal distributions of $X$ and $Y$ closely resemble the charactersitic bell shape.

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import norm

# gaussian densities
x_vals = np.linspace(-4, 4, 1000)
gaussian_dens = norm.pdf(x_vals)

plt.figure(figsize=(10, 7))

# KDE X
plt.subplot(1, 2, 1)
plt.plot(
    x_vals, gaussian_dens, 
    color='red', linestyle='--', label='Standard Gaussian'
)
sns.kdeplot(X, color='blue', label='KDE')
plt.title("KDE of X")

# KDE Y
plt.subplot(1, 2, 2)
plt.plot(
    x_vals, gaussian_dens, 
    color='red', linestyle='--', label='Standard Gaussian'
)
sns.kdeplot(Y, color='blue', label='KDE')
plt.title("KDE of Y")

# legend
handles, labels = plt.gca().get_legend_handles_labels()
fig = plt.gcf()
fig.legend(handles, labels, loc='lower center', ncol=2, frameon=False)

plt.tight_layout(rect=[0, 0.1, 1, 1])

plt.show()

Figure 3: KDEs of Marginal Distributions

… and that’s it for now!