Math
Randomized numerical linear algebra replaces one large deterministic linear algebra problem with a smaller random one. The approximation is useful when the random compression preserves the geometry relevant for the downstream problem.
There are two distinct sketches in this page:
- subspace sketches, which find a low-dimensional basis for the range of a matrix;
- row sketches, which compress observations before least squares or instrumental variables.
Subspace sketches
Let \(A\) be an \(m \times n\) matrix. If \(A\) is approximately rank \(k\), its columns mostly live in a \(k\)-dimensional subspace. A randomized range finder draws a random test matrix \(\Omega \in \mathbb{R}^{n \times (k+q)}\) and forms
\[
Y = A\Omega,
\]
where \(q\) is a small oversampling parameter. The columns of \(Y\) are random linear combinations of columns of \(A\). With high probability, they span nearly the same dominant column space as \(A\). Orthogonalizing gives
\[
Y \approx QR,
\qquad
Q'Q = I.
\]
Then \(Q\) is an approximate basis for the range of \(A\). This is the primitive behind randomized SVD and randomized QR.
For a randomized SVD, project the matrix into this basis:
\[
B = Q'A.
\]
Compute the ordinary SVD of the small matrix,
\[
B = \widetilde U \Sigma V',
\]
and lift the left singular vectors back:
\[
A \approx Q \widetilde U \Sigma V'.
\]
For a randomized QR approximation, factor the same small projected matrix:
\[
B = \widetilde Q R,
\]
then lift back:
\[
A \approx (Q \widetilde Q) R.
\]
When singular values decay slowly, the leading subspace is harder to identify. Power iterations sharpen the spectrum by repeatedly applying \(A'A\) or \(AA'\) before orthogonalizing. Heuristically, singular values are raised to higher powers, so large singular directions become easier to separate from small ones.
Row sketches for OLS
OLS solves
\[
\hat\beta = \arg\min_b \lVert y - Xb \rVert_2^2,
\]
where \(X\) is \(n \times p\). If \(n\) is large and \(p\) is moderate, the expensive part is the number of rows. A row sketch draws an \(s \times n\) matrix \(S\) with \(s \ll n\) and solves
\[
\hat\beta_S
= \arg\min_b \lVert Sy - SXb \rVert_2^2.
\]
The sketch should preserve residual norms in the space spanned by \(X\) and \(y\):
\[
\lVert S(y-Xb) \rVert_2^2 \approx \lVert y-Xb \rVert_2^2
\quad \text{for relevant } b.
\]
If that approximation is good, the minimizer of the sketched objective is close to the full OLS minimizer.
The implementation uses a CountSketch embedding. Each row \(i\) is assigned to one random bucket \(h(i) \in \{1,\ldots,s\}\) and one random sign \(\sigma_i \in \{-1,+1\}\). The sketched design and outcome are accumulated as
\[
(SX)_{h(i), \cdot} \mathrel{+}= \sigma_i X_{i,\cdot},
\qquad
(Sy)_{h(i)} \mathrel{+}= \sigma_i y_i.
\]
This touches each original row once, so sketch construction is \(O(np)\) rather than \(O(snp)\).
Row sketches for IV / 2SLS
Two-stage least squares can be written with an endogenous/exogenous design \(X\) and instrument design \(Z\). With an intercept included where needed, the population of matrices is
\[
X = [1, X_{endog}, X_{exog}],
\qquad
Z = [1, X_{exog}, Z_{excluded}].
\]
The usual finite-sample 2SLS estimator is
\[
\hat\beta_{2SLS}
= (X'P_ZX)^{-1}X'P_Zy,
\qquad
P_Z = Z(Z'Z)^{-1}Z'.
\]
A row-sketched IV fit applies the same sketch \(S\) to \(X\), \(Z\), and \(y\), then runs 2SLS on the compressed problem:
\[
\hat\beta_{S,2SLS}
= ((SX)'P_{SZ}(SX))^{-1}(SX)'P_{SZ}(Sy),
\qquad
P_{SZ} = (SZ)((SZ)'(SZ))^{-1}(SZ)'.
\]
Using the same sketch for the regressors, instruments, and outcome is important: the random compression should preserve the joint geometry of the moment condition
\[
Z'(y-X\beta) = 0.
\]
The result is an approximate IV fit that can be much cheaper for tall designs, while still using the same first-stage projection logic as ordinary 2SLS.