This cached ablation checks the new randomized sketching path for tall least-squares designs.
The library path under test is OLS.fit_sketch(...), which uses a CountSketch row embedding. Each original observation is assigned to one signed sketch bucket, so the sketch construction is \(O(np)\) for an \(n \times p\) design. The sketched estimator then solves the smaller least-squares problem
\[
\hat\beta_S
= \arg\min_b \lVert S y - S X b \rVert_2^2,
\]
where \(S\) is the sparse signed embedding. This is deliberately different from a dense Gaussian/Rademacher sketch, whose direct construction would cost \(O(snp)\) for sketch size \(s\).
The goal here is not to show that sketching beats full OLS for every moderate problem. It is to document the accuracy/speed tradeoff on a reproducible synthetic design and catch obvious regressions in the implementation.
1 Setup
Show code
from html import escapeimport timeimport matplotlib.pyplot as pltimport numpy as npfrom IPython.display import HTML, displayimport crabbymetrics as cmnp.set_printoptions(precision=4, suppress=True)def html_table(headers, rows): parts = ["<table>", "<thead>", "<tr>"] parts.extend(f"<th>{escape(str(header))}</th>"for header in headers) parts.extend(["</tr>", "</thead>", "<tbody>"])for row in rows: parts.append("<tr>") parts.extend(f"<td>{cell}</td>"for cell in row) parts.append("</tr>") parts.extend(["</tbody>", "</table>"])return"".join(parts)
2 Design
The DGP is intentionally friendly to OLS: standardized Gaussian regressors, a dense linear signal, and homoskedastic additive noise. That makes the full OLS solution a useful numerical target. We vary the sketch size as a multiple of the full design dimension, including the intercept.
summary = []for multiple in multiples: block = [row for row in rows if row["multiple"] == multiple] summary.append( {"multiple": multiple,"sketch_size": block[0]["sketch_size"],"median_coef_rel_error": float(np.median([r["coef_rel_error"] for r in block])),"median_prediction_rmse_vs_full": float( np.median([r["prediction_rmse_vs_full"] for r in block]) ),"median_speedup": float(np.median([r["speedup"] for r in block])), } )summary_rows = [ [f"{entry['multiple']}x", entry["sketch_size"],f"{entry['median_coef_rel_error']:.4f}",f"{entry['median_prediction_rmse_vs_full']:.4f}",f"{entry['median_speedup']:.2f}x", ]for entry in summary]display( HTML( html_table( ["Sketch multiple","Sketch size","Median coefficient relative error","Median prediction RMSE vs full OLS","Median speedup", ], summary_rows, ) ))
Sketch multiple
Sketch size
Median coefficient relative error
Median prediction RMSE vs full OLS
Median speedup
2x
82
0.0312
0.2242
3.70x
4x
164
0.0188
0.1140
4.49x
8x
328
0.0106
0.0745
4.34x
12x
492
0.0086
0.0629
3.29x
16x
656
0.0083
0.0562
3.53x
The expected pattern is monotone accuracy improvement as the sketch dimension grows. Speedups are meaningful here because CountSketch touches each row once and then solves a much smaller least-squares problem.
4 Plots
Show code
fig, axes = plt.subplots(1, 3, figsize=(14, 4), constrained_layout=True)x_axis = np.array([entry["multiple"] for entry in summary])coef_error = np.array([entry["median_coef_rel_error"] for entry in summary])pred_error = np.array([entry["median_prediction_rmse_vs_full"] for entry in summary])speedup = np.array([entry["median_speedup"] for entry in summary])axes[0].plot(x_axis, coef_error, marker="o")axes[0].set_xlabel("Sketch size / design dimension")axes[0].set_ylabel("Median relative coefficient error")axes[0].set_title("Coefficient accuracy")axes[1].plot(x_axis, pred_error, marker="o", color="tab:orange")axes[1].set_xlabel("Sketch size / design dimension")axes[1].set_ylabel("Median RMSE vs full OLS prediction")axes[1].set_title("Prediction agreement")axes[2].plot(x_axis, speedup, marker="o", color="tab:green")axes[2].axhline(1.0, color="black", linestyle="--", linewidth=1)axes[2].set_xlabel("Sketch size / design dimension")axes[2].set_ylabel("Median full/sketch fit time")axes[2].set_title("Runtime")plt.show()
5 Takeaway
For this tall, well-conditioned design, a sketch size between \(8(p+1)\) and \(16(p+1)\) gives roughly one-percent coefficient error relative to full OLS while still reducing fit time by several times. Smaller sketches are visibly faster but can introduce enough coefficient and prediction error that they should be treated as approximate exploratory fits rather than a drop-in replacement for final inference.