Compressed Ridge Regression
DuckRidge adds ridge regression to the compressed linear-regression workflow. It supports a single penalty value, a full lambda path, and cross-validation over compressed folds.
Constructor
Show code
DuckRidge(
db_name: str,
table_name: str,
formula: str,
lambda_grid=None,
cv_folds: int = 5,
seed: int = 42,
n_bootstraps: int = 0,
rowid_col: str = "rowid",
fitter: str = "ridge",
)DuckRidge currently supports one outcome variable and does not support fixed effects.
Ridge Objective
After compression, the estimator solves
\[ \hat{\beta}_\lambda = \arg\min_\beta \sum_g n_g(\bar{y}_g - x_g'\beta)^2 + \lambda \|\beta\|_2^2. \]
The closed form is
\[ \hat{\beta}_\lambda = (X'WX + \lambda I)^{-1}X'W\bar{y}. \]
The implementation uses an augmented least-squares representation:
\[ \tilde{X} = \begin{bmatrix} W^{1/2}X \\ \sqrt{\lambda} I \end{bmatrix}, \qquad \tilde{y} = \begin{bmatrix} W^{1/2}\bar{y} \\ 0 \end{bmatrix}. \]
Then it calls np.linalg.lstsq.
Single Lambda
Show code
from duckreg.regularized import DuckRidge
duck_ridge = DuckRidge(
db_name="ridge_test.db",
table_name="data",
formula="Y ~ D + f1 + f2 + f3",
lambda_grid=[0.1],
cv_folds=1,
seed=42,
)
duck_ridge.fit(lambda_selection="single")
duck_ridge.summary()Lambda Path
The batch path precomputes weighted \(X\) and \(y\), then solves over the full lambda grid:
Show code
duck_ridge = DuckRidge(
db_name="ridge_test.db",
table_name="data",
formula="Y ~ D + f1 + f2 + f3",
lambda_grid=np.logspace(-4, 2, 50),
cv_folds=1,
seed=42,
)
duck_ridge.fit(lambda_selection="path")
duck_ridge.summary()["lambda_path_coefs"]Cross-Validation
When cv_folds > 1, prepare_data() creates a temporary table with fold assignments:
Show code
CREATE TEMP TABLE cv_folds AS
SELECT *,
(ROW_NUMBER() OVER (ORDER BY rowid)) % 5 AS fold_id
FROM dataCompression then includes fold_id in the group-by columns. Cross-validation loops over folds, fits on compressed training rows, predicts on compressed test rows, and computes weighted MSE.
Example from notebooks/regularized.ipynb:
Show code
duck_ridge_cv = DuckRidge(
db_name="ridge_cv_test.db",
table_name="data",
formula="Y ~ D + f1 + f2 + f3",
lambda_grid=np.logspace(-3, 1, 20),
cv_folds=5,
seed=42,
)
duck_ridge_cv.fit(lambda_selection="cv")
duck_ridge_cv.best_lambda
duck_ridge_cv.summary()Plot Helpers
DuckRidge includes:
Show code
duck_ridge_cv.plot_cv_curve()
duck_ridge_cv.plot_coefficient_path()These require matplotlib.
Inference
Bootstrap standard errors are intentionally not implemented for ridge. Regularization changes the sampling behavior of the estimator, and a naive OLS-style bootstrap would be easy to misinterpret. The API raises NotImplementedError for ridge bootstrap.