duckreg
  • Home
  • Compression
  • Linear Models
  • Panel
  • DML
  • GLMs
  • Ridge
  • Inference
  • Examples
  1. Compressed Ridge Regression
  • duckreg Documentation
  • Compression and Estimator Lifecycle
  • Linear Regression API
  • Panel Estimators
  • Compressed Double Machine Learning
  • Generalized Linear Models
  • Compressed Ridge Regression
  • Inference and Variance Estimation
  • Executed Examples

On this page

  • Constructor
  • Ridge Objective
  • Single Lambda
  • Lambda Path
  • Cross-Validation
  • Plot Helpers
  • Inference

Compressed Ridge Regression

DuckRidge adds ridge regression to the compressed linear-regression workflow. It supports a single penalty value, a full lambda path, and cross-validation over compressed folds.

Constructor

Show code
DuckRidge(
    db_name: str,
    table_name: str,
    formula: str,
    lambda_grid=None,
    cv_folds: int = 5,
    seed: int = 42,
    n_bootstraps: int = 0,
    rowid_col: str = "rowid",
    fitter: str = "ridge",
)

DuckRidge currently supports one outcome variable and does not support fixed effects.

Ridge Objective

After compression, the estimator solves

\[ \hat{\beta}_\lambda = \arg\min_\beta \sum_g n_g(\bar{y}_g - x_g'\beta)^2 + \lambda \|\beta\|_2^2. \]

The closed form is

\[ \hat{\beta}_\lambda = (X'WX + \lambda I)^{-1}X'W\bar{y}. \]

The implementation uses an augmented least-squares representation:

\[ \tilde{X} = \begin{bmatrix} W^{1/2}X \\ \sqrt{\lambda} I \end{bmatrix}, \qquad \tilde{y} = \begin{bmatrix} W^{1/2}\bar{y} \\ 0 \end{bmatrix}. \]

Then it calls np.linalg.lstsq.

Single Lambda

Show code
from duckreg.regularized import DuckRidge

duck_ridge = DuckRidge(
    db_name="ridge_test.db",
    table_name="data",
    formula="Y ~ D + f1 + f2 + f3",
    lambda_grid=[0.1],
    cv_folds=1,
    seed=42,
)
duck_ridge.fit(lambda_selection="single")
duck_ridge.summary()

Lambda Path

The batch path precomputes weighted \(X\) and \(y\), then solves over the full lambda grid:

Show code
duck_ridge = DuckRidge(
    db_name="ridge_test.db",
    table_name="data",
    formula="Y ~ D + f1 + f2 + f3",
    lambda_grid=np.logspace(-4, 2, 50),
    cv_folds=1,
    seed=42,
)
duck_ridge.fit(lambda_selection="path")
duck_ridge.summary()["lambda_path_coefs"]

Cross-Validation

When cv_folds > 1, prepare_data() creates a temporary table with fold assignments:

Show code
CREATE TEMP TABLE cv_folds AS
SELECT *,
       (ROW_NUMBER() OVER (ORDER BY rowid)) % 5 AS fold_id
FROM data

Compression then includes fold_id in the group-by columns. Cross-validation loops over folds, fits on compressed training rows, predicts on compressed test rows, and computes weighted MSE.

Example from notebooks/regularized.ipynb:

Show code
duck_ridge_cv = DuckRidge(
    db_name="ridge_cv_test.db",
    table_name="data",
    formula="Y ~ D + f1 + f2 + f3",
    lambda_grid=np.logspace(-3, 1, 20),
    cv_folds=5,
    seed=42,
)
duck_ridge_cv.fit(lambda_selection="cv")
duck_ridge_cv.best_lambda
duck_ridge_cv.summary()

Plot Helpers

DuckRidge includes:

Show code
duck_ridge_cv.plot_cv_curve()
duck_ridge_cv.plot_coefficient_path()

These require matplotlib.

Inference

Bootstrap standard errors are intentionally not implemented for ridge. Regularization changes the sampling behavior of the estimator, and a naive OLS-style bootstrap would be easy to misinterpret. The API raises NotImplementedError for ridge bootstrap.