duckreg
  • Home
  • Compression
  • Linear Models
  • Panel
  • DML
  • GLMs
  • Ridge
  • Inference
  • Examples
  1. Panel Estimators
  • duckreg Documentation
  • Compression and Estimator Lifecycle
  • Linear Regression API
  • Panel Estimators
  • Compressed Double Machine Learning
  • Generalized Linear Models
  • Compressed Ridge Regression
  • Inference and Variance Estimation
  • Executed Examples

On this page

  • DuckMundlak
    • Data Preparation
  • DuckDoubleDemeaning
  • DuckMundlakEventStudy
  • Bootstrap Behavior

Panel Estimators

The panel estimators encode common fixed-effect designs as generated regressors before compression. The central theme is the same as the base estimator: create the relevant design matrix in DuckDB, group identical design rows, and solve weighted least squares in memory.

DuckMundlak

DuckMundlak augments each row with unit-level and, optionally, time-level averages of the covariates. For covariate \(W_{it}\), the two-way Mundlak design contains

\[ W_{it},\qquad \bar{W}_{i \cdot},\qquad \bar{W}_{\cdot t}. \]

The corresponding model is

\[ Y_{it} = \alpha + W_{it}'\beta + \bar{W}_{i \cdot}'\gamma + \bar{W}_{\cdot t}'\delta + u_{it}. \]

Constructor:

Show code
DuckMundlak(
    db_name: str,
    table_name: str,
    outcome_var: str,
    covariates: list,
    seed: int,
    unit_col: str,
    time_col: str | None = None,
    n_bootstraps: int = 100,
    cluster_col: str | None = None,
)

Example from notebooks/introduction.ipynb:

Show code
from duckreg.estimators import DuckMundlak

mundlak = DuckMundlak(
    db_name="panel_data.db",
    table_name="panel_data",
    outcome_var="Y",
    covariates=["W"],
    unit_col="unit",
    time_col="time",
    cluster_col="unit",
    n_bootstraps=50,
    seed=929,
)
mundlak.fit()
mundlak.summary()

Data Preparation

prepare_data() creates temporary tables of unit and time averages, then joins them back to the raw table:

Show code
CREATE TEMP TABLE unit_avgs AS
SELECT unit, AVG(W) AS avg_W_unit
FROM panel_data
GROUP BY unit

If time_col is provided, it also creates time_avgs. The compressed design groups by original covariates and generated averages.

DuckDoubleDemeaning

DuckDoubleDemeaning constructs the two-way residualized treatment

\[ \ddot{W}_{it} = W_{it} - \bar{W}_{i \cdot} - \bar{W}_{\cdot t} + \bar{W}_{\cdot \cdot}. \]

Then it estimates

\[ Y_{it} = \alpha + \tau \ddot{W}_{it} + u_{it} \]

on compressed values of \(\ddot{W}_{it}\).

Constructor:

Show code
DuckDoubleDemeaning(
    db_name: str,
    table_name: str,
    outcome_var: str,
    treatment_var: str,
    unit_col: str,
    time_col: str,
    seed: int,
    n_bootstraps: int = 100,
    cluster_col: str | None = None,
)

Notebook example:

Show code
from duckreg.estimators import DuckDoubleDemeaning

double_demean = DuckDoubleDemeaning(
    db_name="panel_data.db",
    table_name="panel_data",
    outcome_var="Y",
    treatment_var="W",
    unit_col="unit",
    time_col="time",
    cluster_col="unit",
    n_bootstraps=100,
    seed=828,
)
double_demean.fit()
double_demean.summary()

DuckMundlakEventStudy

DuckMundlakEventStudy builds a cohort-by-time event-study design in DuckDB. It first computes the treatment cohort

\[ G_i = \min\{t : D_{it}=1\}, \]

then creates:

  1. cohort intercepts,
  2. calendar-time dummies,
  3. cohort-by-time treatment dummies.

Constructor:

Show code
DuckMundlakEventStudy(
    db_name: str,
    table_name: str,
    outcome_var: str,
    treatment_col: str,
    unit_col: str,
    time_col: str,
    cluster_col: str,
    pre_treat_interactions: bool = True,
    n_bootstraps: int = 100,
)

The event-study regression is solved as compressed WLS on this expanded design. The point estimate is returned as a dictionary keyed by cohort. Each value is a table of the cohort-specific event-study path.

Single-cohort example from notebooks/event_study.ipynb:

Show code
from duckreg.estimators import DuckMundlakEventStudy

mundlak = DuckMundlakEventStudy(
    db_name="event_study_data.db",
    table_name="panel_data",
    outcome_var="Y_it",
    treatment_col="W_it",
    unit_col="unit_id",
    time_col="time_id",
    cluster_col="unit_id",
    n_bootstraps=0,
    seed=42,
    pre_treat_interactions=True,
)
mundlak.fit()
evsum = mundlak.summary()

Staggered-adoption example:

Show code
mundlak = DuckMundlakEventStudy(
    db_name="stagg_event_study_data.db",
    table_name="panel_data",
    outcome_var="Y_it",
    treatment_col="W_it",
    unit_col="unit_id",
    time_col="time_id",
    cluster_col="unit_id",
    n_bootstraps=0,
    seed=42,
    pre_treat_interactions=False,
)
mundlak.fit()
evsum = mundlak.summary()

Bootstrap Behavior

All three panel estimators support bootstrap covariance paths. With a cluster column, the bootstrap resamples clusters and recompresses the generated design. For event studies, covariance matrices are returned separately for each cohort-specific coefficient path.