Panel Estimators

The panel estimators encode common fixed-effect designs as generated regressors before compression. The central theme is the same as the base estimator: create the relevant design matrix in DuckDB, group identical design rows, and solve weighted least squares in memory.

`DuckMundlak`

DuckMundlak augments each row with unit-level and, optionally, time-level averages of the covariates. For covariate \(W_{it}\), the two-way Mundlak design contains

\[ W_{it},\qquad \bar{W}_{i \cdot},\qquad \bar{W}_{\cdot t}. \]

The corresponding model is

\[ Y_{it} = \alpha + W_{it}'\beta + \bar{W}_{i \cdot}'\gamma + \bar{W}_{\cdot t}'\delta + u_{it}. \]

Constructor:

Show code

DuckMundlak(
    db_name: str,
    table_name: str,
    outcome_var: str,
    covariates: list,
    seed: int,
    unit_col: str,
    time_col: str | None = None,
    n_bootstraps: int = 100,
    cluster_col: str | None = None,
)

Example from notebooks/introduction.ipynb:

Show code

from duckreg.estimators import DuckMundlak

mundlak = DuckMundlak(
    db_name="panel_data.db",
    table_name="panel_data",
    outcome_var="Y",
    covariates=["W"],
    unit_col="unit",
    time_col="time",
    cluster_col="unit",
    n_bootstraps=50,
    seed=929,
)
mundlak.fit()
mundlak.summary()

Data Preparation

prepare_data() creates temporary tables of unit and time averages, then joins them back to the raw table:

Show code

CREATE TEMP TABLE unit_avgs AS
SELECT unit, AVG(W) AS avg_W_unit
FROM panel_data
GROUP BY unit

If time_col is provided, it also creates time_avgs. The compressed design groups by original covariates and generated averages.

`DuckDoubleDemeaning`

DuckDoubleDemeaning constructs the two-way residualized treatment

\[ \ddot{W}_{it} = W_{it} - \bar{W}_{i \cdot} - \bar{W}_{\cdot t} + \bar{W}_{\cdot \cdot}. \]

Then it estimates

\[ Y_{it} = \alpha + \tau \ddot{W}_{it} + u_{it} \]

on compressed values of \(\ddot{W}_{it}\).

Constructor:

Show code

DuckDoubleDemeaning(
    db_name: str,
    table_name: str,
    outcome_var: str,
    treatment_var: str,
    unit_col: str,
    time_col: str,
    seed: int,
    n_bootstraps: int = 100,
    cluster_col: str | None = None,
)

Notebook example:

Show code

from duckreg.estimators import DuckDoubleDemeaning

double_demean = DuckDoubleDemeaning(
    db_name="panel_data.db",
    table_name="panel_data",
    outcome_var="Y",
    treatment_var="W",
    unit_col="unit",
    time_col="time",
    cluster_col="unit",
    n_bootstraps=100,
    seed=828,
)
double_demean.fit()
double_demean.summary()

`DuckMundlakEventStudy`

DuckMundlakEventStudy builds a cohort-by-time event-study design in DuckDB. It first computes the treatment cohort

\[ G_i = \min\{t : D_{it}=1\}, \]

then creates:

cohort intercepts,
calendar-time dummies,
cohort-by-time treatment dummies.

Constructor:

Show code

DuckMundlakEventStudy(
    db_name: str,
    table_name: str,
    outcome_var: str,
    treatment_col: str,
    unit_col: str,
    time_col: str,
    cluster_col: str,
    pre_treat_interactions: bool = True,
    n_bootstraps: int = 100,
)

The event-study regression is solved as compressed WLS on this expanded design. The point estimate is returned as a dictionary keyed by cohort. Each value is a table of the cohort-specific event-study path.

Single-cohort example from notebooks/event_study.ipynb:

Show code

from duckreg.estimators import DuckMundlakEventStudy

mundlak = DuckMundlakEventStudy(
    db_name="event_study_data.db",
    table_name="panel_data",
    outcome_var="Y_it",
    treatment_col="W_it",
    unit_col="unit_id",
    time_col="time_id",
    cluster_col="unit_id",
    n_bootstraps=0,
    seed=42,
    pre_treat_interactions=True,
)
mundlak.fit()
evsum = mundlak.summary()

Staggered-adoption example:

Show code

mundlak = DuckMundlakEventStudy(
    db_name="stagg_event_study_data.db",
    table_name="panel_data",
    outcome_var="Y_it",
    treatment_col="W_it",
    unit_col="unit_id",
    time_col="time_id",
    cluster_col="unit_id",
    n_bootstraps=0,
    seed=42,
    pre_treat_interactions=False,
)
mundlak.fit()
evsum = mundlak.summary()

Bootstrap Behavior

All three panel estimators support bootstrap covariance paths. With a cluster column, the bootstrap resamples clusters and recompresses the generated design. For event studies, covariance matrices are returned separately for each cohort-specific coefficient path.