Panel Estimators
The panel estimators encode common fixed-effect designs as generated regressors before compression. The central theme is the same as the base estimator: create the relevant design matrix in DuckDB, group identical design rows, and solve weighted least squares in memory.
DuckMundlak
DuckMundlak augments each row with unit-level and, optionally, time-level averages of the covariates. For covariate \(W_{it}\), the two-way Mundlak design contains
\[ W_{it},\qquad \bar{W}_{i \cdot},\qquad \bar{W}_{\cdot t}. \]
The corresponding model is
\[ Y_{it} = \alpha + W_{it}'\beta + \bar{W}_{i \cdot}'\gamma + \bar{W}_{\cdot t}'\delta + u_{it}. \]
Constructor:
Show code
DuckMundlak(
db_name: str,
table_name: str,
outcome_var: str,
covariates: list,
seed: int,
unit_col: str,
time_col: str | None = None,
n_bootstraps: int = 100,
cluster_col: str | None = None,
)Example from notebooks/introduction.ipynb:
Show code
from duckreg.estimators import DuckMundlak
mundlak = DuckMundlak(
db_name="panel_data.db",
table_name="panel_data",
outcome_var="Y",
covariates=["W"],
unit_col="unit",
time_col="time",
cluster_col="unit",
n_bootstraps=50,
seed=929,
)
mundlak.fit()
mundlak.summary()Data Preparation
prepare_data() creates temporary tables of unit and time averages, then joins them back to the raw table:
Show code
CREATE TEMP TABLE unit_avgs AS
SELECT unit, AVG(W) AS avg_W_unit
FROM panel_data
GROUP BY unitIf time_col is provided, it also creates time_avgs. The compressed design groups by original covariates and generated averages.
DuckDoubleDemeaning
DuckDoubleDemeaning constructs the two-way residualized treatment
\[ \ddot{W}_{it} = W_{it} - \bar{W}_{i \cdot} - \bar{W}_{\cdot t} + \bar{W}_{\cdot \cdot}. \]
Then it estimates
\[ Y_{it} = \alpha + \tau \ddot{W}_{it} + u_{it} \]
on compressed values of \(\ddot{W}_{it}\).
Constructor:
Show code
DuckDoubleDemeaning(
db_name: str,
table_name: str,
outcome_var: str,
treatment_var: str,
unit_col: str,
time_col: str,
seed: int,
n_bootstraps: int = 100,
cluster_col: str | None = None,
)Notebook example:
Show code
from duckreg.estimators import DuckDoubleDemeaning
double_demean = DuckDoubleDemeaning(
db_name="panel_data.db",
table_name="panel_data",
outcome_var="Y",
treatment_var="W",
unit_col="unit",
time_col="time",
cluster_col="unit",
n_bootstraps=100,
seed=828,
)
double_demean.fit()
double_demean.summary()DuckMundlakEventStudy
DuckMundlakEventStudy builds a cohort-by-time event-study design in DuckDB. It first computes the treatment cohort
\[ G_i = \min\{t : D_{it}=1\}, \]
then creates:
- cohort intercepts,
- calendar-time dummies,
- cohort-by-time treatment dummies.
Constructor:
Show code
DuckMundlakEventStudy(
db_name: str,
table_name: str,
outcome_var: str,
treatment_col: str,
unit_col: str,
time_col: str,
cluster_col: str,
pre_treat_interactions: bool = True,
n_bootstraps: int = 100,
)The event-study regression is solved as compressed WLS on this expanded design. The point estimate is returned as a dictionary keyed by cohort. Each value is a table of the cohort-specific event-study path.
Single-cohort example from notebooks/event_study.ipynb:
Show code
from duckreg.estimators import DuckMundlakEventStudy
mundlak = DuckMundlakEventStudy(
db_name="event_study_data.db",
table_name="panel_data",
outcome_var="Y_it",
treatment_col="W_it",
unit_col="unit_id",
time_col="time_id",
cluster_col="unit_id",
n_bootstraps=0,
seed=42,
pre_treat_interactions=True,
)
mundlak.fit()
evsum = mundlak.summary()Staggered-adoption example:
Show code
mundlak = DuckMundlakEventStudy(
db_name="stagg_event_study_data.db",
table_name="panel_data",
outcome_var="Y_it",
treatment_col="W_it",
unit_col="unit_id",
time_col="time_id",
cluster_col="unit_id",
n_bootstraps=0,
seed=42,
pre_treat_interactions=False,
)
mundlak.fit()
evsum = mundlak.summary()Bootstrap Behavior
All three panel estimators support bootstrap covariance paths. With a cluster column, the bootstrap resamples clusters and recompresses the generated design. For event studies, covariance matrices are returned separately for each cohort-specific coefficient path.