crabbymetrics
  • Home
  • API
    • API Overview
    • Regression And GLMs
    • OLS
    • Ridge
    • FixedEffectsOLS
    • ElasticNet
    • Logit
    • MultinomialLogit
    • Poisson
    • FTRL
    • Causal Inference And Panels
    • TwoSLS
    • BalancingWeights
    • EPLM
    • AverageDerivative
    • PartiallyLinearDML
    • AIPW
    • SyntheticControl
    • HorizontalPanelRidge
    • SyntheticDID
    • MatrixCompletion
    • InteractiveFixedEffects
    • Transforms
    • PCA
    • KernelBasis
    • Estimation Interfaces
    • GMM
    • MEstimator
    • Optimizers
  • Binding Crash Course
  • Regression And GLMs
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Logit
    • Multinomial Logit
    • Poisson
    • GMM
    • FTRL
    • MEstimator Poisson
  • Causal Inference
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
    • TwoSLS
    • Synthetic Control
    • Synthetic DID
    • Horizontal Panel Ridge
    • Matrix Completion
    • Interactive Fixed Effects
    • Staggered Panel Event Study
  • Transforms
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Two-Period Semiparametric DID
    • Bridging Finite And Superpopulation
    • Panel Estimator DGP Comparisons
    • Same Root Panel Case Studies
    • Randomized Sketching And Least Squares
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV
    • Ch 27 Mediation

Two-Period Semiparametric DID

Semiparametric two-period DID estimators target the period-1 ATT. The design uses high-dimensional sparse covariates, a sparse propensity score, and two periods of potential outcomes. The target is the period-1 ATT,

\[ \tau_{ATT} = \mathbb{E}[Y^1(1) - Y^0(1) \mid D=1]. \]

The comparison covers the usual two-period DID estimator together with outcome-modeling, IPW, and AIPW variants under two regimes:

  • \(\rho = 0\): unconditional parallel trends holds;
  • \(\rho = 2\): unconditional parallel trends fails, but conditional parallel trends holds after adjusting for \(X\) through the propensity-index component in the untreated trend.

The nuisance fits use lightweight ridge/logistic-ridge learners so the page is self-contained in the Python docs environment. The estimator menu includes unadjusted DID, outcome-model DID, IPW DID, and AIPW DID, with plug-in and cross-fit versions of the nuisance-based estimators.

1 Imports And Helpers

Show code
from html import escape

import matplotlib.pyplot as plt
import numpy as np
from IPython.display import HTML, display

np.set_printoptions(precision=4, suppress=True)


def html_table(headers, rows):
    parts = ["<table>", "<thead>", "<tr>"]
    parts.extend(f"<th>{escape(str(h))}</th>" for h in headers)
    parts.extend(["</tr>", "</thead>", "<tbody>"])
    for row in rows:
        parts.append("<tr>")
        parts.extend(f"<td>{cell}</td>" for cell in row)
        parts.append("</tr>")
    parts.extend(["</tbody>", "</table>"])
    return "".join(parts)


def expit(z):
    z = np.clip(z, -35.0, 35.0)
    return 1.0 / (1.0 + np.exp(-z))


def add_intercept(x):
    return np.column_stack([np.ones(x.shape[0]), x])


def make_folds(n, k, rng):
    idx = rng.permutation(n)
    return [fold for fold in np.array_split(idx, k) if fold.size]

2 DGP

Let \(N=200\), \(p=100\), and \(s=5\). The covariates are

\[ X_i \sim \mathcal{N}(0, I_p), \]

and the sparse propensity index uses

\[ \gamma_0 = (1, 1/2, 1/3, 1/4, 1/5, 0, \ldots, 0)^\top. \]

Treatment is assigned by

\[ D_i \sim \mathrm{Bernoulli}(e_i), \qquad e_i = \Lambda(X_i^\top \gamma_0). \]

The potential outcomes are

\[ \begin{aligned} Y_i^0(0) &= X_i^\top \beta_0 + \varepsilon_i, \\ Y_i^0(1) &= Y_i^0(0) + 1 + \rho e_i + \varepsilon_i, \\ Y_i^1(1) &= \theta_0 + Y_i^0(1) + \varepsilon_i, \end{aligned} \]

where \(\beta_0 = \gamma_0 + 0.5\), \(\theta_0 = 3\), and \(\varepsilon_i \sim \mathcal{N}(0, 0.1^2)\). A common shock is used across the baseline, untreated-trend, and treated-outcome equations. The observed outcomes are \(Y_i(0)=Y_i^0(0)\) and

\[ Y_i(1) = (1-D_i)Y_i^0(1) + D_iY_i^1(1). \]

When \(\rho = 0\), the untreated trend does not depend on the propensity score and the simple DID contrast is valid. When \(\rho \neq 0\), units with high treatment probability also have different untreated trends, so unconditional DID is biased unless the covariates are accounted for.

Show code
def semipar_did_dgp(n=200, p=100, s=5, theta=3.0, rho=0.0, rng=None):
    rng = np.random.default_rng() if rng is None else rng
    gamma = np.r_[np.arange(s, 0, -1) / s, np.zeros(p - s)]
    x = rng.normal(size=(n, p))
    e = expit(x @ gamma)
    d = rng.binomial(1, e, size=n).astype(float)
    beta = gamma + 0.5
    eps = rng.normal(0.0, 0.1, size=n)

    y00 = x @ beta + eps
    y01 = y00 + 1.0 + rho * e + eps
    y11 = theta + y01 + eps

    y0 = y00
    y1 = y01 * (1.0 - d) + y11 * d
    return {"y1": y1, "y0": y0, "d": d, "x": x, "e": e, "theta": theta, "rho": rho}


example = semipar_did_dgp(rho=2.0, rng=np.random.default_rng(42))
display(
    HTML(
        html_table(
            ["Quantity", "Value"],
            [
                ["N", example["x"].shape[0]],
                ["p", example["x"].shape[1]],
                ["Treated share", f"{example['d'].mean():.3f}"],
                ["Mean propensity", f"{example['e'].mean():.3f}"],
                ["True ATT", f"{example['theta']:.3f}"],
            ],
        )
    )
)
Quantity Value
N 200
p 100
Treated share 0.475
Mean propensity 0.533
True ATT 3.000

3 Estimators

The comparison covers eight estimators:

  1. naive: post-period treated-control difference;
  2. did: unadjusted two-period DID;
  3. om: outcome-model DID using fitted \(\mathbb{E}[Y_t \mid X,D=d]\) surfaces;
  4. omXF: cross-fit version of om;
  5. ipw: Abadie-style IPW DID using a fitted propensity score;
  6. ipwXF: cross-fit version of ipw;
  7. aipw: double-robust DID using a propensity score and untreated-trend model;
  8. aipwXF: cross-fit version of aipw.

The estimating equations are easiest to see after defining \(\Delta Y_i = Y_i(1)-Y_i(0)\) and \(\hat p = n^{-1}\sum_i D_i\). The IPW score is

\[ \hat\tau^{IPW} = \frac{1}{n}\sum_i \frac{D_i - \hat e(X_i)}{\hat p(1-\hat e(X_i))}\Delta Y_i. \]

The AIPW score subtracts the untreated-trend model \(\hat m_0(X_i) \approx \mathbb{E}[\Delta Y_i \mid X_i, D_i=0]\) before applying the same balancing score:

\[ \hat\tau^{AIPW} = \frac{1}{n}\sum_i \frac{D_i - \hat e(X_i)}{\hat p(1-\hat e(X_i))} \{\Delta Y_i - \hat m_0(X_i)\}. \]

Show code
def ridge_fit(x, y, penalty=1.0):
    z = add_intercept(x)
    gram = z.T @ z
    ridge = np.eye(gram.shape[0]) * penalty
    ridge[0, 0] = 0.0
    return np.linalg.solve(gram + ridge, z.T @ y)


def ridge_predict(x, beta):
    return add_intercept(x) @ beta


def logistic_ridge_fit(x, y, penalty=1.0, max_iter=40):
    z = add_intercept(x)
    beta = np.zeros(z.shape[1])
    ridge = np.eye(z.shape[1]) * penalty
    ridge[0, 0] = 0.0
    for _ in range(max_iter):
        eta = z @ beta
        p = expit(eta)
        w = np.clip(p * (1.0 - p), 1e-5, None)
        grad = z.T @ (p - y) + ridge @ beta
        hess = z.T @ (z * w[:, None]) + ridge
        step = np.linalg.solve(hess, grad)
        beta -= step
        if np.max(np.abs(step)) < 1e-6:
            break
    return beta


def logistic_predict(x, beta, clip=0.02):
    return np.clip(expit(add_intercept(x) @ beta), clip, 1.0 - clip)


def crossfit_predict(x, y, fit_fn, pred_fn, penalty, k, rng, subset=None):
    n = x.shape[0]
    pred = np.zeros(n)
    if subset is None:
        subset = np.arange(n)
    subset = np.asarray(subset)
    folds = make_folds(subset.size, k, rng)
    for fold in folds:
        test = subset[fold]
        train = np.setdiff1d(subset, test, assume_unique=False)
        beta = fit_fn(x[train], y[train], penalty)
        pred[test] = pred_fn(x[test], beta)
    return pred


def two_period_did(d, y1, y0):
    return (y1[d == 1].mean() - y0[d == 1].mean()) - (y1[d == 0].mean() - y0[d == 0].mean())


def om_did(x, d, y1, y0, xfit=False, penalty=1.0, rng=None):
    rng = np.random.default_rng(0) if rng is None else rng
    treated = np.flatnonzero(d == 1.0)
    control = np.flatnonzero(d == 0.0)
    preds = []
    for y, group in [(y1, treated), (y1, control), (y0, treated), (y0, control)]:
        beta = ridge_fit(x[group], y[group], penalty)
        pred = ridge_predict(x, beta)
        if xfit:
            pred[group] = crossfit_predict(x, y, ridge_fit, ridge_predict, penalty, 3, rng, subset=group)[group]
        preds.append(pred)
    m1, m2, m3, m4 = preds
    return float(np.mean((m1 - m2) - (m3 - m4)))


def propensity_hat(x, d, xfit=False, penalty=1.0, rng=None):
    if not xfit:
        beta = logistic_ridge_fit(x, d, penalty)
        return logistic_predict(x, beta)
    rng = np.random.default_rng(0) if rng is None else rng
    return crossfit_predict(x, d, logistic_ridge_fit, logistic_predict, penalty, 3, rng)


def ipw_did(x, d, y1, y0, xfit=False, penalty=1.0, rng=None):
    ehat = propensity_hat(x, d, xfit=xfit, penalty=penalty, rng=rng)
    delta = y1 - y0
    score = (d - ehat) / (d.mean() * (1.0 - ehat))
    return float(np.mean(score * delta))


def aipw_did(x, d, y1, y0, xfit=False, penalty=1.0, rng=None):
    rng = np.random.default_rng(0) if rng is None else rng
    ehat = propensity_hat(x, d, xfit=xfit, penalty=penalty, rng=rng)
    delta = y1 - y0
    control = np.flatnonzero(d == 0.0)
    beta = ridge_fit(x[control], delta[control], penalty)
    mhat = ridge_predict(x, beta)
    if xfit:
        mhat[control] = crossfit_predict(x, delta, ridge_fit, ridge_predict, penalty, 3, rng, subset=control)[control]
    score = (d - ehat) / (d.mean() * (1.0 - ehat))
    return float(np.mean(score * (delta - mhat)))


def estimate_all(draw, penalty=25.0, rng=None):
    y1, y0, d, x = draw["y1"], draw["y0"], draw["d"], draw["x"]
    rng = np.random.default_rng(0) if rng is None else rng
    return {
        "naive": float(y1[d == 1].mean() - y1[d == 0].mean()),
        "did": two_period_did(d, y1, y0),
        "om": om_did(x, d, y1, y0, xfit=False, penalty=penalty, rng=rng),
        "omXF": om_did(x, d, y1, y0, xfit=True, penalty=penalty, rng=rng),
        "ipw": ipw_did(x, d, y1, y0, xfit=False, penalty=penalty, rng=rng),
        "ipwXF": ipw_did(x, d, y1, y0, xfit=True, penalty=penalty, rng=rng),
        "aipw": aipw_did(x, d, y1, y0, xfit=False, penalty=penalty, rng=rng),
        "aipwXF": aipw_did(x, d, y1, y0, xfit=True, penalty=penalty, rng=rng),
    }


trial = estimate_all(semipar_did_dgp(rho=0.0, rng=np.random.default_rng(7)), rng=np.random.default_rng(8))
display(HTML(html_table(["Estimator", "Estimate"], [[k, f"{v:.3f}"] for k, v in trial.items()])))
Estimator Estimate
naive 5.252
did 3.015
om 3.003
omXF 3.201
ipw 3.445
ipwXF 2.784
aipw 3.028
aipwXF 3.033

4 Monte Carlo

The simulation compares \(\rho=0\) and \(\rho=2\). The true effect is \(\theta_0=3\) in both cases.

Show code
def summarize(vals, truth):
    vals = np.asarray(vals)
    err = vals - truth
    return {
        "mean": float(vals.mean()),
        "bias": float(err.mean()),
        "mab": float(abs(err.mean())),
        "sd": float(vals.std(ddof=1)),
        "mse": float(np.mean(err**2)),
    }


def run_mc(rhos=(0.0, 2.0), reps=150, seed=20260516):
    rng = np.random.default_rng(seed)
    rows = []
    draws = {rho: {name: [] for name in ["naive", "did", "om", "omXF", "ipw", "ipwXF", "aipw", "aipwXF"]} for rho in rhos}
    for rho in rhos:
        for _ in range(reps):
            draw = semipar_did_dgp(rho=rho, rng=rng)
            est = estimate_all(draw, penalty=25.0, rng=rng)
            for name, value in est.items():
                draws[rho][name].append(value)
        for name, vals in draws[rho].items():
            rows.append({"rho": rho, "estimator": name, **summarize(vals, 3.0)})
    return rows, draws


mc_rows, mc_draws = run_mc()

for rho in [0.0, 2.0]:
    display(HTML(f"<h4>rho = {rho:g}</h4>"))
    display(
        HTML(
            html_table(
                ["Estimator", "Mean", "Bias", "Mean Absolute Bias", "SD", "MSE"],
                [
                    [
                        r["estimator"],
                        f"{r['mean']:.3f}",
                        f"{r['bias']:+.3f}",
                        f"{r['mab']:.3f}",
                        f"{r['sd']:.3f}",
                        f"{r['mse']:.3f}",
                    ]
                    for r in mc_rows
                    if r["rho"] == rho
                ],
            )
        )
    )

rho = 0

Estimator Mean Bias Mean Absolute Bias SD MSE
naive 5.601 +2.601 2.601 0.777 7.366
did 3.002 +0.002 0.002 0.021 0.000
om 3.002 +0.002 0.002 0.026 0.001
omXF 3.014 +0.014 0.014 0.210 0.044
ipw 3.476 +0.476 0.476 0.053 0.229
ipwXF 2.768 -0.232 0.232 0.186 0.088
aipw 3.004 +0.004 0.004 0.026 0.001
aipwXF 3.006 +0.006 0.006 0.033 0.001

rho = 2

Estimator Mean Bias Mean Absolute Bias SD MSE
naive 6.124 +3.124 3.124 0.812 10.412
did 3.590 +0.590 0.590 0.070 0.352
om 3.301 +0.301 0.301 0.055 0.094
omXF 3.292 +0.292 0.292 0.188 0.120
ipw 4.304 +1.304 1.304 0.141 1.721
ipwXF 2.929 -0.071 0.071 0.372 0.142
aipw 3.273 +0.273 0.273 0.064 0.079
aipwXF 3.106 +0.106 0.106 0.099 0.021
Show code
fig, axes = plt.subplots(2, 1, figsize=(11, 8), constrained_layout=True, sharex=True)
order = ["naive", "did", "om", "omXF", "ipw", "ipwXF", "aipw", "aipwXF"]
for ax, rho in zip(axes, [0.0, 2.0]):
    data = [mc_draws[rho][name] for name in order]
    ax.boxplot(data, labels=order, showfliers=False)
    ax.axhline(3.0, color="tab:red", linestyle="--", linewidth=1.2, label="truth")
    ax.set_title(f"Sampling distributions, rho = {rho:g}")
    ax.set_ylabel("ATT estimate")
    ax.legend(loc="upper right")
axes[-1].tick_params(axis="x", rotation=30)
plt.show()

5 Reading The Results

When \(\rho=0\), unconditional parallel trends holds. The unadjusted DID estimator is therefore centered near the true effect, while the post-period naive contrast remains badly confounded because treated and control units differ in the sparse propensity-index covariates.

When \(\rho=2\), unconditional DID breaks: the untreated trend itself contains the propensity score, so high-propensity units would have grown differently even without treatment. The adjusted estimators target the conditional parallel-trends structure. Outcome modeling works when the nuisance regressions recover the relevant high-dimensional linear signal. IPW works by reweighting the untreated trend changes toward the treated covariate distribution, but it is more variance-sensitive because the propensity score appears in the denominator. AIPW combines both pieces: it removes the fitted untreated trend and then balances the residual trend by the propensity score.

The cross-fit columns avoid using each observation’s own outcome to construct its nuisance prediction. In this small-\(N\), high-\(p\) design, cross-fitting is not guaranteed to reduce variance in every estimator, but it is the safer pattern for the doubly robust score.