Abundance-Based Constraints For OLS

Weighted effect coding for categorical modifiers

1 Motivation

Reference-category coding answers questions relative to an arbitrary omitted level. That is often fine for prediction, but it is awkward when the goal is to read lower-order coefficients as population or sample-average effects in the presence of categorical modifiers.

The upshot is simple. Standard one-hot encoding sets the baseline to an arbitrary omitted category, so coefficients are deviations from that category. ABC sets the baseline to the full-sample weighted mean, so coefficients are deviations from common parameters: the grand mean, the average slope, or the average treatment effect, depending on the model. Lin (2013) already used this idea in regression-adjusted experiments by centering the covariate vector before interacting it with treatment: the coefficient on treatment is then the sample-average treatment effect, while the treatment-by-covariate interactions are deviations from that shared effect rather than deviations from an arbitrary covariate origin.

The abundance-based constraints (ABC) construction of Kowal, Facilitating heterogeneous effect estimation via statistically efficient categorical modifiers, JASA, DOI 10.1080/01621459.2026.2635078, generalizes that centering logic to categorical modifiers and their interactions. It replaces reference-level interpretations with empirical-abundance-weighted interpretations. In OLS, this is a pure reparameterization: fitted values are the same as a full-rank ordinary dummy-coded regression, but the displayed coefficients answer different questions.

This page documents the crabbymetrics OLS implementation, cm.ABCOLS().

2 Setup

Let \(y_i \in \mathbb R\), continuous covariates \(x_i \in \mathbb R^p\), and categorical covariates \(c_i = (c_{i1}, \ldots, c_{iK})\). For a categorical variable \(C_k\) with levels \(\ell = 0, \ldots, L_k - 1\), define empirical level abundances

\[ \hat\pi_{k\ell} = \frac{1}{n}\sum_{i=1}^n 1\{c_{ik}=\ell\}. \]

For two categorical variables \(A\) and \(B\), define empirical cell abundances

\[ \hat\pi_{ab} = \frac{1}{n}\sum_{i=1}^n 1\{A_i=a, B_i=b\}. \]

ABCOLS expects categorical variables to be supplied as zero-based integer codes in a dense matrix. This keeps the runtime dependency footprint NumPy-only; callers with pandas categoricals should convert levels to stable codes before fitting.

3 Main-effect constraints

For categorical main effects, the overcomplete model includes one coefficient for every level:

\[ \mu_i = \alpha_0 + x_i'\alpha + \sum_{\ell=0}^{L_k-1} 1\{C_{ik}=\ell\}\beta_{k\ell}. \]

The abundance-based identifying constraint is

\[ \sum_{\ell=0}^{L_k-1} \hat\pi_{k\ell}\beta_{k\ell}=0. \]

So \(\beta_{k\ell}\) is the deviation of level \(\ell\) from the sample-abundance-weighted average, not the deviation from a reference level. With centered continuous covariates, the intercept is interpretable as the sample-weighted grand mean at the sample mean of \(x\).

4 Continuous-by-categorical interactions

For a continuous variable \(x_j\) and categorical modifier \(C_k\), the overcomplete model adds

\[ \sum_{\ell=0}^{L_k-1} x_{ij}1\{C_{ik}=\ell\}\gamma_{jk\ell}. \]

ABCOLS imposes

\[ \sum_{\ell=0}^{L_k-1} \hat\pi_{k\ell}\gamma_{jk\ell}=0. \]

The group-specific slope in level \(\ell\) is

\[ s_{j\ell} = \alpha_j + \gamma_{jk\ell}. \]

The constraint implies

\[ \alpha_j = \sum_{\ell=0}^{L_k-1}\hat\pi_{k\ell}s_{j\ell}. \]

Thus the continuous main effect remains the empirical-abundance-weighted average of group-specific slopes.

5 Categorical-by-categorical interactions

For two categorical variables \(A\) and \(B\), the overcomplete model includes a cell coefficient \(\delta_{ab}\) for every observed cell:

\[ \sum_a\sum_b 1\{A_i=a,B_i=b\}\delta_{ab}. \]

The ABC constraints impose weighted zero margins:

\[ \sum_b \hat\pi_{ab}\delta_{ab}=0 \quad \text{for each } a, \]

and

\[ \sum_a \hat\pi_{ab}\delta_{ab}=0 \quad \text{for each } b. \]

One of these constraints is redundant. ABCOLS uses a deterministic rule: it includes all \(A\)-margin constraints and all but the first \(B\)-margin constraint.

6 Null-space implementation

Rather than hand-building weighted contrast columns, ABCOLS uses the constraint/null-space route.

First build an overcomplete design matrix \(X_{\rm full}\) containing:

an intercept;
centered continuous variables;
one indicator for every categorical level;
one column \(x_j1\{C_k=\ell\}\) for every requested continuous-categorical interaction level;
one column \(1\{A=a,B=b\}\) for every requested categorical-categorical interaction cell.

Let \(\theta\) collect all overcomplete coefficients. The ABC restrictions are linear:

\[ A\theta = 0. \]

Let \(Q\) be a basis for the null space of \(A\), so \(AQ=0\). Write

\[ \theta = Q\phi. \]

Then fit the unconstrained OLS problem

\[ \hat\phi = \arg\min_\phi \|y - X_{\rm full}Q\phi\|_2^2. \]

With \(Z = X_{\rm full}Q\),

\[ \hat\phi = (Z'Z)^{-1}Z'y, \qquad \hat\theta = Q\hat\phi. \]

The fitted values are

\[ \hat y = X_{\rm full}\hat\theta = Z\hat\phi. \]

For homoskedastic OLS inference in the reduced coordinates,

\[ \widehat{\operatorname{Var}}(\hat\phi) = \hat\sigma^2 (Z'Z)^{-1}, \qquad \hat\sigma^2 = \frac{\|y - Z\hat\phi\|_2^2}{n - \operatorname{rank}(Z)}. \]

Mapping back gives

\[ \widehat{\operatorname{Var}}(\hat\theta) = Q\widehat{\operatorname{Var}}(\hat\phi)Q'. \]

The implementation reports coefficient standard errors from the diagonal of this matrix and records the maximum constraint violation \(\|A\hat\theta\|_\infty\) as a numerical check.

7 Example: ABC versus base-category coding

The example below cooks an unbalanced sample with a three-level categorical modifier group and a two-level categorical variable sex. The true data-generating process has group-specific intercept and slope deviations.

Show code

import numpy as np
import crabbymetrics as cm

rng = np.random.default_rng(2026)

group = np.repeat(np.array([0, 1, 2], dtype=np.uint32), [36, 54, 30])
sex = np.tile(np.array([0, 1], dtype=np.uint32), len(group) // 2)
cats = np.column_stack([group, sex]).astype(np.uint32)

x_raw = rng.normal(size=len(group))
x = x_raw[:, None]
x_centered = x_raw - x_raw.mean()

level_effect = np.array([-0.75, 0.15, 0.95])
slope_deviation = np.array([0.45, -0.20, 0.10])
sex_effect = 0.35 * sex
noise = rng.normal(scale=0.08, size=len(group))

y = (
    1.25
    + 1.10 * x_centered
    + level_effect[group]
    + slope_deviation[group] * x_centered
    + sex_effect
    + noise
)

The raw simulation parameters are therefore:

\[ \alpha_0 = 1.25, \qquad \beta_0 = 1.10, \qquad \gamma_g = (-0.75,\ 0.15,\ 0.95), \qquad \eta_g = (0.45,\ -0.20,\ 0.10), \qquad \delta_s = (0,\ 0.35), \qquad \sigma = 0.08. \]

Here \(\gamma_g\) are the raw group intercept deviations, \(\eta_g\) are the raw x0:group slope deviations, and \(\delta_s\) is the raw sex main effect. Since ABC reports abundance-centered effects, the population coefficients in the ABC parameterization shift these raw effects by their sample-abundance averages. With \(\hat\pi_g=(0.30,0.45,0.25)\) and \(\hat\pi_s=(0.50,0.50)\),

\[ \bar\gamma = \hat\pi_g'\gamma = 0.08, \qquad \bar\eta = \hat\pi_g'\eta = 0.07, \qquad \bar\delta = \hat\pi_s'\delta = 0.175. \]

So the true ABC-centered coefficients implied by the DGP are

\[ \alpha = 1.25 + 0.08 + 0.175 = 1.505, \qquad \beta = 1.10 + 0.07 = 1.17, \]

\[ g = (-0.83,\ 0.07,\ 0.87), \qquad s = (-0.175,\ 0.175), \qquad h = (0.38,\ -0.27,\ 0.03), \qquad d_{ab}=0 \;\; \text{for all cells}. \]

These are the targets to keep in mind before looking at the overcomplete design matrix and its constraints.

7.1 The concrete overparameterized design

For this simulation, the overparameterized coefficient vector is ordered as

\[ \theta = (\alpha,\ \beta,\ g_0,g_1,g_2,\ s_0,s_1,\ h_0,h_1,h_2,\ d_{00},d_{01},d_{10},d_{11},d_{20},d_{21})'. \]

Here \(\alpha\) is the intercept, \(\beta\) is the main slope on centered x0, \(g_a\) is the group=a main effect, \(s_b\) is the sex=b main effect, \(h_a\) is the x0:group=a slope deviation, and \(d_{ab}\) is the group=a:sex=b interaction cell coefficient.

The sample abundances are fixed by construction:

\[ \hat\pi_g = (0.30,0.45,0.25), \qquad \hat\pi_s = (0.50,0.50), \]

and, because sex alternates within each group,

\[ \hat\pi_{00}=0.150,\quad \hat\pi_{01}=0.150,\quad \hat\pi_{10}=0.225,\quad \hat\pi_{11}=0.225,\quad \hat\pi_{20}=0.125,\quad \hat\pi_{21}=0.125. \]

Thus the concrete constraint system \(A\theta=0\) is

\[ 0.30g_0 + 0.45g_1 + 0.25g_2 = 0, \]

\[ 0.50s_0 + 0.50s_1 = 0, \]

\[ 0.30h_0 + 0.45h_1 + 0.25h_2 = 0, \]

and, for the group:sex interaction,

\[ 0.150d_{00}+0.150d_{01}=0, \]

\[ 0.225d_{10}+0.225d_{11}=0, \]

\[ 0.125d_{20}+0.125d_{21}=0, \]

\[ 0.150d_{01}+0.225d_{11}+0.125d_{21}=0. \]

There are 16 overparameterized coefficients and 7 independent constraints, so the constrained coefficient space is 9-dimensional. One especially readable null-space parameterization chooses

\[ \phi = (\alpha,\beta,g_1,g_2,s_1,h_1,h_2,d_{11},d_{21})'. \]

Then \(\theta = Q_{\rm hand}\phi\) where the constrained coefficients are recovered as

\[ g_0 = -\frac{3}{2}g_1 - \frac{5}{6}g_2, \qquad s_0 = -s_1, \qquad h_0 = -\frac{3}{2}h_1 - \frac{5}{6}h_2, \]

and

\[ d_{01} = -\frac{3}{2}d_{11} - \frac{5}{6}d_{21}, \qquad d_{00} = \frac{3}{2}d_{11} + \frac{5}{6}d_{21}, \qquad d_{10} = -d_{11}, \qquad d_{20} = -d_{21}. \]

Equivalently, the columns of \(Q_{\rm hand}\) are: the intercept direction, the main-slope direction, two weighted group-effect contrast directions, one sex-effect contrast direction, two weighted group-slope contrast directions, and two weighted interaction contrast directions. The package uses a numerical orthonormal basis for the same span; the coordinate system for \(\phi\) can rotate, but the implied constrained \(\hat\theta\) and fitted values are invariant.

7.2 Raw NumPy constrained solve

Before using crabbymetrics, we can solve the overparameterized constrained least-squares problem directly in NumPy. First construct \(X_{\rm full}\), \(A\), and the hand-written null-space basis \(Q_{\rm hand}\).

Show code

def overparameterized_design(x_centered, group, sex):
    cols = [np.ones_like(x_centered), x_centered]
    cols += [(group == a).astype(float) for a in range(3)]
    cols += [(sex == b).astype(float) for b in range(2)]
    cols += [x_centered * (group == a) for a in range(3)]
    cols += [((group == a) & (sex == b)).astype(float) for a in range(3) for b in range(2)]
    return np.column_stack(cols)

full_names = [
    "Intercept", "x0",
    "g0", "g1", "g2",
    "s0", "s1",
    "x0:g0", "x0:g1", "x0:g2",
    "g0:s0", "g0:s1", "g1:s0", "g1:s1", "g2:s0", "g2:s1",
]

X_full = overparameterized_design(x_centered, group, sex)

A = np.zeros((7, 16))
A[0, [2, 3, 4]] = [0.30, 0.45, 0.25]
A[1, [5, 6]] = [0.50, 0.50]
A[2, [7, 8, 9]] = [0.30, 0.45, 0.25]
A[3, [10, 11]] = [0.150, 0.150]
A[4, [12, 13]] = [0.225, 0.225]
A[5, [14, 15]] = [0.125, 0.125]
A[6, [11, 13, 15]] = [0.150, 0.225, 0.125]

Q_hand = np.zeros((16, 9))
# Free coordinates: alpha, beta, g1, g2, s1, h1, h2, d11, d21.
Q_hand[0, 0] = 1.0
Q_hand[1, 1] = 1.0
Q_hand[2, 2] = -1.5
Q_hand[3, 2] = 1.0
Q_hand[2, 3] = -5.0 / 6.0
Q_hand[4, 3] = 1.0
Q_hand[5, 4] = -1.0
Q_hand[6, 4] = 1.0
Q_hand[7, 5] = -1.5
Q_hand[8, 5] = 1.0
Q_hand[7, 6] = -5.0 / 6.0
Q_hand[9, 6] = 1.0
Q_hand[10, 7] = 1.5
Q_hand[11, 7] = -1.5
Q_hand[12, 7] = -1.0
Q_hand[13, 7] = 1.0
Q_hand[10, 8] = 5.0 / 6.0
Q_hand[11, 8] = -5.0 / 6.0
Q_hand[14, 8] = -1.0
Q_hand[15, 8] = 1.0

print("rank(A):", np.linalg.matrix_rank(A))
print("Q_hand shape:", Q_hand.shape)
print("max |A Q_hand|:", np.max(np.abs(A @ Q_hand)))

rank(A): 7
Q_hand shape: (16, 9)
max |A Q_hand|: 5.551115123125783e-17

Now solve the reduced problem \(\min_\phi \|y - X_{\rm full}Q_{\rm hand}\phi\|^2\) and map back to the overparameterized coefficients.

Show code

Z_hand = X_full @ Q_hand
phi_np, *_ = np.linalg.lstsq(Z_hand, y, rcond=None)
theta_np = Q_hand @ phi_np
yhat_np = X_full @ theta_np

print("rank(X_full Q_hand):", np.linalg.matrix_rank(Z_hand))
print("max |A theta_np|:", np.max(np.abs(A @ theta_np)))
for name, value in zip(full_names, theta_np):
    print(f"{name:10s} {value: .4f}")

rank(X_full Q_hand): 9
max |A theta_np|: 1.3877787807814457e-17
Intercept   1.5120
x0          1.1752
g0         -0.8346
g1          0.0782
g2          0.8608
s0         -0.1750
s1          0.1750
x0:g0       0.3830
x0:g1      -0.2675
x0:g2       0.0219
g0:s0       0.0055
g0:s1      -0.0055
g1:s0      -0.0021
g1:s1       0.0021
g2:s0      -0.0027
g2:s1       0.0027

This is the ABC solution in raw NumPy. Notice that \(Q_{\rm hand}\) is not orthonormal. It does not need to be: any full-rank basis for \(\operatorname{null}(A)\) gives the same constrained fitted values.

7.3 The `crabbymetrics` solution

Now fit ABCOLS with the same continuous-by-categorical and categorical-by-categorical interactions:

Show code

abc = cm.ABCOLS()
abc.fit(
    y,
    x,
    cats,
    cont_cat_interactions=[(0, 0)],
    cat_cat_interactions=[(0, 1)],
)
abc_summary = abc.summary()
abc_coef = dict(zip(abc_summary["column_names"], abc_summary["coef"]))

name_map = {
    "Intercept": "Intercept",
    "x0": "x0",
    "g0": "c0[0]", "g1": "c0[1]", "g2": "c0[2]",
    "s0": "c1[0]", "s1": "c1[1]",
    "x0:g0": "x0:c0[0]", "x0:g1": "x0:c0[1]", "x0:g2": "x0:c0[2]",
    "g0:s0": "c0[0]:c1[0]", "g0:s1": "c0[0]:c1[1]",
    "g1:s0": "c0[1]:c1[0]", "g1:s1": "c0[1]:c1[1]",
    "g2:s0": "c0[2]:c1[0]", "g2:s1": "c0[2]:c1[1]",
}

theta_crabby = np.array([abc_coef[name_map[name]] for name in full_names])
print("max |theta_np - theta_crabby|:", np.max(np.abs(theta_np - theta_crabby)))
print("max |yhat_np - yhat_crabby|:", np.max(np.abs(yhat_np - np.asarray(abc.fitted_values()))))
print("max constraint violation:", abc_summary["max_constraint_violation"])

max |theta_np - theta_crabby|: 9.2148511043888e-15
max |yhat_np - yhat_crabby|: 1.2156942119645464e-14
max constraint violation: 6.245004513516506e-17

The package is doing the same constrained least-squares operation. The only implementation difference is that ABCOLS computes a numerical orthonormal null-space basis internally rather than using the hand-written \(Q_{\rm hand}\) above.

Now fit the same linear span using ordinary base-category coding. This drops group=0, sex=0, and the corresponding interaction baseline columns.

Show code

def base_category_design(x_centered, group, sex):
    cols = [np.ones_like(x_centered), x_centered]
    cols += [(group == 1).astype(float), (group == 2).astype(float)]
    cols += [(sex == 1).astype(float)]
    cols += [x_centered * (group == 1), x_centered * (group == 2)]
    cols += [((group == 1) & (sex == 1)).astype(float), ((group == 2) & (sex == 1)).astype(float)]
    return np.column_stack(cols)

X_base = base_category_design(x_centered, group, sex)
beta_base, *_ = np.linalg.lstsq(X_base, y, rcond=None)
yhat_base = X_base @ beta_base

base_names = [
    "Intercept",
    "x0",
    "group[1]",
    "group[2]",
    "sex[1]",
    "x0:group[1]",
    "x0:group[2]",
    "group[1]:sex[1]",
    "group[2]:sex[1]",
]
theta = dict(zip(full_names, theta_crabby))
abc_as_base = np.array([
    theta["Intercept"] + theta["g0"] + theta["s0"] + theta["g0:s0"],
    theta["x0"] + theta["x0:g0"],
    theta["g1"] - theta["g0"] + theta["g1:s0"] - theta["g0:s0"],
    theta["g2"] - theta["g0"] + theta["g2:s0"] - theta["g0:s0"],
    theta["s1"] - theta["s0"] + theta["g0:s1"] - theta["g0:s0"],
    theta["x0:g1"] - theta["x0:g0"],
    theta["x0:g2"] - theta["x0:g0"],
    theta["g1:s1"] - theta["g1:s0"] - theta["g0:s1"] + theta["g0:s0"],
    theta["g2:s1"] - theta["g2:s0"] - theta["g0:s1"] + theta["g0:s0"],
])

print("max fitted-value difference:", np.max(np.abs(np.asarray(abc.fitted_values()) - yhat_base)))

from IPython.display import Markdown, display

def markdown_table(headers, rows):
    out = ["| " + " | ".join(headers) + " |", "| " + " | ".join(["---"] * len(headers)) + " |"]
    out += ["| " + " | ".join(row) + " |" for row in rows]
    return "\n".join(out)

abc_rows = [
    [name, f"{value:.4f}"]
    for name, value in zip(full_names, theta_crabby)
]
display(Markdown(markdown_table(["ABC overcomplete coefficient", "ABCOLS estimate"], abc_rows)))

comparison_rows = [
    [name, f"{base:.4f}", f"{abc_equiv:.4f}", f"{base - abc_equiv:.2e}"]
    for name, base, abc_equiv in zip(base_names, beta_base, abc_as_base)
]
display(Markdown(markdown_table(
    ["Vanilla one-hot / reference-coded coefficient", "Vanilla OLS", "Same coefficient implied by ABC", "Difference"],
    comparison_rows,
)))

max fitted-value difference: 3.9968028886505635e-15

ABC overcomplete coefficient	ABCOLS estimate
Intercept	1.5120
x0	1.1752
g0	-0.8346
g1	0.0782
g2	0.8608
s0	-0.1750
s1	0.1750
x0:g0	0.3830
x0:g1	-0.2675
x0:g2	0.0219
g0:s0	0.0055
g0:s1	-0.0055
g1:s0	-0.0021
g1:s1	0.0021
g2:s0	-0.0027
g2:s1	0.0027

Vanilla one-hot / reference-coded coefficient	Vanilla OLS	Same coefficient implied by ABC	Difference
Intercept	0.5078	0.5078	-1.11e-15
x0	1.5582	1.5582	-4.44e-16
group[1]	0.9052	0.9052	2.22e-16
group[2]	1.6873	1.6873	8.88e-16
sex[1]	0.3391	0.3391	-1.22e-15
x0:group[1]	-0.6505	-0.6505	8.88e-16
x0:group[2]	-0.3611	-0.3611	-1.11e-16
group[1]:sex[1]	0.0152	0.0152	-8.69e-16
group[2]:sex[1]	0.0163	0.0163	-1.22e-15

The first table is the ABCOLS coefficient table in the overcomplete ABC coordinates. The second table translates those ABC coefficients back into the ordinary one-hot/reference-coded coordinates and compares them to a vanilla OLS fit. The fitted values agree because the two designs span the same column space. The coefficients differ because the parameterizations answer different questions, but the translated coefficients agree up to numerical precision.

8 Interpreting the ABC slope

Under ABC coding, x0 is the abundance-weighted average of group-specific slopes. We can verify this directly.

Show code

coef = dict(zip(abc_summary["column_names"], abc_summary["coef"]))
weights = np.bincount(group) / len(group)
group_slopes = np.array([coef["x0"] + coef[f"x0:c0[{level}]"] for level in range(3)])

print("weights:", weights.round(3))
print("group slopes:", group_slopes.round(4))
print("weighted average slope:", float(weights @ group_slopes))
print("ABC main slope:", coef["x0"])

weights: [0.3  0.45 0.25]
group slopes: [1.5582 0.9077 1.1971]
weighted average slope: 1.1751733997826892
ABC main slope: 1.175173399782689

In the base-category fit, the coefficient on x0 is the slope for the omitted group. In the ABC fit, the coefficient on x0 is the empirical-abundance-weighted average slope.

9 Interpreting the ABC categorical effects

The same logic applies to categorical main effects. The ABC group effects have sample-weighted mean zero:

Show code

group_effects = np.array([coef[f"c0[{level}]"] for level in range(3)])
print("group effects:", group_effects.round(4))
print("weighted mean group effect:", float(weights @ group_effects))

group effects: [-0.8346  0.0782  0.8608]
weighted mean group effect: -2.7755575615628914e-17

By contrast, base-category coefficients are deviations from group=0. They change if a different base group is selected; the ABC coefficients do not have that arbitrary reference-level dependence.

10 Current implementation contract

cm.ABCOLS() is intentionally low-level and NumPy-first:

model = cm.ABCOLS()
model.fit(
    y,                         # 1D float array
    x,                         # 2D float array of continuous covariates
    categories,                # 2D uint32 array of zero-based categorical codes
    cont_cat_interactions=[],  # list of (continuous_col, categorical_col)
    cat_cat_interactions=[],   # list of (categorical_col_a, categorical_col_b)
    center_continuous=True,
)

Important restrictions in the first implementation:

categorical levels must be contiguous observed integer codes starting at zero;
empty categorical levels raise an error;
requested categorical-categorical interactions must have all cells observed;
inference is homoskedastic OLS in the constrained parameterization;
robust covariance, formula parsing, pandas categorical handling, and sparse empty-cell handling are natural follow-ups.

11 Takeaway

ABC OLS is not a new fitted-value model for unpenalized least squares. It is a disciplined coefficient coordinate system. The constraints make lower-order coefficients stable and interpretable as sample-abundance-weighted averages when categorical modifiers and interactions enter the model.

1 Motivation

2 Setup

3 Main-effect constraints

4 Continuous-by-categorical interactions

5 Categorical-by-categorical interactions

6 Null-space implementation

7 Example: ABC versus base-category coding

7.1 The concrete overparameterized design

7.2 Raw NumPy constrained solve

7.3 The crabbymetrics solution

8 Interpreting the ABC slope

9 Interpreting the ABC categorical effects

10 Current implementation contract

11 Takeaway

7.3 The `crabbymetrics` solution