Methodology II (DR Inference With and Without Positivity)

Identification and Estimation Under Positivity

Let \(Y(t)\) be the potential outcome that would have been observed under treatment level \(T=t\). Consider a random sample \(\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n \subset \mathbb{R}\times \mathbb{R} \times \mathbb{R}^d\). We assume the following basic identification conditions.

  • A1(a) (Consistency) \(T_i=t\) implies that \(Y_i=Y_i(t)\).

  • A1(b) (Unconfoundedness) \(Y_i(t)\) is conditionally independent of \(T\) given \(\textbf{S}\).

  • A1(c) (Treatment Variation) The conditional variance of \(T\) given any \(\textbf{S}=\textbf{s}\) is strictly positive, i.e., \(\text{Var}(T|\textbf{S}=\textbf{s})>0\).

  • (A2) (Positivity) The conditional density \(p(t|\textbf{s})\) is bounded above and away from zero almost surely for all \(t\) and \(\textbf{s}\).

There are three main estimation strategies for \(t\mapsto m(t)=\mathbb{E}\left[Y(t)\right]\) and \(t\mapsto \theta(t)=\frac{d}{dt}\mathbb{E}\left[Y(t)\right]\) with observed data \(\left\{(Y_i,T_i,\textbf{S}_i)\right\}_{i=1}^n\) listed as follows.

  • Regression Adjustment (RA) Estimators:

\[\hat{m}_{RA}(t) = \frac{1}{n}\sum_{i=1}^n \hat{\mu}(t,\textbf{S}_i),\]

where \(\hat{\mu}(t,\textbf{s})\) is any consistent estimator of the conditional mean outcome function \(\mu(t,\textbf{s})=\mathbb{E}(Y|T=t,\textbf{S}=\textbf{s})\). Similarly,

\[\hat{\theta}_{RA}(t) = \frac{1}{n}\sum_{i=1}^n \hat{\beta}(t,\textbf{S}_i),\]

where \(\hat{\beta}(t,\textbf{s})\) is any consistent estimator of \(\beta(t,\textbf{s})=\frac{\partial}{\partial t}\mu(t,\textbf{s})\).

  • Inverse Probability Weighting (IPW) Estimator:

\[\hat{m}_{\mathrm{IPW}}(t) = \frac{1}{nh}\sum_{i=1}^n \frac{K\left(\frac{T_i-t}{h}\right)}{\hat{p}_{T|\textbf{S}}(T_i|\textbf{S}_i)}\cdot Y_i,\]

where \(h>0\) is a smoothing bandwidth, \(K:\mathbb{R}\to [0,\infty)\) is a kernel function, and \(\hat{p}(t|\textbf{s})\) is a (consistent) estimator of the conditional density \(p(t|\textbf{s})\). Additionally,

\[\hat{\theta}_{\mathrm{IPW}}(t) = \frac{1}{nh^2}\sum_{i=1}^n \frac{Y_i\left(\frac{T_i-t}{h}\right)K\left(\frac{T_i-t}{h}\right)}{\kappa_2\cdot \hat{p}_{T|\textbf{S}}(T_i|\textbf{S}_i)},\]

where \(\kappa_2=\int u^2K(u)\,du>0\).

  • Doubly Robust (DR) Estimator:

\[\hat{m}_{\mathrm{DR}}(t) =\frac{1}{nh}\sum_{i=1}^n \left\{\frac{K\left(\frac{T_i-t}{h}\right)}{\hat{p}_{T|\textbf{S}}(T_i|\textbf{S}_i)}\cdot \left[Y_i - \hat \mu(t,\textbf{S}_i)\right]+ h\cdot \hat{\mu}(t,\textbf{S}_i) \right\},\]

where \(\hat{\mu}(t,\textbf{s})\) and \(\hat{p}(t,\textbf{s})\) are (consistent) estimators of \(\mu(t,\textbf{s})\) and \(p(t,\textbf{s})\) respectively. The doubly robust estimator of \(\theta(t)\) contains some new insights. For the outcome model, we need to specify and estimate both the condition mean outcome function \(\mu(t,\textbf{s})\) and its partial derivative \(\beta(t,\textbf{s})\) with respect to \(t\) in order to obtain the following doubly robust estimator

\[\hat{\theta}_{\mathrm{DR}}(t) = \frac{1}{nh}\sum_{i=1}^n \left\{ \frac{\left(\frac{T_i-t}{h}\right)K\left(\frac{T_i-t}{h}\right) }{h\cdot \kappa_2\cdot \hat{p}_{T|\textbf{S}}(T_i|\textbf{S}_i)} \left[Y_i - \hat{\mu}(t,\textbf{S}_i) - (T_i-t)\cdot \hat{\beta}(t,\textbf{S}_i)\right]+ h\cdot \hat{\beta}(t,\textbf{S}_i) \right\}.\]

Furthermore,

\[\sqrt{nh^3}\left[\hat{\theta}_{\mathrm{DR}}(t) - \theta(t) - h^2 B_{\theta}(t)\right] \stackrel{d}{\to} \mathcal{N}\left(0,V_{\theta}(t)\right),\]

where \(B_{\theta}(t)\) is a bias term. By choosing a bandwidth with a standard rate of convergence \(h=O\left(n^{-1/5}\right)\), we can construct a \((1-\alpha)\)-level confidence interval for \(\theta(t)\) as:

\[\left[\hat{\theta}_{\mathrm{DR}}(t)- \Phi\left(1-\frac{\alpha}{2}\right)\sqrt{\frac{\hat{V}_{\theta}(t)}{nh^3}},\; \hat{\theta}_{\mathrm{DR}}(t)+ \Phi\left(1-\frac{\alpha}{2}\right)\sqrt{\frac{\hat{V}_{\theta}(t)}{nh^3}}\right],\]

where \(\Phi(\cdot)\) is the cumulative distribution function of \(\mathcal{N}(0,1)\) and \(\hat{V}_{\theta}(t)\) is computed as:

\[\hat{V}_{\theta}(t) = \frac{1}{n} \sum_{i=1}^n \left\{\phi_{h,t}\left(Y_i,T_i,\textbf{S}_i;\hat{\mu}, \hat{\beta}, \hat{p}_{T|\textbf{S}}\right) + \sqrt{h^3}\left[\hat{\beta}(t,\textbf{S}_i) - \hat{\theta}_{\mathrm{DR}}(t) \right]\right\}^2\]

with \(\phi_{h,t}\left(Y,T,\textbf{S}; \bar{\mu},\bar{\beta}, \bar{p}_{T|\textbf{S}}\right) = \frac{\left(\frac{T-t}{h}\right) K\left(\frac{T-t}{h}\right)}{\sqrt{h}\cdot \kappa_2\cdot \bar{p}_{T|\textbf{S}}(T|\textbf{S})}\cdot \left[Y - \bar{\mu}(t,\textbf{S}) - (T-t)\cdot \bar{\beta}(t,\textbf{S})\right]\).

Identification and Estimation Without Positivity

To study the IPW and DR estimators without relying on the positivity condition, we impose an additive structural assumption on the potential outcome as \(Y(t) = \bar{m}(t) + \eta(\textbf{S}) +\epsilon\). The identification theory in Section 2 of [1] implies that both the dose-response curve \(m(t)\) and its derivative \(\theta(t)\) are identifiable even under violations of positivity. However, the aforementioned IPW and DR estimators are indeed biased without positivity even when the additive structural assumption holds true; see Section 4.2 in [2].

We propose the following bias-corrected IPW and DR estimators of \(\theta(t)\) as:

\[\hat{\theta}_{\mathrm{C,IPW}}(t) = \frac{1}{nh^2} \sum_{i=1}^n \frac{Y_i\left(\frac{T_i-t}{h}\right) K\left(\frac{T_i-t}{h}\right) \hat{p}_{\zeta}(\textbf{S}_i|t)}{\kappa_2 \cdot \hat{p}(T_i,\textbf{S}_i)},\]
\[\hat{\theta}_{\mathrm{C,DR}}(t) = \frac{1}{nh^2} \sum_{i=1}^n \frac{\left(\frac{T_i-t}{h}\right) K\left(\frac{T_i-t}{h}\right) \hat{p}_{\zeta}(\textbf{S}_i|t)}{\kappa_2\cdot \hat{p}(T_i,\textbf{S}_i)} \left[Y_i - \hat{\mu}(t,\textbf{S}_i) - (T_i-t)\cdot \hat{\beta}(t,\textbf{S}_i)\right] + \int \hat{\beta}(t,\textbf{s})\cdot \hat{p}_{\zeta}(\textbf{s}|t)\, d\textbf{s},\]

where \(\hat{p}(t,\textbf{s})\) is a consistent estimator of the joint density \(p(t,\textbf{s})\) and \(\hat{p}_{\zeta}(\textbf{s}|t)\) is an estimated \(\zeta\)-interior conditional density defined as:

\[p_{\zeta}(\textbf{s}|t) = \frac{p_{\textbf{S}|T}(\textbf{s}|t) \cdot \mathbb{1}_{\left\{\textbf{s}\in \mathcal{L}_{\zeta}(t)\right\}}}{\int_{\mathcal{L}_{\zeta}(t)} p_{\textbf{S}|T}(\textbf{s}_1|t) \,d\textbf{s}_1},\]

with \(\mathcal{L}_{\zeta}(t) = \left\{\textbf{s}\in \mathcal{S}(t): p_{\textbf{S}|T}(\textbf{s}|t) \geq \zeta\right\}\) being the \(\zeta\)-upper level set of the conditional density \(p_{\textbf{S}|T}(\textbf{s}|t)\).

It can be proved that

\[\sqrt{nh^3}\left[\hat{\theta}_{\mathrm{C,DR}}(t) - \theta(t) - h^2 B_{C,\theta}(t)\right] \stackrel{d}{\to} \mathcal{N}\left(0,V_{C,\theta}(t)\right),\]

where \(B_{C,\theta}(t)\) is a bias term. By choosing a bandwidth with a standard rate of convergence \(h=O\left(n^{-1/5}\right)\), we can construct a \((1-\alpha)\)-level confidence interval for \(\theta(t)\) as before, in which the asymptotic variance is estimated by

\[\hat{V}_{C,\theta}(t) = \frac{1}{n} \sum_{i=1}^n \left\{\phi_{C,h,t}\left(Y_i,T_i,\textbf{S}_i;\hat{\mu}, \hat{\beta}, \hat{p}, \hat{p}_{\zeta}\right) + \sqrt{h^3}\left[\int \hat{\beta}(t,\textbf{s}) \cdot \hat{p}_{\zeta}(\textbf{s}|t)\, d\textbf{s} - \hat{\theta}_{\mathrm{C,DR}}(t) \right]\right\}^2,\]

where \(\phi_{C,h,t}\left(Y,T,\textbf{S}; \bar{\mu},\bar{\beta}, \bar{p},\bar{p}_{\zeta}\right) = \frac{\left(\frac{T-t}{h}\right) K\left(\frac{T-t}{h}\right) \cdot \bar{p}_{\zeta}(\textbf{S}|t)}{\sqrt{h}\cdot \kappa_2\cdot \bar{p}(T,\textbf{S})}\cdot \left[Y - \bar{\mu}(t,\textbf{S}) - (T-t)\cdot \bar{\beta}(t,\textbf{S})\right]\).

The integral bias-corrected IPW and DR estimators for \(m(t)\) are also delineated in Appendix I of [2].

References