Methodology

Model Setup

Consider a random sample \(\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n \subset \mathbb{R}\times \mathbb{R} \times \mathbb{R}^d\) generated from the following model:

\[Y=\mu(T,\textbf{S})+\epsilon \quad \text{ and } \quad T=f(\textbf{S})+E,\]

where

\(Y\) is the outcome variable,
\(T\) is the continuous treatment variable,
\(\textbf{S}\) is a vector of covariates that incorporate confounding variables,
\(E\in\mathbb{R}\) is the treatment variation with \(\mathbb{E}(E)=0\) and \(E\) being independent of \(\textbf{S}\),
\(\epsilon\in\mathbb{R}\) is an exogenous noise variable with \(\mathbb{E}(\epsilon)=0, \mathrm{Var}(\epsilon)=\sigma^2>0,\mathrm{E}(\epsilon^4)<\infty\).

Let \(Y(t)\) be the potential outcome that would have been observed under treatment level \(T=t\). Under some identification conditions (see Section 2.2 in our paper [1]), the (causal) dose-response curve \(t\mapsto m(t)=\mathbb{E}\left[Y(t)\right]\) coincides with the covariate-adjusted regression function \(t\mapsto \mathbb{E}\left[\mu(t,\textbf{S})\right]\) and can thus be identified from the observed data \(\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n\). In addition, we also consider estimating the derivative effect \(\theta(t)=m'(t)=\frac{d}{dt}\mathbb{E}\left[\mu(t,\textbf{S})\right]\).

Given the above identification formula, one traditional method for estimating the dose-response curve \(m(t)\) is through the following regression adjustment (RA) or G-computation estimator

\[\hat{m}_{RA}(t) = \frac{1}{n}\sum_{i=1}^n \hat{\mu}(t,\textbf{S}_i),\]

where \(\hat{\mu}(t,\textbf{s})\) is any consistent estimator of the conditional mean outcome function \(\mu(t,\textbf{s})\). However, when the positivity condition does not hold, the regression adjustment estimator will be unstable and even inconsistent. This is because without the positivity condition, the joint density \(p(t,\textbf{S}_i)=p(t|\textbf{S}_i)\cdot p_S(\textbf{S}_i)\) could be closer to 0 for some \(i=1,...,n\).

Key Insights and Proposed Estimators

To bypass the strong positivity condition, we consider imposing the following key assumption:

\[\mathbb{E}\left[\mu(T,\textbf{S})\right]=\mathbb{E}\left[m(T)\right] \quad \text{ and } \quad \theta(t)=\mathbb{E}\left[\frac{\partial}{\partial t} \mu(t,\textbf{S})\right]\]

= mathbb{E}left[frac{partial}{partial t} mu(t,textbf{S}) Big|T=tright].

It can be verified that the additive confounding model with \(\mu(T,\textbf{S})=m(T)+\eta(\textbf{S})\) satisfies the above two equalities.

Under the above assumption, we construct our estimator of \(m(t)\) from three critical insights:

Insight 1: \(\mu(t,\textbf{s})\) and \(\frac{\partial}{\partial t}\mu(t,\textbf{s})\) can be consistently estimated at each observation \((T_i,\textbf{S}_i)\) for \(i=1,...,n\).
Insight 2: \(\theta(t)\) can be consistently estimated through the localized form \(\theta_C(t)=\mathbb{E}\left[\frac{\partial}{\partial t} \mu(t,\textbf{S}) \big|T=t\right]\).
Insight 3: By the fundamental theorem of calculus, we know that

\[m(t) = m(T) + \int_{\tilde{t}=T}^{\tilde{t}=t} m'(\tilde{t})\, d\tilde{t} = m(T)+ \int_{\tilde{t}=T}^{\tilde{t}=t} \theta(\tilde{t})\, d\tilde{t}.\]

Under our key assumption, we can take the expectation on both sides of the above equality to obtain that

\[m(t) = \mathbb{E}\left[m(T) + \int_{\tilde{t}=T}^{\tilde{t}=t} \theta(\tilde{t})\, d\tilde{t}\right] =\mathbb{E}\left[\mu(T,\textbf{S})\right] + \mathbb{E}\left[\int_{\tilde{t}=T}^{\tilde{t}=t} \theta_C(\tilde{t})\, d\tilde{t}\right] = \mathbb{E}(Y) + \mathbb{E}\left\{\int_{\tilde{t}=T}^{\tilde{t}=t} \mathbb{E}\left[\frac{\partial}{\partial t}\mu(\tilde{t},\textbf{S})\Big|T=\tilde{t}\right] \, d\tilde{t}\right\}\]

Based on the above three insights, we thus propose an integral estimator of the dose-response curve \(m(t)\) as:

\[\hat{m}_\theta(t) = \frac{1}{n}\sum_{i=1}^n \left[Y_i + \int_{\tilde{t}=T_i}^{\tilde{t}=t} \hat{\theta}_C(\tilde{t})\, d\tilde{t} \right],\]

where \(\hat{\theta}_C(t)\) is a consistent estimator of \(\theta_C(t) = \mathbb{E}\left[\frac{\partial}{\partial t}\mu(t,\textbf{S})\big|T=t\right] = \int \frac{\partial}{\partial t} \mu(t,\textbf{s})\, d\mathrm{P}(\textbf{s}|t)\). The estimator \(\hat{\theta}_C(t)\) of the derivative effect \(\theta(t)\) includes two nuisance functions:

We fit the partial derivative \(\beta_2(t,\mathbf{s})=\frac{\partial}{\partial t} \mu(t,\mathbf{s})\) of the conditional mean outcome function by (partial) local polynomial regression. In particular, \(\hat{\beta}_2(t,\mathbf{s})\) is the second coordinate of the solution to the following weighted least square problem:

\[\left(\hat{\mathbf{\beta}}(t,\textbf{s}), \hat{\mathbf{\alpha}}(t,\textbf{s}) \right)^T = \arg\min_{(\mathbf{\beta},\mathbf{\alpha})^T \in \mathbb{R}^{q+1}\times \mathbb{R}^d} \sum_{i=1}^n \left[Y_i-\sum_{j=0}^q\beta_j (T_i-t)^q - \sum_{\ell=1}^d\alpha_{\ell}(S_{i,\ell}-s_{\ell})\right]^2 K_T\left(\frac{T_i-t}{h}\right)K_S\left(\frac{\mathbf{S}_i-\mathbf{s}}{\mathbf{b}}\right),\]

where \(K_T:\mathbb{R}\to [0,\infty), K_S:\mathbb{R}^d \to [0,\infty)\) are two kernel functions and \(h>0,\mathbf{b}\in \mathbb{R}_+^d\) be their corresponding smoothing bandwidth parameters.

We estimate the conditional cumulative distribution function (CDF) \(\mathrm{P}(\textbf{s}|t)\) via Nadaraya-Watson conditional CDF estimator.

\[\hat P(\mathbf{s}|t) = \frac{\sum_{i=1}^n \mathbb{1}_{\{\mathbf{S}_i\leq \mathbf{s}, |T_i-t|\leq \hslash\}} }{\sum_{j=1}^n \mathbb{1}_{\{|T_j-t|\leq \hslash\}}},\]

where \(\bar{K}_T:\mathbb{R}\to[0,\infty)\) is a kernel function and \(\hslash>0\) is the associated smoothing bandwidth parameter that needs not be the same as the bandwidth parameter \(h>0\).

This leads to our proposed localized derivative estimator of \(\theta(t)\) as:

\[\hat{\theta}_C(t)= \frac{\sum_{i=1}^n \hat{\beta}_2(t,\textbf{S}_i) \cdot \bar{K}_T\left(\frac{T_i-t}{\hslash}\right)}{\sum_{j=1}^n \bar{K}_T\left(\frac{T_j-t}{\hslash}\right)}.\]

Fast Computing Algorithm

Let \(T_{(1)}\leq \cdots\leq T_{(n)}\) be the order statistics of \(T_1,..., T_n\) and let \(\Delta_j = T_{(j+1)} - T_{(j)}\) for \(j=1,..., n-1\) be the consecutive difference.

We can approximate \(\hat{m}_{\theta}(T_{(j)})\) for \(j=1,...,n\) as:

\[\hat{m}_{\theta}(T_{(j)}) \approx \frac{1}{n}\sum_{i=1}^n Y_i + \frac{1}{n}\sum_{i=1}^{n-1} \Delta_i \left[ i \cdot \hat{\theta}_C(T_{(i)}) \mathbb{1}_{ \{ i < j \} } - (n-i)\cdot \hat{\theta}_C(T_{(i+1)}) \mathbb{1}_{\{ i\geq j \} } \right].\]

To evaluate \(\hat{m}_{\theta}(t)\) for any arbitrary \(t\), we conduct a linear interpolation between \(\hat{m}_{\theta}(T_{(j)})\) and \(\hat{m}_{\theta}(T_{(j+1)})\) on the interval \(t\in\left[T_{(j)}, T_{(j+1)}\right]\).

Bootstrap Inference

We consider conducting inference on the dose-response curve \(m(t)\) and its derivative effect \(\theta(t)=m'(t)\) via nonparametric bootstrap. Other bootstrap methods, including residual bootstrap and wild bootstrap, also work under some modified conditions.

Compute the integral estimator \(\hat{m}_{\theta}(t)\) and localized derivative estimator \(\hat{\theta}_C(t)\) on the original data \(\{(Y_i,T_i,\mathbf{S}_i)\}_{i=1}^n\).
Generate \(B\) bootstrap samples \(\left\{\left(Y_i^{*(b)},T_i^{*(b)},\mathbf{S}_i^{*(b)}\right)\right\}_{i=1}^n, b=1,...,B\) by sampling with replacement from the original data and compute the integral estimator \(\hat{m}_{\theta}^{*(b)}(t)\) and localized derivative estimator \(\hat{\theta}_C^{*(b)}(t)\) on each bootstrapped sample for \(b=1,...,B\).
Let \(\alpha \in (0,1)\) be a pre-specified significance level.
- For a pointwise inference at \(t_0\in \mathcal{T}\), we calculate the \(1-\alpha\) quantiles \(\zeta_{1-\alpha}^*(t_0)\) and \(\bar{\zeta}_{1-\alpha}^*(t_0)\) of \(\{D_1(t_0),...,D_B(t_0)\}\) and \(\{\bar{D}_1(t_0),...,\bar{D}_B(t_0)\}\) respectively, where \(D_b(t_0) = \left|\hat{m}_{\theta}^{*(b)}(t_0) - \hat{m}_{\theta}(t_0)\right|\) and \(\bar{D}_b(t_0) = \left|\hat{\theta}_C^{*(b)}(t_0) - \hat{\theta}_C(t_0)\right|\) for \(b=1,...,B\).
- For an uniform inference on the entire dose-response curve \(m(t)\) and its derivative \(\theta(t)\), we compute the \(1-\alpha\) quantiles \(\xi_{1-\alpha}^*\) and \(\bar{\xi}_{1-\alpha}^*\) of \(\{D_{\sup,1},...,D_{\sup,B}\}\) and \(\{\bar{D}_{\sup,1},...,\bar{D}_{\sup,B}\}\) respectively, where \(D_{\sup,b} = \sup_{t\in \mathcal{T}}\left|\hat{m}_{\theta}^{*(b)}(t) - \hat{m}_{\theta}(t)\right|\) and \(\bar{D}_{\sup,b} = \sup_{t\in \mathcal{T}}\left|\hat{\theta}_C^{*(b)}(t) - \hat{\theta}_C(t)\right|\) for \(b=1,...,B\).
Define the \(1-\alpha\) confidence intervals for \(m(t_0)\) and \(\theta(t_0)\) as:

\[\left[\hat{m}_{\theta}(t_0) - \zeta_{1-\alpha}^*(t_0),\, \hat{m}_{\theta}(t_0) + \zeta_{1-\alpha}^*(t_0)\right] \quad \text{ and } \quad \left[\hat{\theta}_C(t_0) - \bar{\zeta}_{1-\alpha}^*(t_0),\, \hat{\theta}_C(t_0) + \bar{\zeta}_{1-\alpha}^*(t_0)\right]\]

respectively, as well as the simultaneous \(1-\alpha\) confidence bands as:

\[\left[\hat{m}_{\theta}(t) - \xi_{1-\alpha}^*,\, \hat{m}_{\theta}(t) + \xi_{1-\alpha}^*\right] \quad \text{ and } \quad \left[\hat{\theta}_C(t) - \bar{\xi}_{1-\alpha}^*,\, \hat{\theta}_C(t) + \bar{\xi}_{1-\alpha}^*\right]\]

for every \(t\in \mathcal{T}\), where \(\mathcal{T}\) is the support of the marginal density of \(T\).