Methodology
Model Setup
Consider a random sample \(\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n \subset \mathbb{R}\times \mathbb{R} \times \mathbb{R}^d\) generated from the following model:
where
\(Y\) is the outcome variable,
\(T\) is the continuous treatment variable,
\(\textbf{S}\) is a vector of covariates that incorporate confounding variables,
\(E\in\mathbb{R}\) is the treatment variation with \(\mathbb{E}(E)=0\) and \(E\) being independent of \(\textbf{S}\),
\(\epsilon\in\mathbb{R}\) is an exogenous noise variable with \(\mathbb{E}(\epsilon)=0, \mathrm{Var}(\epsilon)=\sigma^2>0,\mathrm{E}(\epsilon^4)<\infty\).
Let \(Y(t)\) be the potential outcome that would have been observed under treatment level \(T=t\). Under some identification conditions (see Section 2.2 in our paper [1]), the (causal) dose-response curve \(t\mapsto m(t)=\mathbb{E}\left[Y(t)\right]\) coincides with the covariate-adjusted regression function \(t\mapsto \mathbb{E}\left[\mu(t,\textbf{S})\right]\) and can thus be identified from the observed data \(\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n\). In addition, we also consider estimating the derivative effect \(\theta(t)=m'(t)=\frac{d}{dt}\mathbb{E}\left[\mu(t,\textbf{S})\right]\).
Given the above identification formula, one traditional method for estimating the dose-response curve \(m(t)\) is through the following regression adjustment (RA) or G-computation estimator
where \(\hat{\mu}(t,\textbf{s})\) is any consistent estimator of the conditional mean outcome function \(\mu(t,\textbf{s})\). However, when the positivity condition does not hold, the regression adjustment estimator will be unstable and even inconsistent. This is because without the positivity condition, the joint density \(p(t,\textbf{S}_i)=p(t|\textbf{S}_i)\cdot p_S(\textbf{S}_i)\) could be closer to 0 for some \(i=1,...,n\).
Key Insights and Proposed Estimators
To bypass the strong positivity condition, we consider imposing the following key assumption:
= mathbb{E}left[frac{partial}{partial t} mu(t,textbf{S}) Big|T=tright].
It can be verified that the additive confounding model with \(\mu(T,\textbf{S})=m(T)+\eta(\textbf{S})\) satisfies the above two equalities.
Under the above assumption, we construct our estimator of \(m(t)\) from three critical insights:
Insight 1: \(\mu(t,\textbf{s})\) and \(\frac{\partial}{\partial t}\mu(t,\textbf{s})\) can be consistently estimated at each observation \((T_i,\textbf{S}_i)\) for \(i=1,...,n\).
Insight 2: \(\theta(t)\) can be consistently estimated through the localized form \(\theta_C(t)=\mathbb{E}\left[\frac{\partial}{\partial t} \mu(t,\textbf{S}) \big|T=t\right]\).
Insight 3: By the fundamental theorem of calculus, we know that
Under our key assumption, we can take the expectation on both sides of the above equality to obtain that
Based on the above three insights, we thus propose an integral estimator of the dose-response curve \(m(t)\) as:
where \(\hat{\theta}_C(t)\) is a consistent estimator of \(\theta_C(t) = \mathbb{E}\left[\frac{\partial}{\partial t}\mu(t,\textbf{S})\big|T=t\right] = \int \frac{\partial}{\partial t} \mu(t,\textbf{s})\, d\mathrm{P}(\textbf{s}|t)\). The estimator \(\hat{\theta}_C(t)\) of the derivative effect \(\theta(t)\) includes two nuisance functions:
We fit the partial derivative \(\beta_2(t,\mathbf{s})=\frac{\partial}{\partial t} \mu(t,\mathbf{s})\) of the conditional mean outcome function by (partial) local polynomial regression. In particular, \(\hat{\beta}_2(t,\mathbf{s})\) is the second coordinate of the solution to the following weighted least square problem:
where \(K_T:\mathbb{R}\to [0,\infty), K_S:\mathbb{R}^d \to [0,\infty)\) are two kernel functions and \(h>0,\mathbf{b}\in \mathbb{R}_+^d\) be their corresponding smoothing bandwidth parameters.
We estimate the conditional cumulative distribution function (CDF) \(\mathrm{P}(\textbf{s}|t)\) via Nadaraya-Watson conditional CDF estimator.
where \(\bar{K}_T:\mathbb{R}\to[0,\infty)\) is a kernel function and \(\hslash>0\) is the associated smoothing bandwidth parameter that needs not be the same as the bandwidth parameter \(h>0\).
This leads to our proposed localized derivative estimator of \(\theta(t)\) as:
Fast Computing Algorithm
Let \(T_{(1)}\leq \cdots\leq T_{(n)}\) be the order statistics of \(T_1,..., T_n\) and let \(\Delta_j = T_{(j+1)} - T_{(j)}\) for \(j=1,..., n-1\) be the consecutive difference.
We can approximate \(\hat{m}_{\theta}(T_{(j)})\) for \(j=1,...,n\) as:
To evaluate \(\hat{m}_{\theta}(t)\) for any arbitrary \(t\), we conduct a linear interpolation between \(\hat{m}_{\theta}(T_{(j)})\) and \(\hat{m}_{\theta}(T_{(j+1)})\) on the interval \(t\in\left[T_{(j)}, T_{(j+1)}\right]\).
Bootstrap Inference
We consider conducting inference on the dose-response curve \(m(t)\) and its derivative effect \(\theta(t)=m'(t)\) via nonparametric bootstrap. Other bootstrap methods, including residual bootstrap and wild bootstrap, also work under some modified conditions.
Compute the integral estimator \(\hat{m}_{\theta}(t)\) and localized derivative estimator \(\hat{\theta}_C(t)\) on the original data \(\{(Y_i,T_i,\mathbf{S}_i)\}_{i=1}^n\).
Generate \(B\) bootstrap samples \(\left\{\left(Y_i^{*(b)},T_i^{*(b)},\mathbf{S}_i^{*(b)}\right)\right\}_{i=1}^n, b=1,...,B\) by sampling with replacement from the original data and compute the integral estimator \(\hat{m}_{\theta}^{*(b)}(t)\) and localized derivative estimator \(\hat{\theta}_C^{*(b)}(t)\) on each bootstrapped sample for \(b=1,...,B\).
Let \(\alpha \in (0,1)\) be a pre-specified significance level.
For a pointwise inference at \(t_0\in \mathcal{T}\), we calculate the \(1-\alpha\) quantiles \(\zeta_{1-\alpha}^*(t_0)\) and \(\bar{\zeta}_{1-\alpha}^*(t_0)\) of \(\{D_1(t_0),...,D_B(t_0)\}\) and \(\{\bar{D}_1(t_0),...,\bar{D}_B(t_0)\}\) respectively, where \(D_b(t_0) = \left|\hat{m}_{\theta}^{*(b)}(t_0) - \hat{m}_{\theta}(t_0)\right|\) and \(\bar{D}_b(t_0) = \left|\hat{\theta}_C^{*(b)}(t_0) - \hat{\theta}_C(t_0)\right|\) for \(b=1,...,B\).
For an uniform inference on the entire dose-response curve \(m(t)\) and its derivative \(\theta(t)\), we compute the \(1-\alpha\) quantiles \(\xi_{1-\alpha}^*\) and \(\bar{\xi}_{1-\alpha}^*\) of \(\{D_{\sup,1},...,D_{\sup,B}\}\) and \(\{\bar{D}_{\sup,1},...,\bar{D}_{\sup,B}\}\) respectively, where \(D_{\sup,b} = \sup_{t\in \mathcal{T}}\left|\hat{m}_{\theta}^{*(b)}(t) - \hat{m}_{\theta}(t)\right|\) and \(\bar{D}_{\sup,b} = \sup_{t\in \mathcal{T}}\left|\hat{\theta}_C^{*(b)}(t) - \hat{\theta}_C(t)\right|\) for \(b=1,...,B\).
Define the \(1-\alpha\) confidence intervals for \(m(t_0)\) and \(\theta(t_0)\) as:
respectively, as well as the simultaneous \(1-\alpha\) confidence bands as:
for every \(t\in \mathcal{T}\), where \(\mathcal{T}\) is the support of the marginal density of \(T\).