Methodology I (Basic Framework and RA Estimator Without Positivity)
===========

Identification
------------

Let :math:`Y(t)` be the potential outcome that would have been observed under treatment level :math:`T=t`. Consider a random sample :math:`\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n \subset \mathbb{R}\times \mathbb{R} \times \mathbb{R}^d`. We assume the following basic identification conditions.

* **A1(a) (Consistency)** :math:`T_i=t` implies that :math:`Y_i=Y_i(t)`.
* **A1(b) (Unconfoundedness)** :math:`Y_i(t)` is conditionally independent of :math:`T` given :math:`\textbf{S}`.
* **A1(c) (Treatment Variation)** The conditional variance of :math:`T` given any :math:`\textbf{S}=\textbf{s}` is strictly positive, i.e., :math:`\text{Var}(T|\textbf{S}=\textbf{s})>0`.

The above assumptions are generally insufficient to identify the dose-response curve :math:`m(t)=\mathbb{E}\left[Y(t)\right]` with observable data. The existing methods also assume the following positivity condition.

* **(A0) (Positivity)** The conditional density :math:`p(t|\textbf{s})` is bounded above and away from zero almost surely for all :math:`t` and :math:`\textbf{s}`.

Then, the (causal) dose-response curve :math:`t\mapsto m(t)=\mathbb{E}\left[Y(t)\right]` coincides with the covariate-adjusted regression function :math:`t\mapsto \mathbb{E}\left[\mu(t,\textbf{S})\right]` and can thus be identified from the observed data :math:`\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n`. In addition, we can also identify the derivative effect :math:`\theta(t)=m'(t)=\frac{d}{dt}\mathbb{E}\left[\mu(t,\textbf{S})\right]`.

Given the above identification formula, a traditional method for estimating the dose-response curve :math:`m(t)` is through the following regression adjustment (RA) or G-computation estimator

.. math::

    \hat{m}_{RA}(t)  = \frac{1}{n}\sum_{i=1}^n \hat{\mu}(t,\textbf{S}_i),

where :math:`\hat{\mu}(t,\textbf{s})` is any consistent estimator of the conditional mean outcome function :math:`\mu(t,\textbf{s})=\mathbb{E}(Y|T=t,\textbf{S}=\textbf{s})`. 


However, when the positivity condition is violated, the above RA estimator does not work and will be inconsistent for estimation as well. To address the identification issue without positivity, we assume that the function :math:`\mathbb{E}\left[Y(t)|\textbf{S}=\textbf{s}\right]` is continuously differentiable with respect to :math:`t` for any :math:`(t,\textbf{s})` such that :math:`p(\textbf{s}|t)>0`, and the following equalities hold true:

.. math::

    \theta(t) = \mathbb{E}\left[\frac{\partial}{\partial t} \mathbb{E}\left[Y(t)|\textbf{S}\right]\right] = \mathbb{E}\left[\frac{\partial}{\partial t} \mathbb{E}\left[Y(t)|\textbf{S}\right]\Big|T=t\right].

Together with Assumptions A1(a-c), we can identify the derivative :math:`\theta(t)` of the dose-response curve as:

.. math::

    \theta(t) = \mathbb{E}\left[\frac{\partial}{\partial t} \mathbb{E}\left[Y(t)|\textbf{S}\right]\Big|T=t\right] = \mathbb{E}\left[\frac{\partial}{\partial t} \mathbb{E}\left(Y|T=t,\textbf{S}\right)\Big|T=t\right] :=\theta_C(t).

Once :math:`\theta(t)` is identifiable from observable data, we can refer the identification and estimation back to the dose-response curve :math:`m(t)` as follows. By the fundamental theorem of calculus, we know that

.. math::

    m(t) = m(T) + \int_{\tilde{t}=T}^{\tilde{t}=t} m'(\tilde{t})\, d\tilde{t} = m(T)+ \int_{\tilde{t}=T}^{\tilde{t}=t} \theta(\tilde{t})\, d\tilde{t}.

Under our key assumption, we can take the expectation on both sides of the above equality to obtain that

.. math::

    m(t) = \mathbb{E}\left[m(T) + \int_{\tilde{t}=T}^{\tilde{t}=t} \theta(\tilde{t})\, d\tilde{t}\right] =\mathbb{E}\left[\mu(T,\textbf{S})\right] + \mathbb{E}\left[\int_{\tilde{t}=T}^{\tilde{t}=t} \theta_C(\tilde{t})\, d\tilde{t}\right] = \mathbb{E}(Y) + \mathbb{E}\left\{\int_{\tilde{t}=T}^{\tilde{t}=t} \mathbb{E}\left[\frac{\partial}{\partial t}\mu(\tilde{t},\textbf{S})\Big|T=\tilde{t}\right] \, d\tilde{t}\right\}

All the quantities on the right-hand of the above equation are identifiable from the observed data :math:`\{(Y_i,T_i,\textbf{S}_i)\}_{i=1}^n` even without the positivity condition.


Estimation and Inference
------------------------------------

Based on our identification strategies without positivity, we thus propose an *integral estimator* of the dose-response curve :math:`m(t)` as:

.. math::

    \hat{m}_\theta(t) = \frac{1}{n}\sum_{i=1}^n \left[Y_i + \int_{\tilde{t}=T_i}^{\tilde{t}=t} \hat{\theta}_C(\tilde{t})\, d\tilde{t} \right],

where :math:`\hat{\theta}_C(t)` is a consistent estimator of :math:`\theta_C(t) = \mathbb{E}\left[\frac{\partial}{\partial t}\mu(t,\textbf{S})\big|T=t\right] = \int \frac{\partial}{\partial t} \mu(t,\textbf{s})\, d\mathrm{P}(\textbf{s}|t)`. The estimator :math:`\hat{\theta}_C(t)` of the derivative effect :math:`\theta(t)` includes two nuisance functions:

* We fit the partial derivative :math:`\beta_2(t,\mathbf{s})=\frac{\partial}{\partial t} \mu(t,\mathbf{s})` of the conditional mean outcome function by (partial) local polynomial regression. In particular, :math:`\hat{\beta}_2(t,\mathbf{s})` is the second coordinate of the solution to the following weighted least square problem:

.. math::

    \left(\hat{\mathbf{\beta}}(t,\textbf{s}), \hat{\mathbf{\alpha}}(t,\textbf{s}) \right)^T = \arg\min_{(\mathbf{\beta},\mathbf{\alpha})^T \in \mathbb{R}^{q+1}\times \mathbb{R}^d} \sum_{i=1}^n \left[Y_i-\sum_{j=0}^q\beta_j (T_i-t)^q - \sum_{\ell=1}^d\alpha_{\ell}(S_{i,\ell}-s_{\ell})\right]^2 K_T\left(\frac{T_i-t}{h}\right)K_S\left(\frac{\mathbf{S}_i-\mathbf{s}}{\mathbf{b}}\right),

where :math:`K_T:\mathbb{R}\to [0,\infty), K_S:\mathbb{R}^d \to [0,\infty)` are two kernel functions and :math:`h>0,\mathbf{b}\in \mathbb{R}_+^d` be their corresponding smoothing bandwidth parameters.

* We estimate the conditional cumulative distribution function (CDF) :math:`\mathrm{P}(\textbf{s}|t)` via Nadaraya-Watson conditional CDF estimator.

.. math::

    \hat P(\mathbf{s}|t) = \frac{\sum_{i=1}^n  \mathbb{1}_{\{\mathbf{S}_i\leq \mathbf{s}, |T_i-t|\leq \hslash\}} }{\sum_{j=1}^n \mathbb{1}_{\{|T_j-t|\leq \hslash\}}},

where :math:`\bar{K}_T:\mathbb{R}\to[0,\infty)` is a kernel function and :math:`\hslash>0` is the associated smoothing bandwidth parameter that needs not be the same as the bandwidth parameter :math:`h>0`.

This leads to our proposed *localized derivative estimator* of :math:`\theta(t)` as:

.. math::

    \hat{\theta}_C(t)= \frac{\sum_{i=1}^n \hat{\beta}_2(t,\textbf{S}_i) \cdot \bar{K}_T\left(\frac{T_i-t}{\hslash}\right)}{\sum_{j=1}^n \bar{K}_T\left(\frac{T_j-t}{\hslash}\right)}.


Fast Computing Algorithm
----------------------------

Let :math:`T_{(1)}\leq \cdots\leq T_{(n)}` be the order statistics of :math:`T_1,..., T_n` and let :math:`\Delta_j = T_{(j+1)} - T_{(j)}` for :math:`j=1,..., n-1` be the consecutive difference. 

* We can approximate :math:`\hat{m}_{\theta}(T_{(j)})` for :math:`j=1,...,n` as:

.. math::

    \hat{m}_{\theta}(T_{(j)}) \approx \frac{1}{n}\sum_{i=1}^n Y_i + \frac{1}{n}\sum_{i=1}^{n-1} \Delta_i \left[ i \cdot \hat{\theta}_C(T_{(i)}) \mathbb{1}_{ \{ i < j \} } - (n-i)\cdot \hat{\theta}_C(T_{(i+1)}) \mathbb{1}_{\{ i\geq j \} } \right].

* To evaluate :math:`\hat{m}_{\theta}(t)` for any arbitrary :math:`t`, we conduct a linear interpolation between :math:`\hat{m}_{\theta}(T_{(j)})` and :math:`\hat{m}_{\theta}(T_{(j+1)})` on the interval :math:`t\in\left[T_{(j)}, T_{(j+1)}\right]`.

Bootstrap Inference
----------------------------

We consider conducting inference on the dose-response curve :math:`m(t)` and its derivative effect :math:`\theta(t)=m'(t)` via nonparametric bootstrap. Other bootstrap methods, including residual bootstrap and wild bootstrap, also work under some modified conditions.

1. Compute the integral estimator :math:`\hat{m}_{\theta}(t)` and localized derivative estimator :math:`\hat{\theta}_C(t)` on the original data :math:`\{(Y_i,T_i,\mathbf{S}_i)\}_{i=1}^n`.

2. Generate :math:`B` bootstrap samples :math:`\left\{\left(Y_i^{*(b)},T_i^{*(b)},\mathbf{S}_i^{*(b)}\right)\right\}_{i=1}^n, b=1,...,B` by sampling with replacement from the original data and compute the integral estimator :math:`\hat{m}_{\theta}^{*(b)}(t)` and localized derivative estimator :math:`\hat{\theta}_C^{*(b)}(t)` on each bootstrapped sample for :math:`b=1,...,B`.

3. Let :math:`\alpha \in (0,1)` be a pre-specified significance level.

    * For a pointwise inference at :math:`t_0\in \mathcal{T}`, we calculate the :math:`1-\alpha` quantiles :math:`\zeta_{1-\alpha}^*(t_0)` and :math:`\bar{\zeta}_{1-\alpha}^*(t_0)` of :math:`\{D_1(t_0),...,D_B(t_0)\}` and :math:`\{\bar{D}_1(t_0),...,\bar{D}_B(t_0)\}` respectively, where :math:`D_b(t_0) = \left|\hat{m}_{\theta}^{*(b)}(t_0) - \hat{m}_{\theta}(t_0)\right|` and :math:`\bar{D}_b(t_0) = \left|\hat{\theta}_C^{*(b)}(t_0) - \hat{\theta}_C(t_0)\right|` for :math:`b=1,...,B`.

    * For an uniform inference on the entire dose-response curve :math:`m(t)` and its derivative :math:`\theta(t)`, we compute the :math:`1-\alpha` quantiles :math:`\xi_{1-\alpha}^*` and :math:`\bar{\xi}_{1-\alpha}^*` of :math:`\{D_{\sup,1},...,D_{\sup,B}\}` and :math:`\{\bar{D}_{\sup,1},...,\bar{D}_{\sup,B}\}` respectively, where :math:`D_{\sup,b} = \sup_{t\in \mathcal{T}}\left|\hat{m}_{\theta}^{*(b)}(t) - \hat{m}_{\theta}(t)\right|` and :math:`\bar{D}_{\sup,b} = \sup_{t\in \mathcal{T}}\left|\hat{\theta}_C^{*(b)}(t) - \hat{\theta}_C(t)\right|` for :math:`b=1,...,B`.

4. Define the :math:`1-\alpha` confidence intervals for :math:`m(t_0)` and :math:`\theta(t_0)` as:

.. math::

    \left[\hat{m}_{\theta}(t_0) - \zeta_{1-\alpha}^*(t_0),\, \hat{m}_{\theta}(t_0) + \zeta_{1-\alpha}^*(t_0)\right] \quad \text{ and } \quad \left[\hat{\theta}_C(t_0) - \bar{\zeta}_{1-\alpha}^*(t_0),\, \hat{\theta}_C(t_0) + \bar{\zeta}_{1-\alpha}^*(t_0)\right]

respectively, as well as the simultaneous :math:`1-\alpha` confidence bands as:

.. math::

	\left[\hat{m}_{\theta}(t) - \xi_{1-\alpha}^*,\, \hat{m}_{\theta}(t) + \xi_{1-\alpha}^*\right] \quad \text{ and } \quad \left[\hat{\theta}_C(t) - \bar{\xi}_{1-\alpha}^*,\, \hat{\theta}_C(t) + \bar{\xi}_{1-\alpha}^*\right]

for every :math:`t\in \mathcal{T}`, where :math:`\mathcal{T}` is the support of the marginal density of :math:`T`.


References
----------

.. [1] Yikun Zhang, Yen-Chi Chen, Alexander Giessing (2024+). Nonparametric Inference on Dose-Response Curves Without the Positivity Condition. *arXiv:2405.09003*