API References

Main Functions in the npDoseResponse Package

npDoseResponse.npDoseResponse.DerivEffect(Y, X, t_eval=None, h_bar=None, kernT_bar='gaussian', h=None, b=None, C_h=7, C_b=3, print_bw=True, degree=2, deriv_ord=1, kernT='epanechnikov', kernS='epanechnikov', parallel=False, processes=20)[source]

Estimating the derivative of the dose-response curve via Nadaraya-Watson conditional CDF estimator.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0], which consists of the observed treatment variables.)

  • h_bar (float) – The bandwidth parameters for the Nadaraya-Watson conditional CDF estimator. (Default: h_bar=None. Then, the Silverman’s rule of thumb is applied. See Chen et al.(2016) for details.)

  • kernT_bar (str) – The name of the kernel function for Nadaraya-Watson conditional CDF estimator. (Default: “gaussian”.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • parallel (boolean) – The indicator of whether the function should be parallel executed by multi-processing. (Default: parallel=False.)

  • processes (int) – The number of processes for parallel execution. (Default: processes=20.)

Returns:

theta_C – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponse.DerivEffectBoot(Y, X, t_eval=None, boot_num=500, alpha=0.95, h_bar=None, kernT_bar='gaussian', h=None, b=None, C_h=7, C_b=3, print_bw=True, degree=2, deriv_ord=1, kernT='epanechnikov', kernS='epanechnikov', parallel=False, processes=20)[source]

Conduct inference on the derivative of the dose-response curve via Nadaraya-Watson conditional CDF estimator and nonparametric bootstrap.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0], which consists of the observed treatment variables.)

  • boot_num (int) – The number of bootstrapping times. (Default: bootstrap_num=500.)

  • alpha (float) – The confidence level of both the uniform confidence band and pointwise confidence interval. (Default: alpha=0.95.)

  • h_bar (float) – The bandwidth parameters for the Nadaraya-Watson conditional CDF estimator. (Default: h_bar=None. Then, the Silverman’s rule of thumb is applied. See Chen et al.(2016) for details.)

  • kernT_bar (str) – The name of the kernel function for Nadaraya-Watson conditional CDF estimator. (Default: “gaussian”.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • parallel (boolean) – The indicator of whether the function should be parallel executed by multi-processing. (Default: parallel=False.)

  • processes (int) – The number of processes for parallel execution. (Default: processes=20.)

Returns:

  • theta_C ((m,)-array) – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

  • theta_C_boot ((m,)-array) – The estimated derivatives of the dose-response curve on bootstrap samples evaluated at points “t_eval”.

  • theta_alpha (float) – The width of the uniform confidence band.

  • theta_alpha_var ((m,)-array) – The widths of the pointwise confidence bands at evaluation points “t_eval”.

npDoseResponse.npDoseResponse.IntegEst(Y, X, t_eval=None, h_bar=None, kernT_bar='gaussian', h=None, b=None, C_h=7, C_b=3, print_bw=True, degree=2, deriv_ord=1, kernT='epanechnikov', kernS='epanechnikov', parallel=False, processes=20)[source]

Estimating the dose-response curve via our integral estimator with linear interpolation approximation.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0].)

  • h_bar (float) – The bandwidth parameters for the Nadaraya-Watson conditional CDF estimator. (Default: h_bar=None. Then, the Silverman’s rule of thumb is applied. See Chen et al.(2016) for details.)

  • kernT_bar (str) – The name of the kernel function for Nadaraya-Watson conditional CDF estimator. (Default: “gaussian”.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • parallel (boolean) – The indicator of whether the function should be parallel executed by multi-processing. (Default: parallel=False.)

  • processes (int) – The number of processes for parallel execution. (Default: processes=20.)

Returns:

m_est – The estimated dose-response curve evaluated at points “t_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponse.IntegEstBoot(Y, X, t_eval=None, boot_num=500, alpha=0.95, h_bar=None, kernT_bar='gaussian', h=None, b=None, C_h=7, C_b=3, print_bw=True, degree=2, deriv_ord=1, kernT='epanechnikov', kernS='epanechnikov', parallel=False, processes=20)[source]

Conduct inference on the dose-response curve via our integral estimator and nonparametric bootstrap.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0].)

  • boot_num (int) – The number of bootstrapping times. (Default: bootstrap_num=500.)

  • alpha (float) – The confidence level of both the uniform confidence band and pointwise confidence interval. (Default: alpha=0.95.)

  • h_bar (float) – The bandwidth parameters for the Nadaraya-Watson conditional CDF estimator. (Default: h_bar=None. Then, the Silverman’s rule of thumb is applied. See Chen et al.(2016) for details.)

  • kernT_bar (str) – The name of the kernel function for Nadaraya-Watson conditional CDF estimator. (Default: “gaussian”.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • parallel (boolean) – The indicator of whether the function should be parallel executed by multi-processing. (Default: parallel=False.)

  • processes (int) – The number of processes for parallel execution. (Default: processes=20.)

Returns:

  • m_est ((m,)-array) – The estimated dose-response curve evaluated at points “t_eval”.

  • m_est_boot ((boot_num, m)-array) – The estimated dose-response curves (or their derivatives) on the bootstrap samples evaluated at points “t_eval”.

  • m_alpha (float) – The width of the uniform confidence band.

  • m_alpha_var ((m,)-array) – The widths of the pointwise confidence bands at evaluation points “t_eval”.

npDoseResponse.npDoseResponse.LocalPolyReg(Y, X, x_eval=None, degree=2, deriv_ord=1, h=None, b=None, C_h=7, C_b=3, print_bw=True, kernT='epanechnikov', kernS='epanechnikov', h_lst=numpy.linspace, b_lst=numpy.linspace)[source]

(Partial) Local polynomial regression for estimating the conditional mean outcome function and its partial derivatives. We use higher order local monomials for the treatment variable and first-order local monomials for the confounding variables.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • x_eval ((m,d+1)-array) – The coordinates of the m evaluation points. (Default: x_eval=None. Then, x_eval=X.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • h_lst ((k1,)-array and (k2,)-array) – Candidate searching values of h,b for LOOCV.

  • b_lst ((k1,)-array and (k2,)-array) – Candidate searching values of h,b for LOOCV.

Returns:

Y_est – The estimated conditional mean outcome function or its partial derivatives evaluated at points “x_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponse.LocalPolyReg1D(Y, X, h=None, x_eval=None, degree=2, deriv_ord=0, kernel='epanechnikov')[source]

Local polynomial regression in one dimension.

Parameters:
  • Y ((m,)-array) – The y coordinates of m data points.

  • X ((m,)-array) – The x coordinates of m data points.

  • h (float) – The bandwidth parameter. (Default: h=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used.)

  • x_eval ((k,)-array) – Vector of evaluation points. (Default: x_eval=None. Then, x_eval=X.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of derivatives of the regression function that are estimated. (Default: deriv_ord=0. Then, it is the usual local polynomial regression.)

Returns:

Y_est – The estimated function or its derivatives by local polynomial regression evaluated at points “x_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponse.LocalPolyRegMain(Y, X, x_eval=None, degree=2, deriv_ord=1, h=None, b=None, kernT='epanechnikov', kernS='epanechnikov')[source]

Main function for computing the local polynomial regression.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • x_eval ((m,d+1)-array) – The coordinates of the m evaluation points. (Default: x_eval=None. Then, x_eval=X.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables.

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables.

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

Returns:

Y_est – The estimated conditional mean outcome function or its partial derivatives evaluated at points “x_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponse.LocalPolyReg_Fs(x_eval, Y, X, degree=2, deriv_ord=1, h=None, b=None, C_h=7, C_b=3, print_bw=True, kernT='epanechnikov', kernS='epanechnikov', h_lst=numpy.linspace, b_lst=numpy.linspace)[source]

(Partial) Local polynomial regression for estimating the conditional mean outcome function and its partial derivatives. We use higher order local monomials for the treatment variable and first-order local monomials for the confounding variables. (This function is for multi-process execution only.)

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • x_eval ((m,d+1)-array) – The coordinates of the m evaluation points. (Default: x_eval=None. Then, x_eval=X.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • h_lst ((k1,)-array and (k2,)-array) – Candidate searching values of h,b for LOOCV.

  • b_lst ((k1,)-array and (k2,)-array) – Candidate searching values of h,b for LOOCV.

Returns:

Y_est – The estimated conditional mean outcome function or its partial derivatives evaluated at points “x_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponse.RegAdjust(Y, X, t_eval=None, h=None, b=None, C_h=7, C_b=3, print_bw=True, degree=2, deriv_ord=0, kernT='epanechnikov', kernS='epanechnikov', parallel=False, processes=20)[source]

Estimating the dose-response curve via simple integral estimator with linear interpolation approximation.

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0].)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables. (Default: h=None, b=None. Then, the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999) is used with additional scaling factors C_h and C_b, respectively.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative of the conditional mean outcome function. (Default: deriv_ord=0. Then, it estimates the conditional mean outcome function itself.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • parallel (boolean) – The indicator of whether the function should be parallel executed by multi-processing. (Default: parallel=False.)

  • processes (int) – The number of processes for parallel execution. (Default: processes=20.)

Returns:

m_est – The estimated dose-response curve (or its derivative) evaluated at points “t_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponseDR.DRCurve(Y, X, t_eval=None, est='RA', mu=None, condTS_type=None, condTS_mod=None, L=1, h=None, kern='epanechnikov', tau=0.001, h_cond=None, self_norm=True, print_bw=True)[source]

Dose-response curve estimation under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0], which consists of the observed treatment variables.)

  • est (str) – The type of the dose-response curve estimator. (Default: est=”RA”. Other choices include “IPW” and “DR”.)

  • mu (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset.

  • h (float) – The bandwidth parameter for the IPW/DR estimator. (Default: h=None. Then the Silverman’s rule of thumb is applied; see Chen et al.(2016) for details.)

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • h_cond (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

Returns:

  • m_est ((m,)-array) – The estimated dose-response curve evaluated at points “t_eval”.

  • sd_est ((m,)-array (if est=”DR”)) – The estimated asymptotic standard deviation of the DR estimator evaluated at points “t_eval”.

npDoseResponse.npDoseResponseDR.DRDR(Y, X, t_eval, mu, condTS_type, condTS_mod, L, h, kern='epanechnikov', tau=0.001, b=None, self_norm=True)[source]

Estimating the dose-response curve through the doubly robust (DR) form.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset.

  • h (float) – The bandwidth parameter.

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

  • m_est ((m,)-array) – The estimated dose-response curve evaluated at points “t_eval”.

  • sd_est ((m,)-array) – The estimated asymptotic stdndard deviation of the DR estimator evaluated at points “t_eval”.

npDoseResponse.npDoseResponseDR.IPWDR(Y, X, t_eval, condTS_type, condTS_mod, L, h, kern='epanechnikov', tau=0.001, b=None, self_norm=True)[source]

Estimating the dose-response curve through the inverse probability weighting (IPW) form.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter.

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

  • m_est ((m,)-array) – The estimated dose-response curve evaluated at points “t_eval”.

  • cond_est_full ((n,)-array) – The estimated conditional density function of T given S evaluated at the n observed data points.

npDoseResponse.npDoseResponseDR.RegAdjustDR(Y, X, t_eval, mu, L=1, multi_boot=False, B=1000)[source]

Estimating the dose-response curve through the regression adjustment (or G-computation) form.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • multi_boot (boolean) – An indicator of whether the multiplier bootstrap will be run. (Default: multi_boot=False.)

  • B (int) – The number of bootstrapping times. (Default: B=1000.)

Returns:

  • m_est ((m,)-array) – The estimated dose-response curve evaluated at points “t_eval”.

  • mu_boot ((B,m)-array) – The estimated dose-response curves on bootstrapping data evaluated at points “t_eval”. (Only return this quantity when “multi_boot=True”.)

npDoseResponse.npDoseResponseDerivDR.DRDRDeriv(Y, X, t_eval, mu, condTS_type, condTS_mod, L, h, kern='epanechnikov', n_iter=1000, lr=0.01, tau=0.001, b=None, self_norm=True)[source]

Estimating the derivative of a dose-response curve through the doubly robust (DR) form by a PyTorch neural network model under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (a neural network class defined by PyTorch) – The conditional mean outcome (or regression) model of Y given X.

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (cikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter.

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • n_iter (int) – The number of iterations or training epochs of the neural network model. (Default: n_iter=1000.)

  • lr (float) – The learning rate (Default: lr=0.01.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

  • theta_est ((m,)-array) – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

  • sd_est ((m,)-array) – The estimated asymptotic stdndard deviation of the DR derivative estimator evaluated at points “t_eval”.

npDoseResponse.npDoseResponseDerivDR.DRDRDerivBC(Y, X, t_eval, mu, L=1, h=None, kern='epanechnikov', n_iter=1000, lr=0.01, b=None, thres_val=0.75, self_norm=True)[source]

Estimating the derivative of a dose-response curve through the doubly robust (DR) form by a PyTorch neural network model without assuming the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (a neural network class defined by PyTorch) – The conditional mean outcome (or regression) model of Y given X.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter. (Default: h=None. Then, the Silverman’s rule of thumb is applied; see Chen et al.(2016) for details.)

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • n_iter (int) – The number of iterations or training epochs of the neural network model. (Default: n_iter=1000.)

  • lr (float) – The learning rate (Default: lr=0.01.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • thres_val (float) – The threshold factor that is multiplied to the maximum conditional density values of S given T evaluated at the sample points. (Default: thres_val=0.75.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

  • theta_est ((m,)-array) – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

  • sd_est ((m,)-array) – The estimated asymptotic stdndard deviation of the DR derivative estimator evaluated at points “t_eval”.

npDoseResponse.npDoseResponseDerivDR.DRDRDerivSKLearn(Y, X, t_eval, mu, condTS_type, condTS_mod, L, h, kern='epanechnikov', tau=0.001, b=None, delta=0.01, self_norm=True)[source]

Estimating the derivative of a dose-response curve through the doubly robust (DR) form under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (cikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter.

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • n_iter (int) – The number of iterations or training epochs of the neural network model. (Default: n_iter=1000.)

  • lr (float) – The learning rate (Default: lr=0.01.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • delta (float) – The step value for computing the finite differences (or numerical partial differentiation) of the fitted regression model.

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

  • theta_est ((m,)-array) – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

  • sd_est ((m,)-array) – The estimated asymptotic stdndard deviation of the DR derivative estimator evaluated at points “t_eval”.

npDoseResponse.npDoseResponseDerivDR.DRDerivCurve(Y, X, t_eval=None, est='RA', beta_mod=None, n_iter=1000, lr=0.01, condTS_type=None, condTS_mod=None, L=1, h=None, kern='epanechnikov', tau=0.001, h_cond=None, delta=0.01, self_norm=True, print_bw=True)[source]

Dose-response curve derivative estimation under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points. (Default: t_eval=None. Then, t_eval=X[:,0], which consists of the observed treatment variables.)

  • est (str) – The type of the dose-response curve estimator. (Default: est=”RA”. Other choices include “IPW” and “DR”.)

  • beta_mod (PyTorch neural network class or scikit-learn model or any python)

  • ".predict()" (model that can use ".fit()" and) – The conditional mean outcome (or regression) model of Y given X.

  • n_iter (int) – The number of iterations or training epochs of the neural network model. (Default: n_iter=1000.)

  • lr (float) – The learning rate (Default: lr=0.01.)

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (cikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter for the IPW/DR estimator. (Default: h=None. Then the Silverman’s rule of thumb is applied; see Chen et al.(2016) for details.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • h_cond (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

Returns:

  • theta_est ((m,)-array) – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

  • sd_est ((m,)-array (if est=”DR”)) – The estimated asymptotic stdndard deviation of the DR derivative estimator evaluated at points “t_eval”.

npDoseResponse.npDoseResponseDerivDR.IPWDRDeriv(Y, X, t_eval, condTS_type, condTS_mod, L, h, kern='epanechnikov', tau=0.001, b=None, self_norm=True)[source]

Estimating the derivative of a dose-response curve through the inverse probability weighting (IPW) form under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • condTS_type (str) – Specifying the model type for estimating the conditional density of the treatment variable T given the covariate vector S.

  • condTS_mod (cikit-learn model or any python model that can use ".fit()" and ".predict()") – The regression model for estimating the conditional density of T given S.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter.

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • tau (float) – The threshold value that lower bounds the estimated conditional density values. (Default: tau=0.001.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

theta_est – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponseDerivDR.IPWDRDerivBC(Y, X, t_eval, L=1, h=None, kern='epanechnikov', b=None, thres_val=0.75, self_norm=True)[source]

Estimating the derivative of a dose-response curve through the inverse probability weighting (IPW) form without assuming the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • h (float) – The bandwidth parameter. (Default: h=None. Then, the Silverman’s rule of thumb is applied; see Chen et al.(2016) for details.)

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • b (float) – The bandwidth parameter for the kernel-smoothed conditional density estimation methods. (Default: b=None.)

  • thres_val (float) – The threshold factor that is multiplied to the maximum conditional density values of S given T evaluated at the sample points. (Default: thres_val=0.75.)

  • self_norm (boolean) – An indicator of whether the self-normalized version is implemented. (Default: self_norm=True.)

Returns:

theta_est – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

Return type:

(m,)-array

class npDoseResponse.npDoseResponseDerivDR.NeurNet(*args: Any, **kwargs: Any)[source]

Bases: Module

npDoseResponse.npDoseResponseDerivDR.RADRDeriv(Y, X, t_eval, mu, L=1, n_iter=1000, lr=0.1, multi_boot=False, B=1000)[source]

Estimating the derivative of a dose-response curve through the regression adjustment (or G-computation) form by a PyTorch neural network model under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (a neural network class defined by PyTorch) – The conditional mean outcome (or regression) model of Y given X.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • n_iter (int) – The number of iterations or training epochs of the neural network model. (Default: n_iter=1000.)

  • lr (float) – The learning rate (Default: lr=0.01.)

  • multi_boot (boolean) – An indicator of whether the multiplier bootstrap will be run. (Default: multi_boot=False.)

  • B (int) – The number of bootstrapping times. (Default: B=1000.)

Returns:

  • theta_est ((m,)-array) – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

  • mu_boot ((B,m)-array) – The estimated derivatives of the dose-response curves on bootstrapping data evaluated at points “t_eval”. (Only return this quantity when “multi_boot=True”.)

npDoseResponse.npDoseResponseDerivDR.RADRDerivBC(Y, X, t_eval, mu, L=1, n_iter=1000, lr=0.01, h_bar=None, kernT_bar='gaussian', print_bw=False)[source]

Estimating the derivative of a dose-response curve through the regression adjustment (or G-computation) form by a PyTorch neural network model without assuming the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (a neural network class defined by PyTorch) – The conditional mean outcome (or regression) model of Y given X.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • n_iter (int) – The number of iterations or training epochs of the neural network model. (Default: n_iter=1000.)

  • lr (float) – The learning rate (Default: lr=0.01.)

  • h_bar (float) – The bandwidth parameters for the Nadaraya-Watson conditional CDF estimator. (Default: h_bar=None. Then, the Silverman’s rule of thumb is applied. See Chen et al.(2016) for details.)

  • kernT_bar (str) – The name of the kernel function for Nadaraya-Watson conditional CDF estimator. (Default: “gaussian”.)

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=False.)

Returns:

theta_C – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponseDerivDR.RADRDerivSKLearn(Y, X, t_eval, mu, L=1, delta=0.01)[source]

Estimating the derivative of a dose-response curve through the regression adjustment (or G-computation) form under the positivity condition.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • t_eval ((m,)-array) – The coordinates of the m evaluation points.

  • mu (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • L (int) – The number of data folds for cross-fitting. When L<= 1, no cross-fittings are applied and the regression model is fitted on the entire dataset. (Default: L=1.)

  • delta (float) – The step value for computing the finite differences (or numerical partial differentiation) of the fitted regression model.

Returns:

theta_est – The estimated derivative of the dose-response curve evaluated at points “t_eval”.

Return type:

(m,)-array

npDoseResponse.npDoseResponseDerivDR.train(mod, X_train, Y_train, lr=0.01, n_epochs=10, momentum=0.7, weight_decay=0, print_loss=True)[source]

Utility function for training the PyTorch neural network model via stochastic gradient descent.

Parameters:
  • mod (python class) – The neural network class defined by PyTorch.

  • X_train ((n,d+1)-torch.Tensor) – The first column of “X_train” is the treatment/exposure variable, while the other d columns are the confounding variables of n observations.

  • Y_train ((n,)-torch.Tensor) – The outcome variables of n observations.

  • lr (float) – The learning rate (Default: lr=0.01.)

  • n_epochs (int) – The number of training epochs. (Default: n_epochs=10.)

  • momentum (float) – The momentum factor (Default: momentum=0.7.)

  • weight_decay (float) – The weight decay (L2 penalty) (Default: weight_decay=0.)

  • print_loss (boolean) – An indicator of whether the training loss will be printed to the console.

Returns:

model – The fitted model instance of a neural network class defined by PyTorch.

Return type:

python object

Implementations of Common Kernel Functions

npDoseResponse.rbf.KernelRetrieval(name)[source]

Retrieving the kernel function, its second moment, and its variance based on the name.

Parameters:

name (str) – The name of the kernel function.

Returns:

  • kern_func (python function) – The kernel function.

  • sigmaK_sq (float) – The second moment of the kernel function.

  • K_sq (float) – The variance of the kernel function.

npDoseResponse.rbf.bigaussian(t)[source]

Bigaussian kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.biweight(t)[source]

Biweight/quartic kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.cosine(t)[source]

Cosine kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.epanechnikov(t)[source]

Epanechnikov kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.gaussian(t)[source]

Gaussian kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.logistic(t)[source]

Logistic kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.rectangular(t)[source]

Rectangular/uniform kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.sigmoid(t)[source]

Sigmoid kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.silverman(t)[source]

Silverman kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.triangular(t)[source]

Triangular kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.tricube(t)[source]

Tricube kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

npDoseResponse.rbf.triweight(t)[source]

Triweight kernel function.

Parameters:

t (float or (n,)-array) – The query points.

Returns:

res – The kernel values evaluated at the query points.

Return type:

float or (n,)-array

Utility Functions

npDoseResponse.utils.BndKern(x_qry, kern, deriv_ord=0, alpha=1, bnd='left')[source]

Generalized jackknife boundary kernel.

Parameters:
  • x_qry ((m,)-array) – The coordinates of m query points in the 1-dimensional Euclidean space.

  • kern (python function) – The kernel function.

  • deriv_ord (int) – The order of the derivative estimator. (Default: deriv_ord=0, which is for nonparametric density or curve estimation.)

  • alpha (float) –

    The truncated proportion of the kernel support (0 <= alpha <= 1). (Default: alpha=1, which recovers the original kernel function for

    the interior points.)

  • bnd (str) – Indicator of whether the input point is within the left or right boundary of the support. (Default: bnd=’left’.)

Returns:

res – The boundary kernel function evaluated at m query points.

Return type:

(m,)-array

npDoseResponse.utils.CondDenEst(Y, X, reg_mod, y_eval=None, x_eval=None, kern='gaussian', b=None, poly_ext=False)[source]

Conditional density estimation via nonparametric regression on the kernel-smoothed outcome variables.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d)-array) – The d-dimensional covariates of n observations.

  • reg_mod (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • y_eval ((m,)-array) – The outcome variables on which we evaluate the estimated conditional densities.

  • x_eval ((m,d)-array) – The covariates on which we evaluate the estimated conditional densities.

  • kern (str) – The name of the kernel function. (Default: kern=”gaussian”.)

  • b (float) – The bandwidth parameter for KDE. (Default: b=None.)

  • poly_ext (boolean) – The indicator of whether polynomial features are generated from the current covariates. (Default: poly_ext=False.)

Returns:

cond_est – The estimated conditional densities at the m query points.

Return type:

(m,)-array

npDoseResponse.utils.CondDenEstKDE(Y, X, reg_mod, y_eval=None, x_eval=None, kern='epanechnikov', b=None)[source]

Conditional density estimation by applying the kernel density estimator (KDE) on the regression residuals.

Parameters:
  • Y ((n,)-array) – The outcome variables of n observations.

  • X ((n,d)-array) – The d-dimensional covariates of n observations.

  • reg_mod (scikit-learn model or any python model that can use ".fit()" and ".predict()") – The conditional mean outcome (or regression) model of Y given X.

  • y_eval ((m,)-array) – The outcome variables at which we evaluate the estimated conditional densities.

  • x_eval ((m,d)-array) – The covariates at which we evaluate the estimated conditional densities.

  • kern (str) – The name of the kernel function. (Default: kern=”epanechnikov”.)

  • b (float) – The bandwidth parameter for KDE. (Default: b=None.)

Returns:

cond_est – The estimated conditional densities at the m query points.

Return type:

(m,)-array

npDoseResponse.utils.HatMatrix(X, degree=2, deriv_ord=1, h=None, b=None, print_bw=True, kernT='epanechnikov', kernS='epanechnikov')[source]

Compute the hat matrix of the local polynomial regression when it is viewed as a linear smoother.

Parameters:
  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • degree (int) – Degree of local polynomials. (Default: degree=2.)

  • deriv_ord (int) – The order of the estimated derivative the conditional mean outcome function. (Default: deriv_ord=1. Then, it estimates the partial derivative of the conditional mean outcome function with respect to the treatment variable.)

  • h (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables.

  • b (float) – The bandwidth parameters for the treatment/exposure variable and confounding variables.

  • print_bw (boolean) – The indicator of whether the current bandwidth parameters should be printed to the console. (Default: print_bw=True.)

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

Returns:

hat_mat – The hat matrix.

Return type:

(n,n)-array

npDoseResponse.utils.KDE(x, data, kern='gaussian', h=None)[source]

The d-dimensional Euclidean kernel density estimator.

Parameters:
  • x ((m,d)-array) – The coordinates of m query points in the d-dim Euclidean space.

  • data ((n,d)-array) – The coordinates of n random sample points in the d-dimensional Euclidean space.

  • kern (str) – The name of the kernel function. (Default: “gaussian”.)

  • h (float) – The bandwidth parameter. (Default: h=None. Then the Silverman’s rule of thumb is applied. See Chen et al.(2016) for details.)

Returns:

f_hat – The corresponding kernel density estimates at m query points.

Return type:

(m,)-array

npDoseResponse.utils.KDE1D(x, data, kern='epanechnikov', h=None)[source]

One-dimensional kernel density estimation with generalized jackknife boundary corrections (Jones 1993).

Parameters:
  • x ((m,)-array) – The coordinates of m query points in the 1-dim Euclidean space.

  • data ((n,)-array) – The coordinates of n random sample points in the d-dimensional Euclidean space.

  • kern (str) – The name of the kernel function. (Default: “epanechnikov”.)

  • h (float) – The bandwidth parameter. (Default: h=None. Then the Silverman’s rule of thumb is applied; see Chen et al.(2016) for details.)

Returns:

f_hat – The corresponding kernel density estimates at m query points.

Return type:

(m,)-array

npDoseResponse.utils.RoTBWLocalPoly(Y, X, kernT='epanechnikov', kernS='epanechnikov', C_h=10, C_b=15)[source]

Compute the rule-of-thumb bandwidth selector in Eq.(A1) of Yang and Tschernig (1999).

Parameters:
  • Y ((n,)-array) – The outcomes of n observations.

  • X ((n,d+1)-array) – The first column of X is the treatment/exposure variable, while the other d columns are confounding variables of n observations.

  • kernT (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • kernS (str) – The names of kernel functions for the treatment/exposure variable and confounding variables. (Default: “epanechnikov”.)

  • C_h (float) – The scaling factors for the rule-of-thumb bandwidth parameters. (Default: C_h=7, C_b=3.)

  • C_b (float) – The scaling factors for the rule-of-thumb bandwidth parameters. (Default: C_h=7, C_b=3.)

Returns:

  • h (float) – The rule-of-thumb bandwidth parameter for the treatment/exposure variable.

  • b ((d,)-array) – The rule-of-thumb bandwidth vector for the confounding variables.