| Title: | Sparse High-Dimensional Linear Regression with a PaRtitiOned Empirical Bayes Ecm (PROBE) Algorithm |
|---|---|
| Description: | Implements an efficient and powerful Bayesian approach for sparse high-dimensional linear regression. It uses minimal prior assumptions on the parameters through plug-in empirical Bayes estimates of hyperparameters. An efficient Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm estimates maximum a posteriori (MAP) values of regression parameters and variable selection probabilities. The PX-ECM results in a robust computationally efficient coordinate-wise optimization, which adjusts for the impact of other predictor variables. The E-step is motivated by the popular two-group approach to multiple testing. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression, implemented using one-at-a-time or all-at-once type optimization. Simulation studies found the all-at-once variant to be superior. |
| Authors: | Alexander McLain [aut, cre] (ORCID: <https://orcid.org/0000-0002-5475-0670>), Anja Zgodic [aut, ctb] |
| Maintainer: | Alexander McLain <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.3 |
| Built: | 2026-06-02 10:53:00 UTC |
| Source: | https://github.com/alexmclain/probe |
Implements an efficient and powerful Bayesian approach for sparse high-dimensional linear regression. It uses minimal prior assumptions on the parameters through plug-in empirical Bayes estimates of hyperparameters. An efficient Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm estimates maximum a posteriori (MAP) values of regression parameters and variable selection probabilities. The PX-ECM results in a robust computationally efficient coordinate-wise optimization, which adjusts for the impact of other predictor variables. The E-step is motivated by the popular two-group approach to multiple testing. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression, implemented using one-at-a-time or all-at-once type optimization. Simulation studies found the all-at-once variant to be superior.
Examples for applying PROBE to sparse high-dimensional linear regression are given
for one-at-a-time probe_one or all-at-once probe type optimization.
Maintainer: Alexander McLain [email protected] (ORCID)
Authors:
Anja Zgodic [contributor]
McLain, AC, A Zgodic, H Bondell (2025). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. Computational Statistics and Data Analysis 207, 108146.
Zgodic, A., Bai, R., Zhang, J., Wang, Y., Rorden, C., & McLain, AC. (2026). Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression. Computational Statistics 41:7, 1 – 23.
Useful links:
Report bugs at https://github.com/alexmclain/PROBE/issues
A wrapper function estimating posterior expectations of the variables using an empirical Bayesian technqiue.
e_step_func(beta_t, beta_var, df, adj = 5, lambda = 0.1, monotone = TRUE)e_step_func(beta_t, beta_var, df, adj = 5, lambda = 0.1, monotone = TRUE)
beta_t |
Expectation of the posterior mean (assuming |
beta_var |
Current posterior variance (assuming |
df |
Degrees of freedom for the t-distribution (use to calculate p-values). |
adj |
Bandwidth multiplier to Silverman's ‘rule of thumb’ for calculating the marginal density of the test-statistics (default = 5). |
lambda |
Value of the |
monotone |
Logical - Should the estimated marginal density of the test-statistics be monotone non-increasing from zero (default = TRUE). |
A list including
delta estimated posterior expectations of the .
pi0 estimated proportion of null hypothesis
Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach,” J. R. Stat. Soc. Ser. B. Stat. Methodol., 66, 187–205. McLain, A. C., Zgodic, A., & Bondell, H. (2022). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. arXiv preprint arXiv:2209.08139.
#not run #mod <- e_step_func(beta_t, beta_var, df, adj = 5, lambda = 0.1, monotone = TRUE)#not run #mod <- e_step_func(beta_t, beta_var, df, adj = 5, lambda = 0.1, monotone = TRUE)
This dataset was simulated using a 2-dimensional setting described in the reference. The data contains 200 subjects with one outcome, 400 predictor variables, and 6 variables that are related to the variance of the error term. The data contains training and test data, along with the true values of the parameters used to generate the data..
data("h_sim_data")data("h_sim_data")
A data frame with 200 observations and the following objects:
YOutcome variable of length .
XA matrix of binary predictor variables.
VA design matrix to model variance of the residuals (includes an intercept).
beta_trThe true values of all regression coefficients.
omega_trThe true values of all variance regression coefficients.
Simulated data.
data(h_sim_data) str(h_sim_data)data(h_sim_data) str(h_sim_data)
A wrapper function for the H-PROBE algorithm.
hprobe(Y, X, V, Z = NULL, ep = 0.1, maxit = 10000, Y_test = NULL, X_test = NULL, Z_test = NULL, V_test = NULL, verbose = FALSE, signal = NULL, eta_i = NULL, alpha = 0.05, plot_ind = FALSE, adj = 5)hprobe(Y, X, V, Z = NULL, ep = 0.1, maxit = 10000, Y_test = NULL, X_test = NULL, Z_test = NULL, V_test = NULL, verbose = FALSE, signal = NULL, eta_i = NULL, alpha = 0.05, plot_ind = FALSE, adj = 5)
Y |
The outcome variable. |
X |
An |
V |
A design matrix of predictors for the variance model (including an intercept). |
Z |
(optional) An |
ep |
Value against which to compare convergence criterion (default = 0.1). |
maxit |
Maximum number of iterations the algorithm will run for (default = 10000). |
Y_test |
(optional) Test Y data used plotting purposes only (doesn't impact results). |
X_test |
(optional) Test X data used plotting purposes only (doesn't impact results). |
Z_test |
(optional) Test Z data used plotting purposes only (doesn't impact results). |
V_test |
(optional) Test V data used plotting purposes only (doesn't impact results). |
verbose |
A logical (true/false) value whether to print algorithm iteration progress and summary quantities (default = FALSE). |
signal |
(optional) A vector of indicies of the true non-null coefficients. This is used to calculate the true and false discovery rates by iteration for simulated data. Used plotting purposes only (doesn't impact results). |
eta_i |
(optional) A vector of the true signal. This is used to calculate the MSE by iteration for simulated data. Used plotting purposes only (doesn't impact results). |
alpha |
(optional) significance level |
plot_ind |
A logical values (True/False) for whether to output plots on algorithm results and progress (default = FALSE) |
adj |
Bandwidth parameter for empirical Bayes E-step. The bandwidth will be equal to |
A list including
beta_ast_hat MAP estimates of the regression coefficients (),
beta_hat, beta_hat_var MAP estimates of the posterior expectation (beta_hat) and variance (beta_hat_var) of the prior mean () of the regression coefficients assuming ,
gamma_hat the posterior expectation of the latent variables,
sigma2_est MAP estimate of the residual variance,
E_step full results of the final E_step,
Calb_mod results of first () part of the M-step,
count the total number of iterations before convergence.
McLain, AC, A Zgodic, H Bondell (2025). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. Computational Statistics and Data Analysis 207, 108146.
Zgodic, A., Bai, R., Zhang, J., Wang, Y., Rorden, C., McLain, AC. (2026). Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression. Computational Statistics 41:7, 1 – 23.
predict_probe_func to obtain predictions, credible intervals and prediction intervals from PROBE.
### Example data(h_sim_data) Y = h_sim_data$Y X = h_sim_data$X V = h_sim_data$V beta_tr = h_sim_data$beta_tr omega_tr = h_sim_data$omega_tr Y_test = h_sim_data$Y_test X_test = h_sim_data$X_test V_test = h_sim_data$V_test # Run Analysis res <- hprobe(Y = Y, X = X, V = V) # Predicting for test data pred_res <- predict_hprobe_func(res, X_test, V = V_test) sqrt(mean((Y_test - pred_res$Pred)^2)) head(cbind(Y_test, pred_res)) plot(Y_test, pred_res$Pred, ylab = "Prediction", xlab = "Test Outcome") abline(coef = c(0,1)) # Proportion of explained variance 1 - var(Y_test - pred_res$Pred)/var(Y_test) ## Omega coefficients (versus true values) cbind(omega_tr, res$omega) ## True versus estimated beta coeffiecients plot(beta_tr, res$beta_ast_hat, xlab = "True Beta", ylab = "Estimated Beta") abline(coef = c(0,1)) ## Confusion matrix of true versus estimated signals using 0.5 cutoff. table(beta_tr==0, res$gamma_hat<0.5)### Example data(h_sim_data) Y = h_sim_data$Y X = h_sim_data$X V = h_sim_data$V beta_tr = h_sim_data$beta_tr omega_tr = h_sim_data$omega_tr Y_test = h_sim_data$Y_test X_test = h_sim_data$X_test V_test = h_sim_data$V_test # Run Analysis res <- hprobe(Y = Y, X = X, V = V) # Predicting for test data pred_res <- predict_hprobe_func(res, X_test, V = V_test) sqrt(mean((Y_test - pred_res$Pred)^2)) head(cbind(Y_test, pred_res)) plot(Y_test, pred_res$Pred, ylab = "Prediction", xlab = "Test Outcome") abline(coef = c(0,1)) # Proportion of explained variance 1 - var(Y_test - pred_res$Pred)/var(Y_test) ## Omega coefficients (versus true values) cbind(omega_tr, res$omega) ## True versus estimated beta coeffiecients plot(beta_tr, res$beta_ast_hat, xlab = "True Beta", ylab = "Estimated Beta") abline(coef = c(0,1)) ## Confusion matrix of true versus estimated signals using 0.5 cutoff. table(beta_tr==0, res$gamma_hat<0.5)
A wrapper function providing the quantities related to the M-step for and .
m_step_regression(Y, W, W2, Z = NULL, a = -3/2, Int = TRUE)m_step_regression(Y, W, W2, Z = NULL, a = -3/2, Int = TRUE)
Y |
A matrix containing the outcome |
W |
Quantity |
W2 |
Quantity |
Z |
A matrix or dataframe of other predictors to account for |
a |
(optional) parameter for changing the hyperparameter |
Int |
(optional) Logical - should an intercept be used? |
A list including
coef the MAP estimates of the parameters
sigma2_est the MAP estimate of
VCV posterior variance covariance matrix of ,
res_data dataframe containing MAP estimates, posterior variances, t-test statistics and associated p-values for
McLain, A. C., Zgodic, A., & Bondell, H. (2022). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. arXiv preprint arXiv:2209.08139.
#not run #mod <- m_step_regression(Y, W_ast, W_ast_var + W_ast^2, Z)#not run #mod <- m_step_regression(Y, W_ast, W_ast_var + W_ast^2, Z)
A function providing predictions, along with credible, and prediction intervals for new observations.
predict_hprobe_func(res, X, V, Z = NULL, alpha = 0.05, X_2 = NULL)predict_hprobe_func(res, X, V, Z = NULL, alpha = 0.05, X_2 = NULL)
res |
The results from the probe function. |
X |
A matrix containing the predictors on which to apply the probe algorithm |
V |
A design matrix of predictors for the variance model. |
Z |
(optional) A matrix or dataframe of predictors not subjected to the sparsity assumption to account for. |
alpha |
significance level for ( |
X_2 |
(optional) Square of X matrix. |
A dataframe with predictions, credible intervals, and prediction intervals for each new observation.
McLain, AC, A Zgodic, H Bondell (2025). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. Computational Statistics and Data Analysis 207, 108146.
Zgodic, A., Bai, R., Zhang, J., Wang, Y., Rorden, C., & McLain, A. (2023). Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression. arXiv preprint arXiv:2309.08783.
### Example #not run # pred_res <- predict_probe_func(full_res, X = X_test, Z = NULL, alpha = alpha) # head(pred_res)### Example #not run # pred_res <- predict_probe_func(full_res, X = X_test, Z = NULL, alpha = alpha) # head(pred_res)
A function providing predictions, along with credible, and prediction intervals for new observations.
predict_probe_func(res, X, Z = NULL, alpha = 0.05, X_2 = NULL)predict_probe_func(res, X, Z = NULL, alpha = 0.05, X_2 = NULL)
res |
The results from the probe function. |
X |
A matrix containing the predictors on which to apply the probe algorithm |
Z |
(optional) A matrix or dataframe of predictors not subjected to the sparsity assumption to account for. |
alpha |
significance level for ( |
X_2 |
(optional) Square of X matrix. |
A dataframe with predictions, credible intervals, and prediction intervals for each new observation.
McLain, AC, A Zgodic, H Bondell (2025). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. Computational Statistics and Data Analysis 207, 108146.
Zgodic, A., Bai, R., Zhang, J., Wang, Y., Rorden, C., & McLain, A. (2023). Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression. arXiv preprint arXiv:2309.08783.
### Example #not run # pred_res <- predict_probe_func(full_res, X = X_test, Z = NULL, alpha = alpha) # head(pred_res)### Example #not run # pred_res <- predict_probe_func(full_res, X = X_test, Z = NULL, alpha = alpha) # head(pred_res)
A wrapper function for the all-at-once variant of the PROBE algorithm.
probe(Y, X, Z = NULL, ep = 0.1, maxit = 10000, Y_test = NULL, X_test = NULL, Z_test = NULL, verbose = FALSE, signal = NULL, eta_i = NULL, alpha = 0.05, plot_ind = FALSE, adj = 5)probe(Y, X, Z = NULL, ep = 0.1, maxit = 10000, Y_test = NULL, X_test = NULL, Z_test = NULL, verbose = FALSE, signal = NULL, eta_i = NULL, alpha = 0.05, plot_ind = FALSE, adj = 5)
Y |
The outcome variable. |
X |
An |
Z |
(optional) An |
ep |
Value against which to compare convergence criterion (default = 0.1). |
maxit |
Maximum number of iterations the algorithm will run for (default = 10000). |
Y_test |
(optional) Test Y data used plotting purposes only (doesn't impact results). |
X_test |
(optional) Test X data used plotting purposes only (doesn't impact results). |
Z_test |
(optional) Test Z data used plotting purposes only (doesn't impact results). |
verbose |
A logical (true/false) value whether to print algorithm iteration progress and summary quantities (default = FALSE). |
signal |
(optional) A vector of indicies of the true non-null coefficients. This is used to calculate the true and false discovery rates by iteration for simulated data. Used plotting purposes only (doesn't impact results). |
eta_i |
(optional) A vector of the true signal. This is used to calculate the MSE by iteration for simulated data. Used plotting purposes only (doesn't impact results). |
alpha |
(optional) significance level |
plot_ind |
A logical values (True/False) for whether to output plots on algorithm results and progress (default = FALSE) |
adj |
Bandwidth parameter for empirical Bayes E-step. The bandwidth will be equal to |
A list including
beta_ast_hat MAP estimates of the regression coefficients (),
beta_hat, beta_hat_var MAP estimates of the posterior expectation (beta_hat) and variance (beta_hat_var) of the prior mean () of the regression coefficients assuming ,
gamma_hat the posterior expectation of the latent variables,
sigma2_est MAP estimate of the residual variance,
E_step full results of the final E_step,
Calb_mod results of first () part of the M-step,
count the total number of iterations before convergence.
McLain, AC, A Zgodic, H Bondell (2025). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. Computational Statistics and Data Analysis 207, 108146.
predict_probe_func to obtain predictions, credible intervals and prediction intervals from PROBE.
### Example data(Sim_data) data(Sim_data_test) ## Get training and test data Y <- Sim_data$Y X <- Sim_data$X Y_test <- Sim_data_test$Y_test X_test <- Sim_data_test$X_test ## Get true values signal <- Sim_data$signal beta_tr <- Sim_data$beta_tr sigma2_tr <- Sim_data$sigma2_tr # Run the analysis. Y_test and X_test are included for plotting purposes only full_res <- probe( Y = Y, X = X, Y_test = Y_test, X_test = X_test, alpha = 0.05, plot_ind = TRUE, adj = 10) # Predicting for test data pred_res <- predict_probe_func(full_res, X = X_test) sqrt(mean((Y_test - pred_res$Pred)^2)) # Estimate of the residual variance and true value full_res$sigma2_est sigma2_tr # RMSE of estimated beta coefficients beta_ast_est <- full_res$beta_ast_hat sqrt(mean((beta_ast_est - beta_tr)^2)) # Posterior expectation of gamma by true gamma_est <- full_res$E_step$gamma sum(gamma_est) sum(gamma_est[beta_tr>0]) ### Example with additional covariate data Z (not subjected to the sparsity assumption) data(Sim_data_cov) # Calculating the true signal (the impact of X only) eta_i <- apply(t(Sim_data_cov$X)*Sim_data_cov$beta_tr,2,sum) full_res <- probe( Y = Sim_data_cov$Y, X = Sim_data_cov$X, Z = Sim_data_cov$Z, alpha = 0.05, plot_ind = TRUE, signal = Sim_data_cov$signal, eta_i = eta_i) # Final estimates of the impact of X versus the true values: data.frame(true_values = Sim_data_cov$beta_Z_tr, full_res$Calb_mod$res_data[-2,]) # Compare to a standard linear model of X on Y: summary(lm(Y~Sim_data_cov$Z$Cont_cov + Sim_data_cov$Z$Binary_cov))$coefficients### Example data(Sim_data) data(Sim_data_test) ## Get training and test data Y <- Sim_data$Y X <- Sim_data$X Y_test <- Sim_data_test$Y_test X_test <- Sim_data_test$X_test ## Get true values signal <- Sim_data$signal beta_tr <- Sim_data$beta_tr sigma2_tr <- Sim_data$sigma2_tr # Run the analysis. Y_test and X_test are included for plotting purposes only full_res <- probe( Y = Y, X = X, Y_test = Y_test, X_test = X_test, alpha = 0.05, plot_ind = TRUE, adj = 10) # Predicting for test data pred_res <- predict_probe_func(full_res, X = X_test) sqrt(mean((Y_test - pred_res$Pred)^2)) # Estimate of the residual variance and true value full_res$sigma2_est sigma2_tr # RMSE of estimated beta coefficients beta_ast_est <- full_res$beta_ast_hat sqrt(mean((beta_ast_est - beta_tr)^2)) # Posterior expectation of gamma by true gamma_est <- full_res$E_step$gamma sum(gamma_est) sum(gamma_est[beta_tr>0]) ### Example with additional covariate data Z (not subjected to the sparsity assumption) data(Sim_data_cov) # Calculating the true signal (the impact of X only) eta_i <- apply(t(Sim_data_cov$X)*Sim_data_cov$beta_tr,2,sum) full_res <- probe( Y = Sim_data_cov$Y, X = Sim_data_cov$X, Z = Sim_data_cov$Z, alpha = 0.05, plot_ind = TRUE, signal = Sim_data_cov$signal, eta_i = eta_i) # Final estimates of the impact of X versus the true values: data.frame(true_values = Sim_data_cov$beta_Z_tr, full_res$Calb_mod$res_data[-2,]) # Compare to a standard linear model of X on Y: summary(lm(Y~Sim_data_cov$Z$Cont_cov + Sim_data_cov$Z$Binary_cov))$coefficients
A wrapper function for the one-at-a-time variant of the PROBE algorithm.
probe_one(Y, X, ep = 0.001, maxit = 10000, Y_test = NULL, X_test = NULL, verbose = FALSE, signal = NULL, eta_i = NULL, alpha = 0.05, plot_ind = FALSE, order.method = "lasso", adj = 10, delta = 0.4, update_order= NULL, beta_start= NULL, seed = 8675309)probe_one(Y, X, ep = 0.001, maxit = 10000, Y_test = NULL, X_test = NULL, verbose = FALSE, signal = NULL, eta_i = NULL, alpha = 0.05, plot_ind = FALSE, order.method = "lasso", adj = 10, delta = 0.4, update_order= NULL, beta_start= NULL, seed = 8675309)
Y |
The outcome variable. |
X |
An |
ep |
Value against which to compare convergence criterion (default = 0.001). |
maxit |
Maximum number of iterations the algorithm will run for (default = 10000). |
Y_test |
(optional) Test Y data used plotting purposes only (doesn't impact results). |
X_test |
(optional) Test X data used plotting purposes only (doesn't impact results). |
verbose |
A logical (true/false) value whether to print algorithm iteration progress and summary quantities (default = FALSE). |
signal |
(optional) A vector of indicies of the true non-null coefficients. This is used to calculate the true and false discovery rates by iteration for simulated data. Used plotting purposes only (doesn't impact results). |
eta_i |
(optional) A vector of the true signal. This is used to calculate the MSE by iteration for simulated data. Used plotting purposes only (doesn't impact results). |
alpha |
(optional) significance level |
plot_ind |
A logical values (True/False) for whether to output plots on algorithm results and progress (default = FALSE) |
order.method |
Updating order and initial values of the algorithm. For |
adj |
Bandwidth parameter for empirical Bayes E-step. The bandwidth will be equal to |
delta |
Learning rate for iteration t is (1 + t)^(-1 + delta) (default delta = 0.4). |
update_order |
Manual value for the updating order for when |
beta_start |
Manual value for the starting beta coefficients for when |
seed |
Seed value to ensure reproducibility when |
A list including
beta_ast_hat MAP estimates of the regression coefficients (),
beta_hat, beta_hat_var MAP estimates of the posterior expectation (beta_hat) and variance (beta_hat_var) of the prior mean () of the regression coefficients assuming ,
gamma_hat the posterior expectation of the latent variables,
sigma2_est MAP estimate of the residual variance,
E_step full results of the final E_step,
count the total number of iterations before convergence.
McLain, AC, A Zgodic, H Bondell (2025). Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. Computational Statistics and Data Analysis 207, 108146.
predict_probe_func to obtain predictions.
### Example data(Sim_data) data(Sim_data_test) attach(Sim_data) attach(Sim_data_test) plot_ind <- TRUE adj <- 10 # Run the analysis. Y_test and X_test are included for plotting purposes only full_res <- probe_one( Y = Y, X = X, Y_test = Y_test, order.method = "lasso", X_test = X_test, plot_ind = plot_ind, adj = adj) # Predicting for test data pred_res <- predict_probe_func(full_res, X = X_test) sqrt(mean((Y_test - pred_res$Pred)^2)) # Estimate of the residual variance and true value full_res$sigma2_est sigma2_tr # RMSE of estimated beta coefficients beta_ast_est <- c(full_res$beta_ast_hat) sqrt(mean((beta_ast_est - beta_tr)^2)) # Posterior expectation of gamma by true gamma_est <- full_res$E_step$gamma table(gamma_est > 0.5, beta_tr > 0) sum(gamma_est) sum(gamma_est[beta_tr>0])### Example data(Sim_data) data(Sim_data_test) attach(Sim_data) attach(Sim_data_test) plot_ind <- TRUE adj <- 10 # Run the analysis. Y_test and X_test are included for plotting purposes only full_res <- probe_one( Y = Y, X = X, Y_test = Y_test, order.method = "lasso", X_test = X_test, plot_ind = plot_ind, adj = adj) # Predicting for test data pred_res <- predict_probe_func(full_res, X = X_test) sqrt(mean((Y_test - pred_res$Pred)^2)) # Estimate of the residual variance and true value full_res$sigma2_est sigma2_tr # RMSE of estimated beta coefficients beta_ast_est <- c(full_res$beta_ast_hat) sqrt(mean((beta_ast_est - beta_tr)^2)) # Posterior expectation of gamma by true gamma_est <- full_res$E_step$gamma table(gamma_est > 0.5, beta_tr > 0) sum(gamma_est) sum(gamma_est[beta_tr>0])
This dataset was simulated using a 2-dimensional setting described in the reference. The data contains 400 subjects with one outcome and 10,000 predictor variables. The test outcomes and predictor variables are contained in Sim_data_test.
data("Sim_data")data("Sim_data")
A data frame with 400 observations and the following objects:
YOutcome variable of length .
XA matrix of binary predictor variables.
signalThe locations of the non-zero regression coefficients.
beta_trThe true values of all regression coefficients.
sigma2_trThe true value of the residual variance.
Simulated data.
data(Sim_data) attach(Sim_data) length(Y) dim(X)data(Sim_data) attach(Sim_data) length(Y) dim(X)
This dataset was simulated using a 2-dimensional setting described in the reference only two covariates are added. The data contains 400 subjects with one outcome, 10000 predictor variables which are to be subjected to the sparsity assumption, and 2 covariates which are not to be subjected to the sparsity assumption.
data("Sim_data_cov")data("Sim_data_cov")
A data frame with 400 observations and the following objects:
YOutcome variable of length .
ZA dataframe of a continuous (Cont_cov) and binary (Binary_cov) covariate.
XA matrix of binary predictor variables.
beta_trThe true values of all regression coefficients.
beta_Z_trThe true values of the intercept, Cont_cov, and Binary_cov.
signalThe locations of the non-zero regression coefficients.
data(Sim_data_cov) str(Sim_data_cov)data(Sim_data_cov) str(Sim_data_cov)
A test set of outcomes and predictor variables to be used with Sim_data.
data("Sim_data_test")data("Sim_data_test")
A data frame with 400 observations and the following objects:
Y_testOutcome variable of length for test set.
Z_testA matrix of binary predictor variables for test set.
Simulated data.
data(Sim_data_test) attach(Sim_data_test) length(Y_test) dim(X_test)data(Sim_data_test) attach(Sim_data_test) length(Y_test) dim(X_test)