Title: | Linear p-Wasserstein Projections |
---|---|
Description: | Performs Wasserstein projections from the predictive distributions of any model into the space of predictive distributions of linear models. We utilize L1 penalties to also reduce the complexity of the model space. This package employs the methods as described in Dunipace, Eric and Lorenzo Trippa (2020) <doi:10.48550/arXiv.2012.09999>. |
Authors: | Eric Dunipace [aut, cre] |
Maintainer: | Eric Dunipace <[email protected]> |
License: | GPL (==3.0) |
Version: | 0.2.3 |
Built: | 2025-02-06 05:30:28 UTC |
Source: | https://github.com/ericdunipace/wpproj |
Options For Use With the Binary Program Method
binary_program_method_options( maxit = 500L, infimum.maxit = 100L, transport.method = transport_options(), epsilon = 0.05, OTmaxit = 100L, model.size = NULL, nvars = NULL, tol = 1e-07, display.progress = FALSE, parallel = NULL, solver.options = NULL )
binary_program_method_options( maxit = 500L, infimum.maxit = 100L, transport.method = transport_options(), epsilon = 0.05, OTmaxit = 100L, model.size = NULL, nvars = NULL, tol = 1e-07, display.progress = FALSE, parallel = NULL, solver.options = NULL )
maxit |
The maximum iterations for the optimizer. Default is 500. |
infimum.maxit |
Maximum iterations to alternate binary program and Wasserstein distance calculations |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
epsilon |
A value > 0 for the penalty parameter of if using the Sinkhorn method |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
model.size |
What is the maximum number of coefficients to have in the final model. Default is NULL. If NULL, will find models from the minimum size, 0, to the number of columns in |
nvars |
The number of variables to explore. Should be an integer vector of model sizes. Default is NULL which will explore all models from 1 to |
tol |
The tolerance for convergence |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is FALSE. |
parallel |
A cluster backend to be used by |
solver.options |
Options to be passed on to the solver. See details |
This function will setup the default arguments used by the binary program method. Of note, for the argument solver.options
, If using the "lasso" solver, you should provide arguments such as "penalty", "nlambda", "lambda.min.ratio", "gamma", and "lambda" in a list. A simple way to do this is to feed the output of the L1_method_options()
function to the argument solver.options.
This will tell the approximate solver, which uses a lasso method that then will project the parameters back to the space. For the other solvers, you can see the options in the ECOS solver package,
ECOSolveR::ecos.control()
, and the options for the mosek solver, Rmosek::mosek()
.
A list with names corresponding to each argument above.
binary_program_method_options() # is using the lasso solver for the binary program method to give an approximate solution binary_program_method_options(solver.options = L1_method_options(nlambda = 50L))
binary_program_method_options() # is using the lasso solver for the binary program method to give an approximate solution binary_program_method_options(solver.options = L1_method_options(nlambda = 50L))
Objects
Will combine
objects into a single object.
combine.WPR2(...)
combine.WPR2(...)
... |
List of |
A vector of objects
if (rlang::is_installed("stats")) { n <- 128 p <- 10 s <- 99 x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta fit1 <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0, method = "binary program") fit2 <- WpProj(X=x, eta=post_mu, power = 2.0, options = list(penalty = "lasso") ) out1 <- WPR2(predictions = post_mu, projected_model = fit1) out2 <- WPR2(predictions = post_mu, projected_model = fit2) combine <- combine.WPR2(out1, out2) }
if (rlang::is_installed("stats")) { n <- 128 p <- 10 s <- 99 x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta fit1 <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0, method = "binary program") fit2 <- WpProj(X=x, eta=post_mu, power = 2.0, options = list(penalty = "lasso") ) out1 <- WPR2(predictions = post_mu, projected_model = fit1) out2 <- WPR2(predictions = post_mu, projected_model = fit2) combine <- combine.WPR2(out1, out2) }
Will compare the Wasserstein distance between the original model and the
WpProj
model.
distCompare( models, target = list(parameters = NULL, predictions = NULL), power = 2, method = "exact", quantity = c("parameters", "predictions"), parallel = NULL, transform = function(x) { return(x) }, ... )
distCompare( models, target = list(parameters = NULL, predictions = NULL), power = 2, method = "exact", quantity = c("parameters", "predictions"), parallel = NULL, transform = function(x) { return(x) }, ... )
models |
A list of models from WpProj methods |
target |
The target to compare the methods to. Should be a list with slots "parameters" to compare the parameters and "predictions" to compare predictions |
power |
The power parameter of the Wasserstein distance. |
method |
Which approximation to the Wasserstein distance to use. Should be one of the outputs of |
quantity |
Should the function target the "parameters" or the "predictions". Can choose both. |
parallel |
Parallel backend to use for the |
transform |
Transformation function for the predictions. |
... |
other options passed to the |
For the data frames, dist
is the Wasserstein distance, nactive
is the number of active variables in the model, groups
is the name distinguishing the model, and method
is the method used to calculate the distance (i.e., exact, sinkhorn, etc.). If the list in models
is named, these will be used as the group names otherwise the group names will be created based on the call from the WpProj
method.
an object of class distcompare
with slots parameters
, predictions
, and p
. The slots parameters
and predictions
are data frames. See the details for more info. The slot p
is the power parameter of the Wasserstein distance used in the distance calculation.
if(rlang::is_installed("stats")) { n <- 32 p <- 10 s <- 21 x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta fit1 <- WpProj(X=x, eta=post_mu, power = 2.0, options = list(penalty = "lasso") ) fit2 <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0, method = "binary program", solver = "lasso", options = list(solver.options = list(penalty = "mcp")) ) dc <- distCompare(models = list("L1" = fit1, "BP" = fit2), target = list(parameters = post_beta, predictions = post_mu)) if(rlang::is_installed(c("ggplot2","ggsci"))) { plot(dc) } }
if(rlang::is_installed("stats")) { n <- 32 p <- 10 s <- 21 x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta fit1 <- WpProj(X=x, eta=post_mu, power = 2.0, options = list(penalty = "lasso") ) fit2 <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0, method = "binary program", solver = "lasso", options = list(solver.options = list(penalty = "mcp")) ) dc <- distCompare(models = list("L1" = fit1, "BP" = fit2), target = list(parameters = post_beta, predictions = post_mu)) if(rlang::is_installed(c("ggplot2","ggsci"))) { plot(dc) } }
Runs the Hahn-Carvalho method but adapted to return full distributions.
HC( X, Y = NULL, theta, family = "gaussian", penalty = c("elastic.net", "selection.lasso", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), method = c("selection.variable", "projection"), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 1, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001 )
HC( X, Y = NULL, theta, family = "gaussian", penalty = c("elastic.net", "selection.lasso", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net", "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net", "sparse.grp.lasso"), method = c("selection.variable", "projection"), lambda = numeric(0), nlambda = 100L, lambda.min.ratio = NULL, alpha = 1, gamma = 1, tau = 0.5, groups = numeric(0), penalty.factor = NULL, group.weights = NULL, maxit = 500L, tol = 1e-07, irls.maxit = 100L, irls.tol = 0.001 )
X |
Covariates |
Y |
Predictions |
theta |
Parameters |
family |
Family for method. See oem. |
penalty |
Penalty function. See oem. |
method |
Should we run a selection variable methodology or projection? |
lambda |
lambda for lasso. See oem for this and all options below |
nlambda |
Number of lambda values. |
lambda.min.ratio |
Minimum lambda ratio for self selected lambda |
alpha |
elastic net mixing. |
gamma |
tuning parameters for SCAD and MCP |
tau |
mixing parameter for sparse group lasso |
groups |
A vector of grouping values |
penalty.factor |
Penalty factor for OEM. |
group.weights |
Weights for groupped lasso |
maxit |
Max iteration for OEM |
tol |
Tolerance for OEM |
irls.maxit |
IRLS max iterations for OEM |
irls.tol |
IRLS tolerance for OEM |
a WpProj
object with selected covariates and their values
Hahn, P. Richard and Carlos M. Carvalho. (2014) "Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective." https://arxiv.org/pdf/1408.0464
n <- 32 p <- 10 s <- 99 x <- matrix( 1, nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta post_beta <- matrix(beta, nrow=p, ncol=s) post_mu <- x %*% post_beta fit <- HC(X=x, Y=post_mu, theta = post_beta, penalty = "lasso", method = "projection" )
n <- 32 p <- 10 s <- 99 x <- matrix( 1, nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta post_beta <- matrix(beta, nrow=p, ncol=s) post_mu <- x %*% post_beta fit <- HC(X=x, Y=post_mu, theta = post_beta, penalty = "lasso", method = "projection" )
Options For Use With the L0 Method
L0_method_options( method = c("binary program", "projection"), transport.method = transport_options(), epsilon = 0.05, OTmaxit = 0, parallel = NULL, ... )
L0_method_options( method = c("binary program", "projection"), transport.method = transport_options(), epsilon = 0.05, OTmaxit = 0, parallel = NULL, ... )
method |
Should covariates be selected as an approximate "binary program" or should a projection method be used. Default is the approximate binary program. |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
epsilon |
A value > 0 for the penalty parameter if using the Sinkhorn method for optimal transport |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
parallel |
A cluster backend to be used by |
... |
Not used |
a named list corresponding to the above arguments
L0_method_options()
L0_method_options()
Options For Use With the L1 Method
L1_method_options( penalty = L1_penalty_options(), lambda = numeric(0), nlambda = 500L, lambda.min.ratio = 1e-04, gamma = 1, alpha = 1, maxit = 500L, model.size = NULL, tol = 1e-07, display.progress = FALSE, solver.options = NULL )
L1_method_options( penalty = L1_penalty_options(), lambda = numeric(0), nlambda = 500L, lambda.min.ratio = 1e-04, gamma = 1, alpha = 1, maxit = 500L, model.size = NULL, tol = 1e-07, display.progress = FALSE, solver.options = NULL )
penalty |
The penalty to use. See |
lambda |
The penalty parameter to use if method is "L1". |
nlambda |
The number of lambdas to explore for the "L1" method if |
lambda.min.ratio |
The minimum ratio of max to min lambda for "L1" method. Default 1e-4. |
gamma |
Tuning parameter for SCAD and MCP penalties if method = "L1". |
alpha |
Tuning parameter for elastic net penalties |
maxit |
The maximum iterations for optimization. Default is 500. |
model.size |
What is the maximum number of coefficients to have in the final model. Default is NULL. If NULL, will find models from the minimum size, 0, to the number of columns in |
tol |
The tolerance for convergence |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is FALSE. |
solver.options |
Options to be passed on to the solver. Only used for "ecos" and "mosek" solvers. |
A list with names corresponding to each argument above.
L1_method_options()
L1_method_options()
Recognized L1 Penalties
L1_penalty_options()
L1_penalty_options()
A character vector with the possible penalties for L1 methods
L1_penalty_options() # [1] "lasso" "ols" "mcp" "elastic.net" "scad" # [6] "mcp.net" "scad.net" "grp.lasso" "grp.lasso.net" "grp.mcp" # [11] "grp.scad" "grp.mcp.net" "grp.scad.net" "sparse.grp.lasso"
L1_penalty_options() # [1] "lasso" "ols" "mcp" "elastic.net" "scad" # [6] "mcp.net" "scad.net" "grp.lasso" "grp.lasso.net" "grp.mcp" # [11] "grp.scad" "grp.mcp.net" "grp.scad.net" "sparse.grp.lasso"
ObjectsPlot Function for Objects
## S4 method for signature 'WPR2' plot( x, xlim = NULL, ylim = NULL, linesize = 0.5, pointsize = 1.5, facet.group = NULL, ... )
## S4 method for signature 'WPR2' plot( x, xlim = NULL, ylim = NULL, linesize = 0.5, pointsize = 1.5, facet.group = NULL, ... )
x |
A |
xlim |
x-axis limits |
ylim |
y-axis limits |
linesize |
Linesize for geom_line |
pointsize |
Point size for geom_point |
facet.group |
Group to do facet_grid by |
... |
Currently not used |
a ggplot2::ggplot()
object
n <- 128 p <- 10 s <- 99 x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2.0, options = list(penalty = "lasso") ) obj <- WPR2(predictions = post_mu, projected_model = fit) if (rlang::is_installed("ggplot2")) { p <- plot(obj) }
n <- 128 p <- 10 s <- 99 x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2.0, options = list(penalty = "lasso") ) obj <- WPR2(predictions = post_mu, projected_model = fit) if (rlang::is_installed("ggplot2")) { p <- plot(obj) }
This function will plot the distribution of predictions for a range of active coefficients
ridgePlot( fit, index = 1, minCoef = 1, maxCoef = 10, scale = 1, alpha = 0.5, full = NULL, transform = function(x) { x }, xlab = "Predictions", bandwidth = NULL )
ridgePlot( fit, index = 1, minCoef = 1, maxCoef = 10, scale = 1, alpha = 0.5, full = NULL, transform = function(x) { x }, xlab = "Predictions", bandwidth = NULL )
fit |
A |
index |
The observation number to select. Can be a vector |
minCoef |
The minimum number of coefficients to use |
maxCoef |
The maximum number of coefficients to use |
scale |
How the densities should be scale |
alpha |
Alpha term from ggplot2 object |
full |
"True" prediction to compare to |
transform |
transform for predictions |
xlab |
x-axis label |
bandwidth |
Bandwidth for kernel |
a ggplot2::ggplot()
plot
if(rlang::is_installed("stats")) { n <- 128 p <- 10 s <- 99 x <- matrix(stats::rnorm(n*p), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + matrix(stats::rnorm(p*s, 0, 0.1), p, s) post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2 ) if(rlang::is_installed(c("ggplot2","ggsci","ggridges"))) { ridgePlot(fit) } }
if(rlang::is_installed("stats")) { n <- 128 p <- 10 s <- 99 x <- matrix(stats::rnorm(n*p), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + matrix(stats::rnorm(p*s, 0, 0.1), p, s) post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2 ) if(rlang::is_installed(c("ggplot2","ggsci","ggridges"))) { ridgePlot(fit) } }
Options For Use With the Simulated Annealing Selection Method
simulated_annealing_method_options( force = NULL, method = c("binary program", "projection"), transport.method = transport_options(), OTmaxit = 0L, epsilon = 0.05, maxit = 1L, temps = 1000L, max.time = 3600, proposal.method = c("covariance", "uniform"), energy.distribution = c("boltzman", "bose-einstein"), cooling.schedule = c("Geman-Geman", "exponential"), model.size = NULL, nvars = NULL, display.progress = FALSE, parallel = NULL, calc.theta = TRUE, ... )
simulated_annealing_method_options( force = NULL, method = c("binary program", "projection"), transport.method = transport_options(), OTmaxit = 0L, epsilon = 0.05, maxit = 1L, temps = 1000L, max.time = 3600, proposal.method = c("covariance", "uniform"), energy.distribution = c("boltzman", "bose-einstein"), cooling.schedule = c("Geman-Geman", "exponential"), model.size = NULL, nvars = NULL, display.progress = FALSE, parallel = NULL, calc.theta = TRUE, ... )
force |
Any covariates to force into the model? Should be by column number or NULL if no variables to force into the model. |
method |
Should covariates be selected as an approximate "binary program" or should a projection method be used. Default is the approximate binary program. |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
epsilon |
A value > 0 for the penalty parameter of if using the Sinkhorn method for optimal transport |
maxit |
Maximum number of iterations per temperature |
temps |
Number of temperatures to try |
max.time |
Maximum time in seconds to run the algorithm |
proposal.method |
The method to propose the next covariate to add. One of "covariance" or "random". "covariance" will randomly select from covariates with probability proportional to the absolute value of the covariance. "uniform" will select covariates uniformly at random. |
energy.distribution |
The energy distribution to use for evaluating proposals. One of "boltzman" or "bose-einstein". Default is "boltzman". |
cooling.schedule |
The schedule to use for cooling temperatures. One of "Geman-Geman" or "exponential". Default is "Geman-Geman". |
model.size |
How many coefficients should the maximum final model have? Ignored if |
nvars |
What model sizes should one check? Should be a numeric vector with maximum less than number of variables or |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is |
parallel |
A cluster backend to be used by |
calc.theta |
Return the linear coefficients? Default is TRUE. |
... |
Not used. |
A named list with the above arguments
simulated_annealing_method_options()
simulated_annealing_method_options()
Options For Use With the Stepwise Selection Method
stepwise_method_options( force = NULL, direction = c("backward", "forward"), method = c("binary program", "projection"), transport.method = transport_options(), OTmaxit = 0, epsilon = 0.05, model.size = NULL, display.progress = FALSE, parallel = NULL, calc.theta = TRUE, ... )
stepwise_method_options( force = NULL, direction = c("backward", "forward"), method = c("binary program", "projection"), transport.method = transport_options(), OTmaxit = 0, epsilon = 0.05, model.size = NULL, display.progress = FALSE, parallel = NULL, calc.theta = TRUE, ... )
force |
Any covariates to force into the model? Should be by column number or NULL if no variables to force into the model. |
direction |
"forward" or "backward" selection? Default is "backward" |
method |
Should covariates be selected as an approximate "binary program" or should a projection method be used. Default is the approximate binary program. |
transport.method |
Method for Wasserstein distance calculation. Should be one the outputs of |
OTmaxit |
The number of iterations to run the Wasserstein distance solvers. |
epsilon |
A value > 0 for the penalty parameter of if using the Sinkhorn method for optimal transport |
model.size |
How many coefficients should the maximum final model have? |
display.progress |
Logical. Should intermediate progress be displayed? TRUE or FALSE. Default is FALSE. |
parallel |
A cluster backend to be used by |
calc.theta |
Return the linear coefficients? Default is TRUE. |
... |
Not used |
A named list with the above arguments
stepwise_method_options()
stepwise_method_options()
This function will calculate linear projections from a set of predictions into the space of the covariates in terms of the p-Wasserstein distance.
WpProj( X, eta = NULL, theta = NULL, power = 2, method = c("L1", "binary program", "stepwise", "simulated annealing", "L0"), solver = c("lasso", "ecos", "lpsolve", "mosek"), options = NULL )
WpProj( X, eta = NULL, theta = NULL, power = 2, method = c("L1", "binary program", "stepwise", "simulated annealing", "L0"), solver = c("lasso", "ecos", "lpsolve", "mosek"), options = NULL )
X |
An |
eta |
An |
theta |
An optional An |
power |
The power of the Wasserstein distance to use. Must be |
method |
The algorithm to calculate the Wasserstein projections. One of "L1", "binary program", "IP", "stepwise","simulated annealing", or "L0". Will default to "L1" if not provided. See details for more information. |
solver |
Which solver to use? One of "lasso", "ecos", "lpsolve", or "mosek". See details for more information |
options |
Options passed to the particular method and desired solver. See details for more information. |
The WpProj
function is a wrapper for the various Wasserstein projection methods. It is designed to be a one-stop shop for all Wasserstein projection methods. It will automatically choose the correct method and solver based on the arguments provided. It will also return a standardized output for all methods. Each method has its own set of options that can be passed to it. See the documentation for each method for more information.
For the L1 methods, see L1_method_options()
for more information. For the binary program methods, see binary_program_method_options()
for more information. For the stepwise methods, see stepwise_method_options()
for more information. For the simulated annealing methods, see simulated_annealing_method_options()
for more information.
In most cases, we recommend using the L1 methods or binary program methods. The L1 methods are the fastest and applicable to Wasserstein powers of any value greater than 1 and function as direct linear projections into the space of the covariates. The binary program methods instead preserve the coefficients of the original model if this is of interest, such as when the original model was already a linear model. The binary program will instead function as a way of turning on and off certain coefficients in a way that minimizes the Wasserstein distance between reduced and original models. Of note, we also have available an approximate binary program method using a lasso solver. This method is faster than the exact binary program method but is not guaranteed to find the optimal solution. It is recommended to use the exact binary program method if possible. See binary_program_method_options()
for more information on how to set up the approximate method as some arguments for the lasso solver should be specified. For more information on how this works, please also see the referenced paper.
The stepwise, simulated annealing, and L0 methods also select covariates like the binary program methods but they can be slower. They are presented merely for comparison purposes given they were used in the original paper.
The Wasserstein distance is a measure of distance between two probability distributions. It is defined as:
where is the set of all joint distributions with marginals
and
. The Wasserstein distance is a generalization of the Euclidean distance, which is the case when
. In our function we have argument
power
that corresponds to the of the equation above. The default
power
is 2.0
but any value greater than or equal to 1.0
is allowed. For more information, see the references.
The particular implementation of the Wasserstein distance is as follows. If is the original prediction from the original model, then we seek to find a new prediction
that minimizes the Wasserstein distance between the two:
.
object of class WpProj
, which is a list with the following slots:
call
: The call to the function
theta
: A list of the final parameter matrices for each returned model
fitted.values
: A list of the fitted values for each returned model
power
: The power of the Wasserstein distance used
method
: The method used to calculate the Wasserstein projections
solver
: The solver used to calculate the Wasserstein projections
niter
: The number of iterations used to calculate the Wasserstein projections. Not all methods return a number of iterations so this may be NULL
nzero
: The number of non zero coefficients in the final models
Dunipace, Eric and Lorenzo Trippa (2020) https://arxiv.org/abs/2012.09999.
if(rlang::is_installed("stats")) { # note we don't generate believable data with real posteriors # these examples are just to show how to use the function n <- 32 p <- 10 s <- 21 # covariates and coefficients x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 #outcome y <- x %*% beta + stats::rnorm(n) # fake posterior post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta #posterior predictive distributions # fit models ## L1 model fit.p2 <- WpProj(X=x, eta=post_mu, power = 2.0, method = "L1", #default solver = "lasso" #default ) ## approximate binary program fit.p2.bp <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0, method = "binary program", solver = "lasso" #default because approximate algorithm is faster ) ## compare performance by measuring distance from full model dc <- distCompare(models = list("L1" = fit.p2, "BP" = fit.p2.bp)) if(rlang::is_installed(c("ggplot2","ggsci"))) { plot(dc) } ## compare performance by measuring the relative distance between a null model ## and the predictions of interest as a pseudo R^2 r2.expect <- WPR2(predictions = post_mu, projected_model = dc) # can have negative values r2.null <- WPR2(projected_model = dc) # should be between 0 and 1 if(rlang::is_installed(c("ggplot2","ggsci"))) { plot(r2.null) } ## we can also examine how predictions change in the models for individual observations if(rlang::is_installed(c("ggplot2","ggsci","ggridges"))) { ridgePlot(fit.p2, index = 21, minCoef = 0, maxCoef = 10) } }
if(rlang::is_installed("stats")) { # note we don't generate believable data with real posteriors # these examples are just to show how to use the function n <- 32 p <- 10 s <- 21 # covariates and coefficients x <- matrix( stats::rnorm( p * n ), nrow = n, ncol = p ) beta <- (1:10)/10 #outcome y <- x %*% beta + stats::rnorm(n) # fake posterior post_beta <- matrix(beta, nrow=p, ncol=s) + stats::rnorm(p*s, 0, 0.1) post_mu <- x %*% post_beta #posterior predictive distributions # fit models ## L1 model fit.p2 <- WpProj(X=x, eta=post_mu, power = 2.0, method = "L1", #default solver = "lasso" #default ) ## approximate binary program fit.p2.bp <- WpProj(X=x, eta=post_mu, theta = post_beta, power = 2.0, method = "binary program", solver = "lasso" #default because approximate algorithm is faster ) ## compare performance by measuring distance from full model dc <- distCompare(models = list("L1" = fit.p2, "BP" = fit.p2.bp)) if(rlang::is_installed(c("ggplot2","ggsci"))) { plot(dc) } ## compare performance by measuring the relative distance between a null model ## and the predictions of interest as a pseudo R^2 r2.expect <- WPR2(predictions = post_mu, projected_model = dc) # can have negative values r2.null <- WPR2(projected_model = dc) # should be between 0 and 1 if(rlang::is_installed(c("ggplot2","ggsci"))) { plot(r2.null) } ## we can also examine how predictions change in the models for individual observations if(rlang::is_installed(c("ggplot2","ggsci","ggridges"))) { ridgePlot(fit.p2, index = 21, minCoef = 0, maxCoef = 10) } }
Function to Evaluate Performance
This function will calculate p-Wasserstein distances between the predictions of interest and the projected model.
WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,matrix' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,distcompare' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,list' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,WpProj' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... )
WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,matrix' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,distcompare' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,list' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... ) ## S4 method for signature 'ANY,WpProj' WPR2( predictions = NULL, projected_model, p = 2, method = "exact", base = NULL, ... )
predictions |
Predictions of interest, likely from the original model |
projected_model |
A matrix of competing predictions, possibly from a WpProj fit, a WpProj fit itself, or a list of WpProj objects |
p |
Power of the Wasserstein distance to use in distance calculations |
method |
Method for calculating Wasserstein distance |
base |
The baseline result to compare to. If not provided, defaults to the model with no covariates and only an intercept. |
... |
Arguments passed to Wasserstein distance calculation. See |
values
if (rlang::is_installed("stats")) { # this example is not a true posterior estimation, but is used for illustration n <- 32 p <- 10 s <- 21 x <- matrix( stats::rnorm(n*p), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + matrix(rnorm(p*s), p, s) # not a true posterior post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2.0) out <- WPR2(predictions = post_mu, projected_model = fit, base = rowMeans(post_mu) # same as intercept only projection ) }
if (rlang::is_installed("stats")) { # this example is not a true posterior estimation, but is used for illustration n <- 32 p <- 10 s <- 21 x <- matrix( stats::rnorm(n*p), nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta + stats::rnorm(n) post_beta <- matrix(beta, nrow=p, ncol=s) + matrix(rnorm(p*s), p, s) # not a true posterior post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2.0) out <- WPR2(predictions = post_mu, projected_model = fit, base = rowMeans(post_mu) # same as intercept only projection ) }
This function will measure how much removing each covariate harms prediction accuracy.
WPVI( X, eta, theta, pred.fun = NULL, p = 2, ground_p = 2, transport.method = transport_options(), epsilon = 0.05, OTmaxit = 0, display.progress = FALSE, parallel = NULL )
WPVI( X, eta, theta, pred.fun = NULL, p = 2, ground_p = 2, transport.method = transport_options(), epsilon = 0.05, OTmaxit = 0, display.progress = FALSE, parallel = NULL )
X |
Covariates |
eta |
Predictions from the estimated model |
theta |
Parameters from the estimated model. |
pred.fun |
A prediction function. must take variables x, theta as arguments: |
p |
Power of Wasserstein distance |
ground_p |
Power of distance metric |
transport.method |
Transport methods. See |
epsilon |
Hyperparameter for Sinkhorn iterations |
OTmaxit |
Maximum number of iterations for the Wasserstein method |
display.progress |
Display intermediate progress |
parallel |
a foreach backend if already created |
Returns an integer vector ranking covariate importance from most to least important.
n <- 128 p <- 10 s <- 99 x <- matrix(1, nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta post_beta <- matrix(beta, nrow=p, ncol=s) post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2.0) WPVI(X = x, eta = post_mu, theta = post_beta, transport.method = "hilbert")
n <- 128 p <- 10 s <- 99 x <- matrix(1, nrow = n, ncol = p ) beta <- (1:10)/10 y <- x %*% beta post_beta <- matrix(beta, nrow=p, ncol=s) post_mu <- x %*% post_beta fit <- WpProj(X=x, eta=post_mu, power = 2.0) WPVI(X = x, eta = post_mu, theta = post_beta, transport.method = "hilbert")