Title: | Approximate and Exact Optimal Transport Methods |
---|---|
Description: | R and C++ functions to perform exact and approximate optimal transport. All C++ methods can be linked to other R packages via their header files. |
Authors: | Eric Dunipace [aut, cre] |
Maintainer: | Eric Dunipace <[email protected]> |
License: | GPL (== 3.0) |
Version: | 1.2 |
Built: | 2025-01-08 19:29:11 UTC |
Source: | https://github.com/ericdunipace/approxot |
R and C++ functions to perform exact and approximate optimal transport. All C++ methods are linkable to other R packages via their header files.
Eric Dunipace
Useful links:
Report bugs at https://github.com/ericdunipace/approxOT/issues
Transform transportation plan to transportation matrix
## S3 method for class 'transport.plan' as.matrix(x, ...)
## S3 method for class 'transport.plan' as.matrix(x, ...)
x |
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan()] |
... |
Unused arguments |
A matrix specifying the minimal joint distribution between samples. Margins will be equal to the marginal distributions of the samples
set.seed(203987) n <- 5 d <- 2 x <- matrix(rnorm(d*n), nrow=d, ncol=n) y <- matrix(rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert") trans_matrix <- as.matrix(trans_plan) print(trans_matrix)
set.seed(203987) n <- 5 d <- 2 x <- matrix(rnorm(d*n), nrow=d, ncol=n) y <- matrix(rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert") trans_matrix <- as.matrix(trans_plan) print(trans_matrix)
Transform transportation matrix to transportation plan
as.transport.plan(transport_matrix, ...)
as.transport.plan(transport_matrix, ...)
transport_matrix |
A matrix that is a transportation matrix, i.e. the minimal joint distribution for two samples. |
... |
Unused arguments |
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan]
set.seed(203987) n <- 5 d <- 2 x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert") trans_matrix <- as.matrix(trans_plan$tplan) tplan2 <- as.transport.plan(trans_matrix) all.equal(tplan2, trans_plan$tplan)
set.seed(203987) n <- 5 d <- 2 x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert") trans_matrix <- as.matrix(trans_plan$tplan) tplan2 <- as.transport.plan(trans_matrix) all.equal(tplan2, trans_plan$tplan)
Calculate cost matrix
cost_calc(X, Y, ground_p)
cost_calc(X, Y, ground_p)
X |
matrix of values in first sample. Observations should be by column, not rows. |
Y |
matrix of Values in second sample. Observations should be by column, not rows. |
ground_p |
power of the Lp norm to use in cost calculation. |
matrix of costs
X <- matrix(rnorm(10*100), 10, 100) Y <- matrix(rnorm(10*100), 10, 100) # the Euclidean distance cost <- cost_calc(X, Y, ground_p = 2)
X <- matrix(rnorm(10*100), 10, 100) Y <- matrix(rnorm(10*100), 10, 100) # the Euclidean distance cost <- cost_calc(X, Y, ground_p = 2)
Get order along the Hilbert curve
hilbert.projection(X, Sigma = NULL)
hilbert.projection(X, Sigma = NULL)
X |
matrix of values. Observations are unique by rows. |
Sigma |
Covariance of the data. If provided, uses a Mahalanobis distance. |
Index of orders
X <- matrix(rnorm(10*3), 3, 10) idx <- hilbert.projection(X) print(idx)
X <- matrix(rnorm(10*3), 3, 10) idx <- hilbert.projection(X) print(idx)
Check if function is a transport.plan
is.transport.plan(tplan)
is.transport.plan(tplan)
tplan |
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan] |
Logical
set.seed(203987) n <- 5 d <- 2 x <- matrix(rnorm(d*n), nrow=d, ncol=n) y <- matrix(rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert") print(is.transport.plan(trans_plan))
set.seed(203987) n <- 5 d <- 2 x <- matrix(rnorm(d*n), nrow=d, ncol=n) y <- matrix(rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert") print(is.transport.plan(trans_plan))
Function returning supported optimal transportation methods.
transport_options()
transport_options()
The currently supported methods are
exact, networkflow: Utilize the networkflow algorithm to solve the exact optimal transport problem
shortsimplex: Use the shortsimplex algorithm to solve the exact optimal transport problem
sinkhorn: Use Sinkhorn's algorithm to solve the approximate optimal transport problem
sinkhorn_log: Use Sinkhorn's algorithm on a log-scale for added stability to solve the approximate optimal transport problem
greenkhorn: Use the Greenkhorn algorithm to solve the approximate optimal transport problem
hilbert: Use hilbert sorting to perform approximate optimal transport
rank: use the average covariate ranks to perform approximate optimal transport
univariate: Use appropriate optimal transport methods for univariate data
swapping: Utilize the swapping algorithm to perform approximate optimal transport
sliced: Use the sliced optimal transport distance
Returns a vector of supported transport methods
Optimal transport plans
transport_plan( X, Y, a = NULL, b = NULL, p = 2, ground_p = 2, observation.orientation = c("rowwise", "colwise"), method = transport_options(), ... )
transport_plan( X, Y, a = NULL, b = NULL, p = 2, ground_p = 2, observation.orientation = c("rowwise", "colwise"), method = transport_options(), ... )
X |
The covariate data of the first sample. |
Y |
The covariate data of the second sample. |
a |
Optional. Empirical measure of the first sample |
b |
Optional. Empirical measure of the second sample |
p |
The power of the Wasserstein distance |
ground_p |
The power of the Lp norm |
observation.orientation |
Are observations by row ("rowwise") or column ("colwise"). |
method |
Which transportation method to use. See [transport_options][transport_options] |
... |
Additional arguments for various methods
|
a list with slots "tplan" and "cost". "tplan" is the optimal transport plan and "cost" is the optimal transport distance.
set.seed(203987) n <- 100 d <- 10 x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way transx <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert")
set.seed(203987) n <- 100 d <- 10 x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n) #get hilbert sort orders for x in backwards way transx <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert")
Optimal transport plans given a pre-specified cost
transport_plan_given_C( mass_x, mass_y, p = 2, cost = NULL, method = "exact", cost_a = NULL, cost_b = NULL, ... )
transport_plan_given_C( mass_x, mass_y, p = 2, cost = NULL, method = "exact", cost_a = NULL, cost_b = NULL, ... )
mass_x |
The empirical measure of the first sample |
mass_y |
The empirical measure of the second sample. |
p |
The power of the Wasserstein distance |
cost |
Specify the cost matrix in advance. |
method |
The transportation method to use, one of "exact", "networkflow","shortsimplex", "sinkhorn", "greenkhorn" |
cost_a |
The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn |
cost_b |
The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn |
... |
Additional arguments for various methods
|
A transportation plan as an object of class "transport.plan", which is a list with slots "from","to", and "mass".
n <- 32 d <- 5 set.seed(293897) A <- matrix(stats::rnorm(n*d),nrow=d,ncol=n) B <- matrix(stats::rnorm(n*d),nrow=d,ncol=n) transp.meth <- "sinkhorn" niter <- 1e2 test <- transport_plan_given_C(rep(1/n,n), rep(1/n,n), 2, cost = cost_calc(A,B,2), "sinkhorn", niter = niter)
n <- 32 d <- 5 set.seed(293897) A <- matrix(stats::rnorm(n*d),nrow=d,ncol=n) B <- matrix(stats::rnorm(n*d),nrow=d,ncol=n) transp.meth <- "sinkhorn" niter <- 1e2 test <- transport_plan_given_C(rep(1/n,n), rep(1/n,n), 2, cost = cost_calc(A,B,2), "sinkhorn", niter = niter)
Multimarginal optimal transport plans
transport_plan_multimarg( ..., p = 2, ground_p = 2, observation.orientation = c("rowwise", "colwise"), method = c("hilbert", "univariate", "sliced"), nsim = 1000 )
transport_plan_multimarg( ..., p = 2, ground_p = 2, observation.orientation = c("rowwise", "colwise"), method = c("hilbert", "univariate", "sliced"), nsim = 1000 )
... |
Either data matrices as separate arguments or a list of data matrices. Arguments after the data must be specified by name. |
p |
The power of the Wasserstein distance to use |
ground_p |
The power of the Euclidean distance to use |
observation.orientation |
Are observations by rows or columns |
method |
One of "hilbert", "univariate", or "sliced" |
nsim |
Number of simulations to use for the sliced method |
transport plan
set.seed(23423) n <- 100 d <- 10 p <- ground_p <- 2 #euclidean cost, p = 2 x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d) y <- matrix(stats::rnorm(n*d), n, d) z <- matrix(stats::rnorm((n +455)*d), n +455, d) # make data a list data <- list(x,y,z) tplan <- transport_plan_multimarg(data, p = p, ground_p = ground_p, observation.orientation = "rowwise", method = "hilbert") #' #transpose data works too datat <- lapply(data, t) tplan2 <- transport_plan_multimarg(datat, p = p, ground_p = ground_p, observation.orientation = "colwise",method = "hilbert")
set.seed(23423) n <- 100 d <- 10 p <- ground_p <- 2 #euclidean cost, p = 2 x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d) y <- matrix(stats::rnorm(n*d), n, d) z <- matrix(stats::rnorm((n +455)*d), n +455, d) # make data a list data <- list(x,y,z) tplan <- transport_plan_multimarg(data, p = p, ground_p = ground_p, observation.orientation = "rowwise", method = "hilbert") #' #transpose data works too datat <- lapply(data, t) tplan2 <- transport_plan_multimarg(datat, p = p, ground_p = ground_p, observation.orientation = "colwise",method = "hilbert")
Calculate the Wasserstein distance
wasserstein( X = NULL, Y = NULL, a = NULL, b = NULL, cost = NULL, tplan = NULL, p = 2, ground_p = 2, method = transport_options(), cost_a = NULL, cost_b = NULL, ... )
wasserstein( X = NULL, Y = NULL, a = NULL, b = NULL, cost = NULL, tplan = NULL, p = 2, ground_p = 2, method = transport_options(), cost_a = NULL, cost_b = NULL, ... )
X |
The covariate data of the first sample. |
Y |
The covariate data of the second sample. |
a |
Optional. Empirical measure of the first sample |
b |
Optional. Empirical measure of the second sample |
cost |
Specify the cost matrix in advance. |
tplan |
Give a transportation plan with slots "from", "to", and "mass", like that returned by the [tranportation_plan()] function. |
p |
The power of the Wasserstein distance |
ground_p |
The power of the Lp norm |
method |
Which transportation method to use. See [transport_options()] |
cost_a |
The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn |
cost_b |
The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn |
... |
Additional arguments for various methods:
|
The p-Wasserstein distance, a numeric value
set.seed(11289374) n <- 100 z <- stats::rnorm(n) w <- stats::rnorm(n) uni <- approxOT::wasserstein(X = z, Y = w, p = 2, ground_p = 2, observation.orientation = "colwise", method = "univariate")
set.seed(11289374) n <- 100 z <- stats::rnorm(n) w <- stats::rnorm(n) uni <- approxOT::wasserstein(X = z, Y = w, p = 2, ground_p = 2, observation.orientation = "colwise", method = "univariate")