CVXR package on R Views

CVXR: A Direct Standardization Example

Fri, 20 Jul 2018 00:00:00 +0000

In our first blog post, we introduced CVXR, an R package for disciplined convex optimization, and showed how to model and solve a non-negative least squares problem using its interface. This time, we will tackle a non-parametric estimation example, which features new atoms as well as more complex constraints.

Direct Standardization

Consider a set of observations \((x_i,y_i)\) drawn non-uniformly from an unknown distribution. We know the expected value of the columns of \(X\), denoted by \(b \in {\mathbf R}^n\), and want to estimate the true distribution of \(y\). This situation may arise, for instance, if we wish to analyze the health of a population based on a sample skewed toward young males, knowing the average population-level sex, age, etc.

A naive approach would be to simply take the empirical distribution that places equal probability \(1/m\) on each \(y_i\). However, this is not a good estimation strategy when our sample is unbalanced. Instead, we will use the method of direct standardization (Fleiss, Levin, and Paik 2003, 19.5): we solve for weights \(w \in {\mathbf R}^m\) of a weighted empirical distribution, \(y = y_i\) with probability \(w_i\), which rectifies the skewness of the sample. This can be posed as the convex optimization problem

\[ \begin{array}{ll} \underset{w}{\mbox{maximize}} & \sum_{i=1}^m -w_i\log w_i \\ \mbox{subject to} & w \geq 0, \quad \sum_{i=1}^m w_i = 1,\quad X^Tw = b. \end{array} \]

Our objective is the total entropy, which is concave on \({\mathbf R}_+^m\). The constraints ensure \(w\) is a probability vector that induces our known expectations over the columns of \(X\), i.e., \(\sum_{i=1}^m w_iX_{ij} = b_j\) for \(j = 1,\ldots,n\).

An Example with Simulated Data

As an example, we generate \(m = 1000\) data points \(x_{i,1} \sim \mbox{Bernoulli}(0.5)\), \(x_{i,2} \sim \mbox{Uniform}(10,60)\), and \(y_i \sim N(5x_{i,1} + 0.1x_{i,2},1)\). We calculate \(b_j\) to be the mean over \(x_{.,j}\) for \(j = 1,2\). Then we construct a skewed sample of \(m = 100\) points that over-represent small values of \(y_i\), thus biasing its distribution downwards.

Using CVXR, we construct the direct standardization problem. We first define the variable \(w\).

w <- Variable(m)

Then, we form the objective function by combining CVXR’s library of operators and atoms.

objective <- Maximize(sum(entr(w)))

Here, entr is the element-wise entropy atom; the S4 object entr(w) represents an \(m\)-dimensional vector with entries \(-w_i\log(w_i)\) for \(i=1,\ldots,m\). The sum operator acts exactly as expected, forming an expression that is the sum of the entries in this vector. (For a full list of atoms, see the function reference page).

Our next step is to generate the list of constraints. Note that, by default, the relational operators apply over all entries in a vector or matrix.

constraints <- list(w >= 0, sum(w) == 1, t(X) %*% w == b)

Finally, we are ready to formulate and solve the problem.

prob <- Problem(objective, constraints)
result <- solve(prob)
weights <- result$getValue(w)

Using our optimal weights, we can then re-weight our skewed sample and compare it to the population distribution. Below, we plot the density functions using linear approximations for the range of \(y\).

## Approximate density functions
dens1 <- density(ypop)
dens2 <- density(y)
dens3 <- density(y, weights = weights)
yrange <- seq(-3, 15, 0.01)
d <- data.frame(x = yrange,
                True = approx(x = dens1$x, y = dens1$y, xout = yrange)$y,
                Sample = approx(x = dens2$x, y = dens2$y, xout = yrange)$y,
                Weighted = approx(x = dens3$x, y = dens3$y, xout = yrange)$y)

## Plot probability distribution functions
plot.data <- gather(data = d, key = "Type", value = "Estimate", True, Sample, Weighted,
                    factor_key = TRUE)
ggplot(plot.data) +
    geom_line(mapping = aes(x = x, y = Estimate, color = Type)) +
    theme(legend.position = "top")

## Warning: Removed 300 rows containing missing values (geom_path).

Figure 1: Probability distribution functions: population, skewed sample and reweighted sample

## Return the cumulative distribution function
get_cdf <- function(data, probs, color = 'k') {
    if(missing(probs))
        probs <- rep(1.0/length(data), length(data))
    distro <- cbind(data, probs)
    dsort <- distro[order(distro[,1]),]
    ecdf <- base::cumsum(dsort[,2])
    cbind(dsort[,1], ecdf)
}

## Plot cumulative distribution functions
d1 <- data.frame("True", get_cdf(ypop))
d2 <- data.frame("Sample", get_cdf(y))
d3 <- data.frame("Weighted", get_cdf(y, weights))

names(d1) <- names(d2) <- names(d3) <- c("Type", "x", "Estimate")
plot.data <- rbind(d1, d2, d3)

ggplot(plot.data) +
    geom_line(mapping = aes(x = x, y = Estimate, color = Type)) +
    theme(legend.position = "top")

Figure 2: Cumulative distribution functions: population, skewed sample and reweighted sample

As is clear from the plots, the sample probability distribution peaks around \(y = 2.0\), and its cumulative distribution is shifted left from the population’s curve, a result of the downward bias in our sampled \(y_i\). However, with the direct standardization weights, the new empirical distribution cleaves much closer to the true distribution shown in red.

We hope you’ve enjoyed this demonstration of CVXR. For more examples, check out our official site and recent presentation “Disciplined Convex Optimization with CVXR” at useR! 2018.

Solver Interfaces in CVXR

Mon, 09 Jul 2018 00:00:00 +0000

Introduction

In our previous blog post, we introduced CVXR, an R package for disciplined convex optimization. The package allows one to describe an optimization problem with Disciplined Convex Programming rules using high level mathematical syntax. Passing this problem definition along (with a list of constraints, if any) to the solve function transforms it into a form that can be handed off to a solver. The default installation of CVXR comes with two (imported) open source solvers:

ECOS and its mixed integer cousin ECOS_BB via the CRAN package ECOSolveR
SCS via the CRAN package scs.

CVXR (version 0.99) can also make use of several other open source solvers implemented in R packages:

The linear and mixed integer programming package lpSolve via the lpSolveAPI package
The linear and mixed integer programming package GLPK via the Rglpk package.

About Solvers

The real work of finding a solution is done by solvers, and writing good solvers is hard work. Furthermore, some solvers work particularly well for certain types of problems (linear programs, quadratic programs, etc.). Not surprisingly, there are commercial vendors who have solvers that are designed for performance and scale. Two well-known solvers are MOSEK and GUROBI. R packages for these solvers are also provided, but they require the problem data to be constructed in a specific form. This necessitates a bit of work in the current version of CVXR and is certainly something we plan to include in future versions. However, it is also true that these commercial solvers expose a much richer API to Python programmers than to R programmers. How, then, do we interface such solvers with R as quickly as possible, at least in the short term?

Reticulate to the Rescue

The current version of CVXR exploits the reticulate package for commercial solvers such as MOSEK and GUROBI. We took the Python solver interfaces in CVXPY version 0.4.11, edited them suitably to make them self-contained, and hooked them up to reticulate.

This means that one needs two prerequisites to use these commercial solvers in the current version of CVXR:

A Python installation
The reticulate R package.

Installing MOSEK/GUROBI

Both MOSEK and GUROBI provide academic versions (registration required) free of charge. For example, Anaconda users can install MOSEK with the command:

conda install -c mosek mosek

Others can use the pip command:

pip install -f https://download.mosek.com/stable/wheel/index.html Mosek

GUROBI is handled in a similar fashion. The solvers must be activated using a license provided by the vendor.

Once activated, one can check that CVXR recognizes the solver; installed_solvers() should list them.

> installed_solvers()
[1] "ECOS"    "ECOS_BB" "SCS"     "MOSEK"   "LPSOLVE" "GLPK"    "GUROBI"

Further information

More information on these solvers, along with a number of tutorial examples are available on the CVXR site. If you are attending useR! 2018, you can catch Anqi’s CVXR talk on Friday, July 13.

CVXR: An R Package for Disciplined Convex Optimization

Mon, 27 Nov 2017 00:00:00 +0000

At long last, we are pleased to announce the release of CVXR!

First introduced at useR! 2016, CVXR is an R package that provides an object-oriented language for convex optimization, similar to CVX, CVXPY, YALMIP, and Convex.jl. It allows the user to formulate convex optimization problems in a natural mathematical syntax, then automatically verifies the problem’s convexity with disciplined convex programming (DCP) and converts it into the appropriate form for a specific solver. This makes CVXR ideal for rapidly prototyping new statistical models. More information is available at the official site.

This is the first of a series of blog posts about CVXR. In this post, we will introduce the semantics of our package and dive into a simple example, which gives users an idea of CVXR’s power.

Convex Optimization

A convex optimization problem has the form \[ \begin{array}{ll} \underset{v}{\mbox{minimize}} & f_0(v)\\ \mbox{subject to} & f_i(v) \leq 0, \quad i=1,\ldots,M\\ & g_i(v) = 0, \quad i=1,\ldots,P \end{array} \] where \(v\) is the variable, \(f_0\) and \(f_1,...,f_m\) are convex, and \(g_1,...,g_p\) are affine. In CVXR, variables, expressions, objectives, and constraints are all represented by S4 objects. Users define a problem by combining constants and variables with a library of basic functions (atoms) provided by the package. When solve() is called, CVXR converts the S4 object into a standard form, sends it to the user-specified solver, and retrieves the results. Let’s see an example of this in action.

Ordinary Least Squares (OLS)

We begin by generating data for an ordinary least squares problem.

set.seed(1)
s <- 1
n <- 10
m <- 300
mu <- rep(0, 9)
Sigma <- cbind(c(1.6484, -0.2096, -0.0771, -0.4088, 0.0678, -0.6337, 0.9720, -1.2158, -1.3219),
               c(-0.2096, 1.9274, 0.7059, 1.3051, 0.4479, 0.7384, -0.6342, 1.4291, -0.4723),
               c(-0.0771, 0.7059, 2.5503, 0.9047, 0.9280, 0.0566, -2.5292, 0.4776, -0.4552),
               c(-0.4088, 1.3051, 0.9047, 2.7638, 0.7607, 1.2465, -1.8116, 2.0076, -0.3377),
               c(0.0678, 0.4479, 0.9280, 0.7607, 3.8453, -0.2098, -2.0078, -0.1715, -0.3952),
               c(-0.6337, 0.7384, 0.0566, 1.2465, -0.2098, 2.0432, -1.0666,  1.7536, -0.1845),
               c(0.9720, -0.6342, -2.5292, -1.8116, -2.0078, -1.0666, 4.0882,  -1.3587, 0.7287),
               c(-1.2158, 1.4291, 0.4776, 2.0076, -0.1715, 1.7536, -1.3587, 2.8789, 0.4094),
               c(-1.3219, -0.4723, -0.4552, -0.3377, -0.3952, -0.1845, 0.7287, 0.4094, 4.8406))
X <- mvrnorm(m, mu, Sigma)
X <- cbind(rep(1, m), X)
trueBeta <- c(0, 0.8, 0, 1, 0.2, 0, 0.4, 1, 0, 0.7)
y <- X %*% trueBeta + rnorm(m, 0, s)

Here, n is the number of predictors, y is the response, and X is the matrix of predictors. In CVXR, we first instantiate the optimization variable.

beta <- Variable(n)

beta is a Variable S4 object, which does not contain a value yet.

In the next line, we define the objective function.

objective <- Minimize(sum((y - X %*% beta)^2))

The expression sum((y - X %*% beta)^2) is another S4 object, created by combining y, X, and beta using the basic addition, subtraction, multiplication, and power atoms. Thus, the call to Minimize() does not return its minimum value (after all, beta doesn’t have a value yet, so we cannot evaluate the expression), but simply defines the goal of our optimization problem.

Finally, we construct the Problem and call solve(), which invokes the default ECOS solver.

prob <- Problem(objective)
CVXR_result <- solve(prob)

This returns a list containing, among other things, the solver status, objective value, and function getValue() that takes as input a Variable and retrieves its optimal value from the solution.

CVXR_result$status           # solution status by solver

## [1] "optimal"

CVXR_result$value            # optimal objective value

## [1] 340.3187

cvxrBeta <- CVXR_result$getValue(beta)   # optimal value of beta

Below, we plot the CVXR coefficients beside the coefficients found by lm().

p <- length(trueBeta)
lm_result <- lm(y ~ 0 + X)
lmBeta <- coef(lm_result)
df <- data.frame(coeff = rep(paste0("beta[", seq_along(lmBeta) - 1L, "]"), 2),
                beta = c(lmBeta, cvxrBeta),
                type = c(rep("OLS", p), rep("CVXR", p)))
ggplot(data = df, mapping = aes(x = coeff, y = beta)) +
    geom_bar(mapping = aes(fill = type), stat = "identity", position = "dodge") +
    scale_x_discrete(labels = parse(text = levels(df$coeff)))

As expected, they are identical. Obviously, if all you want to fit is OLS, you should use lm(). The chief advantage of CVXR is its ability to quickly modify and adapt a problem, as we illustrate in the next section.

Non-Negative Least Squares (NNLS)

In many situations, we can greatly improve our model by constraining the solution to reflect our prior knowledge. For example, we may know that beta must be non-negative. We can easily incorporate this knowledge into our problem by passing an additional argument to the Problem() constructor, which specifies a list of constraints.

prob2 <- Problem(objective, list(beta >= 0))

Again, counter to what you might expect, the expression beta >= 0 does not return TRUE or FALSE the way 1.3 >= 0 would. Instead, the == and >= operators have been overloaded to return Constraint S4 objects, which will be used by the solver to enforce the problem’s constraints. We then re-solve the new problem with the non-negativity constraint.

CVXR_result2 <- solve(prob2)
cvxrBetaNNLS <- CVXR_result2$getValue(beta)

As you can see in the figure below, adding that one constraint produced a massive improvement in the accuracy of the estimates. Not only are the NNLS estimates much closer to the true coefficients than the OLS estimates, but they have even managed to recover the correct sparsity structure in this case.

df <- data.frame(coeff = rep(paste0("beta[", seq_along(trueBeta) - 1L, "]"), 3),
                beta = c(trueBeta, lmBeta, cvxrBetaNNLS),
                type = c(rep("Actual", p), rep("OLS", p), rep("NNLS", p)))
ggplot(data = df, mapping = aes(x = coeff, y = beta)) +
    geom_bar(mapping = aes(fill = type), stat = "identity", position = "dodge") +
    scale_x_discrete(labels = parse(text = levels(df$coeff)))

Like with OLS, there are already R packages available that implement NNLS. But that is actually an excellent demonstration of the power of CVXR: A single line of code here - namely, prob2 <- Problem(objective, list(beta >= 0)) - is doing the work of an entire package.

We hope this gives you an idea of the power of CVXR. In our next post, we will explore a non-parametric estimation problem that introduces more atoms and constraints.