{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Proximal gradient algorithm\n", "This first tutorial details how to use a proximal gradient algorithm\n", "to solve the LASSO problem:\n", "$$\n", "\\min_u \\;f(u) = \\| A u - b \\|_2^2 + \\lambda \\|u \\|_1\n", "$$\n", "with $A$ given matrix, and $b$ vector of observation.\n", "As the function $f$ is non-smooth and additive, the resolution of the LASSO\n", "problem is a good use case for the proximal gradient algorithm.\n", "\n", "## Settings\n", "We import the usual packages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using Printf, Random\n", "using LinearAlgebra\n", "using JuMP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fix seed" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Random.seed!(2713);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first generate artificial data to study the algorithm:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "include(\"data.jl\")\n", "\n", "nVariables = 50;\n", "nCassures = 50;\n", "xMin, xMax = build_bounds(nVariables)\n", "A = build_X(nVariables, nCassures, xMin, xMax);\n", "b = randn(nCassures)\n", "λ = 50.0;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Build oracle for objective:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f(u) = 0.5 * norm(A*u - b, 2)^2 + λ * norm(u, 1);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and we add a function to evaluate the prox operator of the $\\ell_1$-norm:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function prox_g(v, u, ε, λ)\n", " step = ε * λ\n", " for i in 1:length(u)\n", " # TODO: code prox of component i here\n", " end\n", " return v\n", "end;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get optimal solution with a QP solver" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using OSQP\n", "# using CPLEX\n", "\n", "function get_solution(A, b, λ)\n", " n, m = size(A)\n", " model = Model(OSQP.Optimizer)\n", " JuMP.set_silent(model)\n", " # Variable u\n", " @variable(model, u[1:n])\n", " # Variable t, linearizing the \\|.\\|_1 term\n", " @variable(model, t[1:n])\n", " @constraint(model, t .>= u)\n", " @constraint(model, t .>= -u)\n", " @objective(model, Min, 0.5 * dot(A*u - b, A*u - b) + λ * sum(t))\n", " JuMP.optimize!(model)\n", " return JuMP.objective_value(model)\n", "end\n", "optsol = get_solution(A, b, λ)\n", "println(\"Optimal solution is equal to \", optsol)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Proximal gradient algorithm\n", "At iteration $k$ and iterate $u_k$, the proximal gradient algorithm computes\n", "$$\n", "\\begin{aligned}\n", "& v_k = u_k - \\varepsilon_k \\nabla F(u_k) \\\\\n", "& u_{k+1} = \\text{prox}_{\\varepsilon_k g}(v_k)\n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function proximal_gradient(A, b, u0, ε; maxit=1000)\n", " # iterate\n", " u = copy(u0)\n", " # intermediate vector\n", " v = similar(u0)\n", " # instantiate array to store trace\n", " trace_f = Float64[]\n", " push!(trace_f, f(u))\n", "\n", " for nit in 1:maxit\n", " # TODO: code proximal gradient algorithm here\n", " # evaluate objective at current point\n", " f_k = #TODO\n", " push!(trace_f, f_k)\n", " end\n", " return trace_f\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We know that the optimal step is given by the inverse of the Lipschitz\n", "coefficient $L$ of the gradient of $F$, the smooth part of the objective.\n", "As $F(u) = 0.5 \\|A u - b \\|_2^2$, we get" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "L = eigmax(A' * A)\n", "step = 1 / L" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We choose as initial point the vector $u_0 = 0$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "u0 = zeros(nVariables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accelerated proximal gradient\n", "The accelerated proximal gradient algorithm is a variant of the proximal\n", "gradient algorithm, which is known for its better performance.\n", "\n", "At iteration $k$ and iterate $u_k$, the accelerated proximal gradient\n", "algorithm computes\n", "$$\n", "\\begin{aligned}\n", "& w_k = u_k - \\varepsilon_k \\nabla F(u_k) \\\\\n", "& v_{k+1} = \\text{prox}_{\\varepsilon_k g}(w_k) \\\\\n", "& u_{k+1} = v_{k+1} + \\dfrac{k - 1}{k+2} (v_{k+1} - v_k)\n", "\\end{aligned}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function accelerated_proximal_gradient(A, b, u0, ε; maxit=1000)\n", " # intermediate vector\n", " u = copy(u0)\n", " v_next = copy(u0)\n", " v_prev = copy(u0)\n", " w = similar(u0)\n", " trace_f = Float64[]\n", " push!(trace_f, f(u))\n", "\n", " for nit in 1:maxit\n", " # TODO: code accelerated proximal gradient here\n", " f_k = # TODO\n", " push!(trace_f, f_k)\n", " copy!(v_prev, v_next)\n", " end\n", " return trace_f\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using PyPlot\n", "trace_1 = proximal_gradient(A, b, u0, step)\n", "trace_2 = accelerated_proximal_gradient(A, b, u0, step)\n", "\n", "plot(log10.(trace_1 .- optsol), lw=2., label=\"Prox. gradient\")\n", "plot(log10.(abs.(trace_2 .- optsol)), lw=2., label=\"Accelerated gradient\")" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.2.0", "language": "julia", "name": "julia-1.2" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 2 }