{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Proximal gradient algorithm\n",
    "This first tutorial details how to use a proximal gradient algorithm\n",
    "to solve the LASSO problem:\n",
    "$$\n",
    "\\min_u \\;f(u) =  \\| A u - b \\|_2^2 + \\lambda \\|u \\|_1\n",
    "$$\n",
    "with $A$ given matrix, and $b$ vector of observation.\n",
    "As the function $f$ is non-smooth and additive, the resolution of the LASSO\n",
    "problem is a good use case for the proximal gradient algorithm.\n",
    "\n",
    "## Settings\n",
    "We import the usual packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using Printf, Random\n",
    "using LinearAlgebra\n",
    "using JuMP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fix seed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Random.seed!(2713);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first generate artificial data to study the algorithm:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "include(\"data.jl\")\n",
    "\n",
    "nVariables = 50;\n",
    "nCassures = 50;\n",
    "xMin, xMax = build_bounds(nVariables)\n",
    "A = build_X(nVariables, nCassures, xMin, xMax);\n",
    "b = randn(nCassures)\n",
    "λ = 50.0;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Build oracle for objective:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f(u) = 0.5 * norm(A*u - b, 2)^2 + λ * norm(u, 1);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and we add a function to evaluate the prox operator of the $\\ell_1$-norm:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function prox_g(v, u, ε, λ)\n",
    "    step = ε * λ\n",
    "    for i in 1:length(u)\n",
    "        # TODO: code prox of component i here\n",
    "    end\n",
    "    return v\n",
    "end;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get optimal solution with a QP solver"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using OSQP\n",
    "# using CPLEX\n",
    "\n",
    "function get_solution(A, b, λ)\n",
    "    n, m = size(A)\n",
    "    model = Model(OSQP.Optimizer)\n",
    "    JuMP.set_silent(model)\n",
    "    # Variable u\n",
    "    @variable(model, u[1:n])\n",
    "    # Variable t, linearizing the \\|.\\|_1 term\n",
    "    @variable(model, t[1:n])\n",
    "    @constraint(model, t .>= u)\n",
    "    @constraint(model, t .>= -u)\n",
    "    @objective(model, Min, 0.5 * dot(A*u - b, A*u - b) + λ * sum(t))\n",
    "    JuMP.optimize!(model)\n",
    "    return JuMP.objective_value(model)\n",
    "end\n",
    "optsol = get_solution(A, b, λ)\n",
    "println(\"Optimal solution is equal to \", optsol)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Proximal gradient algorithm\n",
    "At iteration $k$ and iterate $u_k$, the proximal gradient algorithm computes\n",
    "$$\n",
    "\\begin{aligned}\n",
    "& v_k = u_k - \\varepsilon_k \\nabla F(u_k) \\\\\n",
    "& u_{k+1} = \\text{prox}_{\\varepsilon_k g}(v_k)\n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function proximal_gradient(A, b, u0, ε; maxit=1000)\n",
    "    # iterate\n",
    "    u = copy(u0)\n",
    "    # intermediate vector\n",
    "    v = similar(u0)\n",
    "    # instantiate array to store trace\n",
    "    trace_f = Float64[]\n",
    "    push!(trace_f, f(u))\n",
    "\n",
    "    for nit in 1:maxit\n",
    "        # TODO: code proximal gradient algorithm here\n",
    "        # evaluate objective at current point\n",
    "        f_k = #TODO\n",
    "        push!(trace_f, f_k)\n",
    "    end\n",
    "    return trace_f\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We know that the optimal step is given by the inverse of the Lipschitz\n",
    "coefficient $L$ of the gradient of $F$, the smooth part of the objective.\n",
    "As $F(u) = 0.5 \\|A u - b \\|_2^2$, we get"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "L = eigmax(A' * A)\n",
    "step = 1 / L"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We choose as initial point the vector $u_0 = 0$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "u0 = zeros(nVariables)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accelerated proximal gradient\n",
    "The accelerated proximal gradient algorithm is a variant of the proximal\n",
    "gradient algorithm, which is known for its better performance.\n",
    "\n",
    "At iteration $k$ and iterate $u_k$, the accelerated proximal gradient\n",
    "algorithm computes\n",
    "$$\n",
    "\\begin{aligned}\n",
    "& w_k = u_k - \\varepsilon_k \\nabla F(u_k) \\\\\n",
    "& v_{k+1} = \\text{prox}_{\\varepsilon_k g}(w_k) \\\\\n",
    "& u_{k+1} = v_{k+1} + \\dfrac{k - 1}{k+2} (v_{k+1} - v_k)\n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "function accelerated_proximal_gradient(A, b, u0, ε; maxit=1000)\n",
    "    # intermediate vector\n",
    "    u = copy(u0)\n",
    "    v_next = copy(u0)\n",
    "    v_prev = copy(u0)\n",
    "    w = similar(u0)\n",
    "    trace_f = Float64[]\n",
    "    push!(trace_f, f(u))\n",
    "\n",
    "    for nit in 1:maxit\n",
    "        # TODO: code accelerated proximal gradient here\n",
    "        f_k = # TODO\n",
    "        push!(trace_f, f_k)\n",
    "        copy!(v_prev, v_next)\n",
    "    end\n",
    "    return trace_f\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "using PyPlot\n",
    "trace_1 = proximal_gradient(A, b, u0, step)\n",
    "trace_2 = accelerated_proximal_gradient(A, b, u0, step)\n",
    "\n",
    "plot(log10.(trace_1 .- optsol), lw=2., label=\"Prox. gradient\")\n",
    "plot(log10.(abs.(trace_2 .- optsol)), lw=2., label=\"Accelerated gradient\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.2.0",
   "language": "julia",
   "name": "julia-1.2"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.2.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}