WeChatshare
2019 CUHK-Shenzhen Workshop on Optimization Theory and Applications

You can share it to WeChat via the Qr code.

Eventsshare
2019 CUHK-Shenzhen Workshop on Optimization Theory and Applications

Enter the applet sharing event using WeChat scan.

Tutorial Session


  • Title: Introduction to Mathematics of Deep Learning

    Speaker: Ruoyu Sun, University of Illinois Urbana-Champaign

    Abstract: Deep learning has achieved great success in many artificial intelligence applications recently. It is often recognized as an empirical field, since there are many unknown questions about the behavior of neural networks. Nevertheless, recent theoretical research has greatly enhanced our understanding of how neural networks work. In this tutorial, we give a brief introduction to the recent advances in mathematical theory of deep learning. We will discuss some representative results in approximation theory and optimization theory, and also point out some outstanding open questions. 


  • Title: Building (Stochastic, Distributed, Learned) First-Order Methods from Monotone Operators

    Speaker: Wotao Yin, University of California Los Angeles

    Abstract: We will provide a unified perspective of how a variety of first-order methods are developed to solve the problems at hand and why they converge. The tutorial will have three parts:

    1.We overview the foundation of monotone operators. Instead of using abstract algebra, we use elementary Euclidean geometry to illustrate many useful results regarding convergence of operator iterations and their parameter selections. We use a geometrical tool that not only captures the main idea but is also a rigorous proof tool.


    2. We present a couple of basic yet powerful techniques in operator splitting. We illustrate how to use them to quickly develop or recover a variety of parallel, distributed, and decentralized algorithms.

     

    3. We present approaches to integrate classic operator-splitting methods with deep learning for better performance. They include iteration unfolding with trained parameters, safeguarding techniques, as well as the plug-and-play method (which uses learned operators in a classic fixed-point iteration).

    This tutorial incorporates material from multiple contributors (see the tutorial slides.)


Presentation Titles and Abstracts (in program oder)
  • Title: On Alternating Direction Methods of Multipliers: A Historical Perspective

    Speaker: Roland Glowinski, University of Houston

    Abstract: The Alternating Direction Method of Multipliers (ADMM) was introduced in 1974 and has been used (and still is) under the name of ALG2 for the numerical solution of various problems from Mechanics, Physics and Differential Geometry, among others. During the last decade and half, ADMM has known a surge of popularity coming from its applicability to the solution of problems of Image Processing, Statistical Learning, Data Mining, etc. The main goals of this lecture are: (1) Provide historical facts concerning the origins of ADMM. (2) Give a general presentation of ADMM and related algorithms in the framework of Hilbert spaces. (3) Show the relationships between ADMM and some classical operator-splitting methods such as Douglas-Rachford and Peaceman-Rachford. (4) Present the results of numerical experiments concerning the application of ADMM to the solution of the Weber problem and of a non-convex problem from nonlinear Elasto-Dynamics.


  • Title: On the convergence of multi-block ADMM

    Speaker: Caihua Chen, Nanjing University

    Abstract: Recently, the multi-block alternating direction method of multipliers (ADMM) has found many efficient applications in various areas. This talk surveys the recent development on the convergence analysis of multi-block ADMM for convex optimization problems. Specifically, we will consider the direct extension of ADMM, deterministic variants of ADMM and random variants of ADMM. Some interesting applications of multi-block ADMM to machine learning and management science are also discussed. 


  • Title: Introduction to Meta-Learning and Generative Models with Applications

    Speaker: Hongyuan Zha, The Chinese University of Hong Kong, Shenzhen

    Abstract: This tutorial talk consists of two parts: 1) we will discuss the ideas and notions of meta-learning and how they can be related to alternative ways for solving problems in scientific computing and optimization. 2) we will give a brief presentation of learning generative models both likelihood-based and likelihood-free, and especially the role played by Stein's method and related ideas in learning those models. This is a non-technical talk which is meant to introduce the audience to some high-level ideas in the above two areas.


  • Title: Globalized Semismooth Newton Methods for Nonsmooth Nonconvex Optimization and Generalized Variational Inequality Problems

    Speaker: Andre Milzarek, The Chinese University of Hong Kong, Shenzhen

    Abstract:  In this talk, we present globalized semismooth Newton methods for solving nonsmooth nonconvex composite-type optimization problems and generalized variational inequality problems (GVIP). The considered class of problems comprises a large variety of applications such as l1-regularized or group sparse problems, minimization problems arising in machine learning or image processing, mixed variational inequalities, and nonlinear saddle point problems. Our approach is based on a reformulation of the associated stationarity conditions and of the variational inequality as a proximal-type fixed point equation. In many important situations, the proximity operator can be shown to be semismooth and the semismooth Newton method can be utilized to solve the resulting nonsmooth equation. The algorithmic framework we investigate combines the semismooth Newton method with a globally convergent descent method that is based on proximal gradient steps and on a generalized D-gap function for the GVIP. We present both global and local convergence results and provide numerical experiments illustrating the efficiency of the proposed method.


  • Title: On the Complexity of Sequentially Lifting Cover Inequalities for the Knapsack Polytope

    Speaker: Yu-Hong Dai,Chinese Academy of Sciences 

    Abstract: The well-known lifted cover inequality is widely employed in solving mixed integer programs. However, it is still an open question whether an arbitrary project lifted cover inequality can be computed in polynomial time for a given minimal cover (Gu, Nemhauser, and Savelsbergh, INFORMS J. Comput., 26: 117123, 1999). We show that this problem is NP-hard, thus giving a negative answer to the question. This is a joint work with Wei-Kun Chen.


  • Title: Economics of Mobile Crowd Sensing

    Speaker: Jianwei HUANG, The Chinese University of Hong Kong, Shenzhen

    Abstract: Mobile crowd sensing achieves a flexible and scalable sensing coverage with a low deploying cost, by encouraging mobile users to participate and contribute their smartphones as sensors. In this talk, we will introduce the design of an effective reward system to induce high-quality contributions by users in mobile crowd sensing, considering the diversity of data contributions and social relationship among users. We will also outline some of the core research issues in this area.


  • Title: A Primal-dual Dynamical Approach to Structured Convex Minimization Problems

    Speaker: Radu Ioan Bot¸ University of Vienna

    Abstract: In this talk we propose a primal-dual dynamical approach to the minimiza-tion of a structured convex function consisting of a smooth term, a nonsmooth term, and the composition of another nonsmooth term with a linear continuous operator. To this end we introduce a dynamical system for which we prove that its trajectories asymptotically converge to a saddle point of the Lagrangian of the underlying convex minimization problem as time tends to infinity. In addi-tion, we provide rates for both the violation of the feasibility condition by the ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value. Explicit time discretization of the dy-namical system results in a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.


  • Title: An Optimal High-Order Tensor Method for Convex Optimization

    Speaker: Bo Jiang, Shanghai University of Finance and Economics

    Abstract: Optimization algorithms using up to the d-th order derivative information of the iterative solutions are called high-order tensor methods. In this talk, we propose a new high-order tensor algorithm for the general composite case, with the iteration complexity of O(1/k^(3d+1)/2), which matches the lower bound for the d-th order tensor methods, hence is optimal. Our approach is based on the so-called Accelerated Hybrid Proximal Extragradient (A-HPE) framework proposed by Monteiro and Svaiter. Moreover, on the side of implementation we propose a simple process using bisection on the steps.



  • Title: Some recent advances on completely positive optimization 

    Speaker: Jinyan Fan, Shanghai Jiao Tong University

    Abstract: Completely positive (CP) matrices have wide applications in combinatorics and statistics. Many NP-hard problems can be formulated as linear optimization problems over the cone of CP matrices. In this talk, we overview the semidefinite relaxation methods for the CP decomposition problem, the CP completion problem and the CP approximation problem. Some other related topics will also be discussed.


  • Title: Recent Advances in Non-Convex Distributed Optimization and Learning

    Speaker: Mingyi Hong, University of Minnesota

    Abstract: We consider a class of distributed non-convex optimization problems, in which a number of agents are connected by a communication network, and they collectively optimize a sum of (possibly non-convex and non-smooth) local objective functions. This type of problem has gained some recent popularities, especially in the application of distributed training of deep neural networks.

    In the first part of this talk, we address the following general question: What is the fastest rate that any distributed algorithms can achieve, and how to achieve those rates. In particular, we consider a class of unconstrained non-convex problems, and allow the agents to access local (first-order) gradient information. We develop a lower bound analysis that identifies difficult problem instances for any first-order method. Further, we develop a rate-optimal distributed method whose rate matches the lower bound (up to a ploylog factor).  The algorithm combines ideas from distributed consensus, nonlinear optimization, as well as classical fast solvers for linear systems.

    In the second part of this talk, we discuss how to improve communication efficiency of distributed schemes. We consider a popular algorithm called signSGD, which only exchanges one bit information at each iteration of the algorithm. We show that, this scheme suffers from non-convergence when the local nodes have heterogeneous data distribution. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise of appropriate magnitude. The proposed methods preserve the low per-iteration communication complexity property of signSGD, and further enjoy global convergence to stationary solutions.

    Finally, we discuss extensions of the above works to general non-smooth problems, and to computing high-order stationary solutions. Applications in distributed training of the neural networks, as well as distributed control of wind farm will also be discussed.


  • Title: Newton-Type Stochastic Optimization Algorithms For Machine Learning

    Speaker: Zaiwen Wen, Peking University

     Abstract: In this talk, we introduce 1) stochastic semismooth quasi-Newton methods for large-scale problems involving smooth nonconvex and nonsmooth convex terms in the objective function for deep learning; 2) a stochastic trust region method for reinforcement learning.


  • Title: A Class of Smooth Exact Penalty Function Methods for Optimization Problems with Orthogonality Constraints

    Speaker: Xin Liu, Chinese Academy of Sciences

    Abstract: Updating the augmented Lagrangian multiplier by closed-form expression yields efficient first-order infeasible approach for optimization problems with orthogonality constraints. Hence, parallelization becomes tractable in solving this type of problems. Inspired by this closed-form updating scheme, we propose an exact penalty function model with compact convex constraints (PenC). We show its equivalence to optimization problems with orthogonality constraints under mild condition. Based on PenC, we first propose a first-order algorithm called PenCF and establish its global convergence and local linear convergence rate under some mild assumptions. If the computation and storage of Hessian is achievable, and we pursue high precision solution and fast local convergence rate, a second-order approach called PenCS is proposed under the same penalty function. To avoid expensive calculation or solving a hard subproblem in computing the Newton step, we propose a new strategy to do it approximately which leads to quadratic convergence theoretically. Moreover, the main iterations of both PenCF and PenCS are orthonormalization-free and hence parallelizable. Numerical experiments illustrate that PenCF is comparable with existing first-order methods including the existent infeasible approaches. Furthermore, PenCS shows its stability and high efficiency in obtain high precision solution in comparing with the existent second-order methods.


  • Title: first-order methods for convex and nonconvex functional constrained problems

    Speaker: Yangyang Xu, Rensselaer Polytechnic Institute

    Abstract: first-order methods have been extensively studied and used for problems without constraints or with simple constraints. Recently, the study has been extended to problems with complicated nonlinear functional constraints. In this talk, I will review a few such works and also present our recent works on both convex and nonconvex problems with nonlinear functional constraints. For convex problems, our first-order method is based on the inexact augmented Lagrangian method. We show that for strongly convex problems, our method can achieve $O(1/k^2)$ convergence rate in terms of KKT system violation, where $k$ is the number of function and gradient evaluation, and for convex problems, the rate is $O(1/k)$. For nonconvex problems, our method is designed under the framework of proximal point method, and each subproblem is strongly convex and thus can be inexactly solved by our method in the first part. A convergence rate result will also be shown in terms of KKT system violation. 

  • Title: Optimal Nonergodic Sublinear Convergence Rate of Proximal Point Algorithm for Maximal Monotone Inclusion Problems

    Speaker: Junfeng Yang, Nanjing University

    Abstract: We establish the optimal nonergodic sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problems.  First, the optimal bound is formulated by the performance estimation framework, resulting in an infinite dimensional nonconvex optimization problem, which is then equivalently reformulated as a finite dimensional semidefinite programming (SDP) problem. By constructing a feasible solution to the dual SDP, we obtain an upper bound on the optimal nonergodic sublinear rate. Finally, an example in two dimensional space is constructed to provide a lower bound on the optimal nonergodic sublinear rate. Since the lower bound provided by the example matches exactly the upper bound obtained by the dual SDP, we have thus established the worst case nonergodic sublinear convergence rate which is optimal in terms of both the order as well as the constants involved. Our result sharpens the understanding of the fundamental proximal point algorithm.

    "This is a joint work with Prof. Guoyong Gu from Nanjing University."



  • Title: Refinements of Nash Equilibrium and Their Computation

    Speaker: Chuangyin Dang, City University of Hong Kong 

    Abstract: As a powerful mechanism for conflict modeling and analysis, game theory has been successfully applied in a variety of fields. The concept of Nash equilibrium is one of the most important and elegant ideas in game theory. Nevertheless, a game can have many Nash equilibria and some of these equilibria may be inconsistent with our intuitive notions about what should be the outcome of a game. To reduce this ambiguity and eliminate some of these counterintuitive equilibria, the concepts of perfect equilibrium, proper equilibrium, sequential equilibrium, perfect d-proper equilibrium, and extended proper equilibrium were introduced in game theory. The introductions of these refinements have significantly advanced the development of game theory and its applications. The computation of these refinements plays an important role in the applications of game theory. This talk will briefly introduce these refinements of Nash equilibrium and present our recent developments of smooth path-following approaches to computing these refinements. The basic idea of the approaches is to formulate an artificial game by incorporating a quadratic term into each player’s payoff function with a smooth nonlinear function of an extra variable. The emphasis is on fully exploiting differentiability of the problem. The equilibrium systems of artificial games establishes the existence of smooth paths that start from a totally mixed strategy profile and lead to these refinements, respectively. A predictor-corrector method is adapted for numerically following these paths. Numerical results show that the approaches are effective and efficient and significantly outperform simplicial methods.


  • Title:Higher-order Moment Portfolio Optimization via Difference-of-Convex Programming and Sums-of-Squares

    Speaker: Yishuai Niu, Shanghai Jiao Tong University

     Abstract: We are interested in developing a DC (Difference-of-Convex) programming approach for solving higher-order moment (Mean-Variance-Skewness-Kurtosis) portfolio selection problem. The portfolio selection with higher moments can be formulated as a nonconvex quartic multivariate polynomial optimization. Based on the recent development in the Difference-of-Convex-Sums-of-Squares (DCSOS) decomposition techniques, we can reformulate this problem as a DC program which can be solved by a well-known DC algorithm - DCA. We will also propose an improved DC algorithm called Boosted-DCA (BDCA) based on an Armijo-type line search to accelerate the convergence of DCA. This acceleration technique is introduced to both the DC algorithm based on DCSOS decomposition proposed in this paper, and to the DC algorithm based on universal DC decomposition proposed in our previous work. Numerical simulations show good performance of our algorithms.