Title: Distributed Online Learning in the Wild
In this talk, we will describe an online learning system which runs in the cloud, and learns in a continuous, distributed fashion. This system, called the Decision Service, has been operational for over a year, and addresses a class of problems called contextual bandits. The talk will describe the challenges in designing such a system, as well as practical and simple solutions to overcome these challenges. No prior familiarity with contextual bandits will be assumed.
Alekh Agarwal is a researcher in the New York lab of Microsoft Research, where his research focuses on topics in interactive machine learning such as contextual bandits, reinforcement learning and active learning. He has received several awards for his research, including the best paper award at NIPS 2015.
Title: Federated Learning: Privacy-Preserving Collaborative Machine Learning without Centralized Training Data
Federated Learning enables mobile devices to collaboratively learn a shared inference model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. In this talk, I will discuss (1) the challenges that distinguish Federated Learning from other distributed learning scenarios, (2) the algorithms and systems we've created to address those challenges, (3) extensions to those algorithms to minimize communication costs, and (4) the novel cryptographic and randomized privacy-preserving technologies we have developed for Federated Learning.
Keith Bonawitz is a software engineer and researcher at Google, focusing on Federated Learning and other privacy-preserving technologies in support of on-device machine intelligence. Previously, he led Project Loon's planning, simulation, and control team in Google[x] and co-founded a successful startup around probabilistic programming and natively probabilistic computing.
Title: When Cyclic Coordinate Descent Beats Randomized Coordinate Descent
The coordinate descent (CD) methods have seen a resurgence of recent interest because of their applicability in machine learning as well as large scale data analysis and superior empirical performance. CD methods have two variants, cyclic coordinate descent (CCD) and randomized coordinate descent (RCD) which are deterministic and randomized versions of the CD methods. In light of the recent results in the literature, there is a common perception that RCD always dominates CCD in terms of performance. In this talk, we question this perception and provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of deterministic CD relative to RCD. The amount of improvement depend on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.
Mert Gurbuzbalaban is an assistant professor at the Rutgers Business School. He is broadly interested in optimization and computational science driven by applications in large-scale information and decision systems and networks. Previously, he received his BSc degrees in Electrical Engineering and Mathematics as a valedictorian from Bogazici University, Istanbul, Turkey, the Diplome d Ingenieur degree from Ecole Polytechnique, France, and the MS and PhD degrees in Applied Mathematics from the Courant Institute of Mathematical Sciences, New York University. He was also a postdoctoral associate at the Laboratory for Information and Decision Systems (LIDS) at MIT.
Title: A Proximal Primal-Dual Algorithm for Decomposing Non-convex Nonsmooth Problems
In this talk, we discuss a new method for decomposing non-convex nonsmooth optimization problems with linearly coupling constraints. The proposed method consists of one step of approximate primal gradient-type iterations followed by an approximate dual ascent step. Due to the special way that the primal and dual steps are designed, the proposed method can effectively decompose a number of challenging non-convex problems into simple subproblems(possibly with closed-form solutions). We analyze various properties of the proposed method, including convergence and convergence rate. Further, we discuss application of the proposed method in distributed eigenvalue decomposition problem, as well as solving certain generalized sparse principal subspace estimation problem.
Mingyi Hong received his Ph.D. degree from University of Virginia in 2011. He is an Assistant Professor with the Department of Electrical and Computer Engineering, University of Minnesota. Since Jan. 2017, he has been serving on the IEEE Signal Processing Society Signal Processing for Communications and Networking (SPCOM) Technical Committee. His research interests are primarily in the fields of optimization theory and applications in signal processing and machine learning.
Title: Distributed Optimization over Directed Graphs
In my lab, the problems of optimization, control, and signal processing have been a predominant interest particularly when applicable to what can be summed up as Internet of Mobile Devices (IoMDs). IoMDs, in addition to including stationary devices like sensors, further include mobile sensing agents, e.g., aerial and/or ground robots. The mobile nature of the agents, however, leads to dynamic and non-deterministic information exchange among the neighboring agents. The diversity within such agents, typically carrying a limited battery, further leads to the possibility of one-directional communication mathematically abstracted by a directed graph. Optimization, control, and signal processing algorithms thus have to be implemented within the constraints set up with this dynamic, non-deterministic, and possibly directed information exchange.
A particular subset of the aforementioned problems is what we refer to as directed optimization, i.e., distributed optimization over directed graphs. Borrowing ideas from optimization algorithms that are applicable to undirected graphs and consensus strategies over directed graphs, we have developed fast algorithms for directed optimization. In this talk, I will briefly describe the algorithms that we have developed and how we have managed to remove some of the underlying assumptions on the step-sizes and the graph parameters. During this process, I will discuss the intuition behind the related algorithms and will further cast the limitations of the existing work in order to highlight some open problems.
Usman A. Khan received his B.S. degree (with honors) in Electrical Engineering from University of Engineering and Technology, Lahore-Pakistan, in 2002, M.S. degree in Electrical and Computer Engineering (ECE) from University of Wisconsin-Madison (UW-Madison) in 2004, and Ph.D. degree in ECE from Carnegie Mellon University in 2009. From Sep. 2009 to Dec. 2010, he was a post-doctoral researcher in the Electrical and Systems Engineering department at the University of Pennsylvania. Currently, he is with the ECE Department at Tufts University where he joined as an Assistant Professor in Jan. 2011. He spent Spring 2015 as a Visiting Professor at KTH, Sweden, while on sabbatical from Tufts. His research interests lie in efficient operation and planning of complex infrastructures and include statistical signal processing, networked control and estimation, and distributed optimization algorithms.
Dr. Khan received the NSF Career award in Jan. 2014 and is an IEEE Senior Member since Feb. 2014. He is on the editorial board of IEEE Transactions on Smart Grid and an associate member of Sensor Array and Multichannel Technical Committee with the IEEE Signal Processing Society. He has served on the Technical Program Committees of several IEEE conferences and has organized and chaired several IEEE workshops and sessions. His graduate students have won multiple Best Student Paper awards and his work was presented as Keynote speech at BiOS SPIE Photonics West--Nanoscale Imaging, Sensing, and Actuation for Biomedical Applications IX.
Title: Distributed Resource Allocation with Limited Communication
A major issue in future cyber-physical networks is how individual components can autonomously change their resource allocation to achieve near maximum efficiency for the network. One example is how intelligent devices and producers can change their power consumption/production in power grids. Limited communications between individual devices necessitates an approach where the elements of the network can act in an autonomous manner with limited information/communications yet achieve near optimal performance.
In this talk, I will present our recent work on distributed resource allocation with limited communication. In particular, we first studies a general class of quantized gradient methods where the gradient direction is approximated by a finite quantization set. Sufficient and necessary conditions are provided on such a quantization set to guarantee that the methods optimally solve the resource allocation problem. Convergence rate results are established that connect the fineness of the quantization and the number of iterations needed to reach a predefined solution accuracy. Inspired by the communication complexity in computer science, we further introduce two types of communication complexities for the problems and investigate the minimal number of communicated bits needed to solve some classes of resource allocation problems regardless of the used algorithms.
Joint work with Sindri Magnusson, Chinwendu Enyioha, Carlo Fischione, and Vahid Tarokh.
Na Li is an assistant professor in Electrical Engineering and Applied Mathematics of the School of Engineering and Applied Sciences in Harvard University since 2014. She received her Bachelor degree in Mathematics in Zhejiang University in 2007 and PhD degree in Control and Dynamical systems from California Institute of Technology in 2013. She was a postdoctoral associate of the Laboratory for Information and Decision Systems at Massachusetts Institute of Technology 2013-2014. Her research lies in distributed optimization and control of cyber-physical networked systems. She received NSF career award (2016) and AFSOR Young Investigator Award (2017). She entered the Best Student Paper Award finalist in the 2011 IEEE Conference on Decision and Control.
Title: How to Analyze Nonconvex Optimization Algorithms in High Dimensions? Exact Asymptotics via Exchangeability and Scaling Limits
This talk presents our recent work on analyzing, in the high-dimensional limit, the exact dynamics of iterative algorithms for solving nonconvex optimization problems that arise in signal estimation. For concreteness, we will focus on the prototypical problem of sparse principal component analysis in the talk. We show that the time-varying empirical measures of the estimates given by the algorithms will converge weakly to a deterministic limiting process in the high-dimensional limit. Moreover, this limiting process can be characterized as the unique solution of a nonlinear PDE, and it provides exact information regarding the asymptotic performance of the algorithms. For example, performance metrics such as the MSE, the cosine similarity and the misclassification rate in sparse support recovery can all be obtained by examining the deterministic limiting process. A steady-state analysis of the nonlinear PDE also reveals interesting phase transition phenomena related to the performance of the algorithm. Although our analysis is asymptotic in nature, numerical simulations show that the theoretical predictions are accurate for moderate signal dimensions. We will conclude the talk by briefly presenting the application of similar analysis techniques to other nonconvex optimization problems such as phase retrieval, subspace tracking, low-rank matrix estimation, and dictionary learning.
Joint work with Chuang Wang (Harvard University) and Jonathan Mattingly (Duke University).
Yue M. Lu was born in Shanghai. After finishing undergraduate studies at Shanghai Jiao Tong University, he attended the University of Illinois at Urbana-Champaign, where he received the M.Sc. degree in mathematics and the Ph.D. degree in electrical engineering, both in 2007. Following his work as a postdoctoral researcher at the Audiovisual Communications Laboratory at Ecole Polytechnique federale de Lausanne (EPFL), Switzerland, he joined Harvard University in 2010, where he is currently an Associate Professor of Electrical Engineering at the John A. Paulson School of Engineering and Applied Sciences.
He received the Most Innovative Paper Award of IEEE International Conference on Image Processing (ICIP) in 2006, and the Best Student Paper Award of IEEE ICIP in 2007. Student papers supervised and coauthored by him won the Best Student Paper Award (with Ivan Dokmanic and Martin Vetterli)Â of IEEE International Conference on Acoustics, Speech and Signal Processing in 2011Â and the Best Student Paper Award (with Ameya Agaskar and Chuang Wang)Â of IEEE Global Conference on Signal and Information Processing (GlobalSIP) in 2014. He received the ECE Illinois Young Alumni Achievement Award in 2015.
Title: Fast Distributed Algorithms for Optimization in Time-Varying Graphs
Work collaborated with Alexander Olshevsky and Wilbur (Wei) Shi
We will discuss the problems of distributed optimization over time-varying graphs. For the case of undirected graphs, we introduce a distributed algorithm, referred to as DIGing, which is a combination of a distributed inexact gradient method and a gradient-tracking mechanism. The DIGing algorithm uses doubly stochastic mixing matrices and employs fixed step-sizes and, yet, drives all the agents' iterates to a common global minimizer. When the graphs are directed, in which case the implementation of doubly stochastic mixing matrices is unrealistic, we construct an algorithm that incorporates the push-sum protocol into the DIGing structure, thus obtaining Push-DIGing algorithm. The Push-DIGing uses column stochastic matrices and fixed step-sizes, but it still converges to a global minimizer (common to all agents). Under the strong convexity assumption for the objective function, we prove that both algorithms converge at R-linear (geometric) rates, as long as the step-sizes do not exceed some upper bounds. We establish explicit convergence rate estimates for the convergence rates. When the graph is undirected, we show that the convergence rate of DIGing scales polynomially in the number of agents. We also provide some numerical experiments to demonstrate the efficacy of the proposed algorithms and to validate our theoretical findings. We then discuss the variants of these algorithms for resource allocation problem over time-varying graphs.
Angelia Nedich holds a Ph.D. from Moscow State University, Moscow, Russia, in Computational Mathematics and Mathematical Physics (1994), and a Ph.D. from Massachusetts Institute of Technology, Cambridge, USA in Electrical and Computer Science Engineering (2002). She has worked as a senior engineer in BAE Systems North America, Advanced Information Technology Division at Burlington, MA. She is the recipient of an NSF CAREER Award 2007 in Operations Research for her work in distributed multi-agent optimization. She is a recipient (jointly with her co-authors) of the Best Paper Award at the Winter Simulation Conference 2013 and the Best Paper Award at the International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt) 2015 (with co-authors). Her current interest is in large-scale optimization, games, control and information processing in networks.
Title: Distributed Approaches to Mirror Descent for Stochastic Learning over Rate-limited Networks
We present and analyze two algorithms---termed distributed stochastic approximation mirror descent (D-SAMD) and accelerated distributed stochastic approximation mirror descent (AD-SAMD)---for distributed, stochastic optimization from high-rate data streams over rate-limited networks. Devices collaborate by approximate averaging of the local, noisy subgradients using distributed consensus, which perturbs subgradient averages and may slow down or prevent convergence. Hence, we present two theoretical contributions: ($i$) bounds on the convergence rates of D-SAMD and AD-SAMD, and ($ii$) sufficient conditions for order-optimum convergence of D-SAMD and AD-SAMD. We further identify communication rate regimes under which D-SAMD does not achieve order-optimum convergence, but AD-SAMD does. The key finding here is that accelerated mirror descent, whose improved convergence rate in the centralized setting offers universally optimum convergence but not an order-wise speedup in the noisy regime, indeed offers and order-wise speedup in the distributed (but still noisy) setting.
Matthew Nokleby received the B.S. and M.S. degrees from Brigham Young University, Provo, UT, in 2006 and 2008, respectively, and the Ph.D. degree from Rice University, Houston, TX, in 2012, all in electrical engineering. From 2013-2015 he was a post-doctoral research associate at the Information Initiative @ Duke, a multidisciplinary data science initiative. In 2015 he joined Wayne State University, where he is now an assistant professor of Electrical and Computer Engineering. He has authored or coauthored numerous journal and conference papers on the topics of distributed signal processing, wireless communications, network information theory, and machine learning. Prof. Nokleby received the Texas Instruments Distinguished Fellowship (2008-2012) and the Best Dissertation Award (2012) from the Department of Electrical and Computer Engineering at Rice University. He is a member of the IEEE Signal Processing, Information Theory, and Communications Societies.
Title: Convergence Rates in Decentralized Optimization
The widespread availability of copious amounts of data has created a pressing need to develop optimization algorithms which can work in parallel when input data is unavailable at a single place but rather spread throughout multiple locations. In this talk, we consider the problem of optimizing a sum of convex (not necessarily differentiable) functions in a network where each node knows only one of the functions; this is a common model which includes as particular cases a number of distributed regression and classification problems. Our main result is a distributed subgradient method which simultaneously achieves the optimal scalings both with time and the network size for this problem.
Alex Olshevsky is an assistant professor at the ECE at Boston University. He received a B.S. in applied mathematics and electrical engineering from Georgia Tech in 2004, and an M.S. and PhD from MIT in 2006 and 2010, both in electrical engineering and computer science. Previously, he was a postdoctoral scholar at Princeton University and an assistant professor at the University of Illinois, Urbana-Champaign. He is a recipient of the NSF CAREER Award, the Young Investigator award from the Air Force, two best paper awards from SIAM and INFORMS, and three awards for teaching/advising during his time at the University of Illinois. His research interests focus on control and optimization, especially for large-scale, multi-agent, and networked systems.
Title: High Order Methods in Empirical Risk Minimization
Empirical risk minimization (ERM) problems express optimal classifiers as solutions of optimization problems in which the objective is the sum of a very large number of sample costs. Established approaches to solve ERM rely on computing stochastic gradient directions by accessing a single summand at each iteration. Despite the efficiency of individual iterations, these methods can be slow to converge and have convergence rates that are linear at best. In this talk we discuss approaches to adapt Newton and quasi-Newton methods for ERM problems. In the incremental quasi-Newton method we exploit memory to store curvature approximation matrices. We show that these curvature approximations succeed in approximating the Hessian and thereby lead to superlinear convergence. In the Adaptive Newton method we consider subsets of training samples that are augmented geometrically by a factor of two. Each time the training set is augmented we perform a single Newton step. We show that it is possible to achieve statistical accuracy with just two passes over the dataset.
Alejandro Ribeiro received the B.Sc. degree in electrical engineering from the Universidad de la Republica Oriental del Uruguay, Montevideo, in 1998 and the M.Sc. and Ph.D. degree in electrical engineering from the Department of Electrical and Computer Engineering, the University of Minnesota, Minneapolis in 2005 and 2007. From 1998 to 2003, he was a member of the technical staff at Bellsouth Montevideo. After his M.Sc. and Ph.D studies, in 2008 he joined the University of Pennsylvania (Penn), Philadelphia, where he is currently the Rosenbluth Associate Professor at the Department of Electrical and Systems Engineering. His research interests are in the applications of statistical signal processing to the study of networks and networked phenomena. His focus is on structured representations of networked data structures, graph signal processing, network optimization, robot teams, and networked control. Dr. Ribeiro received the 2014 O. Hugo Schuck best paper award, and paper awards at the 2016 SSP Workshop, 2016 SAM Workshop, 2015 Asilomar SSC Conference, ACC 2013, ICASSP 2006, and ICASSP 2005. His teaching has been recognized with the 2017 Lindback award for distinguished teaching and the 2012 S. Reid Warren, Jr. Award presented by Penn's undergraduate student body for outstanding teaching. Dr. Ribeiro is a Fulbright scholar class of 2003 and a Penn Fellow class of 2015.
Title: Consensus and Distributed Inference Rates using Network Divergence
We study a problem of distributed detection in which a global parameter may not be identifiable by local agents due to their observation structure. However, by using local Bayesian updates and a consensus-based protocol on log posterior beliefs, they can rapidly achieve consensus on the true underlying hypothesis. The rate of convergence for this procedure is given by a new quantity which we call network divergence, which is the weighted sum of local KL-divergences.
Anand D. Sarwate is an Assistant Professor in the Department of Electrical and Computer Engineering at Rutgers, the State University of New Jersey after a postdoctoral position at UCSD (2008-2011) and a research faculty position at TTI-Chicago (2011-2013). He received B.S. degrees in Electrical Engineering and Mathematics from MIT in 2002, an M.S. in Electrical Engineering from UC Berkeley in 2005 and a PhD in Electrical Engineering from UC Berkeley in 2008. His interests are in information theory, machine learning, and signal processing, with applications to distributed systems, privacy and security, and biomedical research.
Title: Distributed Large-scale Optimization via Batch Gradient Tracking
We study distributed nonconvex large-scale optimization in multiagent networks. Our interest is on big-data problems wherein there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method whereby at each iteration agents update in an uncoordinated fashion only a (few) block(s) of the entire decision variables. To deal with the nonconvexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to estimate locally gradient averages; and ii) a novel block-wise consensus-based protocol to perform local block-averaging operations and gradient tacking. Asymptotic convergence to stationary solutions of the nonconvex problem is established.
Gesualdo Scutari received the Electrical Engineering and Ph.D. degrees (both with honors) from the University of Rome "La Sapienza," Rome, Italy, in 2001 and 2005, respectively. He is an Associate Professor with the School of Industrial Engineering, Purdue University, West Lafayette, IN, USA; and he is the scientific director for the area of Big-Data Analytic at the Cyber Center (Discovery Park) at Purdue University. He had previously held several research appointments, namely, at the University of California at Berkeley, Berkeley, CA, USA; Hong Kong University of Science and Technology, Hong Kong; University of Rome, "La Sapienza," Rome, Italy; and University of Illinois at Urbana-Champaign, Urbana, IL, USA. His research interests include theoretical and algorithmic issues related to big data optimization, equilibrium programming, and their applications to signal processing, machine learning, and networking. Dr. Scutari is an Associate Editor of the IEEE Transactions on Signal Processing and IEEE Transactions on Signal and Information Processing over Networks. He served on the IEEE Signal Processing Society Technical Committee on Signal Processing for Communications (SPCOM). He was the recipient of the 2006 Best Student Paper Award at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2006, the 2013 NSF Faculty Early Career Development (CAREER) Award, the 2013 UB Young Investigator Award, the 2015 AnnaMaria Molteni Award for Mathematics and Physics (from ISSNAF), and the 2015 IEEE Signal Processing Society Young Author Best Paper Award.
Title: SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
In this talk, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its practical variant SARAH+, as a novel approach to the finite-sum minimization problems. Different from the vanilla SGD and other modern stochastic methods such as SVRG, S2GD, SAG and SAGA, SARAH admits a simple recursive framework for updating stochastic gradient estimates; when comparing to SAG/SAGA, SARAH does not require a storage of past gradients. The linear convergence rate of SARAH is proven under strong convexity assumption. We also prove a linear convergence rate (in the strongly convex case) for an inner loop of SARAH, the property that SVRG does not possess. Numerical experiments demonstrate the efficiency of our algorithm.
Professor Takac received his B.S. (2008) and M.S. (2010) degrees in Mathematics from Comenius University, Slovakia, and Ph.D. (2014) degree in Mathematics from The University of Edinburgh, United Kingdom. He received several awards during this period, including the Best Ph.D. Dissertation Award by the OR Society (2014), Leslie Fox Prize (2nd Prize; 2013) by the Institute for Mathematics and its Applications, and INFORMS Computing Society Best Student Paper Award (runner up; 2012). Since 2014, he is a Tenure Track Assistant Professor in the Department of Industrial and Systems Engineering at Lehigh University, USA. His current research interests include the design, analysis and application of algorithms for machine learning, optimization, high-performance computing, operations research and energy systems.
Title: Privacy and Fault-Tolerance for Distributed Optimization
Consider a network of agents 1 to N, wherein each agent i has a local convex cost function Fi(x). The objective here is to identify argument x that minimizes the total cost over all the agents. Similar problems arise in many contexts. Distributed algorithms have been developed for this problem to determine the minimum without requiring any single agent to be aware of all the local cost functions. In this talk, we consider distributed optimization that preserves the privacy of local cost functions, and distributed optimization in presence of an adversary that compromises some of the participating agents.
Nitin Vaidya received the Ph.D. from the University of Massachusetts at Amherst. He is a Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. Nitin Vaidya has co-authored papers that received awards at several conferences, including 2015 SSS, 2007 ACM MobiHoc and 1998 ACM MobiCom. He is a fellow of the IEEE. He presently serves the Chair of the Steering Committee for the ACM PODC conference, and has previously served as Editor-in-Chief for the IEEE Transactions on Mobile Computing, and Editor-in-Chief for ACM SIGMOBILE publication MC2R. More information at http://disc.ece.illinois.edu/.
Title: Balancing Computation and Communication in Distributed Optimization
Distributed optimization methods consist of two key aspects: communication and computation. More specifically, at every iteration (or every several iterations) of a distributed algorithm, each node in the network requires some form of information exchange with its neighboring nodes (communication) and the computation of a (sub)-gradient (computation). The standard way of counting iteration numbers overlooks the complexity inside each iteration. Moreover, various applications deploying distributed methods may prefer different composition of computation and communication.
In this talk, we present a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications. We apply this framework to first-order methods, such as the well-known distributed gradient descent, and show that the customized algorithms compare favorably relative to their base algorithms. Through this framework, we also obtained an exact first order method, where we propose adaptively increasing the number of communication steps, to obtain exact convergence to the optimal solution.
Ermin Wei is currently an Assistant Professor at the EECS Dept of Northwestern University. She completed her PhD studies in Electrical Engineering and Computer Science at MIT in 2014, advised by Professor Asu Ozdaglar, where she also obtained her M.S.. She received her undergraduate triple degree in Computer Engineering, Finance and Mathematics with a minor in German, from University of Maryland, College Park. Wei has received many awards, including the Graduate Women of Excellence Award, second place prize in Ernst A. Guillemen Thesis Award and Alpha Lambda Delta National Academic Honor Society Betty Jo Budson Fellowship. Wei's research interests include distributed optimization methods, convex optimization and analysis, smart grid, communication systems and energy networks and market economic analysis.
Title: Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones
Many problems reduce to the fixed-point problem of solving x=T(x). This talk discusses the coordinate-friendly structure in the operator T that prevails in many optimization problems such as linear programs, second-order cone programs, and some semi-definite programs. We propose asynchronous algorithms for these problems, which update multiple blocks of the decision variables in separate threads in parallel. Each update can use outdated information of other blocks, so the threads are asynchronous. Convergence and certain rates are guaranteed. Numerical results will be presented. The talk also includes a theory of using the divergent information of a first-order algorithm to recognize, as well as fix, infeasible and unbounded instances.
Wotao Yin is a professor in the Department of Mathematics of UCLA. His research focuses on computational optimization and its applications in image processing, machine learning, and other data science problems. He received his B.S. in Mathematics from Nanjing University in 2001, and then M.S. and Ph.D. in Operations Research from Columbia University in 2003 and 2006, respectively. He won an NSF CAREER award in 2008, Alfred P. Sloan Research Fellowship in 2009, the Morningside Gold Medal in 2016.
Title: Distributed Optimization Algorithms for Networked Systems
Distributed optimization methods allow to decompose an optimization problem into smaller, more manageable subproblems that are solved in parallel. For this reason, they are widely used to solve large-scale problems arising in areas as diverse as wireless communications, optimal control, machine learning, artificial intelligence, computational biology, finance, and statistics, or problems with a separable structure that are amenable to distributed implementations. In this talk we present the Accelerated Distributed Augmented Lagrangians (ADAL) algorithm, a novel decomposition method for optimization problems that involve a separable convex objective function subject to convex local constraints and linear coupling constraints. Optimization using Augmented Lagrangians (ALs) combines the low computational complexity of first order optimization methods with fast convergence speeds due to the regularization terms included in the AL. In its centralized version, optimization using ALs is an excellent general purpose method for constrained optimization problems and enjoys a large amount of literature. However, decentralized methods that employ ALs are few, as decomposition of ALs is a particularly challenging task. We establish convergence of ADAL and show that it has a worst-case O(1/k) convergence rate. Moreover, we show that ADAL converges to a local minimum of the problem when the objective function is non-convex, and that it can handle uncertainty and noise in which case it generates sequences of primal and dual variables that converge to their respective optimal sets almost surely. We provide numerical simulations for wireless network optimization problems that suggest that the proposed method outperforms the state-of-the-art distributed Augmented Lagrangian methods that are known in the literature. Moreover, we present a Random Approximate Projections (RAP) method for decentralized optimization problems with SDP constraints. Unlike other methods in the literature that employ Euclidean projections onto the feasible set, our method is computationally inexpensive as it relies only on subgradient steps in the direction that minimizes the local constraint violation. We show that the algorithm converges almost surely and can also handle inexact problem data. We demonstrate our approach on a distributed estimation problem involving networks of mobile sensors estimating a set of hidden states that are governed by linear dynamics up to a user-specified accuracy.
Michael M. Zavlanos received the Diploma in mechanical engineering from the National Technical University of Athens (NTUA), Athens, Greece, in 2002, and the M.S.E. and Ph.D. degrees in electrical and systems engineering from the University of Pennsylvania, Philadelphia, PA, in 2005 and 2008, respectively. He is currently an Assistant Professor in the Department of Mechanical Engineering and Materials Science at Duke University, Durham, NC. He also holds a secondary appointment in the Department of Electrical and Computer Engineering and the Department of Computer Science. Prior to joining Duke University, Dr. Zavlanos was an Assistant Professor in the Department of Mechanical Engineering at Stevens Institute of Technology, Hoboken, NJ, and a Postdoctoral Researcher in the GRASP Lab, University of Pennsylvania, Philadelphia, PA. His research interests include a wide range of topics in the emerging discipline of networked systems, with applications in robotic, sensor, and communication networks. He is particularly interested in hybrid solution techniques, on the interface of control theory, distributed optimization, estimation, and networking. Dr. Zavlanos is a recipient of various awards including the 2014 Office of Naval Research Young Investigator Program (YIP) Award and the 2011 National Science Foundation Faculty Early Career Development (CAREER) Award.