國立虎尾科技大學 |

Second-Order Methods for Stochastic and Nonsmooth Optimization.

Record Type:	Language materials, manuscript : Monograph/item
Title/Author:	Second-Order Methods for Stochastic and Nonsmooth Optimization./
Author:	Keskar, Nitish Shirish.
Description:	1 online resource (120 pages)
Notes:	Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
Contained By:	Dissertation Abstracts International78-10B(E).
Subject:	Operations research. -
Online resource:	click for full text (PQDT)
ISBN:	9781369817867

Second-Order Methods for Stochastic and Nonsmooth Optimization.
Keskar, Nitish Shirish.

Second-Order Methods for Stochastic and Nonsmooth Optimization. - 1 online resource (120 pages)

Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.

Thesis (Ph.D.)

Includes bibliographical references

The goal of this thesis is to design practical algorithms for nonlinear optimization in the case when the objective function is stochastic or nonsmooth. The thesis is divided into three chapters. Chapter 1 describes an active-set method for the minimization of an objective function that is structurally nonsmooth, viz., it is the sum of a smooth convex function and an $\ell_1$-regularization term. Problems of this nature primarily arise, e.g., in machine learning, when sparse solutions are desired. A distinctive feature of the method is the way in which active-set identification and second-order subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare an implementation of the new method against state-of-the-art numerical codes to demonstrate its competitiveness.

Electronic reproduction.
Ann Arbor, Mich. :
ProQuest,
2018

Mode of access: World Wide Web

ISBN: 9781369817867Subjects--Topical Terms:

573517
Operations research.
Index Terms--Genre/Form:

554714
Electronic books.

Second-Order Methods for Stochastic and Nonsmooth Optimization.
LDR:04638ntm a2200373Ki 4500 001 910731
005 20180517112609.5
006 m o u
007 cr mn||||a|a||
008 190606s2017 xx obm 000 0 eng d
020 $a 9781369817867
035 $a (MiAaPQ)AAI10265605
035 $a (MiAaPQ)northwestern:13629
035 $a AAI10265605
040 $a MiAaPQ $b eng $c MiAaPQ
099 $a TUL $f hyy $c available through World Wide Web
100 1 $a Keskar, Nitish Shirish. $3 1182172
245 1 0 $a Second-Order Methods for Stochastic and Nonsmooth Optimization.
264 0 $c 2017
300 $a 1 online resource (120 pages)
336 $a text $b txt $2 rdacontent
337 $a computer $b c $2 rdamedia
338 $a online resource $b cr $2 rdacarrier
500 $a Source: Dissertation Abstracts International, Volume: 78-10(E), Section: B.
500 $a Advisers: Andreas Waechter; Jorge Nocedal.
502 $a Thesis (Ph.D.) $c Northwestern University $d 2017.
504 $a Includes bibliographical references
520 $a The goal of this thesis is to design practical algorithms for nonlinear optimization in the case when the objective function is stochastic or nonsmooth. The thesis is divided into three chapters. Chapter 1 describes an active-set method for the minimization of an objective function that is structurally nonsmooth, viz., it is the sum of a smooth convex function and an $\ell_1$-regularization term. Problems of this nature primarily arise, e.g., in machine learning, when sparse solutions are desired. A distinctive feature of the method is the way in which active-set identification and second-order subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach, and compare an implementation of the new method against state-of-the-art numerical codes to demonstrate its competitiveness.
520 $a Chapter 2 outlines an algorithm for minimizing a continuous function that may be nonsmooth and nonconvex, subject to bound constraints. We propose an algorithm that uses the L-BFGS quasi-Newton approximation of the problem's curvature together with a variant of a weak Wolfe line search. The key ingredient of the method is an active-set selection strategy that defines the subspace in which search directions are computed. To overcome the inherent shortsightedness of the gradient for a nonsmooth function, we propose two strategies. The first relies on an approximation of the $\epsilon$-minimum norm subgradient, and the second uses an iterative corrective loop that augments the active set based on the resulting search directions. We describe a Python implementation of the proposed algorithm and present numerical results on a set of standard test problems to illustrate the efficacy of our approach.
520 $a Chapter 3 investigates the gap in statistical generalization performance between large- and small-batch methods in the task of training state-of-the-art deep neural network models. The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say 32--512 data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the model, as measured by its ability to generalize. We investigate the cause for this generalization drop in the large-batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minimizers of the training and testing functions --- and as is well known, sharp minima lead to poorer generalization. In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation. We also discuss several strategies to attempt to help large-batch methods eliminate this generalization gap.
533 $a Electronic reproduction. $b Ann Arbor, Mich. : $c ProQuest, $d 2018
538 $a Mode of access: World Wide Web
650 4 $a Operations research. $3 573517
650 4 $a Applied mathematics. $3 1069907
655 7 $a Electronic books. $2 local $3 554714
690 $a 0796
690 $a 0364
710 2 $a ProQuest Information and Learning Co. $3 1178819
710 2 $a Northwestern University. $b Industrial Engineering and Management Sciences. $3 1180832
773 0 $t Dissertation Abstracts International $g 78-10B(E).
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10265605 $z click for full text (PQDT)