August 5, 2014

Optimization

StrongBundle

To appear in Optimization Vol. , No. , December 2013, 1–23

RESEARCH ARTICLE A strongly convergent proximal bundle method for convex minimization in Hilbert spaces W. van Ackooija J.Y. Bello Cruzb and W. de Oliveirac a b

EDF R&D. OSIRIS 1, avenue du G´en´eral de Gaulle, 92141 Clamart Cedex, France Instituto de Matem´atica e Estat´ıstica, Universidade Federal de Goi´as, Goiˆania, Brazil c Instituto Nacional de Matem´atica Pura e Aplicada - IMPA, Rio de Janeiro, Brazil (August 5, 2014)

A key procedure in proximal bundle methods for convex minimization problems is the definition of stability centers, which are points generated by the iterative process that successfully decrease the objective function. In this paper we study a different stability-center classification rule for proximal bundle methods. We show that the proposed bundle variant has at least two particularly interesting features: (i) the sequence of stability centers generated by the method converges strongly to the solution that lies closest to the initial point; (ii) if the sequence of stability centers is finite, x ˆ being its last element, then the sequence of non-stability centers (null steps) converges strongly to x ˆ. Property (i) is useful in some practical applications in which a minimal norm solution is required. We show the interest of this property on several instances of a full sized unit-commitment problem. Keywords: Convex optimization, Nonsmooth optimization, Proximal bundle method, Strong convergence AMS Subject Classification: MSC 49M37, MSC 65K05, MSC 90C25

1.

Introduction

In this paper we deal with problems of the form f∗ := min f (x) , x∈X

(1)

where X is a nonempty convex and closed subset of a real Hilbert space H. We assume that f : H → R is a lower semicontinuous and convex function, possibly nondifferentiable, and the solution set S∗ of problem (1) is nonempty. Bundle methods are designed to solve nonsmooth convex optimization problems like (1) (mainly when H = Rn ) by only making use of first-order information on f . Such methods are well known by their robustness and by having an efficient stopping test. Moreover, differently from cutting-plane methods [1], most bundle methods, e.g., [2–7] have limited memory: the bundle of information built along the iterations can be kept bounded. In this way, we can save computer memory

c Corresponding

author. Email: [email protected]

1

August 5, 2014

Optimization

StrongBundle

without impairing convergence. This feature is particularly interesting for largescale optimization problems. At any given iteration k, a general bundle algorithm makes use of a convex model fkM (in general a cutting-plane approximation) of f satisfying fkM ≤ f , and a stability center x ˆ` ∈ X. In particular, the new iterate xk+1 of a proximal bundle method depends on x ˆ` and fkM in the following manner:

  1 kx − x ˆ` k2 : x ∈ X , xk+1 := arg min fkM (x) + 2tk

(2)

where tk > 0 is the so-called proximal parameter (see for instance [6]). Generally, in proximal bundle methods the stability center x ˆ` is some previous iterate, usually the “best” point generated by the iterative process so far. A classification rule decides when to update the stability center x ˆ` . Invariably, such a rule depends on the decrease of the function in the following manner, for given vk ≥ 0 and γ ∈ (0, 1): If f (xk+1 ) ≤ f (ˆ x` ) − γvk , then x ˆ`+1 ← xk+1 and ` ← ` + 1 .

(3)

In general, the predicted decrease vk is defined as vk := f (ˆ x` ) − fkM (xk+1 ), but other alternatives are also possible (see for instance [7]). As mentioned in [7], the inequality in (3) is a sort of Armijo rule, used in line-searches or trust-region algorithms for smooth optimization. When the inequality in (3) is not satisfied, then xk+1 is said to be a null iterate: the algorithm performs an unsuccessful step towards improving (significantly) the threshold f (ˆ x` ). However, null steps are useful to update the model fkM . A descent step is said to be performed when inequality (3) is satisfied: in this case, the iterate xk+1 is a point that improves the threshold f (ˆ x` ). As we are interested in minimizing f , it makes sense to take the best known value f (xk+1 ) as the new threshold after a descent step. Therefore, we may take xk+1 as the new stability center: x ˆ`+1 ← xk+1 . By construction, rule (3) ensures that the sequence of stability centers (or descent iterates) (ˆ x` )`∈N is a subsequence of (xk )k∈N , the sequence of iterates generated by solving subproblem (2). Instead of employing rule (3), the present paper proposes a more general classification rule aiming to provide strong results on the sequence of stability centers: If f (xk+1 ) ≤ f (ˆ x` ) − γvk then x ˆ`+1 ← PXk (x0 ) and ` ← ` + 1 .

(4)

In rule (4), PXk (x0 ) stands for the projection of a given initial point x0 ∈ X onto a well-chosen subset Xk ⊆ X (see (17) below). Notice that by taking Xk = {xk+1 } rule (4) coincides with (3) for any given x0 in H. However, other alternatives to Xk are also possible and may imply that (ˆ x` )`∈N is not a subsequence of (xk )k∈N . We are particularly interested in projection sets Xk that provide the following properties: (i) the sequence of stability centers (ˆ x` )`∈N generated by the method converges strongly to PS∗ (x0 ); (ii) if the sequence of stability centers is finite, x ˆ being its last element, then x ˆ is the solution of problem (1) closest to x0 , and the sequence of null steps (xk )k∈N converges strongly to x ˆ; The following is a consequence of the assumptions imposed on Xk to obtain (i)-(ii):

2

August 5, 2014

Optimization

StrongBundle

(iii) the entire sequence of stability centers is contained in a ball with diameter equal to the distance between the initial point and the solution set. The above three properties improve the results given in [6, Proposition 4.3 and Theorem 4.4] for proximal bundle methods employing (3), that we recall here: (i’) if the sequence of stability centers is infinite, then the sequence (f (ˆ x` ))`∈N converges to f∗ . Moreover, if H a is finite-dimensional space, then (ˆ x` )`∈N converges to a solution of problem (1); (ii’) if the sequence of stability centers is finite, x ˆ being its last element, then x ˆ is the solution of problem (1), and the sequence of null steps (xk )k∈N converges strongly to x ˆ. In (i’) nothing is said about the behavior of (ˆ x` )`∈N when H is an infinitedimensional space. Since the proposed rule (4) impacts only the sequence of stability centers (ˆ x` )`∈N , it is natural to expect that when (ˆ x` )`∈N has finitely many elements, strong convergence of (xk )k∈N to the last stability center x ˆ follows from [6, Theorem 4.4] (see item (ii’)). However, (ii’) from [6] does not ensure that x ˆ is the solution of problem (1) closest to x0 . Finally, item (iii) above is not present in [6]. We emphasize that property (i) establishes a connection between convex nondifferentiable minimization problems and minimal norm solution problems, that have been mostly studied in the differentiable setting (see [8, 9] and references therein). Property (i) states that the limit of the sequence (ˆ x` )`∈N is a point in the solution set S∗ of problem (1) that also solves the following second-stage problem min kx − x0 k s.t. x ∈ S∗ ,

(5)

known as the minimal norm solution problem of (1) (mainly when x0 = 0 belongs to X). The main contributions of the present paper are the following: – improvement of the proximal bundle method given in [6], showing strong convergence of the whole sequence of stability centers (compare properties (i) and (i’) above); – we show that as long as problem (1) has a solution, the sequence of stability centers is contained in a certain ball with fixed radius; – strong convergence is shown without knowing the optimal value f∗ of problem (1). This differs from the analysis of [10]; – provided that x0 belongs to X, the proposed bundle algorithm solves the minimal norm solution problem associated to (1) without requiring either differentiability or Lipschitz continuity of f , in contrast to the methods given in [8]. This paper is organized as follows. Section 2 presents the connection of minimal norm solution problems and property (i). In Section 3 we rely on the proximal bundle method given in [6] to provide our algorithm. Moreover, we propose an easy manner to define the projection set Xk in (4) in order to ensure the stated properties (i)-(iii). Section 4 gives the convergence analysis of the proposed method. The interest of property (i) is highlighted in Section 5 on several unit-commitment problem instances.

3

August 5, 2014

Optimization

StrongBundle

Notation

P Our notation is standard. For any points x, y ∈ H, hx, yi := j xi yi is the usual inner product in a real Hilbert space H, and k · k for its associated norm, i.e., p kxk = hx, xi. The closed ball centered at x with radius R is denoted by B[x, R]. For a set X ⊂ H, we denote by iX its indicator function, i.e., iX (x) = 0 if x ∈ X and iX (x) = +∞ otherwise. The relative interior of X, denoted by ri(X), is the interior of X relative to its affine hull. The affine hull of X is the intersection of all affine sets that contain X. Given a convex function f , we remind the definition of the subdifferential of f at the point x, i.e., ∂f (x) = {g ∈ H | f (y) > f (x) + hg, y − xi ∀ y ∈ H}. Throughout this paper we assume that f satisfies the following property: ∂f is bounded on bounded sets,

(6)

which holds automatically when H is a finite-dimensional space in view of Theorem 4.6.1(ii) in [11]. For some equivalences with condition (6) (see for instance [12, Proposition 16.17]). Assumption (6) has been considered in the convergence analysis of many optimization methods in infinite-dimensional spaces (see for instance [4, 10, 13] and references therein). In [6], assumption (6) is also used, although this hypothesis is not explicitly stated in [6, Proposition 4.3].

2.

Minimal norm solution problem

Let S∗ be the solution set of problem (1), which is convex and closed (remember that X is a convex and closed set, and f is a convex and lower semicontinuous function). Suppose that problem (1) has multiple optimal solutions. In this case, one may be interested in finding a point in S∗ satisfying some specific criterion: for instance, the minimal norm solution with respect to the initial point, x0 , that is the unique solution of the second-stage problem (5). The recent work [8] proposes a gradient-based method for solving minimal norm solution problems assuming in (1) that H = Rn , x0 is an arbitrary point, f is differentiable, and Lipschitz continuous on Rn . As in this paper, [8] finds the optimal solution of problem (5) dealing explicitly only with the core problem (1). In what follows we present an example illustrating the usefulness of property (i).

2.1.

Unit-commitment problem

Consider a hydro-thermal system composed of m power plants (hydro-valleys, thermal or nuclear-plants), and the task of determining which power plants are to be used in order to generate enough power to satisfy an electrical load demand d ∈ RT , with minimum operating cost and reliability. Depending on the time-horizon T of planning, this problem is known as a unit-commitment problem (we refer to [14]

4

August 5, 2014

Optimization

StrongBundle

for a recent review), and it can be written schematically as: minp∈

m X Rn

ci (pi )

i=1

s.t.

m X

hi (pi ) ≥ d

(7)

i=1

pi ∈ P i ⊆ Rni , where a unit is represented by i, with ci : Rni → R and hi : Rni → RT its cost and production functions, respectively. The decision variable pi stands for i the production schedule, which lies in the feasible Pmset iof itechnical constraints P , i i and has operating cost c (p ). The constraint i=1 h (p ) ≥ d links the decision variables in order to satisfy the offer-demand equilibrium constraints. These can also cover spinning reserve requirements (e.g., [15, 16]). The main difficulties of the problem reside in its large scale (m, n) and heterogeneity: hydro valleys have fundamentally different constraints (e.g., [17, 18]) than thermal units (e.g., [19, 20]) and potentially stochastic nature (see for instance [21–23] and references therein). Due to the presence of multiple coupling constraints, Lagrangian relaxation appears as a natural approach for dealing with this kind of problems (see [24] and references therein). The dual problem of (7) can be represented as (1) by taking f (x) = −hx, di −

m  X i=1

 min c (p ) − hx, h (p )i , i i i

p ∈P

i

i

i

(8)

and X = RT+ . The key advantage of moving to the dual is that the above inner problem is separable and can be solved by special dedicated approaches that can exploit specific structures. Let x ¯ be an optimal dual solution. Due to its economic interpretation, x ¯ represents the marginal costs of power production. Since the price of electricity is decided by taking into account the marginal costs, power companies and/or governmental entities may wish to keep x ¯ as small as possible (high marginal costs imply a high price for the electricity). Therefore, for the unit-commitment problem the aim is to find a solution to the dual problem like (1) with the minimal norm, i.e., finding x∗ solution of the second stage problem (5) with reference point given by x0 = 0 in Rn . In other words, one wishes to find x∗ = PS∗ (x0 ). Alternatively, x0 could be a specific smooth signal and we may wish to single out a dual solution closest to such a signal. The suggested method can thus be interpreted as a natural way of obtaining stabilized prices, without penalizing the oscillation behavior of x in the dual objective f , as proposed in [25]. We emphasize that the proposed algorithm that solves problem (1) finds the optimal solution of (5) without explicitly dealing with the latter problem. This is a consequence of property (i).

5

August 5, 2014

Optimization

3.

StrongBundle

Strongly convergent proximal bundle method

The proposed method for solving problem (1) generates two sequences of feasible iterates: (xk )k∈N and (ˆ x` )`∈N both contained in X. For each point xk (respectively x ˆ` ) an oracle is called to compute f (xk ) and a subgradient gk ∈ ∂f (xk ) (respectively f (ˆ x` ) and gˆ` ∈ ∂f (ˆ x` )). With such information, the method creates the linearization fkL (x) := f (xk ) + hgk , x − xk i ≤ f (x) ,

(9)

where the inequality follows from the convexity of f . At iteration k a polyhedral cutting-plane model of f is available: fkM (x) := max fjL (x) with Bk an index set, e.g., Bk ⊂ {1, . . . , k} . j∈Bk

(10)

The index set Bk gathers the bundle of information: {xj , f (xj ), gj }j∈Bk . As already mentioned, given a stability center x ˆ` and a prox-parameter tk > 0, the next iterate xk+1 is obtained by solving subproblem (2). Some properties of the minimizer xk+1 are given in the following lemma. Lemma 3.1 Suppose that one of the following constraint qualifications holds: X is a polyhedral set or ri(X) 6= ∅. Denoting by iX the indicator function of the set X, the optimality conditions for (2) imply xk+1 = x ˆ` − tk pk with pk = pkf + pkX ,

(11)

where pkf ∈ ∂fkM (xk+1 ) and pkX ∈ ∂iX (xk+1 ). Moreover, the affine function fkLa (x) := fkM (xk+1 ) + hpk , x − xk+1 i

(12)

is an underestimate of the model: fkLa (x) ≤ fkM (x) for all x ∈ X.

Proof. Under the current constraint qualification, relation (11) comes from the optimality conditions of the problem min fkM (x) + x∈H

1 kx − x ˆk k2 + iX (x) , 2tk

which is equivalent to problem (2) in terms of optimal value and solution xk+1 . Since pkf ∈ ∂fkM (xk+1 ) and pkX ∈ ∂iX (xk+1 ), the subgradient inequality gives fkM (xk+1 ) + hpkf , x − xk+1 i ≤ fkM (x) hpkX , x − xk+1 i ≤ iX (x) for all x ∈ X. By summing the above inequalities and using (11) we get (12).

We next provide useful connections between the predicted decrease vk and the

6

August 5, 2014

Optimization

StrongBundle

aggregate linearization error defined as: eˆk := f (ˆ x` ) − fkLa (ˆ x` ) .

(13)

We also establish a key relation that will be the basis for the subsequent convergence analysis. Proposition 3.2 Let ` ≥ 0, vk = f (ˆ x` ) − fkM (xk+1 ) in (4), and K ` be an index set gathering the iterates issued by the `-th stability center x ˆ` , i.e., K ` := {k ∈ N | xk+1 = x ˆ ` − tk pk } . It holds that eˆk ≥ 0,

eˆk + tk kpk k2 = vk ≥ 0 for all k ∈ K ` .

(14)

Moreover, the following inequality holds: f (ˆ x` ) ≤ f (x) + eˆk + kpk kkˆ x` − xk

for all x ∈ X and k ∈ K ` .

(15)

Proof. Let k ∈ K ` . The fact that eˆk ≥ 0 follows directly from (9) and (12). To show (14), note that eˆk = f (ˆ x` ) − fkLa (ˆ x` ) M = f (ˆ x` ) − (fk (xk+1 ) + hpk , x ˆ` − xk+1 i) = vk − hpk , x ˆ` − xk+1 i = vk − tk kpk k2 , where the last equality follows from (11). This completes the proof of (14). We next verify (15). Using again (9), for all x ∈ X, it holds that f (x) ≥ fkLa (x) = fkM (xk+1 ) + hpk , x − xk+1 i = f (ˆ x` ) − (f (ˆ x` ) − fkM (xk+1 )) + hpk , x − x ˆ` i + hpk , x ˆ` − xk+1 i = f (ˆ x` ) − vk + hpk , x ˆ` − xk+1 i + hpk , x − x ˆ` i = f (ˆ x` ) − eˆk + hpk , x − x ˆ` i and (15) follows by taking into account (14) and the Cauchy-Schwartz inequality.

Provided that the stability centers sequence (ˆ x` )`∈N is bounded (this feature comes from item (ii) to be shown in Proposition 4.1 below), it follows from inequality (15) that convergence amounts to obtaining the following property: a subsequence ((ˆ ek , pk ))k∈K such that limk∈K eˆk = 0 and limk∈K kpk k = 0. (16) Hence, a proximal bundle algorithm can stop with a satisfactory solution x ˆ` when both eˆk and kpk k are small.

7

August 5, 2014

Optimization

3.1.

StrongBundle

Definition of the projection set

In order to obtain the properties (i)-(iii) stated in the introduction, a crucial requirement for the projection set Xk is the inclusion S∗ ⊆ Xk ⊆ X for all k, where S∗ is the solution set of problem (1). With the aim of providing property (i), the projection set Xk must contain the points x ∈ X that satisfy the angular constraint hx0 − x ˆ` , x − x ˆ` i ≤ 0, for a given initial point x0 ∈ X. We now provide a certain minimal requirement for the projection set, denoted by Xmin k , which is crucial to obtain (i)-(iii). Then analysis of convergence will be shown for all sets Xk satisfying S∗ ⊆ Xk ⊆ Xmin k . The minimal projection set is defined as:

 Xmin k

:=

 hˆ g` , x − x ˆ` i ≤ fkbest − f (ˆ x` ) x∈X , hx0 − x ˆ` , x − x ˆ` i ≤ 0

(17)

with fkbest :=

min

1≤j≤k+1

f (xj ).

(18)

A smaller projection set can be obtained by inserting the cutting-plane model in definition (17), that is

 M  f (x) ≤ fkbest   k best Xk := x ∈ X hˆ , g` , x − x ˆ` i ≤ fk − f (ˆ x` ) ⊆ Xmin k   hx0 − x ˆ` , x − x ˆ` i ≤ 0

(19)

with fkbest defined in (18). An even tighter set is available when the optimal value f∗ of (1) is known: just replace fkbest in (19) by f∗ . Naturally, the tighter is Xk ⊇ S∗ the better is the stability center x ˆ`+1 = PXk (x0 ). Ideally, one wishes to have Xk = S∗ so that x ˆ`+1 would be the solution of problem (1) that is nearest to the initial point, solving in this way a sort of minimal norm solution problem. However, having Xk = S∗ is, of course, impossible in practice. Since S∗ 6= ∅ by assumption, we now proceed to show that the sequence of points generated by rule (4) is well defined and that the sets defined in (17) and (19) indeed contain S∗ . In order to do so, we follow the lead of [4] to define the following set V := {x ∈ H | hx0 − x, z − xi ≤ 0, for all z ∈ S∗ } , which is nonempty because x0 belongs to V. Proposition 3.3 Let k be an iteration index such that the point xk+1 satisfies the be defined in (17), and x ˆ`+1 = PXk (x0 ). inequality in (4). Moreover, let Xk = Xmin k If x ˆ` ∈ V , then (a) S∗ ⊆ Xk ⊆ X and x ˆ`+1 ∈ V ; (b) as a result of (a), the whole sequence (ˆ x` )`∈N is contained in V , and the inclusion S∗ ⊆ Xk holds for all k. Proof. Let z ∈ S∗ be a solution of problem (1). It follows from the convexity of f that hgxˆ` , z − x ˆ` i ≤ f (z) − f (ˆ x` ) ≤ fkbest − f (ˆ x` ) ,

8

August 5, 2014

Optimization

StrongBundle

where the last inequality follows from definition of fkbest . This shows that z satisfies the first inequality in (17). Since x ˆ` belongs to V by assumption, we have hx0 − ˆ` i ≤ 0 and z satisfies the second inequality in (17) as well. Hence, we have x ˆ` , z − x shown that S∗ ⊆ Xk . Since S∗ 6= ∅, then Xk is nonempty and thus x ˆ`+1 = PXk (x0 ) is well defined. The inclusion Xk ⊆ X is trivial. Since x ˆ`+1 is the orthogonal projection of x0 onto the projection set Xk , the inequality hx − x ˆ`+1 , x0 − x ˆ`+1 i ≤ 0 holds true for all x ∈ Xk . In particular, hz − x ˆ`+1 , x0 − x ˆ`+1 i ≤ 0 for all z ∈ S∗ , i.e., x ˆ`+1 belongs to V , showing item (a). In order to show item (b), notice that x0 ∈ V . To prove that the whole sequence (ˆ x` )`∈N is contained in V apply item (a) inductively. Since xk+1 satisfying (4) was arbitrary, we have S∗ ⊆ Xk ⊆ X for all k as it was to be shown. Corollary 3.4 The statements of Proposition 3.3 also hold true when Xk is defined as in (19). Proof. In order to show that S∗ ⊆ Xk , we need only establish fkM (z) ≤ fkbest for all z ∈ S∗ . The latter inequality readily follows from fkM (z) ≤ f (z) ≤ fkbest . The remainder of the proof is a verbatim copy of that of Proposition 3.3. Remark 1 Convexity of Xk , orthogonal projection onto Xk , and the relation S∗ ⊆ Xk are the main ingredients in Proposition 3.3. The choice Xk = {xk+1 }, which results in rule (3), can not be shown to have properties (a) and (b) above.

3.2.

Strong proximal bundle algorithm

The strong proximal bundle algorithm can be written as follows: Strongly convergent proximal bundle algorithm Step 0. (Initialization) Select γ ∈ (0, 1) and t1 , tmin such that t1 ≥ tmin > 0. Choose x0 ∈ X and stopping tolerances Toleˆ, Tolp > 0. Call an oracle to compute (f (x0 ), g0 ). Set x ˆ0 ← x0 , k ← 0, ` ← 0 and B0 ← {0}, Step 1. (Next iterate) Obtain xk+1 by solving (2) Set pk ← (ˆ x` − xk+1 )/tk , vk ← f (ˆ x` ) − fkM (xk+1 ), and eˆk ← vk − tk kpk k2 Step 2. (Stopping test) If eˆk ≤ Toleˆ and kpk k ≤ Tolp , stop Step 3. (Oracle call) Call the oracle to compute (f (xk+1 ), gk+1 ) Step 3.1.(Descent test) If f (xk+1 ) ≤ f (ˆ x` ) − γvk , then Set Xk as in (17) (or as in (19)) and obtain x ˆ`+1 ← PXk (x0 ) Call the oracle again to compute (f (ˆ x`+1 ), gˆ`+1 ) Set ` ← ` + 1 and choose tk+1 ≥ tk Step 3.2.(Null step) Else, choose tk+1 ∈ [tmin , tk ] Step 4 (Bundle management) Choose Bk+1 ⊃ {k + 1, k a } Set k = k + 1 and go back to Step 1.

The aim of the algorithm is to estimate a solution to problem (1) as accurately as possible. The stability centers x ˆ` provide such an estimation. We emphasize that the aggregate linearization fkLa must enter into the bundle (i.e., k a ∈ Bk+1 ) only when some active linearization fjL is excluded (we say that fjL is

9

August 5, 2014

Optimization

StrongBundle

an active linearization at iteration k when fjL (xk+1 ) = fkM (xk+1 )). This is the so called bundle compression mechanism, that allows for a limited-memory algorithm. Essentially Algorithm 3.2 is a classical proximal bundle method, except for Step 3.1. Notice that if we choose Xk = {xk+1 } for all k, then the next stability x ˆ`+1 coincides with xk+1 , and we have the classical rule (3) to define the stability center. Convergence can then be established following classic lines of proof (see for instance, [6] and [7]). However, when Xk is defined as in (17) (or (19)), then (ˆ x` )`∈N converges strongly to PS∗ (x0 ), as we will show in the next section.

4.

Convergence analysis

We start this section by showing that property (iii) stated in Section 1 holds. Proposition 4.1 Let x∗ ∈ S∗ be the projection of the initial iterate x0 onto S∗ , i.e., x∗ = PS∗ (x0 ). Moreover let Xk be such that S∗ ⊆ Xk (for instance, Xk as defined by (19)). Then, the sequence (ˆ x` )`∈N is contained in the closed ball centered in (x0 + x∗ )/2 with radius kx∗ − x0 k/2, i.e.,



(ˆ x` )`∈N

 x0 + x∗ kx∗ − x0 k ⊂B , . 2 2

(20)

Proof. Define the points y` = x ˆ` − 12 (x0 + x∗ ) and y∗ = x∗ − 12 (x0 + x∗ ). With this definition we can write x0 as x0 = −y∗ + 21 (x0 + x∗ ). Since x∗ ∈ S∗ ⊆ Xk , it follows from the angular inequality defining Xk that 0 ≥ hx∗ − x ˆ ` , x0 − x ˆ` i

1 1 1 1 = [y∗ + (x0 + x∗ )] − [y` + (x0 + x∗ )], [−y∗ + (x0 + x∗ )] − [y` + (x0 + x∗ )] 2 2 2 2 = hy∗ − y` , −y∗ − y` i = ky` k2 − ky∗ k2 . Hence, we have shown that



x0 + x∗ x0 + x∗

x

= kx∗ − x0 k , ≤ x∗ −

ˆ` −

2 2 2 establishing (20). In what follows we assume that Toleˆ = Tolp = 0 and that Algorithm 3.2 does not stop. If the algorithm stops, then by (15) the last stability center x ˆ` is a solution to problem (1), and by Proposition 4.1 all the stated properties (i)-(iii) are automatically satisfied (since the sequences (ˆ x` )`∈N and (xk )k∈N are finite). In order to establish the stated properties (i) and (ii), assuming that the algorithm does not stop, we split our analysis into the two exclusive cases below: – either the stability center x ˆ` = x ˆ remains unchanged after a finite number of iterations; – or there are infinitely many stability centers. The first case is addressed in [6, Proposition 4.3], since Algorithm 3.2 coincides with the one given in [6] when finitely many descent steps are performed. However,

10

August 5, 2014

Optimization

StrongBundle

since assumption (6) is required in [6, Proposition 4.3] but not explicitly stated, we provide the complete proof here.

4.1.

Finitely many stability centers

Notice that for iterates in which the stability center x ˆ` is fixed, i.e., null steps, Algorithm 3.2 works like the classical proximal bundle method. We assume in this subsection that there exists a last stability center x ˆ` = x ˆ, i.e., the descent step ¯ counter ` is no longer updated for all iteration indices k greater than a certain k. ¯ Let k ≥ k; we start by defining the following two useful functions: 1 kx − x ˆ k2 2tk 1 fkSa (x) := fkLa (x) + kx − x ˆk2 2tk fkS (x) := fkM (x) +

= fkM (xk+1 ) + hpk , x − xk+1 i +

(21) (22) 1 kx − x ˆ k2 . 2tk

Notice that fkSa is twice differentiable: ∇fkSa (x) = pk +

1 x−x ˆ and ∇2 fkSa (x) = I , tk tk

where I is the identity operator in H, i.e., I(x) = x for all x ∈ H. It follows from (11) that ∇fkSa (xk+1 ) = 0, i.e., the point xk+1 is the unique solution of the problem minx∈H fkSa (x) . Moreover, since fkSa is quadratic, its second-order Taylor expansion is exact: 1 fkSa (x) = fkSa (xk+1 ) + h∇fkSa (xk+1 ), x − xk+1 i + h∇2 fkSa (xk+1 )(x − xk+1 ), x − xk+1 i 2 1 = fkSa (xk+1 ) + h0, x − xk+1 i + hI(x − xk+1 ), x − xk+1 i 2tk 1 = fkSa (xk+1 ) + kx − xk+1 k2 2tk 1 = fkS (xk+1 ) + kx − xk+1 k2 , (23) 2tk where the last equality follows from (21) and (22). The above development is crucial to show the following Lemma, which is essentially a reformulation of [7, Lemma 6.3] to our setting. Lemma 4.2 Let fkS be the model given in (21) and xk+1 be a null iterate, obtained from the stability center x ˆ. If tk+1 ≤ tk ≤ tk−1 , then (a) the sequence (fkS (xk+1 ))k∈N is nondecreasing and satisfies fkS (xk+1 ) +

1 S kxk+2 − xk+1 k2 ≤ fk+1 (xk+2 ) ; 2tk

11

August 5, 2014

Optimization

StrongBundle

(b) the sequence (fkS (xk+1 ))k∈N is bounded from above: fkS (xk+1 ) +

1 kˆ x − xk+1 k2 ≤ f (ˆ x) ; 2tk

(c) the inequality M M S fk+1 (xk+1 ) − fk−1 (xk ) ≤ fkS (xk+1 ) − fk−1 (xk ) +

1 tmin

kxk+1 − xk kkˆ x − xk k

holds true. Proof. As the aggregate index k a enters the bundle in Step 4 of Algorithm 3.2, it M (x) for all x ∈ H. Thus, using (22), we have follows that fkLa (x) ≤ fk+1 1 kx − x ˆ k2 2tk 1 M ≤ fk+1 (x) + kx − x ˆk2 2tk 1 S M kx − x ˆk2 = fk+1 (x) , ≤ fk+1 (x) + 2tk+1

fkSa (x) = fkLa (x) +

where the last inequality follows from the assumption tk+1 ≤ tk . Set x = xk+2 in (23) to obtain (a), and x = x ˆ to obtain (b) (it follows from definition (21), with k S M replaced by k + 1, that fk+1 (ˆ x) = fk+1 (ˆ x) ≤ f (ˆ x)). In order to show (c), note that 1 kxk+1 − xk + xk − x ˆk2 2tk 1 1 ≥ kxk − x ˆk2 + hxk+1 − xk , xk − x ˆi 2tk tk 1 1 ≥ kxk − x ˆk2 + hxk+1 − xk , xk − x ˆi 2tk−1 tk

fkS (xk+1 ) − fkM (xk+1 ) =

S M ≥ fk−1 (xk ) − fk−1 (xk ) +

1 hxk+1 − xk , xk − x ˆi , tk

where the second inequality is due to tk ≤ tk−1 . The result follows by applying the Cauchy-Schwartz inequality and assumption tk ≥ tmin > 0. Items (a)-(c) in Lemma 4.2 are useful in the next lemma, which shows that the cutting-plane model evaluated at the iterate xk asymptotically approximates the value of the function. Lemma 4.3 In addition to the setting of Lemma 4.2, assume (6). Then the following holds M lim [f (xk ) − fk−1 (xk )] = 0 .

k→∞

Proof. By Lemma 4.2(a) the sequence (fkS (xk+1 ))k∈N is nondecreasing. Thus, there exists a constant C > 0 such that fkS (xk+1 ) ≥ −C for all k. Using Lemma 4.2(b),

12

August 5, 2014

Optimization

StrongBundle

we conclude that 1 kˆ x − xk+1 k2 ≤ f (ˆ x) − fkS (xk+1 ) ≤ f (ˆ x) + C , 2tk showing that the sequences (kˆ x − xk k)k∈N and (xk )k∈N are bounded, since (tk )k∈N is nonincreasing. By assumption (6), the subdifferential of f is bounded on bounded sets. It thus results that (gk )k∈N is a bounded sequence. By Lemma 4.2(b) the sequence (fkS (xk+1 ))k∈N is bounded from above by f (ˆ x). Lemma 4.2(a) shows that (fkS (xk+1 ))k∈N is nondecreasing and hence S lim [fk+1 (xk+2 ) − fkS (xk+1 )] = 0 and

k→∞

lim kxk+2 − xk+1 k2 = 0 ,

k→∞

(24)

where the second limit follows from Lemma 4.2(a) and the assumption that tk is nonincreasing. Since k ∈ Bk , we have: f (xk ) + hgk , x − xk i = fkL (x) ≤ fkM (x) for all x ∈ H. Setting x = xk+1 in the above inequality, we get f (xk ) = fkL (xk+1 )+hgk , xk −xk+1 i. Therefore, M M f (xk ) − fk−1 (xk ) = fkL (xk+1 ) + hgk , xk − xk+1 i − fk−1 (xk ) M ≤ fkM (xk+1 ) + hgk , xk − xk+1 i − fk−1 (xk ) 1 S S x − xk k + hgk , xk − xk+1 i , ≤ fk (xk+1 ) − fk−1 (xk ) + tmin kxk+1 − xk kkˆ

where the last inequality is due to Lemma 4.2(c). Applying the limit with k → ∞ in the above inequalities and taking into account (24), remembering that (xk )k∈N and M (gk )k∈N are bounded sequences, we conclude that lim supk→∞ [f (xk ) − fk−1 (xk )] ≤ M 0. Since f is convex, we have f (xk ) ≥ fk−1 (xk ) and the result follows.

Property (ii) is shown in the following result. Proposition 4.4 (Finitely many stability center) Assume that (6) holds and suppose that there exists a last stability center x ˆ such that the descent test (4) does not ˆ = PS∗ (x0 ), and the sequence (xk )k∈N hold for k large enough. Then (16) holds, x converges strongly to x ˆ.

M Proof. Lemma 4.3 ensures that limk→∞ [f (xk ) − fk−1 (xk )] = 0. Since the descent test (4) does not hold for k large enough, it follows from (4) that

M M 0 = lim [f (xk ) − fk−1 (xk )] ≥ lim (f (ˆ x) − γvk−1 − fk−1 (xk )) = lim (1 − γ)vk−1 ,

k→∞

k→∞

k→∞

i.e., vk → 0 because vk ≥ 0 (c.f. Proposition 3.2). Since tk ≥ tmin > 0, it follows from (14) that 0 = lim vk = eˆk + tk kpk k ≥ eˆk + tmin kpk k ≥ 0 , k→∞

(25)

i.e., property (16) holds because eˆk ≥ 0. Applying the limit with k → ∞ in (15)

13

August 5, 2014

Optimization

StrongBundle

for a fixed x ˆ` = x ˆ, we conclude that f (ˆ x) ≤ f (x) + lim [ˆ ek + kpk kkx − x ˆk] = f (x) . k→∞

Since x ∈ X is arbitrary, we conclude that x ˆ is a solution to problem (1). Moreover, it follows from Proposition 4.1 that x ˆ belongs to the ball B defined in (20), whose diameter is kPS∗ (x0 ) − x0 k. It is readily seen that x0 belongs to B as well. Hence, by defining x∗ = PS∗ (x0 ) we have the inequality kˆ x − x0 k ≤ kx∗ − x0 k . Since the solution set S∗ is convex and the projection is orthogonal, then x∗ satisfies the strict inequality: kx∗ − x0 k < kz − x0 k for all z ∈ S∗ \{x∗ } . It thus follows from the two inequalities above that x ˆ = x∗ = PS∗ (x0 ), because x ˆ ∈ S∗ . In order to show the last assertion, note that by (25), limk→∞ tk kpk k = 0. By using (11) we conclude that limk→∞ kxk+1 − x ˆk = 0, i.e., (xk )k∈N converges strongly to x ˆ. In what follows we deal with the case of infinitely many stability centers. The choice of Xk then becomes crucial.

4.2.

Infinitely many stability centers

We start by providing the following useful result, which adjusts to our setting the Lemmas 3.4 and 3.5 from [26]. Lemma 4.5 Let Xk be such that Xk ⊆ Xmin k , and let k(`) = k be the iteration in which the (` + 1)-th stability center was determined by Algorithm 3.2, i.e., x ˆ`+1 = PXk(`) (x0 ). Then, it holds that kx0 − x ˆ`+1 k2 ≥ kx0 − x ˆ ` k2 +

γ2 2 v , kˆ g` k2 k(`)

(26)

for all ` ≥ 0 and with γ ∈ (0, 1). Proof. Since Xk ⊆ Xmin k , any x ∈ Xk satisfies the two inequalities (17) defining . Hence, since x ˆ = PXk(`) (x0 ), it follows by the first inequality in (17) that Xmin `+1 k best hˆ g` , x ˆ`+1 − x ˆ` i ≤ fk(`) − f (ˆ x` ) . best ≤ f (x Since by definition fk(`) k+1 ) = f (xk(`)+1 ), we conclude that

hˆ g` , x ˆ`+1 − x ˆ` i ≤ f (xk(`)+1 ) − f (ˆ x` ) ≤ −γvk(`) , where the last inequality is due to (4). By changing the sign and applying the

14

August 5, 2014

Optimization

StrongBundle

Cauchy-Schwartz inequality, we get kˆ g` kkˆ x` − x ˆ`+1 k ≥ γvk(`) .

(27)

It also follows from definition (17) of Xk(`) that hx0 − x ˆ` , x ˆ`+1 − x ˆ` i ≤ 0. Therefore, by developing the squares in kx0 − x ˆ` + x ˆ` − x ˆ`+1 k, we have kx0 − x ˆ`+1 k2 ≥ kx0 − x ˆ` k2 + kˆ x` − x ˆ`+1 k2 . Inequality (26) then follows from (27) and the above inequality. Lemma 4.6 Let Xk be such that S∗ ⊆ Xk ⊆ Xmin k . Assume moreover that ∂f is bounded on bounded sets, i.e., (6) holds. Then, condition (16) holds and each weak cluster point of the sequence (ˆ x` )`∈N belongs to the solution set S∗ . Proof. We first recall that by Proposition 4.1 the sequence (ˆ x` )`∈N is bounded. Let th k(`) be the iteration in which the (` + 1) stability center was determined, i.e., x ˆ`+1 = PXk(`) (x0 ). Recalling that the expected decrease vk is nonnegative by (14), it follows from Lemma 4.5 that the sequence (kx0 − x ˆ` k)`∈N is nondecreasing. It follows from Proposition 4.1 that the sequence (kx0 − x ˆ` k)`∈N is bounded (from above), and hence convergent by combining these two features. Therefore, by using (26) 2 0 ≤ vk(`)

we conclude that lim`→∞

γ2 ≤ kˆ x`+1 − x0 k2 − kˆ x` − x0 k2 , kˆ g` k2

γ2 2 kˆ g` k2 vk(`)

= 0.

Since (ˆ x` )`∈N is bounded, assumption (6) ensures that (ˆ g` )`∈N is also bounded. Thus, 0 = lim vk(`) = lim (ˆ ek(`) + tk(`) kpk(`) k) ≥ lim (ˆ ek(`) + tmin kpk(`) k) ≥ 0 , `→∞

`→∞

`→∞

and (16) holds for K := {k(1), k(2), . . .}, because eˆk ≥ 0 for all k. Let (ˆ x`0 )`0 ∈N be any weakly convergent subsequence of (ˆ x` )`∈N , with weak cluster point x ¯ (since (ˆ x` )`∈N is a bounded sequence in a Hilbert space H, the existence of such a sub0 sequence is ensured). Let x ∈ X be an arbitrary point and K ` be an index set gathering the iterates issued by the `0 -th stability center, as defined in Proposi0 tion 3.2. It follows from the definition that k(`0 ) belongs to K ` . Writing (15) with ` replaced by `0 gives: f (ˆ x`0 ) ≤ f (x) + eˆk(`0 ) + kpk(`0 ) kkˆ xk(`0 ) − xk . We recall that f is lower semicontinuous by assumption, and hence f is also weakly lower semicontinuous. Therefore, f (¯ x) ≤ lim inf f (ˆ x`0 ) ≤ f (x) + lim inf [ˆ ek(`0 ) + kpk(`0 ) kkˆ xk(`0 ) − xk] = f (x) , 0 0 ` →∞

` →∞

because (ˆ x`0 )`0 ∈N is a bounded subsequence of (ˆ x` )`∈N . Therefore, we have shown that any weak cluster point x ¯ of (ˆ x` )`∈N belongs to the solution set S∗ of problem (1).

15

August 5, 2014

Optimization

StrongBundle

The following result shows that the sequence of stability centers converges strongly to the solution that lies closest to the initial point. Proposition 4.7 (Infinitely many stability centers) Let Xk be such that S∗ ⊆ Xk ⊆ Xmin k . Assume moreover that ∂f is bounded on bounded sets, i.e., (6) holds. Then (ˆ x` )`∈N converges strongly to x∗ , where x∗ is defined as x∗ = PS∗ (x0 ). Proof. The point x∗ is well defined, since S∗ 6= ∅ is a closed and convex set. Definition of x ˆ` gives the following inequality: kˆ x` − x0 k ≤ kx − x0 k for all x ∈ Xk(`−1) ,

(28)

with k(` − 1) the iteration in which the `-th stability center was determined, i.e., x ˆ` = PXk(`−1) (x0 ). In particular, it follows from Proposition 4.1 that S∗ ⊆ Xk(`−1) and thus, kˆ x` − x0 k ≤ kx∗ − x0 k, for all ` ≥ 0.

(29)

It follows from Lemma 4.6 that every weak cluster point of the sequence (ˆ x` )`∈N belongs to S∗ . Let (ˆ x`0 )`0 ∈N be a subsequence of (ˆ x` )`∈N that converges weakly to x ¯ ∈ S∗ . By adding and subtracting x0 in the term kˆ x`0 − x∗ k2 we get kˆ x`0 − x∗ k2 = 2 2 kˆ x`0 − x0 k + kx∗ − x0 k − 2hˆ x`0 − x0 , x∗ − x0 i. Therefore, inequality (29) provides kˆ x`0 − x∗ k2 ≤ 2kx∗ − x0 k2 − 2hˆ x`0 − x0 , x∗ − x0 i . By applying the limit with `0 → ∞ and recalling (ˆ x`0 )`0 ∈N converges weakly to x ¯ ∈ S∗ , we obtain lim sup kˆ x`0 − x∗ k2 ≤ 2(kx∗ − x0 k2 − h¯ x − x0 , x∗ − x0 i). `0 →∞

(30)

We now proceed to show that the right-hand side of the above inequality is nonpositive. Since x∗ = PS∗ (x0 ), then hx0 − x∗ , x ¯ − x∗ i ≤ 0. This inequality gives 0 ≥ hx0 − x∗ , x ¯ − x∗ i = hx0 − x∗ , x0 − x∗ i + hx0 − x∗ , x ¯ − x0 i = kx∗ − x0 k2 − hx∗ − x0 , x ¯ − x0 i . Thus, it follows from (30) that lim sup`0 →∞ kˆ x`0 − x∗ k2 ≤ 0, showing that (ˆ x`0 )`0 ∈N converges strongly to x∗ . As (ˆ x`0 )`0 ∈N is an arbitrary weakly convergent subsequence of (ˆ x` )`∈N , we have shown that every weakly convergent subsequence of (ˆ x` )`∈N converges strongly to x∗ . Hence, the whole sequence (ˆ x` )`∈N converges strongly to x∗ = PS∗ (x0 ).

4.3.

Convergence

We summarize the results of this paper in the following theorem. Theorem 4.8 Consider problem (1) with f a convex and lower semicontinuous function having ∂f bounded on bounded sets. Suppose that X is a convex and closed subset of a real Hilbert space H. Let Toleˆ = Tolp = 0 in Algorithm 3.2, and min is defined in (17) (for let Xk be defined such that S∗ ⊆ Xk ⊆ Xmin k , where Xk

16

August 5, 2014

Optimization

StrongBundle

instance, Xk can be defined by (19)). In addition, suppose that the solution set S∗ of problem (1) is nonempty. Then (16) is satisfied, and properties (i)-(iii) stated in the Introduction hold true. Proof. Proposition 4.7 gives item (i), and Proposition 4.4 gives item (ii) (see also Proposition 4.3 in [6], assuming (6)). Finally, item (iii) is ensured by Proposition 4.1.

5.

Some numerical experiments in unit-commitment problems

In this section we will present some numerical results obtained by solving specific unit-commitment problems (7). We care to mention that an infinite-dimensional variant of problem (7) can be obtained if the demand d is supposed to be a continuous random variable, and the objective function is, for instance, the expectation (with respect to a given probability measure) of the production costs. However, only deterministic variants of the problem are considered in this work: d is a given vector estimating future demand of electricity, a commonly made assumption for this class of problems, [15, 17, 27]. Therefore, rather than strong convergence reasons, the motivation of testing our algorithm in this kind of problems was presented in § 2.1: the goal is to find a minimal norm solution, stabilizing prices of power generation. ´ The considered instances are those coming from EDF (Electricit´ e de France). Modeling details can be found in [16]. The main difficulty in solving the problem lies in the fact that P i is a very complex (essentially discrete) set for several power units. The inner problem in (8) requires specific methods for its solution. Dynamic and mixed-integer programming methods are used for solving the subproblems related to thermal units and special interior-point methods are used for solving the hydro-valley subproblems [27]. Beyond the derived production schedule p¯, which is obviously of great importance, the optimal dual signal x ¯ is important on its own. We begin by recalling that in order to derive p¯, additional Lagrangian heuristics (e.g., [16, 28, 29]) or augmented Lagrangian based heuristics [30] are required. We will not investigate p¯ further, but focus on x ¯ instead. This optimal dual solution x ¯ can be interpreted as a price signal explaining the optimal solution. Its information can be exploited to make minor changes to p¯ in an economic way. Price stabilization approaches such as those investigated in [25] are one way to obtain such features. In the numerical tests we have used a standard proximal bundle algorithm with elementary updating of the proximal parameter tk . This is basically Algorithm 3.2 except that the projection set is Xk = {xk+1 }. Consequently, the need for a second oracle in Step 3.1 is eliminated. We refer to this algorithm as Reference algorithm. The second version of Algorithm 3.2, hereafter named Strong Bundle, uses definition (19) for Xk , which we believe is more promising. The minimal requirement was experimented on several academic cases and it was found to lead to set Xmin k significantly slower convergence. The reason for such behavior relies on the amount employs only of information employed to approximate the objective function: Xmin k one linearization of f (poor approximation), whereas Xk contains many linearizations approximating better the function when the iterative process progresses. The interest of the specific projection step in Algorithm 3.2 is well demonstrated for two different EDF unit-commitment problems in Figures 1 and 2, where it is readily shown that the obtained price signal is indeed closer to the initial price

17

Optimization

StrongBundle

signal (as it should be): xref and xstrong stand for the solutions obtained by the Reference and Strong Bundle algorithms, respectively. In particular, in Figure 1, the price spike showing at time step 86 is reduced significantly (by 10%). This is of interest for the operator, who can provide initial price signals with low variability in order to have control over the variance of the optimal dual signal provided by the algorithm. 140

x0 120

xref x

strong

Prices (/MW)

100

80

60

40

20

0 0

10

20

30

40 50 60 Time Steps (1/2 hour)

70

80

90

Figure 1. A comparison of prices. Initial point x0 carefully chosen for the first problem. The difference in norms are : kxref − x0 k = 41.73, kxstrong − x0 k = 29.67, kxref − xstrong k = 13.37, providing a 30% norm reduction.

In Figure 2 the prices are reduced over time steps 65 to 82, by roughly 3%. The norm of xref is reduced by 1.5%. 180

x

ref

160

xstrong

140 120

Prices (/MW)

August 5, 2014

100 80 60 40 20 0 0

10

20

30

40 50 60 Time Steps (1/2 hour)

70

80

90

Figure 2. A comparison of prices. Initial point x0 = 0 for the second problem.

18

Optimization

StrongBundle

In Table 1 we present some numerical results obtained with both the Reference and Strong Bundle algorithms for the first instance. This table shows that the Strong Bundle algorithm requires significantly more iterations and oracle calls than the Reference algorithm in order to satisfy the stopping tests given at Step 2 of Algorithm 3.2. Table 1. A comparison of the Strong Bundle algorithm with the Reference algorithm (nnex stands for nn10x ) Method Nb. Iter Nb. Oracle Calls f∗

Reference Strong Bundle

57 161

−3.4842e7 −3.4837e7

57 267

Similarly, in Table 2 we show the result for the second instance of the problem. As in Figure 2, the initial point was set as zero: x0 = 0. Table 2. A comparison of the Strong Bundle algorithm with the Reference algorithm (nnex stands for nn10x ) Method Nb. Iter Nb. Oracle Calls f∗

Reference Strong Bundle

40 128

−5.68001e7 −5.68024e7

40 243

We have verified in our numerical experiments that the Strong Bundle algorithm provides a similar function value even when stopped after a maximum number of oracle calls. For this reason we have looked at 28 reputedly very hard instances and set the maximum number of oracle calls to 500. This means that the Reference algorithm performs 500 iterations, whereas the Strong Bundle algorithm can roughly perform only 250 iterations. CPU time for both solvers are comparable, since the number of oracles calls are the same (500) and the computational effort for performing the projection step in (4) is negligible when compared to an oracle call. Results are provided in Figure 3. 8

−0.5

Function value at the obtained solution

August 5, 2014

x 10

−1

−1.5

−2

−2.5

−3

−3.5

Reference Method Strong Bundle −4 0

5

10

15 Instances

20

25

Figure 3. A comparison of approximate optimal values. The strong bundle method provides a dual value approximately 10% lower.

19

August 5, 2014

Optimization

StrongBundle

We can observe here that the Strong Bundle algorithm provides a similar dual value (even slightly better). Altogether we have sketched the interest of using the Strong Bundle algorithm despite the increased number of iterations, that can be effectively reduced by considering a hybrid approach, as described below.

5.1.

Hybrid algorithm

A hybrid approach combining both Reference and Strong Bundle algorithms could be of interest to reduce the number of iterations and oracle calls, bringing these quantities down to a more reasonable level. Such a hybrid approach could simply consist of the following two phases: First phase: run a classical proximal bundle method for several iterations or until satisfying a loose stopping test; Second phase: run and warm start the strong bundle method. For warm starting purposes, we can use the previously obtained cutting-plane model and the (better) estimate f best for f∗ , constructed in the first phase. In both phases, it is probably desirable to employ a more efficient rule to update the prox-parameter tk (e.g., poor-man Newton formula as in [31]). In Table 3, we provide more detailed statistics on the number of oracle calls needed to obtain convergence. We have also included the above described hybrid approach for comparison. These statistics were obtained on 50 easier instances. Consequently we have set the maximum number of oracle calls to 1000. The results of Table 3 Table 3. Statistics on the number of oracle calls on 50 easy instances

Method Reference Strong Bundle Hybrid

Average 157.5 388.9 65.9

Standard deviation 70.8 138.0 66.9

Minimum 57 189 19

Maximum 392 762 327

clearly show that by combining both the Reference and Strong Bundle algorithms, in addition to an efficient rule for updating the prox-parameter, one should not be concerned about the increase in the number of oracle calls of the basic Strong Bundle algorithm versus the Reference one. On a further set of 46 intermediately hard instances coming from unit commitment, stopped at convergence, we have computed the computational effort/oracle call and the total number of oracle calls needed. The results are shown in Figures 4(a) and 4(b). We have also evaluated the total amount of time spent solving the quadratic programs when compared to the total time. For this set of instances, the quadratic programs cover no more than 0.08% of total cpu time. This is typical of unitcommitment problems whose time spent in the (bundle) quadratic programs is often negligible with respect to the oracle time. The spikes for the strong bundle method shown in Figure 4(a) are solely due to the instability in CPU time of the mixed-integer programming sub-problems (required to be solved by the oracle). Figure 4(c) shows that the strong bundle method provides a effective reduction of average prices. Proximal bundle method algorithms are known to find, in general, a good quality solution in few iterations, and then many iterations performing null and “small” serious steps are required until a stopping test is met. It turns out that the good performance of the Hybrid method above is due partially to the projection onto

20

Optimization

StrongBundle

20

700

Reference Method Strong Bundle Hybrid Bundle

18 600

16 500

14

# Oracle Calls

CPU time / Oracle call (s)

12 10 8 6

400

300

200

4 2

Reference Method Strong Bundle Hybrid Bundle

0 0

5

10

15

100

20 25 Instances

30

35

40

0 0

45

(a) A comparison computation effort

5

10

15

20 25 Instances

30

35

40

45

(b) A comparison # of oracle calls

0

Average (over time) price reduction (%)

August 5, 2014

−2

−4

−6

−8

−10

−12 0

5

10

15

20 25 Instances

30

35

40

45

(c) Price reduction

Xk from the First phase, which approximates well enough the solution set. We have noticed in our numerical experiments that, after the First phase, the provided value f best is a good approximation for f∗ . Therefore, the projection onto Xk (given in (19)) provides a stability center close to the solution set, accelerating the convergence. When f best is a poor estimation for f∗ , the converge is slower; c.f. Tables 2 and 3.

6.

Concluding Remarks

We have proposed a strongly convergent proximal bundle method in a real Hilbert space. Other strongly convergent proximal variants may be obtained by changing the definitions of the predicted decrease vk and the projection set Xk in (4). In order to obtain the stated properties (i)-(iii) it is crucial to choose the projection set so that the results of Proposition 3.3 remain valid. Essentially, the projection set must be selected at each iteration in a way to contain the solution set of the optimization problem, and to be contained in the minimal requirement set defined in (17). Among the three properties of the proposed method, we emphasize that property (i) is of great interest in some practical applications, such as unit-commitment in which a minimal norm solution is required. In particular, this feature was highlighted on unit-commitment problems coming from EDF. Moreover, we mention that a hybrid approach combining a classical proximal bundle method with the proposed strong bundle method was shown to be competitive

21

August 5, 2014

Optimization

StrongBundle

with the former method, yet providing the interesting properties of the latter one. We finalize by mentioning that replacing the Euclidean norm in the projection step by a more general norm or Bregman distance can be of interest in some applications. Therefore, extending the presented analysis for more general projections in rule (4) is an interesting subject for future research.

Acknowledgment This work was completed while the second author was visiting the University of British Columbia. The author is very grateful for the warm hospitality. JYBC was partially supported by CNPq grants 303492/2013-9, 474160/2013-0 and 202677/2013-3 and by project CAPES-MES-CUBA 226/2012.

References [1] Kelley J. The cutting-plane method for solving convex programs. Journal of the Society for Industrial and Applied Mathematics. 1960;8(4):703–712. [2] Kiwiel K. Proximity control in bundle methods for convex nondiferentiable minimization. Math Programming. 1990;46(1-3):105–122. [3] Lemar´echal C. Constructing bundle methods for convex optimization. In: Fermat days 85: Mathematics for optimization. Vol. 129 of North-Holland Mathematics Studies; North-Holland; 1986. p. 201–240. [4] Bello-Cruz JY, de Oliveria W. Level bundle-like algorithms for convex optimization. Journal of Global Optimization. 2014;59(4):787–809. [5] Solodov M. On approximations with finite precision in bundle methods for nonsmooth optimization. Journal of Optimization Theory and Applications. 2003 October;119(1):Springer Netherlands; Available from: http://www.springerlink.com/ content/m0100406p1rn1611/. [6] Correa R, Lemar´echal C. Convergence of some algorithms for convex minimization. Mathematical Programming. 1993;62(2):261–275. [7] de Oliveira W, Sagastiz´abal C, Lemar´echal C. Bundle methods in depth: a unified analysis for inexact oracles. Available at http://wwwoptimizationonlineorg/DB HTML/2013/02/3792html. 2013;. [8] Beck A, Sabach S. A first order method for finding minimal norm-like solutions of convex optimization problems. Mathematical Programming. 2013;7(3):1–22. [9] Kanzow C, Qi H, Qi L. On the minimum norm solution of linear programs. Journal of Optimization Theory and Applications. 2003;116(2):333–345. [10] Bello-Cruz JY, Iusem AN. A strongly convergent method for nonsmooth convex minimization in hilbert spaces. Numerical Functional Analysis and Optimization. 2011; 32(10):1009–1018. [11] Burachik RS, Iusem AN. Set-valued mappings and enlargements of monotone operators. Vol. 8 of Springer Optimization and Its Applications. Springer Berlin; 2008. [12] Bauschke H, Combettes P. Convex analysis and monotone operator theory in hilbert spaces. CMS Books in mathematics; Springer Berlin; 2011. [13] Alber YI, Iusem AN, Solodov MV. On the projected subgradient method for nonsmooth convex optimization in a hilbert space. Mathematical Programming. 1998; 81:23–35. [14] Tahanan M, van Ackooij W, Frangioni A, Lacalandra F. Large-scale unit commitment under uncertainty: a literature survey. Preprint available http://compass2diunipiit/TR/defaultaspx. 2014;submitted. [15] Finardi E, Silva ED, Sagastiz´abal C. Solving the unit commitment problem of hy-

22

August 5, 2014

Optimization

[16] [17]

[18]

[19]

[20] [21]

[22]

[23]

[24] [25] [26]

[27]

[28] [29]

[30] [31]

StrongBundle

dropower plants via Lagrangian relaxation and sequential quadratic programming. Computational & Applied Mathematics. 2005;24(3):317–341. Dubost L, Gonzalez R, Lemar´echal C. A primal-proximal heuristic applied to french unitcommitment problem. Mathematical programming. 2005;104(1):129–151. Finardi E, Silva ED. Solving the hydro unit commitment problem via dual decomposition and sequential quadratic programming. Power Systems, IEEE Transactions on. 2006;21(2):835–844. van Ackooij W, Henrion R, M¨oller A, Zorgati R. Joint chance constrained programming for hydro reservoir management. Optimization and Engineering. 2014;15:509– 531. Langrene N, van Ackooij W, Br´eant F. Dynamic constraints for aggregated units: Formulation and application. IEEE transactions on Power Systems. 2011;26(3):1349– 1356. Le K, Jackups R, Feinstein J, Griffith J. Operational aspects of generation cycling. IEEE Transactions on Power Systems. 1990 May;5(4):1194–1203. van Ackooij W. Decomposition approaches for block-structured chance-constrained programs with application to hydro-thermal unit-commitment. Submitted ; Preprint CR-2012-08. 2012;Available from: http://www.lgi.ecp.fr/Biblio/PDF/ CR-LGI-2012-08.pdf. Bertsimas D, Litvinov E, Sun XA, Zhao J, Zheng T. Adaptive robust optimization for the security constrained unit commitment problem. IEEE Transactions on Power Systems. 2013 March;28(1):52–63. N¨ urnberg R, R¨omisch W. A two-stage planning model for power scheduling in a hydrothermal system under uncertainty. Optimization and Engineering. 2003 December; 3:355–378. Sagastiz´abal C. Divide to conquer: Decomposition methods for energy optimization. Mathematical Programming. 2012;134(1):187–222. Zaourar S, Malick J. Prices stabilization for inexact unit-commitment problems. Mathematical Methods of Operations Research, to Appear. 2013;:1–19. Brannlund U, Kiwiel KC, Lindberg PO. A descent proximal level bundle method for convex nondifferentiable optimization. Operations Research Letters. 1995;17(3):121 – 126. Gonzalez R, Bongrain M, Renaud A. Unit commitment handling transmission constraints with an interior point method. In: 13th PSCC conference, Trondheim; Vol. 2; 1999. p. 715–723. Takriti S, Birge J. Using integer programming to refine lagrangian-based unit commitment solutions. IEEE Transactions on Power Systems. 2000;15(1):151–156. Feltenmark S, Kiwiel K. Dual applications of proximal bundle methods, including lagrangian relaxation of nonconvex problems. SIAM Journal on Optimization. 2000; 10(3):697–721. Batut J, Renaud A. Daily scheduling with transmission constraints: A new class of algorithms. IEEE Transactions on Power Systems. 1992;7(3):982–989. Lemar´echal C, Sagastiz´abal C. An approach to variable metric bundle methods. Lecture Notes in Control and Information Science. 1994;197:144–162.

23

A strongly convergent proximal bundle method for convex ...

In this paper we study a different stability-center classification. rule for proximal bundle methods. We show that the proposed bundle variant has at least two. particularly interesting features: (i) the sequence of stability centers generated by the method. converges strongly to the solution that lies closest to the initial point; (ii) if ...

182KB Sizes 1 Downloads 227 Views

Recommend Documents

Minimizing Lipschitz-continuous strongly convex ...
Sep 14, 2011 - X := {x ∈ Zn s.t. Ax ≤ a} is a set of all integer points lying in a polyhedron ...... supported by the SFB TR 63 funded by the German Science ...

A Distributed Subgradient Method for Dynamic Convex ...
non-hierarchical agent networks where each agent has access to a local or ... of computation and data transmission over noisy wireless networks for fast and.

Level bundle-like algorithms for convex optimization.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

The method of reflection-projection for convex feasibility ...
Feb 22, 2002 - positive semidefinite) solution to linear constraints in Rn (resp. in Sn, the space ...... These algorithms fall into two disjoint classes: the so-called.

Bundle Method for Nonconvex Nonsmooth Constrained ...
the optimization method for constrained problems presented in [16, 14]. .... 5. If h(x∗) < 0 then ∂1F(x∗,x∗) = ∂f(x∗), which gives λ1 = 0, and therefore x∗ is.

Computing Uniform Convex Approximations for Convex ...
piecewise degree-1 polynomial approximations fh ≥ ̂f, and derive estimates of fh − ̂f .... Next, let p, q ∈ R[y], with q > 0 on D, and let f ∈ C(D) be defined as.

A fast convex conjugated algorithm for sparse recovery
of l1 minimization and run very fast on small dataset, they are still computationally expensive for large-scale ... quadratic constraint problem and make use of alternate minimiza- tion to solve it. At each iteration, we compute the ..... Windows XP

A minmax theorem for concave-convex mappings with ...
Sion [4] or see Sorin [5] and the first chapter of Mertens-Sorin-Zamir [2] for a .... (5). Then X and Y are both finite dimensional but unbounded, f is concave-.

A Note on Convex Relaxations for Non-Metric ...
13 Aug 2012 - i,j ı{rij. 2 + sij ≤ θij} − ∑ i,j ı{pi − pj − rij = 0} −. ∑ i,j ı{sij − qi = 0}... . (3). We will derive the primal of this expression using the following variant of Fenchel duality, min x f(Ax) = max y:AT y=0. −fâ

A Convex Hull Approach to Sparse Representations for ...
noise. A good classification model is one that best represents ...... Approximate Bayesian Compressed Sensing,” Human Language Tech- nologies, IBM, Tech.

A norm-concentration argument for non-convex ... - Semantic Scholar
Local quadratic (Fan & Li,'01) vs. local linear (Zou & Li,'08) bound, tangent at ±3. [Despite the latter appears to be a closer approximation, framing the iterative estimation into the E-M methodology framework, it turns out they are in fact equival

Strongly lifting modules.pdf
Let be a module then an epimorphism. is called a projective cover of a. module if and only if is a. small epimorphism, and is a projective,. equivalently if and only if is an. epimorphism , is a projective and. , see [4], [5] and [3]. 5. An R- module

A Convex Regularizer for Reducing Color Artifact in ...
as a building block for various color image recovery for- mulation, where the associated convex ..... denoising case, the computational cost of both methods are low because the .... Color Res. and App., 30(1):21–30, 2005. 3. [29] L. Shen and C.

A Convex Hull Approach for the Reliability-based Design Optimization ...
The proposed approach is applied to the reliability-based design of a ... design optimization, nonlinear transient dynamic problems, data mining, ..... clearer visualization of the response behavior, it is projected on the (response, length) ...

A Convex Hull Approach for the Reliability-Based ...
available Finite Element software. However, these ..... the explicit software ANSYS/LS-DYNA and the ..... Conferences – Design Automation Conference. (DAC) ...

RESONANCES AND DENSITY BOUNDS FOR CONVEX CO ...
Abstract. Let Γ be a convex co-compact subgroup of SL2(Z), and let Γ(q) be the sequence of ”congruence” subgroups of Γ. Let. Rq ⊂ C be the resonances of the ...

A norm-concentration argument for non-convex ... - Semantic Scholar
(Chartland, '07), signal reconstruction (Wipf & Rao, '05). • 0-norm SVM classification (Weston et al., '03) (results data-dependent). • genomic data classification ...

A norm-concentration argument for non-convex ...
lariser in real high-dimensional genomic data classi- ... success appeared to be data dependent. .... dimensions, the regularisation term of any of the possi-.

A Convex Hull Approach to Sparse Representations for ...
1IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA. ... data or representations derived from the training data, such as prototypes or ...

Bonus play method for a gambling device
Mar 14, 2006 - See application ?le for complete search history. (Us). (56) ... Play and/0r apply ..... As shoWn, there are numerous variations on the theme that.

Strongly Rational Sets for Normal-Form Games
Sep 29, 2015 - probability measures with support in Bi: ∆(Bi) = {αi ∈ ∆(Ai) | αi(Bi)=1}. If. G ∈ G, that is, payoff functions are continuous and strategy sets ...

A Course on Convex Geometry
bdA boundary of A. If f is a function on Rn with values in R or in the extended real line [−∞,∞] and ..... i = 1,...,n + 2, we consider the system of linear equations.

A comprehensive study of Convergent and ... - Scala Language
Jan 13, 2011 - Application areas may include computation in delay-tolerant networks, .... We define the causal history [38] C of replicas of some object x as follows:2 ...... Telex: A semantic platform for cooperative application development.

A comprehensive study of Convergent and ... - Scala Language
Jan 13, 2011 - abroad, or from public or private research centers. L'archive .... computing, data aggregation, and partition-tolerant cloud computing. Since, by ...