Signal Recovery With Certain Involved Convex Data ...

Viewer
Transcript

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

1

Signal Recovery With Certain Involved Convex Data-fidelity Constraints Shunsuke Ono, Member, IEEE, and Isao Yamada, Fellow, IEEE

Abstract—This paper proposes an optimization framework that can efficiently deal with convex data-fidelity constraints onto which the metric projections are difficult to compute. Although such an involved data-fidelity constraint is expected to play an important role in signal recovery under non-Gaussian noise contamination, the said difficulty precludes existing algorithms from solving convex optimization problems with the constraint. To resolve this dilemma, we introduce a fixed point set characterization of involved data-fidelity constraints based on a certain computable quasi-nonexpansive mapping. This characterization enables us to mobilize the hybrid steepest descent method to solve convex optimization problems with such a constraint. The proposed framework can handle a variety of involved data-fidelity constraints in a unified manner, without geometric approximation to them. In addition, it requires no computationally expensive procedure such as operator inversion and inner loop. As applications of the proposed framework, we provide image restoration under several types of non-Gaussian noise contamination with illustrative examples. Index Terms—Constrained convex optimization, data-fidelity constraint, fixed point set characterization, hybrid steepest descent method, signal recovery.

I. I NTRODUCTION Signal recovery, such as denoising, deconvolution, interpolation, and compressed sensing reconstruction, can be cast as inverse problems of the form: v = D(Φ¯ u),

(1)

¯ ∈ RN is an where v ∈ RM is an observation vector, u N unknown signal of interest, Φ : R → RM is a matrix representing an observation process (e.g., convolution and/or downsampling), and D : RM → RM is a noise contamination process that is not necessarily additive. One of the most popular approaches to achieve satisfactory signal recovery under such an ill-posed/ill-conditioned nature is to translate it into convex optimization problems [1]–[6], ¯ is where some a priori knowledge on the unknown signal u incorporated as prior/regularization function(s). Once done so, we can benefit from convex optimization techniques to solve Manuscript received xxxxxx xx, 2015; revised xxxxx xx, 2015 and xxxxx xx, 2015; accepted August 14, 2015. Date of publication xxxx xx, 2015; date of current version xxxx xx, 2015. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ana Perez-Neira. This work was supported in part by JSPS Grants-in-Aid (B-21300091). S. Ono is with Imaging Science and Engineering Laboratory, Tokyo Institute of Technology, Yokohama, Kanagawa 226-8503, Japan (e-mail: [email protected]). I. Yamada is with the Department of Communications and Computer Engineering, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier xx.xxxx/TSP.2015.xxxxxxx

these problems efficiently. In particular, proximal splitting algorithms (see, e.g., [3], [7], [8]) have been successfully applied to large scale signal recovery (typically N > 104 ). A. Data-fidelity Treatment ¯ as a solution of a To obtain a reasonable estimate of u convex optimization problem, the notion of data-fidelity to observation must be properly treated in the problem. This is usually realized using a data-fidelity function ψv : RM → R ∪ {∞}, which is proper lower semicontinuous convex and is designed based on the type of noise contamination process. There are roughly two ways of data-fidelity treatment. One is to minimize the sum of τ ψv (Φu) and some regularization function(s), where τ > 0 controls the trade-off between datafidelity and a priori knowledge. The other is to minimize some regularization function(s) subject to a data-fidelity constraint ψv (Φu) ≤ α, or equivalently, Φu ∈ lev≤α ψv , where lev≤α ψv denotes the lower level set of ψv (see Sec. II for the definition), and α ∈ R is a given level of ψv that determines the “degree” of fidelity to an observation vector v The former case, which we call the additive-fidelity formulation, is in general easier to optimize than the latter case, which we refer to as the fidelity-constrained formulation. Hence, most existing signal recovery methods adopt the former one. On the other hand, the fidelity-constrained formulation enjoys the following advantages over the additive-fidelity one. Universality of once determined parameter: We can determine α almost independent of what regularization functions are employed. This means that once finding a suitable α for a regularization function, the same α would also be suitable for different regularization functions under the same noise intensity, which is convenient for comparing various regularization functions. By contrast, a suitable τ in the additive-fidelity formulation depends strongly on the choice of regularization functions (see Tab 1 for experimental validation). Facilitation of parameter setting: In many cases, α itself is closely related to some statistical parameter on noise contamination (e.g., variance). Hence, one can select α based on such information obtained with the use of existing noise estimation methods, as also addressed in [9]–[12]. Also, if one has an oracle or representative signal, then one can use it to find α, but not to find τ . For these advantages, this paper focuses on the fidelityconstrained formulation. The most popular data-fidelity constraint would be the ℓ2 -norm ball, namely, ψv (Φu) := ∥Φu− v∥2 ≤ α, which is usually employed under additive Gaussian noise contamination, e.g., in [9], [13]–[16]. The ℓ1 -norm ball ψv (Φu) := ∥Φu − v∥1 ≤ α is adopted in [17], which is

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

recognized as a suitable measure under impulsive noise contamination. The tractability of these data-fidelity constraints in optimization stems from the fact that the metric projections onto the level sets of these norms are computable.1 However, if the metric projection onto lev≤α ψv is difficult to compute, convex optimization problems with the data-fidelity constraint cannot be solved even by proximal splitting algorithms. Such data-fidelity functions are essential to achieve satisfactory recovery under various noise contamination, as has been demonstrated in the context of the additive-fidelity formulation [21]–[27]. Thus, together with the advantages of the fidelityconstrained formulation, optimization techniques that can handle involved data-fidelity constraints are highly demanded. Several approaches to resolve this dilemma have been considered. One is to rewrite the fidelity-constrained formulation with a given α to the equivalent additive-fidelity formulation with the exactly corresponding τ . However, such translation is usually impossible due to the difficulty of finding such τ from a given α.2 We leave discussions on how our work relates to the other approaches [10]–[12], [29]–[32] to Sec. III-E. B. Contribution The goal of this paper is to establish a unified optimization framework that can handle data-fidelity constraints onto which the metric projections are difficult to compute in large scale signal recovery. To this end, we establish a generic characterization of such involved data-fidelity constraints as the fixed point set of a certain computable quasi-nonexpansive mapping. Once done so, the fixed point set characterization enables us to mobilize a fixed point algorithm called the hybrid steepest descent method [33]–[35] to solve convex optimization problems with such a constraint efficiently. Different from existing approaches, the proposed method covers an extensive class of involved data-fidelity constraints in a unified manner, without geometric approximation to them. Specifically, we first embody the fidelity-constrained formulation as a convex optimization problem, in which the sum of convex functions are minimized over the intersection of lev≤α ψv and additional constraints. Second, the problem is rewritten via a fixed point set characterization with the help of an operator called the subgradient projection relative to ψv , which is easy to compute in many cases, even when the metric projection onto lev≤α ψv is difficult to compute. Indeed, the subgradient projection has been utilized as a key operator in various applications, e.g., adaptive filtering and online learning [4], [36]–[38]. We further reformulate the problem in a certain product space to circumvent another computational difficulty caused by the composition involving Φ. Third, we derive an algorithmic solution to the reformulated problem based on the hybrid steepest descent method. The algorithm not only requires no computationally expensive procedure such as operator inversion and inner loop but also has a structure suitable for parallel implementation. 1 Efficient computations of the metric projection onto the level set of the (weighted) ℓ1 norm were proposed in [18]–[20]. 2 The case where the objective function is the sum of a homogeneous ψ v and a strictly-convex smooth function was studied in [28].

2

The remainder of the paper is organized as follows. Sec. II provides selected mathematical preliminaries on quasinonexpansive mappings, proximal tools, and the hybrid steepest descent method, which are necessary for the proposed method. Sec. III is the main section devoted to establishing the proposed method with guaranteed convergence and discussing related work. In Sec. IV, image restoration under several types of non-Gaussian noise contamination is presented as illustrative applications of the proposed method, where we also show comprehensive experimental results. Finally, we conclude the paper in Sec. V. A preliminary version of this study appeared in conference proceedings [39]. II. M ATHEMATICAL P RELIMINARIES Throughout the paper, let R+ and R++ denote the sets of nonnegative real and positive real numbers, respectively. Let H be a real Hilbert space equipped with an inner product ⟨·, ·⟩ and its induced norm ∥ · ∥.3 We denote the set of all proper lower semicontinuous convex functions on H by Γ0 (H)4 , and the lower level set of f ∈ Γ0 (H) with α ∈ R by lev≤α f := {x ∈ H| f (x) ≤ α}.5 The fixed point set of a mapping T : H → H is denoted by Fix(T ) := {x ∈ H| T (x) = x}. A. Quasi-nonexpansive Mappings Definition 1 (Quasi-nonexpansive mapping). A mapping T : H → H is said to be quasi-nonexpansive with Fix(T ) ̸= ∅ if T satisfies, for all x ∈ H and all y ∈ Fix(T ), ∥T (x) − y∥ ≤ ∥x − y∥. The nonempty fixed point set of a quasi-nonexpansive mapping becomes a closed convex set (see, e.g., [40]). In what follows, we introduce special subclasses of quasi-nonexpansive mappings and their typical examples that will play important roles in the proposed method. Definition 2 (Attracting quasi-nonexpansive mapping [34]). A quasi-nonexpansive mapping T is said to be attracting if T satisfies, for all x ∈ H \ Fix(T ) and all y ∈ Fix(T ), ∥T (x) − y∥ < ∥x − y∥. In particular, an attracting quasi-nonexpansive mapping T is called δ-strongly attracting if there exists some δ > 0 satisfying, for all x ∈ H and all y ∈ Fix(T ), δ∥x − T (x)∥2 ≤ ∥x − y∥2 − ∥T (x) − y∥2 . A typical example of attracting quasi-nonexpansive mappings is the subgradient projection defined as follows. Example 1 (Subgradient projection). Suppose that a continuous convex function f : H → R satisfies lev≤α f ̸= ∅. Let f ′ (x) ∈ ∂f (x) (∀x ∈ H) be a selection of the subdifferential that the Euclidean space is a typical example of H. function f : H → (−∞, ∞] is called proper lower semicontinuous convex if dom(f ) := {x ∈ H| f (x) < ∞} ̸= ∅, lev≤α f is closed for every α ∈ R, and f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) for every x, y ∈ H and λ ∈ (0, 1), respectively. 5 We also use the notations: lev ≥α f := {x ∈ H| f (x) ≥ α}, lev<α f := {x ∈ H| f (x) < α}, and lev>α f := {x ∈ H| f (x) > α}. 3 Note 4A

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

∂f : H → 2H : x → {g ∈ H| (∀y ∈ H) ⟨y − x, g⟩ + f (x) ≤ f (y)}. Then the operator Tsp(f,α) : H → H defined by { f (x)−α ′ x − ∥f if f (x) > α, ′ (x)∥2 f (x), Tsp(f,α) (x) := (2) x, otherwise, is called the subgradient projection relative to f (of level α), and is attracting quasi-nonexpansive with Fix(Tsp(f,α) ) = lev≤α f (see, e.g., [40]). The subgradient projection is also a quasi-shrinking mapping, introduced in Def. 5, if the selection f ′ is bounded on any bounded set and dim(H) < ∞ (see [34, Theorem 3(d)]). Definition 3 (Nonexpansive mapping). A quasi-nonexpansive mapping T : H → H is called nonexpansive if T satisfies, for all x, y ∈ H, ∥T (x) − T (y)∥ ≤ ∥x − y∥. If T is nonexpansive and attracting quasi-nonexpansive, we simply call T an attracting nonexpansive mapping. We note that if Fix(T ) ̸= ∅, the nonexpansivity automatically implies the quasi-nonexpansivity. Definition 4 (Averaged nonexpansive mapping). A nonexpansive mapping T : H → H is said to be α-averaged if there exist some α ∈ [0, 1) and some nonexpansive mapping Tb : H → H such that T = (1 − α)I + αTb (I denotes the identity operator), and Fix(T ) = Fix(Tb) for α ∈ (0, 1). Under Fix(T ) ̸= ∅, T is α(̸= 0)-averaged if and only if T 1−α α -strongly attracting [34], [35]. Several examples of averaged nonexpansive mappings used in our framework are listed as follows. is

Example 2 (Gradient descent operator). Let f : H → R be a differentiable convex function whose gradient ∇f : H → H is κ-Lipschitzian for some κ > 0.6 Assume that argmin f (x) ̸= ∅. Then, for every µ ∈ (0,

2 κ ],

x∈H

the operator:

2ε κ]

⊂

Example 3 (Metric projection onto closed convex sets). Given a nonempty closed convex set C ⊂ H and any point x ∈ H, there exists a unique point PC (x) ∈ C satisfying dC (x) := min ∥x − y∥ = ∥x − PC (x)∥. y∈C

Definition 5 (Quasi-shrinking mapping [34]). Suppose that a mapping T : H → H is quasi-nonexpansive with Fix(T ) ∩ C ̸= ∅ for some closed convex set C ⊂ H. Then, by letting ▷(Fix(T ), r) := {x ∈ H| dFix(T ) (x) ≥ r} for r ≥ 0, T is called quasi-shrinking on C if the function D : R+ → [0, ∞] :   inf d (x) − dFix(T ) (T (x)),  x∈▷(Fix(T ),r)∩C Fix(T ) r 7→ (5) if ▷ (Fix(T ), r) ∩ C ̸= ∅,   ∞, otherwise satisfies D(r) = 0 ⇔ r = 0. Finally, we show a useful algebraic property of quasinonexpansive mappings. Fact 1 (Composition of quasi-nonexpansive mappings). Let T1 : H 7→ H be quasi-nonexpansive, and T2 : H 7→ H attracting quasi-nonexpansive. Assume Fix(T1 )∩Fix(T2 ) ̸= ∅. Then T := T2 T1 is quasi-nonexpansive and Fix(T ) = Fix(T1 ) ∩ Fix(T2 ). In addition, if T1 is also attracting quasinonexpansive, then so is T (e.g., [34], [35]). B. Proximity Operator and Moreau Envelope We will exploit the proximity operator and Moreau envelope in the proposed method. Definition 6 (Proximity operator [44]). For any γ > 0, the proximity operator of index γ of f ∈ Γ0 (H) is defined by y∈H

is nonexpansive, and Fix(Gf ) = argmin f (x). Moreover, the operator Gf is ε-averaged nonexpansive if µ ∈ (0, (0, κ2 ] for 0 < ε < 1 (see, e.g., [35]).

∩J becomes 21 -averaged nonexpansive, and Fix(Tpp ) = j=1 Cj , ∑J where wj > 0 (j = 1, . . . , J) such that j=1 wj = 1. Now we introduce the notion of quasi-shrinkingness, a special subclass of quasi-nonexpansive mappings. It plays a central role in the convergence analysis of the hybrid steepest descent method for subgradient projection-type operators, which will be explained in detail in Sec. II-C.

proxγf : H → H : x 7→ argmin f (y) +

Gf := I − µ∇f x∈H

3

6 A mapping T : H → H is called κ-Lipschitzian if ∥T (x) − T (y)∥ ≤ κ∥x − y∥ for some κ > 0 and every x, y ∈ H.

(6)

Definition 7 (Moreau envelope [44]). For f ∈ Γ0 (H), the following differentiable convex function: γ

Example 4 (Parallel projection [41]–[43]). Suppose that Cj ⊂ H ∩J(j = 1, . . . , J) are nonempty closed convex sets satisfying j=1 Cj ̸= ∅. Then, the operator: ∑J Tpp := j=1 wj PCj (4)

− y∥2 ,

where the existence and the uniqueness of the minimizer are guaranteed respectively by the coercivity7 and the strict 1 convexity of f (·)+ 2γ ∥x−·∥2 . When an efficient computation of proxγf is available, we call f proximable.

(3)

The mapping PC : H → H is called the metric projection onto C, which is 12 -averaged nonexpansive with Fix(PC ) = C. When an efficient computation of PC is available, we call C projectable. The metric projection onto C can be seen as a subgradient projection relative to dC , i.e., Tsp(dC ,0) = PC .

1 2γ ∥x

f : H → R : x 7→ min f (y) + y∈H

1 2γ ∥x

− y∥2

(7)

is called the Moreau envelope of index γ > 0 of f . The gradient of γ f is given via proxγf , i.e., ∇γ f : H → H : x 7→ γ1 (x − proxγf (x)),

(8)

1 γ -Lipschitzian.

which is The Moreau envelope of f is a lower bound of f , i.e., for all γ > 0 and all x ∈ H, f (x) ≥ γ f (x), and it converges pointwise to f on dom(f ) as γ ↓ 0 (see, e.g., [45, Theorem 12.32(i)]), i.e., limγ↓0 γ f (x) = f (x) (∀x ∈ dom(f )). This property implies that the Moreau envelope is a smooth approximation of f . 7A

function f ∈ Γ0 (H) is called coercive if ∥x∥ → ∞ ⇒ f (x) → ∞.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

4

C. Hybrid Steepest Descent Method

A. Problem Formulation

Let T : H → H be a quasi-nonexpansive mapping whose fixed point set is nonempty, and F : H → R a differentiable convex function with a Lipschitzian gradient over T (H) := {T (x) ∈ H| x ∈ H}. Consider the problem:

The fidelity-constrained formulation considered is given as the following convex optimization problem.

find x⋆ ∈ Ω := argmin F (x).

(9)

x∈Fix(T )

The hybrid steepest descent method [33]–[35]: x(n+1) := T (x(n) ) − λ(n+1) ∇F (T (x(n) ))

(10)

is a simple algorithmic solution to (9), where (λ(n) )n≥1 ⊂ R+ is a slowly decreasing sequence. The hybrid steepest descent method has been successfully applied to various signal processing problems [35], [46]–[48]. The next fact, a special case of [34, Theorem 5], is a convergence analysis of the hybrid steepest descent method for quasi-shrinking mappings. This will be used as the foundation for Theorem 1. Fact 2 (Convergence of the hybrid steepest descent method). Assume dim(H) < ∞. Let T : H → H be quasi-nonexpansive with bounded Fix(T ) ̸= ∅, F : H → R a differentiable convex function whose gradient ∇F is Lipschitzian over T (H). (n) Then, by using any sequence ∑ (λ (n))n≥1 ⊂ R+ satisfying λ = ∞, the sequence limn→∞ λ(n) = 0 and n≥1 (n) (0) (x )n≥0 generated, for arbitrary x ∈ H, by (10) satisfies lim dΩ (x(n) ) = 0,

n→∞

(11)

if there exists a nonempty bounded closed convex set C(⊃ (x(n) )n≥0 ) on which T is quasi-shrinking. III. P ROPOSED M ETHOD We first introduce a generic convex optimization problem as our target fidelity-constrained formulation. Then, we establish a fixed point set characterization of the data-fidelity constraint (and additional constraints), and reformulate the problem in a certain product space to further alleviate computational difficulty. Finally, we present the proposed algorithm based on the hybrid steepest descent method in detail. To simplify the mathematical discussion, we assume in the following that the data-fidelity function ψv is smooth on its domain, which will not restrict the practical applicability of the proposed formulation in many signal recovery scenarios (see Sec. IV). Although the ℓ1 norm used as a suitable measure of the sparsity of outliers (e.g., impulsive noise) does not meet this assumption, one can adopt other smooth functions that approximate the ℓ1 norm as ψv , such as the Huber loss function [49], which is also known as the Moreau envelope (see Def. 7) of the absolute value function. Indeed, the Huber loss function has nice properties, as reported in [50], [51], and it will be used as a data-fidelity function in Sec. IV-C. In what follows, several finite-dimensional product-spaces are introduced. Each space serves as a Hilbert space defined in Sec. II, in which various operators are defined. Although each space is equipped with a different Euclidean norm, we denote them by ∥ · ∥ for simplicity.

Problem 1 (Fidelity-constrained signal recovery). Let ψv ∈ Γ0 (RM ) be a data-fidelity function. Assumptions: For a given α ∈ R, (a1) The data-fidelity function ψv is continuously differentiable over int(dom(ψv )), i.e., we have a continuous gradient ∇ψv : int(dom(ψv )) → RM (Note: ψv can take ∞ on RM ).8 (a2) lev<α ψv ∩ int(dom(ψv )) is nonempty and open. (a3) lev>α ψv ∩ int(dom(ψv )) is nonempty and open. (a4) lev=α ψv (:= lev≤α ψv ∩ lev≥α ψv ) ∩ int(dom(ψv )) ̸= ∅. (a5) Mα := inf x∈lev>α ψv ∩int(dom(ψv )) ∥∇ψv (x)∥ > 0. ¯ in (1) by solving The problem is to estimate u ∑ min f (L0 u + M0 w) + i∈I γ gi (Li u + Mi w) N Q (u,w)∈R ×R [ ∩ (u, w) ∈ j∈J Cj , s.t. Φu ∈ lev≤α ψv ∩ K, where • u ∈ RN is a variable corresponding to the recovered signal. • w ∈ RQ is a variable corresponding to some separated component or is an auxiliary variable (if necessary). • f : RN → R is a differentiable convex function whose gradient is Lipschitzian. • gi ∈ Γ0 (RNi ) (i ∈ I) are proximable. • Li : RN → RNi and Mi : RQ → RNi (i ∈ I ∪ {0}) are linear operators. N • Cj ⊂ R∩ × RQ (j ∈ J ) are projectable closed convex sets such that j∈J Cj ̸= ∅ and C1 is bounded. • lev≤α ψv is a data-fidelity constraint. • K is a projectable bounded closed convex subset of int(dom(ψv )) such that lev≤α ψv ∩ K ̸= ∅. • I and J are index sets with 0 ≤ |I| < ∞, 1 ≤ |J | < ∞, and γ > 0. We assume that the problem has a solution. Remark 1 (On Prob. 1). On the variable w: The objective function seems to be a bit complicated because of the variable w (and linear operators). The reason why we introduce this additional variable is that it is useful to avoid notational confusion when we adopt some regularization functions effective for signal recovery, such as the total generalized variation used in Sec. IV-D. Use of Moreau envelope: The proposed algorithm is based on the hybrid steepest descent method, which dictates the objective function to be differentiable with Lipschitzian gradient. Hence, we use Moreau envelope relaxation to benefit from nonsmooth convex functions effective for signal recovery, such as the ℓ1 norm. If we choose a sufficiently small γ (typically γ ≤ 0.01), the Moreau envelope of g is expected to approximate the original function g. Indeed, Moreau envelope relaxation has been shown to be useful in many signal processing applications, as found in [35], [51], [52]. This will also be demonstrated in Sec. IV-D. 8 The

interior of a set C is its largest open subset, denoted by int(C).

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

This operator is also attracting quasi-nonexpansive on Y with

B. Fixed Point Set Characterization To make ∩ Prob. 1 tractable, first, we characterize lev≤α ψv ∩ K and j∈J Cj as the fixed point sets of certain attracting quasi-nonexpansive mappings, respectively. By noting ∂ψv = {∇ψv } over int(dom(ψv )), we can define a domain-restricted subgradient projection relative to ψv of level α by Tesp(ψv ,α) : int(dom(ψv )) → RM { ψv (x)−α x − ∥∇ψ if ψv (x) > α, 2 ∇ψv (x), v (x)∥ : x 7→ x, otherwise.

(12)

and for any x ∈ int(dom(ψv )) \ lev≤α ψv and any y ∈ lev≤α ψv ∩ int(dom(ψv )), ∥Tesp(ψv ,α) (x) − y∥ < ∥x − y∥.

Proof: See Appendix A. Then we introduce the following operator: (13)

K M K Proposition 1 (Properties of Tsp (ψv ,α) ). On R , Tsp(ψv ,α) is continuous and is attracting quasi-nonexpansive with

(14)

Proof: See Appendix B. Meanwhile, by letting X := RN × RQ (= RN +Q ) equipped with ∩ the canonical inner product and the Euclidean norm, j∈J Cj can also be characterized as the fixed point set J of ∑ the parallel projection Tpp : X → X : (x1 , x2 ) 7→ j∈J wj PCj (x1 , x2 ) (see Example 4), i.e., ∩ J Fix(Tpp ) = j∈J Cj . (15) Thus, all the constraints in Prob. 1 can be rewritten as [ J (u, w) ∈ Fix(Tpp ), K Φu ∈ Fix(Tsp(ψv ,α) ).

which is easily verified, for all (x1 , x2 , x3 ) ∈ Y \Fix(Tpp×sp ) and all (x′1 , x′2 , x′3 ) ∈ Fix(Tpp×sp ), as J K ′ 2 = ∥Tpp (x1 , x2 ) − (x′1 , x′2 )∥2 + ∥Tsp (ψv ,α) (x3 ) − (x3 )∥

< ∥(x1 , x2 ) − (x′1 , x′2 )∥2 + ∥x3 − x′3 ∥2 where the inequality follows from the attracting quasiJ K nonexpansivity of Tpp and Tsp (ψv ,α) . One can see that Fix(Tpp×sp ) characterizes (16) without considering Φu = z. Next, to maintain Φu = z, we consider the following set: SΦ := {(x1 , x2 , x3 ) ∈ Y| Φx1 = x3 }.

(18)

Since SΦ is a closed subspace of Y, the orthogonal projection onto SΦ exists, i.e., SΦ = Fix(PSΦ ). However, the computation of PSΦ requires matrix inversion involving Φ, which is often expensive. Instead, we provide a fixed point set characterization of SΦ via a computationally efficient operator.

II. Tesp(ψv ,α) is continuous over int(dom(ψv )).

= lev≤α ψv ∩ K.

(17)

= ∥(x1 , x2 , x3 ) − (x′1 , x′2 , x′3 )∥2 ,

Tesp(ψv ,α) (x) = x,

K Fix(Tsp (ψv ,α) )

J K Fix(Tpp×sp ) = Fix(Tpp ) × Fix(Tsp (ψv ,α) ) ∩ = ( j∈J Cj ) × (lev≤α ψv ∩ K) ⊂ Y,

∥Tpp×sp (x1 , x2 , x3 ) − (x′1 , x′2 , x′3 )∥2

Lemma 1 (Properties of Tesp(ψv ,α) ). I. For any x ∈ lev≤α ψv ∩ int(dom(ψv )),

K M Tsp → RM : x 7→ Tesp(ψv ,α) ◦ PK (x). (ψv ,α) : R

5

(16)

C. Reformulation on Product Space K The characterization Φu ∈ Fix(Tsp (ψv ,α) ) in (16) is still not manageable in the hybrid steepest descent method because this is not on the variable u due to the composition involving Φ. To circumvent this, an augmented product space Y := X ×RM (= RN +Q+M ) (equipped with the canonical inner product and the Euclidean norm) is introduced, where we decompose Φu ∈ K K Fix(Tsp (ψv ,α) ) into the two constraints: z ∈ Fix(Tsp(ψv ,α) ) and Φu = z with an auxiliary variable z, and characterize them by computable operators on Y. First, we define an operator on Y that characterizes (u, w) ∈ J K Fix(Tpp ) and z ∈ Fix(Tsp (ψv ,α) ) as follows: J K Tpp×sp : Y → Y : (x1 , x2 , x3 ) 7→ (Tpp (x1 , x2 ), Tsp (ψv ,α) (x3 )).

Lemma 2 (Alternative characterization of SΦ ). Let Θ : Y → R : (x1 , x2 , x3 ) 7→ 21 ∥Φx1 − x3 ∥2 , and GΘ : Y → Y be the gradient descent operator w.r.t. Θ given, with µ > 0, by GΘ (x1 , x2 , x3 ) = (x1 , x2 , x3 ) − µ(Φ⊤ (Φx1 − x3 ), 0, x3 − Φx1 ). ( .. .. ) 2ε Then, for µ ∈ (0, ∥A∥ 2 ] (0 < ε < 1) and A := Φ . O .−I ∈ op

RM ×(N +Q+M ) , GΘ is attracting nonexpansive on Y, and Fix(GΘ ) = SΦ ,

(19)

where O and I are the zero and identity matrices, respectively. Proof: See Appendix C. Note that the computation of GΘ requires only the applications of Φ and Φ⊤ , i.e., much more efficient than PSΦ . Then, we compose Tpp×sp and GΘ into the operator: Ξ := PE ◦ GΘ ◦ Tpp×sp ,

(20)

where E is a projectable bounded convex set E ⊂ Y such that ∩ j∈J Cj ×K ⊆ E (e.g., E := C1 ×K). The fixed point set of Ξ exactly represents all the constraints in Prob. 1 as follows. Proposition 2 (Properties of Ξ). On Y, Ξ is continuous and is attracting quasi-nonexpansive with Fix(Ξ) = Fix(GΘ ) ∩ Fix(Tpp×sp ) [ { } ∩ (x1 , x2 ) ∈ j∈J Cj = (x1 , x2 , Φx1 ) ∈ Y . (21) Φx1 ∈ lev≤α ψv ∩ K Proof: See Appendix D. Consequently, Prob. 1 is reformulated as follows. Problem 2. Find (u⋆ , w⋆ , z⋆ ) in ΩJ :=

argmin (u,w,z)∈Fix(Ξ)

J(u, w, z),

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

E. Comparison with Prior Work

Algorithm 1: HSDM for Prob. 1 (0)

1 2 3 4 5 6 7

(0)

(0)

input : u , w , z output: u(n) , w(n) while A stopping criterion is not satisfied do ∑ (u(n) , w(n) ) ← j∈J wj PCj (u(n) , w(n) ); z(n) ← PK (z(n) ); gap ← ψv (z(n) ) − α; if gap > 0 then s ← ∇ψv (z(n) ); gap z(n) ← z(n) − ∥s∥ 2 s; u(n) ← u(n) − µ(Φ⊤ (Φu(n) − z(n) )); z(n) ← z(n) − µ(z(n) − Φu(n) ); (u(n) , w(n) , z(n) ) ← PE (u(n) , w(n) , z(n) ); ϑ0 ← ∇f (L0 u(n) + M0 z(n) ); for i ∈ I do ri ← Li u(n) + Mi w(n) ; ϑi ← γ1 (ri − proxγgi (ri )); ∑ u(n+1) = u(n) − λ(n+1) 0∪i∈I L⊤ i ϑi ; (n+1) (n) (n+1) ∑ ⊤ w =w −λ M i ϑi ; 0∪i∈I (n+1) (n) z =z ; n ← n + 1;

8 9 10 11 12 13 14 15 16 17 18

where J(u, w, z) := f (L0 u + M0 w) +

∑

γ i∈I

gi (Li u + Mi w).

Note that Φu⋆ = z⋆ from (21). D. Algorithm Since Prob. 2 is of the same form as (9), we can apply the hybrid steepest descent method to the problem, resulting in (u(n+1) , w(n+1) , z(n+1) ) := Ξ(u(n) , w(n) , z(n) ) − λ(n+1) ∇J(Ξ(u(n) , w(n) , z(n) )), (22) where ∇J : Y → Y is given by ∇J(u, w, z) = (L⊤ 0 ∇f (L0 u + M0 w) +

∑

M⊤ 0 ∇f (L0 u + M0 w) +

⊤ γ i∈I Li ∇ gi (Li u + Mi w), ⊤ γ i∈I Mi ∇ gi (Li u + Mi w), 0),

∑

which is computable from the assumptions on f and gi . The next theorem provides a convergence analysis of (22), which is a new realization of Fact 2. Theorem 1 (Convergence of (22)). Under the assump(n) tions in Prob. 1, by using any sequence ∑ (λ (n))n≥1 ⊂ R+ satisfying limn→∞ λ(n) = 0 and λ = ∞, n≥1 (n) (n) (n) the sequence (u , w , z )n≥0 generated, for arbitrary (u(0) , w(0) , z(0) ) ∈ Y, by (22) satisfies lim

min

n→∞ (u⋆ ,w⋆ ,z⋆ )∈ΩJ

6

∥(u(n) , w(n) , z(n) ) − (u⋆ , w⋆ , z⋆ )∥ = 0.

Proof: See Appendix E. We finally summarize the detailed steps of (22) in Alg. 1. Note that step 2 and 12-14 can be parallelized, allowing efficient implementation.

1) Generalized Haugazeau’s Algorithm: In [29], [53], the generalized Haugazeau’s algorithm was proposed for minimizing a strictly convex function over the fixed point set of a quasi-nonexpansive mapping in the case where the minimization of the strictly convex function over the intersection of two half-spaces is relatively easy. This algorithm can also be applied to the fidelity-constrained formulation if the objective function is such a simple strictly convex function. In contrast, the proposed method admits non-strictly-convex functions including the Moreau envelope of nonsmooth functions, which are useful for signal recovery. 2) Use of Morozov’s Discrepancy Principle: Carlavan et al. [10] and Teuber et al. [11] studied the fidelity-constrained formulation in the case where ψv is the neg-log-likelihood function of the Poisson distribution, also known as the generalized Kullback-Leibler divergence or I-divergence. Their strategies are based on proximal splitting algorithms, where the projection onto lev≤α ψv is computed via the proximity operator of ψv with an appropriate γ. Such γ is characterized by a certain nonlinear equation, which varies at each iteration. Their algorithms thus use Newton’s method for solving the equation at each iteration, implying that the theoretical convergence of the outer loop (proximal splitting algorithms) is not guaranteed in general due to the error of the innerloop solution.9 Recently, this kind of characterization of γ was considered in more general form [32]. By contrast, the proposed method can handle a wide class of lev≤α ψv in a unified manner without using inner loop, since it just computes the subgradient projection relative to ψv once at each iteration. We should note, however, that this does not necessarily imply that the proposed method is faster than proximal splitting algorithms using inner loop. This is because the convergence of the hybrid steepest descent method is relatively slow due to its diminishing stepsize. 3) Use of Epigraphical Projection: Chierchia et al. [12], [31], [55] proposed to utilize epigraphical projection techniques with proximal splitting algorithms to cope with involved convex constraints. The metric projections onto several mixed-norm balls can be exactly computed by their method. Otherwise, they approximate the target constraint, such as lev≤α ψv , via a number of hyperplanes and compute the metric projection onto the outer approximation set, i.e., the better approximation needs the more hyperplanes. On the other hand, the proposed method can deal with involved data-fidelity constraints without geometric approximation thanks to the fixed point set characterization. IV. A PPLICATIONS TO I MAGE R ESTORATION As applications of the proposed method, we present image restoration under several types of non-Gaussian noise contamination. In such a case, a suitable data-fidelity function is not the standard ℓ2 norm but a convex function of which the level set is not projectable. In what follows, we consider three non-Gaussian noise contamination cases, where each 9 Proximal splitting algorithms typically converge if the computational error of the proximity operators is absolute-summable (see, e.g., [54, Theorem 3.4]).

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

noise contamination model, the corresponding data-fidelity constraint, and its specific fixed point set characterization are presented. We also show experimental results on each case.

7

Although the set K is too strict to deal with xm = 0, by choosing sufficiently small a (e.g., 0 < a < 0.5 when Φ¯ u is an eight-bit image), it does not affect much on the restoration performance because restored images are finally quantized.

A. Poisson Noise Contamination Case 1) Model: Image restoration from observations contaminated by Poisson noise is a longstanding problem in a wide array of applications from astronomical imaging to medical imaging. Consider the following Poisson observation model: v := Pσ (Φ¯ u),

(23)

¯ ∈ RN where u + (N : the number of pixels) is an unknown original image of interest, Φ : RN → RM (≤N ) is a linear degradation operator (e.g., blur) such that θ = (θ1 , . . . , θM )⊤ := M M (N is the set of all natural Φ¯ u ∈ RM + , Pσ : R+ → N numbers) is the Poisson noise contamination process, σ > 0 is the scaling parameter determining the noise intensity (the smaller σ corresponds to the higher noise intensity), and v = (v1 , . . . , vM )⊤ ∈ NM is an observation vector. In this model, v is assumed to be a sample of M dimensional i.i.d. random vector V with the density function: ] M [ ∏ (σθm )vm pdf[V = v|θ] := exp(−σθm ) . (24) vm ! m=1 2) Data-Fidelity: Many image restoration methods for dealing with Poisson noise, e.g., [21], [23], [24], [26], [27], employ the additive-fidelity formulation, where the datafidelity function ψv is derived from the neg-log-likelihood of (24), i.e., ∑M − ln pdf[V = v|θ] = m=1 [−vm ln σθm + ln vm ! + σθm ] . (25) Taking into account the case that vm = 0 and removing the irrelevant constant, the data-fidelity function ψv : RM → (−∞, ∞] for Poisson noise contamination is given by  M  σxm − vm ln σxm , if vm > 0 and xm > 0, ∑ ψv (x) := σx , if vm = 0 and xm ≥ 0,  m m=1 ∞, otherwise. (26) Since the proximity operator of ψv is computable, an efficient solver for the additive-fidelity formulation with ψv can be realized via proximal splitting algorithms. By contrast, the data-fidelity constraint Φu ∈ lev≤α ψv is difficult to handle because the computation of the metric projection onto lev≤α ψv is not tractable. 3) Fixed Point Set Characterization: Since ψv is continuously differentiable over int(dom(ψv )) = RM ++ , i.e., for m = 1, . . . , M , { σ − xvm , if vm > 0 and xm > 0, m [∇ψv (x)]m = (27) σ, if vm = 0 and xm > 0, we can deal with lev≤α ψv with K := {x ∈ RM | xm ∈ [a, b] (0 < a < b, m = 1, . . . , M )}. (28) In this case, PK is simply calculated by pushing input values out of [a, b] into a or b (the nearest one is chosen).

B. Multiplicative Noise Contamination Case 1) Model: In the context of coherent imaging systems including synthetic aperture radar (SAR), ultrasound imaging, and laser images, the multiplicative/speckle noise is inherent, and therefore its removal is considered to be a prerequisite procedure for these applications. The degradation model of multiplicative noise is given as follows: v := (Φ¯ u). ∗ q,

(29)

RN ++ is an M (≤N )

¯ ∈ where u unknown original image of interest, Φ : RN → R is a linear degradation operator such ⊤ that θ = [θ1 , . . . , θM ]⊤ := Φ¯ u ∈ RM ++ , q = [q1 , . . . , qM ] ∈ M ⊤ M R++ is a multiplicative noise, v = [v1 , . . . , vN ] ∈ R++ is an observation vector and ‘.∗’ stands for the component-wise multiplication. In general, the multiplicative noise q is assumed to follow Gamma distribution with density function: M ∏ σ σ σ−1 q exp(−σqm ), Γ(σ) m m=1

(30)

where σ is the shape parameter. Here the scale parameter of the Gamma distribution is assumed to be σ1 , which implies that the mean and variance are 1 and σ1 , respectively, so that the smaller σ corresponds to the higher noise intensity. Thereby, as found in [56], v is a sample of M -dimensional i.i.d. random vector V with density function: ( )] M [ ∏ σσ σvm σ−1 pdf[V = v|θ] := v exp − . (31) σ Γ(σ) m θm θm m=1 2) Data-Fidelity: The neg-log-likelihood of (31) is ∑M − ln pdf[V = v|θ] = m=1 − σ(log σ − log θm ) + ln Γ(σ) − (σ − 1)qm +

σvm θm ,

and removing the irrelevant constants yields the function φv : RM → (−∞, ∞]: {∑ M vm m=1 log xm + xm , if xm > 0, φv (x) := (32) ∞, otherwise. Although φv is the best data-fidelity function for multiplicative noise contamination in the maximum likelihood sense [56], the non-convexity of φv often leads to unstable numerical solution. To circumvent this, there have been proposed several approaches. The first one is considering the restoration problem in the log domain, i.e., estimating log(u) instead of u. This approach, adopted in [22], [25], [57], is natural but cannot deal with the cases of Φ ̸= I. The second one is to use other convex data-fidelity functions, e.g., the ℓ1 norm, instead of φv , found in [58]–[60]. This approach admits Φ ̸= I but no longer has an explicit relation to φv . The final one is to convexify φv by adding a certain term to it [61], [62], which is reasonable and accepts Φ ̸= I, so that we adopt this approach.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

Especially, the convexified function ψv : RM → (−∞, ∞] recently proposed in [62]: { ∑M √ m φv (x) + β m=1 ( xvm − 1)2 , if xm > 0, ψv (x) := ∞, otherwise, (33) is a tight convex √ relaxation of φv with the approximation parameter β ≥ 2 9 6 . Hence we use the level set of this function as the data-fidelity constraint, which, to the best of our knowledge, has never been studied so far. Note that the computation of the proximity operator of ψv even requires an iterative approximation [62]. 3) Fixed Point Set Characterization: Using K in (28), we can derive a fixed point characterization of lev≤α ψv ∩K, since ψv is continuously differentiable on int(dom(ψv )) = RM ++ with ∇ψv : K → RN given, for m = 1, . . . , M , by [∇ψv (x)]m =

1 xm

−

vm x2m

+ β( v1m −

√ 1 vm xm ).

(34)

C. Mixed Gaussian-Impulse Noise Contamination Case 1) Model: Gaussian noise is one of the most standard noise model in image processing, and at the same time, impulse noise contamination also appears in many situations, such as defects in the CCD or in the transmission of images. For these reasons, mixed Gaussian-Impulse noise contamination has been considered as a realistic noise model, which is modeled as follows: v := Np,d (Φ¯ u + nσ ),

(35)

¯ ∈ RN is an unknown original image of interest, where u N Φ : R → RM is a linear degradation operator, nσ ∈ RM is an additive white Gaussian noise with standard deviation σ ∈ R+ , Np,d : RM → RM is the impulse noise contamination process with the noise ratio p ∈ [0, 1) and the noise range d := [dmin , dmax ] ⊂ R+ , and v ∈ RM (≤N ) is an observation vector. Specifically, two types of impulse noise are usually considered [63]–[68]: the salt-and-pepper noise   dmin with probability p/2, Np,d : xm 7→ dmax with probability p/2, (36)   xm with probability 1 − p, and the random-valued noise { dm with probability p, Np,d : xm 7→ xm with probability 1 − p,

(37)

for the ith entry xm of x ∈ RM (m = 1, . . . , M ), where dm (m = 1, . . . , M ) are identically and uniformly distributed random values in [dmin , dmax ]. A major approach to image restoration under mixed Gaussian-impulse noise contamination, e.g., [63]–[68], shares the following two steps: (i) pre-detect the location of outliers (i.e., pixels contaminated by impulse noise) using median-type filters (e.g., [69]); and (ii) restore images based on the additivefidelity formulation. This two-step approach has succeeded to bring excellent results when the pre-detection is performed accurately. However, if not so, the performance significantly degrades, as in the case of the random-valued impulse noise.

8

2) Data-Fidelity: Unlike the two-step approach, we introduce to use the so-called Huber loss function [49] for the datafidelity function in order to deal with outliers automatically. The Huber loss function ψv : RM → R for mixed Gaussianimpulse noise contamination is given by { M 1 2 ∑ if |xm − vm | ≤ δ, 2 (xm − vm ) , ψv (x) := 1 2 δ|xm − vm | − 2 δ , otherwise, m=1 (38) with the cut-off-value δ > 0. The Huber loss function is known as a theoretically robust fidelity measure to Gaussian noise contamination with outliers (strictly speaking, in the sense of the so-called ϵ-contamination model [49]). Indeed, it has been used for adaptive filtering in the presence of mixed Gaussian-impulse noise, which yields a robust estimation [70]–[72]. On the other hand, using the level set of the Huber loss function as the data-fidelity constraint has not yet been considered due to the difficulty of computing the metric projection. 3) Fixed Point Set Characterization: Since the function ψv is continuously differentiable everywhere with ∇ψv : RM → RM given, for m = 1, . . . , M , by   xm − vm , if |xm − vm | ≤ δ, [∇ψv (x)]m = δ, (39) if xm − vm > δ,   −δ, if xm − vm < −δ, lev≤α ψv fits the proposed method with any projectable K containing it, e.g., K in (28) with a = 0. D. Experiments The main objective of the experiments is to examine: (i) The proposed algorithm properly works for Prob. 1. Specifically, we check the convergence of the objective function and the consistency with the data-fidelity constraint. (ii) A once determined suitable value of α for a regularization function is also suitable for different regularization functions. (iii) The effectiveness of Moreau envelope relaxation is almost the same as that of the original nonsmooth functions. (iv) The proposed algorithm is computationally efficient. 1) Settings: The designs of regularization functions considered here are listed as follows (γ = 0.001 in both cases). Total Variation (TV): The so-called (isotropic) total variation (TV) [73] is widely known as a convex regularization function effective for restoring piecewise smooth images. In this case, the objective function of Prob. 1 is reduced to γ

g1 (L1 u),

(40)

where ⊤ ⊤ 2N ×N L1 := (D⊤ , v Dh ) ∈ R (2)

g1 : R2N → R+ : x 7→ ∥x∥1,2 , Dv and Dh are the vertical and horizontal difference operators, (l) and ∥ · ∥1,2 : RlN → R+ is the mixed ℓ1,2 norm defined by √ ∑l−1 2 ∑N (l) (41) ∥x∥1,2 := n=1 k=0 xn+kN .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

(l)

The proximity operator of ∥ · ∥1,2 can be calculated by a simple soft-thresholding type operation: for γ > 0 and for i = 1, . . . , lN , ∑l−1 1 [proxγ∥·∥(l) (x)]i = max{1 − γ( k=0 x˜2i+kN )− 2 , 0}xi , 1,2

where ˜i := (i − 1 mod N ) + 1. Regarding Cj in Prob. 1, we imposed a range constraint on u defined by C1 := [0, 255]N onto which the projection can be computed by pushing each pixel value into [0, 255]. Total Generalized Variation (TGV): The total generalized variation (TGV) [74] was introduced as a well-established higher-order generalization of the TV in order to overcome the undesirable appearance of edges, namely, the staircasing effect [75]. The idea of TGV is recently employed in a number of applications for its utility, e.g., [60], [76]–[80]. Especially in [60], it is shown that a TGV-based multiplicative noise removal method outperforms state-of-the-art methods. In this case, the objective function of Prob. 1 is reduced to γ

g1 (L1 u + M1 w) + γ g2 (M2 w),

where ⊤ ⊤ 2N ×N L1 := (D⊤ , v Dh ) ∈ R

M1 := −I ∈ R2N ×2N , ( )⊤ −Dv −Dh O M2 := ∈ R3N ×2N , O −Dv −Dh (2)

g1 : R2N → R+ : x 7→ β∥x∥1,2 , (3)

g2 : R3N → R+ : x 7→ (1 − β)∥x∥1,2 . Here I denotes the identity matrix, and β ∈ (0, 1) controls the balance between the first- and second-order terms of TGV. Specifically, we examined β = 0.3, 0.5, 0.7. We imposed a range constraint on u and w as Cj defined by C1 := [0, 255]N × [−255, 255]2N . For comparison, we adopted a primal-dual splitting algorithm [81], [82] for solving the corresponding additive-fidelity formulations without Moreau envelope relaxation, i.e., for the TV case, min g1 (L1 u) + τ ψv (u), u

and the TGV case, min g1 (L1 u + M1 w) + g2 (M2 w) + τ ψv (u). u,w

We set α in the fidelity-constrained formulation to ψv (¯ u) − 0.005|ψv (¯ u)| (¯ u is the original image) in all the experiments. Note that this value does not vary depending on the choice of regularization functions. Meanwhile, τ in the additive-fidelity formulation, which controls the trade-off between ψv and TV or TGVs (β = 0.3, 0.5, 0.7), is manually adjusted to the oracle value that achieves the highest improvement in the sense of SSIM [83] for each regularization function. We remark that ¯ , finding a suitable τ usually needs careful even if we know u hand-tuning due to the fact that τ depends both on noise intensity and the choice of regularization functions. Test images of size 256 × 256 were blurred by the 3 × 3 uniform kernel and were contaminated by Poisson noise (the

9

scaling parameter is σ = 0.1), multiplicative noise (the shape parameter is σ = 20), or mixed Gaussian-(random-valued) impulse noise (the standard deviation of the Gaussian noise is σ = 15, and the noise ratio of the impulse noise is p = 0.2), respectively. We set E := C1 × K, λ(n) = 100 n and µ = 1 in Alg. 1. 2) Results: Table I shows PSNR [dB] and SSIM of restored images.10 One can observe that for all the images and regularization functions, both the additive-fidelity and fidelityconstrained formulations achieve similar restoration quality, which indicates that Moreau envelope relaxation is as effective as the original nonsmooth functions (TV and TGV). One can also see that for each row of the table, τ in the additive-fidelity formulation, which is adjusted to achieve the best performance in terms of SSIM, differs among TV and TGVs. This means that even if noise intensity is not changed, a suitable τ for the same image would significantly change depending on the choice of regularization functions. In contrast, α in the fidelityconstrained formulation was fixed for each row, implying that a once determined α for a regularization function (TV) is also suitable for different regularization functions (TGV with β = 0.3, 0.5, 0.7), i.e., the parameter setting in the fidelity-constrained formulation is much easier than that in the additive-fidelity one. Restored results on Saturn image in the case of Poisson noise contamination are shown in Fig. 2, where each pair of resulting images using the same regularization function is visually very similar, which also agree with our expectation that both formulations with the same regularization function and a “suitable” parameter (τ or α) restore almost the same image. Fig. 1(a) plots the evolution of the (normalized) gap between TV of current estimate by the proposed algorithm and that of the optimal solution, for each noise contamination case (test image: Lena). Since the optimal solution u⋆ is analytically unavailable, it was pre-computed by the primal-dual splitting algorithm with 100,000 iterations (we set α for the proposed algorithm to ψv (u⋆ )). We can see that the proposed algorithm properly minimizes the objective function in all the noise contamination cases. Meanwhile, Fig. 1(b) shows the evolution of the (normalized) gap between a given α and the value of ψv at nth iteration, which is defined by |α − ψv (Φu(n) )| , |α|

(42)

where u(n) is the variable of the nth iteration in Alg. 1. One can observe that for all the noise contamination cases, this gap converges to zero, i.e., the restored images obtained by the proposed algorithm satisfy each fidelity constraint. In addition, we also conducted the following experiment: restore a Lena image by solving the additive-fidelity formulation with a given τ ; check the value of ψv of the restored image; restore a Lena image by solving the fidelity-constrained formulation with α set to the value. Then we observe that both restored images are 10 PSNR is defined by 10 log (N/∥u − u ¯) ¯ ∥22 ) and SSIM by SSIM(u, u 10 ¯ the original image (recall in [83, (13)], where u is a restored image, and u that they are normalized).

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

10

102

TABLE II

Poisson Multipl GauImp

primal-dual splitting iterations time [sec] 3323 2903 2704

56.04 92.92 38.18

proposed iterations time [sec] 4583 4317 5644

Poisson Multipl GauImp

(TV, L ENA )

74.52 63.76 82.23

the same (PSNRs between them are over 50 [dB] for all the noise settings). These observations suggest that the proposed algorithm properly works. Finally, we compare the primal-dual splitting method and the proposed algorithm in terms of computational cost. Specifically, we measured the number of iterations and CPU time of solving the additive-fidelity formulation by the primaldual splitting method and those of solving the fidelityconstrained formulation by the proposed algorithm, where the same stopping criterion for both methods was defined by |PSNR(u(n) ) − PSNR(u(n−1) )| < 2.0 × 10−3 (PSNR(u) is the PSNR value of u w.r.t. the original image). This stopping criterion was also used for obtaining all the results in Tab. I. We summarize the results in Tab. II (function: TV; test image: Lena). One can see that in terms of the number of iterations, solving the fidelity-constrained formulation by the proposed algorithm is a bit more computationally expensive than solving the additive-fidelity formulation by the primaldual splitting method. However, as mentioned in Sec. I, it is much difficult to analytically characterize τ that exactly corresponds to a given α (we denote such τ by τ ⋆ ). This fact implies that in order to solve the fidelity-constrained formulation with a given α by the primal-dual splitting method via the relaxation to the additive-fidelity formulation, one has to apply the primal-dual splitting method to the additivefidelity formulation many times with different τ to search τ ⋆ by, e.g., the bisection method. Thus, overall, the proposed algorithm has a computational advantage over the primal-dual splitting method in solving the fidelity-constrained formulation. In addition, one also sees in the multiplicative noise case that solving the fidelity-constrained formulation by the proposed algorithm is faster than solving the additive-fidelity formulation by the primal-dual splitting method in terms of CPU time. This is because the computation of the subgradient projection in this case is much more efficient than that of the proximity operator, which requires an iterative approximation. V. C ONCLUDING R EMARKS We have proposed an optimization framework for signal recovery constrained by the level set of data-fidelity functions that are not projectable. The proposed method can handle such involved data-fidelity constraints without geometric approximation to them and at the same time does not require any computationally expensive procedure such as operator inversion and inner loop. Moreover, the fidelity-constrained formulation considered in the proposed method covers the use of a broad class of regularization functions and constraint sets useful for signal recovery. We presented image restoration under nonGaussian noise contamination as illustrative applications.

|TV(u(n) ) − TV(u⋆ )| |TV(u⋆ )|

XXX method XXX noise X X

TIME

100

10-2

10-4

0

0.5

1

1.5

2 ×104

iteration number: n

(a) Evolution (log-scale) of TV gap (TV is defined in (40)) 100

Poisson Multipl GauImp

10-1

|α − ψv (Φu(n) )| |α|

C OMPARISON OF THE NUMBER OF ITERATIONS AND CPU

10-2 10-3 10-4 10-5 10-6

0

0.5

1

1.5

iteration number: n

2 ×104

(b) Evolution (log-scale) of the level gap in (42) Fig. 1. Convergence profiles of Alg. 1 versus iterations for Poisson, multiplicative, and mixed Gaussian-impulse noise contamination cases.

A PPENDIX A P ROOF OF L EMMA 1 (I) For any x ∈ int(dom(ψv )) ∩ lev≤α ψv , Tesp(ψv ,α) (x) = x from the definition of Tesp(ψv ,α) . For any x ∈ int(dom(ψv )) \ lev≤α ψv and any y ∈ lev≤α ψv ∩ int(dom(ψv )), ∥Tesp(ψv ,α) (x) − y∥2 − ∥x − y∥2

2

ψv (x) − α

− ∥x − y∥2 = x − ∇ψ (x) − y v

∥∇ψv (x)∥2 ⟨ ⟩ ψv (x) − α = −2 x − y, ∇ψ (x) v ∥∇ψv (x)∥2 (ψv (x) − α)2 + ∥∇ψv (x)∥2 ∥∇ψv (x)∥4 ψv (x) − α = (2⟨y − x, ∇ψv (x)⟩ + ψv (x) − α) . ∥∇ψv (x)∥2

(43)

From the convexity of ψv , we see ψv (x) − α (2(ψv (y) − ψv (x)) + ψv (x) − α) ∥∇ψv (x)∥2 ψv (x) − α = (ψv (y) − ψv (x) + ψv (y) − α) < 0, ∥∇ψv (x)∥2

(43) ≤

which completes the proof of Lemma 1-I. (II) For every x ∈ lev<α ψv ∩ int(dom(ψv )), we see Tesp(ψv ,α) (x) = x over the open set lev<α ψv ∩ int(dom(ψv ))

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

Original

TV (additive)

Observation (with Poisson noise)

TV (constrained)

11

TGV (β = 0.3, additive)

TGV (β = 0.5, additive)

TGV (β = 0.7, additive)

TGV (β = 0.3, constrained) TGV (β = 0.5, constrained) TGV (β = 0.7, constrained)

Fig. 2. Restoration results on Saturn image. TABLE I PSNR [ D B]

noise

image Lena

Poisson Saturn Lena Multipl Aerial Lena GauImp Cameraman

AND

SSIM

XXX function XXX form X X

PSNR

TV SSIM

τ

additive constrained additive constrained additive constrained additive constrained additive constrained additive constrained

26.18 26.19 30.99 30.69 26.49 26.48 25.99 25.99 25.81 26.01 23.73 24.07

.7610 .7612 .8820 .8825 .7938 .7938 .6701 .6702 .7635 .7696 .7359 .7433

55 ∅ 32 ∅ 600 ∅ 810 ∅ .095 ∅ .097 ∅

from (12) and (a2), which implies the continuity of Tesp(ψv ,α) over lev<α ψv ∩ int(dom(ψv )). For every x ∈ lev>α ψv ∩ int(dom(ψv )), we see ψv (x)−α Tesp(ψv ,α) (x) = x − ∥∇ψ 2 ∇ψv (x) over the open set v (x)∥ lev>α ψv ∩ int(dom(ψv )) from (12) and (a3). Meanwhile, (a5) means ∥∇ψv (x)∥ ̸= 0 for every x ∈ lev>α ψv ∩ int(dom(ψv )). Moreover, (a1) implies the continuity of ψv and ∇ψv over lev>α ψv ∩ int(dom(ψv )). Thus, from the above things, we have that Tesp(ψv ,α) is continuous over lev>α ψv ∩ int(dom(ψv )). We finally show the continuity of Tesp(ψv ,α) over lev=α ψv ∩ int(dom(ψv )). Let x∗ ∈ lev=α ψv ∩int(dom(ψv )) and ε > 0. Since ψv is continuous over lev=α ψv ∩ int(dom(ψv )) from (a1) and (a4), we can take µ > 0 such that

OF RESTORED IMAGES

TGV β = 0.3 PSNR SSIM τ

TGV β = 0.5 PSNR SSIM τ

TGV β = 0.7 PSNR SSIM τ

26.39 26.41 32.54 32.33 26.67 26.66 26.02 26.01 25.93 26.21 23.68 24.01

26.21 26.22 32.78 32.68 26.68 26.71 26.03 26.03 26.19 26.47 23.81 24.13

25.94 25.92 32.79 32.63 26.53 26.61 26.11 26.11 26.43 26.30 23.95 23.95

.7745 .7736 .9205 .9202 .8053 .8045 .6704 .6704 .7734 .7775 .7369 .7412

18 ∅ 16 ∅ 180 ∅ 260 ∅ .030 ∅ .028 ∅

.7678 .7670 .9231 .9224 .8012 .8006 .6686 .6685 .7822 .7872 .7391 .7487

32 ∅ 25 ∅ 330 ∅ 470 ∅ .052 ∅ .050 ∅

.7616 .7606 .9230 .9216 .7960 .7928 .6735 .6733 .7873 .7847 .7406 .7417

20 ∅ 16 ∅ 240 ∅ 370 ∅ .048 ∅ .044 ∅

from (12), ε < ε. 2 • If x ∈ lev>α ψv , by noting x∗ ∈ lev≤α ψv , we deduce from (12), (44) and (a5), ∥Tesp(ψv ,α) (x) − Tesp(ψv ,α) (x∗ )∥ = ∥x − x∗ ∥ <

∥Tesp(ψv ,α) (x) − Tesp(ψv ,α) (x∗ )∥

ψv (x) − α ∗

= x − ∇ψv (x) − x 2 ∥∇ψv (x)∥

ψv (x) − ψv (x∗ ) 1 ∗

= x − x − ∇ψ (x) v

∥∇ψv (x)∥ ∥∇ψv (x)∥ ≤ ∥x − x∗ ∥ +

εMα ε |ψv (x) − ψv (x∗ )| < + 2 = ε. ∥∇ψv (x)∥ 2 Mα

Hence, Tesp(ψv ,α) is continuous over lev=α ψv ∩int(dom(ψv )). 1 x ∈ B(x , µ) ⊂ int(dom(ψv )) ⇒ |ψv (x)−ψv (x )| < εMα , 2 (44) A PPENDIX B where B(x∗ , µ) is the x∗ -centered open ball with the raP ROOF OF P ROPOSITION 1 dius µ, and Mα > 0 is defined in (a5). Then, for every From Lemma 1-II, the nonexpansivity of PK , and K ⊂ x ∈ B(x∗ , min{ 2ε , µ})(⊂ int(dom(ψv ))), K M int(dom(ψv )), the operator Tsp (ψv ,α) is continuous over R . ∗ • If x ∈ lev≤α ψv , by noting x ∈ lev≤α ψv , we deduce Next we show (14). ∗

∗

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

12

(⊃) Let x ∈ lev≤α ψv ∩ K. Since PK (x) = x, we have from Lemma 1-I,

A PPENDIX E P ROOF OF T HEOREM 1

K e e Tsp (ψv ,α) (x) = Tsp(ψv ,α) ◦ PK (x) = Tsp(ψv ,α) (x) = x,

Since ∇f and ∇γ gi = γ1 (· − proxγgi (·)) (i ∈ I) are Lipschitz continuous, so is ∇J. Also, since K and one of Ci (i ∈ I) are bounded, so is Fix(Ξ) from (17) and (21). Meanwhile, by limn→∞ λ(n) = 0, (Ξ(u(n) , w(n) , z(n) ))n≥0 ⊂ E, and the Lipschitz continuity of ∇J, the sequence (u(n) , w(n) , z(n) )n≥0 is bounded. Then, from Fact 2, it is sufficient to show that Ξ is quasi-shrinking. Let S be a bounded closed convex subset of Y containing Fix(Ξ) and (u(n) , w(n) , z(n) )n≥0 , where the existence of such S is guaranteed by the boundedness of Fix(Ξ) and (u(n) , w(n) , z(n) )n≥0 . Then, we see that ▷(Fix(Ξ), r) ∩ S is bounded and closed, i.e., compact (Note: dim(Y) < ∞), if ▷(Fix(Ξ), r) ∩ S ̸= ∅. Meanwhile, Proposition 2 and the continuity of dFix(Ξ) imply that dFix(Ξ) ◦ Ξ is continuous over Y. Thus, the function D in (5) in this case is reduced to   min d (x) − dFix(Ξ) (Ξ(x)),  x∈▷(Fix Ξ),r)∩S Fix(Ξ) D(r) = if ▷ (Fix(Ξ), r) ∩ S ̸= ∅,   ∞, otherwise.

K which implies x ∈ Fix(Tsp (ψv ,α) ). (⊂) We prove the contraposition. Let x ∈ / lev≤α ψv ∩ K. • Suppose PK (x) ∈ / lev≤α ψv . Then, by PK (x) ∈ K ⊂ int(dom(ψv )), we see from Lemma 1-I and the nonexpansivity of PK that for any y ∈ lev≤α ψv ∩ K, K e ∥Tsp (ψv ,α) (x) − y∥ = ∥Tsp(ψv ,α) ◦ PK (x) − y∥

< ∥PK (x) − y∥ ≤ ∥x − y∥,

(45)

K which implies x ∈ / Fix(Tsp (ψv ,α) ). K • Suppose PK (x) ∈ lev≤α ψv . Then we have Tsp (ψv ,α) (x) = PK (x) from K ⊂ int(dom(ψv )) and Lemma 1-I. In the case of x ∈ K, we see PK (x) = x and thus x ∈ lev≤α ψv ∩ K, which contradicts. In the case of x ∈ / K, the averaged nonexpansivity of PK yields for any y ∈ lev≤α ψv ∩ K, K ∥Tsp (ψv ,α) (x) − y∥ = ∥PK (x) − y∥ < ∥x − y∥,

(46)

K which implies x ∈ / Fix(Tsp (ψv ,α) ). K Overall, we have x ∈ / lev≤α ψv ∩ K ⇒ x ∈ / Fix(Tsp (ψv ,α) ). K Finally, the attracting quasi-nonexpansivity of Tsp (ψv ,α) with (14) immediately follows from (45) and (46).

A PPENDIX C P ROOF OF L EMMA 2 Since Θ(x1 , x2 , x3 ) = 0 if and only if Φx1 = x3 , the set of the minimizers of Θ equals to SΦ . Hence, from Example 2, it is sufficient to show that ∥A∥2op is a Lipschitz constant of ∇Θ(x1 , x2 , x3 ) = (Φ⊤ Φx1 − Φ⊤ x3 , 0, x3 − Φx1 ). For any ⊤ ⊤ ⊤ and (x1 , x2 , x3 ), (x′1 , x′2 , x′3 ) ∈ Y, let ξ := (x⊤ 1 x2 x3 ) ′ ′ N +Q+M ′⊤ ′⊤ ′⊤ ⊤ ). Then ξ := (x1 x2 x3 ) (ξ, ξ ∈ R ∥∇Θ(x1 , x2 , x3 ) − ∇Θ(x′1 , x′2 , x′3 )∥ = ∥(Φ⊤ (Φ(x1 − x′1 ) − (x3 − x′3 )), 0, (x3 − x′3 ) − Φ(x1 − x′1 ))∥ ( . . )⊤ ( .. .. ) = ∥ Φ .. O .. − I Φ . O . − I (ξ − ξ ′ )∥ ( . . ) ≤ ∥ Φ .. O .. − I ∥2op ∥ξ − ξ′ ∥ = ∥A∥2op ∥(x1 , x2 , x3 ) − (x′1 , x′2 , x′3 )∥, which completes the proof. A PPENDIX D P ROOF OF P ROPOSITION 2 K M From Proposition 1, Tsp (ψv ,α) is continuous over R . Since J J Tpp is nonexpansive on X , Tpp is continuous over X . Moreover, the nonexpansivity of GΘ and PE on Y implies their continuity over Y. Thus, Ξ is continuous over X × RM = Y. The attracting quasi-nonexpansivity of Ξ with (21) follows from (14), (15), (17), Fix(Tpp×sp ) ⊆ E, the attracting quasinonexpansivity of Tpp×sp , the averaged nonexpansivity of GΘ and PE , and Fact 1.

Now it is enough to show D(r) = 0 ⇔ r = 0. (⇐) Since Fix(Ξ) ⊂ ▷(Fix(Ξ), 0) ∩ S = Y ∩ S = S and D(r) ≥ 0 for all r ≥ 0 (see [34, Lemma 2]), we have D(0) = min dFix(Ξ) (x) − dFix(Ξ) (Ξ(x)) = 0 − 0 = 0. x∈S

(⇒) We show the contraposition, i.e., r > 0 ⇒ D(r) > 0. • If ▷(Fix(Ξ), r) ∩ S = ∅, then D(r) = ∞. • If ▷(Fix(Ξ), r) ∩ S ̸= ∅, for any x ∈ ▷(Fix(Ξ), r) ∩ S, since x ∈ / Fix(Ξ), PFix(Ξ) (x) ∈ Fix(Ξ), and Ξ is attracting quasi-nonexpansive with Fix(Ξ) from Proposition 2, we have dFix(Ξ) (x) > ∥Ξ(x) − PFix(Ξ) (x)∥ ≥ dFix(Ξ) (Ξ(x)), which implies D(r) > 0. Hence, Ξ is quasi-shrinking on any bounded closed convex set containing Fix(Ξ) and (u(n) , w(n) , z(n) )n≥0 . R EFERENCES [1] D. P. Palomar and Y. C. Eldar, Eds., Convex Optimization in Signal Processing and Communications. Cambridge University Press, 2009. [2] J.-L. Starck, F. Murtagh, and J. M. Fadili, Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity. Cambridge University Press, 2010. [3] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, H. H. Bauschke, R. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Eds. New York: SpringerVerlag, 2011, pp. 185–212. [4] S. Theodoridis, K. Slavakis, and I. Yamada, “Adaptive learning in a world of projections,” IEEE Signal Process. Mag., vol. 28, no. 1, pp. 97–123, 2011. [5] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky, “The convex geometry of linear inverse problems,” Found. Comput. Math., vol. 12, no. 6, pp. 805–849, 2012. [6] V. Cevher, S. Becker, and M. Schmidt, “Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics,” IEEE Signal Process. Mag., vol. 31, no. 5, pp. 32–43, 2014. [7] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends. Optim., vol. 1, no. 3, pp. 127–239, 2014.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

[8] N. Komodakis and J.-C. Pesquet, “Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems,” CoRR, vol. abs/1406.5429, 2014. [9] M. Afonso, J. Bioucas-Dias, and M. Figueiredo, “An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems,” IEEE Trans. Image Process., vol. 20, no. 3, pp. 681–695, 2011. [10] M. Carlavan and L. Blanc-F`eraud, “Sparse Poisson noisy image deblurring,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1834–1846, 2012. [11] T. Teuber, G. Steidl, and R. H. Chan, “Minimization and parameter estimation for seminorm regularization models with I-divergence constraints,” Inverse Problems, vol. 29, no. 3, 2013. [12] G. Chierchia, N. Pustelnik, J.-C. Pesquet, and B. Pesquet-Popescu, “Epigraphical projection and proximal tools for solving constrained convex optimization problems,” Signal. Image. Video. Process., pp. 1– 13, 2014. [13] E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM J. Sci. Comput., vol. 31, no. 2, pp. 890–912, 2008. [14] S. Becker, J. Bobin, and E. Cand`es, “NESTA: A fast and accurate firstorder method for sparse recovery,” SIAM J. Imaging Sci., vol. 4, pp. 1–39, 2011. [15] S. Ono, T. Miyata, I. Yamada, and K. Yamaoka, “Image recovery by decomposition with component-wise regularization,” IEICE Trans. Fundamentals., vol. E95-A, no. 12, pp. 2470–2478, 2012. [16] S. Ono, T. Miyata, and I. Yamada, “Cartoon-texture image decomposition using blockwise low-rank texture characterization,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1128–1142, 2014. [17] S. Ono and I. Yamada, “A convex regularizer for reducing color artifact in color image recovery,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2013, pp. 1775–1781. [18] P. M. Pardalos and N. Kovoor, “An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds,” Math. Program., vol. 46, pp. 321–328, 1990. [19] J. Duchi, S. S-Shwartz, Y. Singer, and T. Chandra, “Efficient projections onto the ℓ1 -ball for learning in high dimensions,” in Proc. Int. Conf. Mach. Learn. (ICML), 2008, pp. 272–279. [20] Y. Kopsinis, K. Slavakis, and S. Theodoridis, “Online sparse system identification and signal reconstruction using projections onto weighted ℓ1 balls,” IEEE Trans. Signal Process., vol. 59, no. 3, pp. 936–952, 2011. [21] P. L. Combettes and J.-C. Pesquet, “A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery,” IEEE J. Sel. Topics in Signal Process., vol. 1, pp. 564–574, 2007. [22] Y.-M. Huang, M. Ng, and Y. Yen, “A new total variation method for multiplicative noise removal,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 20–40, 2009. [23] F. X. Du`pe, J. M. Fadili, and J. L. Starck, “A proximal iteration for deconvolving Poisson noisy images using sparse representations,” IEEE Trans. Image Process., vol. 18, no. 2, pp. 310–321, 2009. [24] S. Setzer, G. Steidl, and T. Teuber, “Deblurring Poissonian images by split Bregman techniques,” J. Vis. Commun. Image Repres., vol. 21, pp. 193–199, 2010. [25] J. M. Bioucas-Dias and M. A. T. Figueiredo, “Multiplicative noise removal using variable splitting and constrained optimization,” IEEE Trans. Image Process., vol. 19, no. 7, pp. 1720–1730, 2010. [26] M. A. T. Figueiredo and J. M. Bioucas-Dias, “Restoration of Poissonian images using alternating direction optimization,” IEEE Trans. Image Process., vol. 19, no. 12, pp. 3133–3145, 2010. [27] N. Pustelnik, C. Chaux, and J.-C. Pesquet, “Parallel proximal algorithm for image restoration using hybrid regularization,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2450–2462, 2011. [28] R. Ciak, B. Shafel, and G. Steidl, “Homogeneous penalizers and constraints in convex image restoration,” J. Math. Imag. Vis., 2012. [29] P. Combettes, “Strong convergence of block-iterative outer approximation methods for convex optimization,” SIAM J. Control Optim., vol. 38, no. 2, pp. 538–564, 2000. [30] P. Combettes and J.-C. Pesquet, “Image restoration subject to a total variation constraint,” IEEE Trans. Image Process., vol. 13, no. 9, pp. 1213–1222, 2004. [31] G. Chierchia, N. Pustelnik, J. C. Pesquet, and B. Pesquet-Popescu, “A proximal approach for constrained cosparse modelling,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 3433– 3436. [32] A. Y. Aravkin, J. V. Burke, and M. P. Friedlander, “Variational properties of value functions,” SIAM J. Optim., vol. 23, no. 3, pp. 1689–1717, 2013.

13

[33] I. Yamada, “The hybrid steepest descent method for the variational inequality problem over the intersection of fixed point sets of nonexpansive mappings,” in Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, ser. Studies in Computational Mathematics, D. Butnariu, Y. Censor, and S. Reich, Eds. Elsevier, 2001, vol. 8, pp. 473–504. [34] I. Yamada and N. Ogura, “Hybrid steepest descent method for variational inequality problem over the fixed point set of certain quasi-nonexpansive mappings,” Numer. Funct. Anal. Optim., vol. 25, pp. 619–655, 2004. [35] I. Yamada, M. Yukawa, and M. Yamagishi, “Minimizing the Moreau envelope of nonsmooth convex functions over the fixed point set of certain quasi-nonexpansive mappings,” in Fixed-Point Algorithm for Inverse Problems in Science and Engineering, H. H. Bauschke, R. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Eds. SpringerVerlag, 2011, pp. 345–390. [36] I. Yamada, K. Slavakis, and K. Yamada, “An efficient robust adaptive filtering algorithm based on parallel subgradient projection techniques,” IEEE Trans. Signal Process., vol. 50, no. 5, pp. 1091–1101, 2002. [37] I. Yamada and N. Ogura, “Adaptive projected subgradient method for asymptotic minimization of sequence of nonnegative convex functions,” Numer. Funct. Anal. Optim., vol. 25, no. 7-8, pp. 593–617, 2005. [38] K. Slavakis and I. Yamada, “The adaptive projected subgradient method constrained by families of quasi-nonexpansive mappings and its application to online learning,” SIAM J. Optim., vol. 23, no. 1, pp. 126–152, 2013. [39] S. Ono and I. Yamada, “Poisson image restoration with likelihood constraint via hybrid steepest descent method,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 5929–5933. [40] H. H. Bauschke and P. L. Combettes, “A weak-to-strong convergence principle for Fej´er-monotone methods in Hilbert space,” Math. Oper. Res., vol. 26, pp. 248–264, 2001. [41] P. Combettes, “Foundation of set theoretic estimation,” Proc. IEEE, vol. 81, no. 2, pp. 182–208, 1993. [42] H. H. Bauschke and J. Borwein, “On projection algorithms for solving convex feasibility problems,” SIAM Review, vol. 38, no. 3, pp. 367–426, 1996. [43] Y. Censor and S. A. Zenios, Parallel optimization: Theory, algorithms, and applications. Oxford University Press, 1997. [44] J. J. Moreau, “Fonctions convexes duales et points proximaux dans un espace hilbertien,” C. R. Acad. Sci. Paris Ser. A Math., vol. 255, pp. 2897–2899, 1962. [45] H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces. New York: Springer, 2011. [46] K. Slavakis and I. Yamada, “Robust wideband beamforming by the hybrid steepest descent method,” IEEE Trans. Signal Process., vol. 55, no. 9, pp. 4511–4522, 2007. [47] B. Zhang, J. M. Fadili, and J.-L. Starck, “Wavelet, ridgelet, and curvelets for Poisson noise removal,” IEEE Trans. Image Process., vol. 17, no. 7, pp. 1093–1108, 2008. [48] S. Ono and I. Yamada, “Hierarchical convex optimization with primaldual splitting,” IEEE Trans. Signal Process., vol. 63, no. 2, pp. 373–388, 2015. [49] P. J. Huber, “Robust estimation of a location parameter,” Ann. Math. Statist., vol. 35, pp. 73–101, 1964. [50] M. Nikolova, “Minimizers of cost-functions involving nonsmooth datafidelity terms — Application to the processing of outliers,” SIAM J. Numer. Anal., vol. 40, no. 3, pp. 965–994, 2002. [51] C. A. Micchelli, L. Shen, Y. Xu, and X. Zeng, “Proximity algorithms for the L1/TV image denoising model.” Adv. Comput. Math., vol. 38, no. 2, pp. 401–426, 2013. [52] M. Yukawa and R. Ishii, “Online model selection and learning by multikernel adaptive filtering,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2013. [53] P. Combettes, “A block-iterative surrogate constraint splitting method for quadratic signal recovery,” IEEE Trans. Signal Process., vol. 51, no. 7, pp. 1771–1782, 2003. [54] P. L. Combettes and V. Wajs, “Signal recovery by proximal forwardbackward splitting,” SIAM J. Multi. Model. Simul., vol. 4, pp. 1168– 1200, 2005. [55] G. Chierchia, N. Pustelnik, B. Pesquet-Popescu, and J.-C. Pesquet, “A nonlocal structure tensor-based approach for multicomponent image recovery problems,” IEEE Trans. Image Process., vol. 23, no. 12, pp. 5531–5544, 2014. [56] G. Aubert and J. Aujol, “A variational approach to removing multiplicative noise,” SIAM J. Appl. Math., vol. 68, no. 4, pp. 925–946, 2008.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. XX, NO. XX, XXXX 201X

[57] L. Rudin, P. Lions, and S. Osher, “Multiplicative denoising and deblurring: Theory and algorithms,” in Geometric Level Set Methods in Imaging, Vision, and Graphics, S. Osher and N. Paragios, Eds. New York: Springer, 2003, pp. 103–120. [58] G. Steidl and T. Teuber, “Removing multiplicative noise by DouglasRachford splitting methods,” J. Math. Imag. Vis., vol. 36, pp. 168–184, 2010. [59] S. Durand, J. M. Fadili, and M. Nikolova, “Multiplicative noise removal using L1 fidelity on frame coefficients,” J. Math. Imag. Vis., vol. 36, pp. 201–226, 2010. [60] W. Feng, H. Lei, and Y. Gao, “Speckle reduction via higher order total variation approach,” IEEE Trans. Image Process., vol. 23, no. 4, pp. 1831–1843, 2014. [61] J. Shi and S. Osher, “A nonlinear inverse scale space method for a convex multiplicative noise model,” SIAM J. Imag. Sci., vol. 1, no. 3, pp. 294–321, 2008. [62] Y. Dong and T. Zeng, “A convex variational model for restoring blurred images with multiplicative noise,” SIAM J. Imag. Sci., vol. 6, no. 3, pp. 1598–1625, 2013. [63] J.-F. Cai, R. Chan, and M. Nikolova, “Two-phase approach for deblurring images corrupted by impulse plus gaussian noise,” Inverse Problems and Imaging, vol. 2, no. 2, pp. 187–204, 2008. [64] Y.-M. Huang, M. Ng, and Y.-W. Wen, “Fast image restoration methods for impulse and Gaussian noises removal,” IEEE Signal Processing Letters, vol. 16, no. 6, pp. 457–460, 2009. [65] E. L´opez-Rubio, “Restoration ofimages corrupted by Gaussian and uniform impulsive noise,” Pattern Recognit., vol. 43, pp. 1835–1846, 2010. [66] Y. Xiao, T. Zeng, J. Yu, and M. K. Ng, “Restoration of images corrupted by mixed Gaussian-impulse noise via l1 -l0 minimization,” Pattern Recognit., vol. 44, pp. 1708–1720, 2011. [67] Y.-R. Li, L. Shen, D.-Q. Dai, and B. W. Suter, “Framelet algorithms for de-blurring images corrupted by impulse plus Gaussian noise,” IEEE Trans. Image Process., vol. 20, no. 7, pp. 1822–1837, 2011. [68] P. Rodriguez, R. Rojas, and B. Wohlberg, “Mixed Gaussian-impulse noise image restoration via total variation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 1077–1080. [69] T. Chen and H. R. Wu, “Adaptive impulse deteection using centerweighted median filters,” IEEE Signal Process. Letters, vol. 8, no. 1, pp. 1–3, 2001. [70] P. Petrus, “Robust Huber adaptive filter,” IEEE Trans. Signal Process., vol. 47, no. 1, pp. 1129–1133, 1999. [71] T. Gansler, S. L. Gay, M. M. Sondhi, , and J. Benesty, “Robust Huber adaptive filter,” IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 656–663, 2000. [72] T. Yamamoto, M. Yamagishi, and I. Yamada, “Adaptive proximal forward-backward splitting for sparse system identification under impulsive noise,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2012, pp. 2620–2624. [73] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D, vol. 60, no. 1-4, pp. 259–268, 1992. [74] K. Bredies, K. Kunisch, and T. Pock, “Total generalized variation,” SIAM J. Imaging Sci., vol. 3, no. 3, pp. 92–526, 2010. [75] G. D. Maso, I. Fonseca, G. Leoni, and M. Morini, “A higher order model for image restoration: The one-dimensional case,” SIAM J. Imag. Sci., vol. 40, no. 6, pp. 2351–2391, 2009. [76] T. Valkonen, K. Bredies, and F. Knoll, “Total generalized variation in diffusion tensor imaging,” SIAM J. Imag. Sci., vol. 6, no. 1, pp. 487–525, 2013. [77] D. Ferstl, C. Reinbacher, R. Ranftl, M. R¨uther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2013. [78] S. Ono and I. Yamada, “Second-order total generalized variation constraint,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 4938–4942. [79] ——, “Decorrelated vectorial total variation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2014, pp. 4090–4097. [80] ——, “Total generalized variation for graph signals,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2015, pp. 5456–5460. [81] L. Condat, “A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” J. Optim. Theory Appl., 2013. [82] B. C. Vu, “A splitting algorithm for dual monotone inclusions involving cocoercive operators,” Adv. Comput. Math., vol. 38, pp. 667–681, 2011. [83] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.

14

Shunsuke Ono (S’11-M’15) received the B.E. degree in computer science in 2010, and the M.E. and Ph.D. degrees in communications and computer engineering in 2012 and 2014 from the Tokyo Institute of Technology, respectively. He is currently an Assistant Professor with Imaging Science and Engineering Laboratory, Tokyo Institute of Technology. His research interests are in signal and image processing, and optimization theory. From April 2012 to September 2014, he was a recipient of the Research Fellowship of the Japan Society for the Promotion of Science (JSPS). He received the Excellent Paper Award in 2014 and the Young Researchers’ Award in 2013 from the IEICE; the Outstanding Student Journal Paper Award from IEEE Signal Processing Society (SPS) Japan Chapter in 2014; and the Yasujiro Niwa Outstanding Paper Award from Tokyo Denki University in 2015. He is a member of IEICE.

Isao Yamada (M’96-SM’06-F’15) received the B.E. degree in computer science from the University of Tsukuba in 1985 and the M.E. and the Ph.D. degrees in electrical and electronic engineering from the Tokyo Institute of Technology, Tokyo, Japan, in 1987 and 1990, respectively. He is a Professor with the Department of Communications and Computer Engineering, and the Director of the GSIC (Global Scientific Information Center), Tokyo Institute of Technology. His current research interests are in mathematical signal processing, nonlinear inverse problem, and optimization theory. Dr. Yamada has been an Associate Editor for several journals, including the IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (2001–2005), the IEEE Transactions on Circuits and Systems—Part I: Fundamental Theory and Applications (2006–2007), and the IEEE Transactions on Signal Processing (2008–2011). He also served as the Editor-in-Chief of the IEICE Transactions on Fundamentals (2013-2015). Currently, he is serving as an editorial board member of the IEEE Transactions on Signal and Information Processing over Networks (2014 to present), the Numerical Functional Analysis and Optimization (Taylor and Francis) and the Multidimensional Systems and Signal Processing (Springer). He has been serving as a Steering Committee member of IEICE Engineering Sciences Society, including the Chairman of the IEICE Technical Group on Signal Processing in 2011 and the Vice President (System and Signal Processing) in 2012. He has also served as a member of the IEEE Signal Processing Theory and Methods Technical Committee (2009–2014). He received the 2014 IEEE Signal Processing Magazine Best Paper Award from the IEEE Signal Processing Society in 2015, the Excellent Paper Awards in 1991, 1995, 2006, 2009 and 2014, the Young Researcher Award in 1992, and the Achievement Award in 2009 from the IEICE; the ICF Research Award from the International Communications Foundation in 2004; the Docomo Mobile Science Award (Fundamental Science Division) from Mobile Communication Fund in 2005; and the Fujino Prize from Tejima Foundation in 2008. In 2015, he became the IEICE Fellow.

A fast convex conjugated algorithm for sparse recovery

Make Life Easy With Android Data Recovery Software.pdf ...

Learning with convex constraints

Data Recovery Raid.pdf

free download memory card data recovery software with keygen ...

data recovery hd.pdf

'DDR Recovery - Professional - Data Recovery/Repair ...

Wondershare Data Recovery for Mac

How to get involved with undergraduate research.pdf

Weighting Techniques in Data Compression - Signal Processing ...

Mac Data Recovery Irvine.pdf

RAID Data Recovery Services.pdf

Android Data Recovery Irvine.pdf

Data recovery software free download.pdf

easeus data recovery 9.pdf

Android Data Recovery Irvine.pdf

RAID Data Recovery Services.pdf

Data Recovery Irvine CAlifornia.pdf

data recovery techniques pdf

Wondershare Data Recovery for itunes