Abstract We study off-line and on-line algorithms for parallel settings which simultaneously yield constant approximation factors for both minimizing total weighted completion time and makespan objectives. Introducing preemptions, the list scheduling algorithm introduced by Garey and Graham is extended in a way in which a parallel job completion time is solely dictated by its predecessors in the input instance, as is the case with Graham’s classical list algorithm for the identical parallel machine model. This suggests the applicability of many ordering techniques which have previously been adapted to the single machine and identical parallel machine settings.

Introduction Recently scheduling so as to approximate more than one criterion received considerable attention [ARSY99, CPSSSW96, Sch96, SW97]. We present approximation techniques and new results for several scheduling models of parallel jobs in order to minimize both total (weighted) completion time and makespan objectives. In these problems each job may demand more than one processor for execution, and hence is defined as a parallel job. We consider identical processors and independent parallel jobs to be scheduled. The jobs are independent in the sense that no precedence constraints are imposed. Each job has positive processing time , and a number of processors needed for execution , . A machine can process at most one job at a time. Exactly processors are required at the same starting time to run job ; any set of processors is proper for execution of . A job may be preempted from time to time and later continued to run on a possibly different set of processors, without affecting its total processing time . A job may have a release date before which it cannot be processed. Let denote the completion time of job in a feasible schedule. Our goal is to design bicriteria approximation algorithms that simultaneously minimize the total (weighted) completion time (or ) together with the makespan, the largest , denoted by . We consider three models. The weighted with release dates is denoted ! " #$%

" & 1 , 1 - In this notation suggested by Graham et al. [GLLR79], each scheduling problem is denoted by ')( *+( , , where ' is either 1 or , denoting that the scheduling environment contains either one processor, or . identical processors; , indicates the objectives 0/#13241 or 5241 and/or 2687:9 ; and * optionally contains ; 1= [email protected] < . 1 , indicating whether there are non-trivial release date constraints, whether preemption is allowed, and whether more than one processor may be required by a job. Here . 1 differs from its original definition by [GLLR79] where it was used to denote the number of operations in a job shop model (they did not refer to

1

Table 1: Summary of Schwiegelshohn’s deterministic off-line results [Sch96] approx. factor

for

model

approx. factor for

2.37

3.20

2.42

3.00

7.11

9.60

7.26

9.00

& is the unweighted general model. and denoted the general model. Similarly " ! #$% " The restricted model has no release dates and denotes ! # $ ! & . Review of Techniques Our algorithms and their analysis are inspired by the work of Philips et al. [PSW98]. They introduced algorithms for converting preemptive schedules into nonpreemptive ones, by scheduling the jobs in the order of their completion times in a preemptive schedule. This technique was generalized in many deterministic and randomized off-line and on-line results (e.g., [HSSW97, SS02, CMNS97]). Our main contribution is the extension of this ordering technique to the case of parallel jobs settings. Our on-line technique give constant competitive factors for both criteria and can be considered as a simple yet non-trivial generalization to the one-machine preemptive approximation algorithm introduced by Chekuri et al. [CMNS97]. Plugging our preemptive parallel list algorithm, we get a straight forward generalization of the work of Hall et al. [HSSW97], and achieve constant approximation factors for both criteria in the off-line case. This demonstrates the applicability of our technique to the framework of the ordering techniques, as most of existing algorithms for these realistic parallel job settings are rather complicated. Related work and Results The problem ! #$% was shown to be NP-hard by Labetoulle et al. [LLLR84]. The problem " #$% & is NP-hard [Droz94]. A detailed overview on parallel job scheduling problems is given in [Droz96]. For parallel jobs with no preemptions and trivial release dates Schwiegelshohn et al. [SLWTY98] introduced the SMART off-line algorithms. They achieved an 8-approximation for the model and a 10.45-approximation for the weighted model . Schwiegelshohn [Sch96] presented several versions of the off-line preemptive deterministic algorithm PSRS, which can be fine tuned to minimize either total completion time or makespan (see Table 1; for each model there are two possible scheduling algorithms, the first algorithms better approximate total completion time and the second better approximate the makespan). Chakrabarti et al. [CPSSSW96] presented a general technique for designing on-line bicriteria algorithms with constant approximation ratios. Their on-line framework is based on partitioning the time horizon into intervals, where in the beginning of each interval they run a dual fpas packing to choose a promising subset of already released jobs to list schedule in the current interval. For the model of weighted malleable jobs 2 )-approximation with no preemption allowed they construct a deterministic algorithm that gives factors for both criteria. Allowing randomization they construct an algorithm that produces a schedule that has makespan at most 8.67 times the optimal and total weighted completion time at most times

! #"%$&

+-,/.0

! #"'$(&*)

parallel jobs). Drozdowski [Droz96] uses instead the notation 1 to denote the number of processors needed for execution of a parallel job. 2 A malleable job is one whose execution time is a function of the number of (identical) processors alloted to it.

2

Table 2: Summary of results approx./competitive ratio

*

model

/

for

type

for

off-line, deterministic

4

2

off-line, deterministic

5

3

on-line, deterministic

6

3

on-line, deterministic

12

3

optimal. The result also holds with the constants reversed. As our general model is a special case of this weighted malleable model, suggesting a simple modification with preemptions allowed, we show how to reduce the approximation ratio of the makespan to 3. The rest of our results is summarized in table 2. Organization of this paper In section 1 we review a nonpreemptive list scheduling algorithm introduced by Garey and Graham and its preemptive modification. In section 2 we consider on-line algorithms for the general and the unweighted general models. In section 3 we consider off-line algorithms for the general and restricted models.

1 Preemptive List scheduling Algorithm First we review a nonpreemptive parallel list scheduling algorithm introduced by Garey and Graham [GG75] for the model . The virtue of adding preemptions to it is that each completion time of a job in the modified algorithm is solely dictated by its predecessors in the input permutation and not by successive jobs, consistently with the classical list algorithm developed by Graham for the identical parallel machine model [Graham66], and in contrast with the original parallel list algorithm [GG75]. This enables us to use various ordering techniques to the general case of parallel jobs. The main idea of the modified algorithm is that jobs are packed onto selves in the order defined by the input list. If the next job in the list requires too many processors, it is skipped, and jobs further down in the list are considered. Thus the shelf is considered full only when no additional job from the list can fit in. However, the height of each shelf is only until the first job in the shelf terminates, or a new job that precedes some scheduled job is released, whichever comes first. When this happens, all jobs are preempted and a new shelf is packed in the same way.

#" !

denote the total “area” which is occupied by job . Let & and denote the largest release date and the largest processing time among the jobs , respectively. We shall denote the makespan of algorithm by , and the makespan of the optimal algorithm as . Notations: Let

!

The optimum makespan is bounded from below by

& )' ( $ " ! % & & * ' '

that is the maximum between the total area divided by the total number of machines, maximal running time, and the maximal release date. This trivial lower bound holds for the preemptive and nonpreemptive settings.

3

Algorithm 1 (List-GG) [GG75] Whenever there are sufficiently many processors idle, schedule some job on as many processor as it requires.

'

Lemma 1 [GG75] by the greedy algorithm List-GG is (tightly) bounded from The makespan produced & , regardless the rule by which we choose the job to be above by:

scheduled. In particular, List-GG is a 2-approximation for the model & .

"

!

'

Now we describe our modified parallel list algorithm. Algorithm 2 (List-GG-PR) A job list is given, ordered by some permutation . Each job in the list has . Base step: Set . Set . Iterative step: Construct shelf : schedule the first job in the list such that . Greedily select the next jobs to be scheduled from the remaining jobs in by scanning through in order and selecting the next job on the list whose processors demand is not greater than the number of the currently free processors at time , and whose release date is not greater than . Now, run the jobs in the shelf . Preempt shelf from running at the first point in time at which the shortest job in shelf terminates, or some job which is placed in the permutation before a certain job in the current shelf is released, whichever is earlier. Update the remaining processing times of the jobs in the current shelf. If some job has completed its execution remove it from . Set . Set to be the current time, or !!" whichever is larger. Repeat until the list is empty.

$

%

Consider the following example of 5 processors and 4 jobs all released at time zero: $#4 ) +* ,* -# . In List-GG the completion times would be: (' % &) % ) .#4 * /# , where in List-GG-PR: 0 1% 0 2) 0 3' 2* 0 54 (job 4 is preempted right after job 1 finishes).

&%

"

"

"

"

Observe that the greedy style of List-GG-PR causes a preemption to affect only jobs which run on the tail of the current shelf. The completion times produced by List-GG-PR on an induced ordered list consisting of any prefix of , are identical to the respective completion times produced by List-GG-PR on the entire list . In the above example if we drop job 4 then both algorithms would produce the same schedule with 20 2% 0 2) 0 6' . Note that the times are different from the completion times of List-GG for the original instance. However, they are identical to the completion times of List-GG-PR for the original instance.

"

For a given job list ordered by consider the job fragments list 7 created by List-GG-PR on , by splitting each job into sub jobs 7 7 % according to the natural way dictated by the shelves and the : preemptions that occurred. Note that 98 <; since the total areas sum of the jobs in both systems and 7 are equivalent. In the fragments system the maximum processing time of the sub jobs may be reduced, and hence . 8

Lemma 2 List-GG-PR is a 3-approximation algorithm with respect to the makespan. Proof: Consider the natural permutation 7 on 7 constructed by running List-GG-PR on the system ( , ): let the first jobs in 7 be the sub jobs in the first shelf in the same order they were scheduled in the shelf. Let the next jobs in 7 be the sub jobs in the second shelf in the same scheduling order, and so on. Run the jobs

4

of the fragment system 7 in that order but not before their release dates. We argue that the resulting feasible schedule is identical to the schedule constructed by List-GG-PR on . By lemma we get:

/

& ) )$

01*

$&%

! ) ! "# ) ) $ $+%*' $ ,. $&%('

$+% $

32 54 /76

*

Corollary 1 Given any permutation on the input, the completion time of job general model is bounded from above by:

where

>89 &?;

8>9 ;< 8:9 +;

:89 &+;

8:9 ;

$ " 8:9 & + ;

!

"#

)

8:9 ;=< >8 9 ?;

8:9 ;< 8:9 +;

8:9 ;

8:9 ;

! )

in List-GG-PR in the

; ,. -

8>9 &

and

8>9 ;

8:9 ; 8:9 ; .

Denote List-GG-P to be the algorithm List-GG-PR when assuming that all jobs in the input have trivial release date. Lemma 3 The algorithm List-GG-P is a 2-approximation algorithm with respect to the makespan.

2 On-Line Our technique here can be considered as a generalization of the deterministic simulation-based on-line algorithm of Chekuri et al. [CMNS97], denoted Algorithm CMNS. We first review their algorithm, then describe constant bicriteria approximation for the unweighted general model, and then turn to the general model.

2.1 Algorithm CMNS Algorithm CMNS addresses the problem of nonpreemptively scheduling on identical parallel machines with release dates to minimize total completion time. The simulation relies on the fact that the preemptive algorithm SRPT [Baker74], which always schedules the shortest remaining processing time job, is an efficient optimal on-line algorithm for the single machine problem ! #$% . Given an instance @ of nonpreemptive scheduling on identical machines, CMNS simulates SRPT on the pseudo-instance @90 , where 0 BA , 0 (note that only the processing times are modified).

Algorithm 3 (CMNS) Set to be an empty list. Then simulate SRPT on @ 0 (assuming a single machine model). As soon as a job is finished in the simulation add it to the end of the list . List schedule the jobs nonpreemptively in the order of the list . It was shown in [CMNS97] that this algorithm is a 3-approximation w.r.t total completion time for the model .

5

Table 3: Related instances of the job system processing release date

total

processors / /

1

instance

processors

time

demand

weight

2.2 The unweighted general model In this subsection we shall introduce and analyze the algorithm Simulate-SRA, simulate smallest remaining area. Given an instance we construct the three additional related instances as defined in table 3. In the unweighted model we ignore the right most column of the weights. Actually we simulate instance , but for ease of analysis we use "! and . Note that in and a job may be delayed until it is released (e.g. if , then would be released at the later time ). This enables the construction of bounds for .

Algorithm 4 (Simulate-SRA) Given the instance construct the instance . Set to be an empty list. Simulate SRPT on . As soon as a job finishes to run on the simulation add it to the end of the list . Run List-GG-PR on the list .

"! )

" ! )

"! ) #" ! ) " ! )

Let # , #! and # denote the preemptive optimal total completion time for the instances ! and , respectively. Let # denote the completion time of job in the simulation, and assume that the jobs are indexed so that # . " " # . We shall show that Simulate-SRA is 6-approximation w.r.t. sum of completion times, and 3-approximation w.r.t. makespan. We first show the bound # 1 # .

"! ) " "! ) Lemma 4 " !# ) "2 " !# )

'" ! )

and the bound is tight.

" ! ) %$

, here # Proof: To show the tightness consider the following instance %$ and # $ . The rest of the proof is easily constructed from the following two lemmas.

"! ) " Lemma 5 " !# )

"! )

# ! .

Proof: Take an optimal schedule & for ! , and construct a feasible schedule for by letting each job to run in exactly the last units of time that allocated to in & . Lemma 6

"! ) "

# !

"! )

# .

Proof: Let &' denote an optimal schedule for with makespan ( . For each $ () consider the following one unit time interval C$ $ +* . Produce a feasible schedule &' ! for ! by duplicating the intervals in the following way: during the interval $ C$ ,* schedule the jobs that run on . Notice that the completion time of each job is doubled.

! $

!" " ! $ )

" ! ) "! )

SRPT gives a feasible schedule for on a single processor, and hence: # ,- # , due to the indexing we get # , and - & # (in fact we could have taken 0 0 0 ) ).

"! )

"! )

6

, 0 0

and

"! )

Lemma 7

#

"! )

# .

Proof: Let &' denote a feasible schedule for with a makespan ( . Produce a feasible schedule &' for . For each $ ( ) , consider the one unit time interval C$ $ +* , where w.l.o.g the running jobs are . Then, from the capacity constraint we have . Equivalently, . Thus, in &' , we run job , , during unit of of time in the interval . We argue that if job - runs units of time and completes at time $ 0 in &' , it runs 1 1 units of time and completes at time $ 0 in &' . In particular, if &' is an optimal schedule for , the total completion time of &' is bounded from above by # and bounded from below by # . We conclude that #

# .

! $

"! )

"! ) "

"! )

"! )

Lemma 8 Simulate-SRA achieves competitive ratio of 6 for the total unweighted completion time.

Proof: Let denote the completion time in the schedule found by Simulate-SRA for job . Clearly job is released into the list at time # . By using List-GG-PR in Simulate-SRA we have by corollary 1:

"! )

Combining the above, gives:

4 4 2 4

'

? $ ? 4 4

"! )

#

"! )

# .

2.3 Ensuring Competitive Ratio of 3 for Makespan Withdrawing the restriction that a job in Simulate-SRA can only run after finishing in the simulation, one can guarantee a ratio of 3 for the makespan. A similar method can be applied to the work of Chakrabarti et al. [CPSSSW96] improving their results. The following modified algorithm demonstrates the method on Simulate-SRA. The idea is that the input list is partitioned into the two lists and % , where we first greedily select to schedule jobs from , and then from % .

Algorithm 5 (Simulate-SRA’) Given the instance construct the instance . Set and % to be empty lists. Upon releasing job add it to the end of the list % . Simulate SRPT on . As soon as a job finishes to run in the simulation, remove the remaining job from % and move it ahead to the end of the list . Run List-GG-PR on the list % , where % concatenated to the end of .

It may be the case that a job is finished to run by the time it is finished in the simulation, e.g., when

the job is postponed for at least the time . Clearly a job finishes to run no later than in the original version. Lemma 9 Simulate-SRA’ achieves competitive ratio of 3 for makespan. Proof: Simulate-SRA’ belongs to the List-GG-PR family. The dynamic ordering of jobs in the input list ( % ) is dictated by the simulation. Any job is present in the input list of Simulate-SRA’ from its releasing (as opposed to Simulate-SRA where a job might run only after finishing with the simulation).

7

2.4 Applying the Simulation to the Weighted General Model Considering weighted jobs, a similar simulation technique may be applied to the general model. Note that the analysis in the proofs of Lemmas 4 and 7 is per job (job-by-job). Thus adding weights does not affect the respective results. Hence we conclude that # ' # . The deterministic algorithm Simulate-wSRA, simulates the on-line algorithm “preemptive list scheduling in order of nonincreasing ratios BA ”, which Goemans et al. [GWW97] 3 proved to be 2-competitive for the model #$% .

"! )

"! )

Algorithm 6 (Simulate-wSRA) Given the instance construct the instance . Simulate preemptive list scheduling in order of nonincreasing ratios A 0 0 on III. As soon as a job is finished to run in the simulation add it to the end of the list . Run List-GG-PR on the list . Lemma 10 Simulate-wSRA achieves competitive ratio of 12 for total weighted completion time. Proof: Let 7 denote the completion time of job in the simulation. Assume the jobs are indexed so that 7 " " # 7 . Let denote the completion found by the algorithm. By using schedule time of job in the - List-GG-PR in Simulate-wSRA we have: 7 & 7 7 ' 7

'

Hence:

'

7

' "

$ " " !# ) #"2 " !# ) .

$ "

3 Off-Line Following Hall et al. [HSSW97] we achieve constant approximation factors for both criteria. The next two lemmas are simple generalization of lemmas 3.1 and 3.2 of [HSSW97], and so we omit the proofs. Lemma 11 Let

' ) )

%

'

denote the job completion times in a feasible schedule. Then:

) ) %

) % )

)

6?66 6

(1)

Lemma 12 Let be arbitrary numbers (not necessarily representing a feasible schedule) that satisfy (1) and assume that " " # . Then % .

'

Consider the following LP relaxation, denoted by LP.

(LP)

!#"%$ '&)(*(,+.-

* ) ) %

) ) %

) % )

66?6

6?66

/%021

/ 3

The first constraint ensures that the completion time of a job will not be less than its length. The second constraint is the inequalities (1). To solve the LP in polynomial time we use the ellipsoid algorithm; the separability of the constraints follows from the fact that (1) is merely a rescaled version of the inequalities for a single machine scheduling model which Queyranne [Queyranne93] proved are separable. 3

Its simple and short proof is quoted in [SS02].

8

Algorithm 7 (Start-Parallel-Jobs-by- ) Compute an optimal solution to LP. Sort the jobs by increasing order of . Run List-GG-PR on this ordered input list.

' '

We assume that

" " (by re-indexing). As we use LP to set the input permutation to ListGG-PR, we conclude that Start-Parallel-Jobs-by- is a 3-approximation for the makespan.

'

'

Lemma 13 Let be an optimal solution to LP, and let denote the completion times in the schedule found by Start-Parallel-Jobs-by- . Then 4 , for every .

Proof: Since we use List-GG-PR, any job solely depends on its predecessors in the permutation, . . Hence, from corollary 1 we get: & . From the first constraint in LP we have Recall that for & for , so . Hence, . By Lemma 12

$ "

$ " % " . Therefore, $ "1 " ! 4 .

Hence we conclude that 4 4 schedule for minimizing total weighted completion time.

"

where

"

is the value of the optimal

Corollary 2 For the general model ! 8$ " & algorithm Start-Parallel-Jobs-by- is a 5-approximation for total weighted completion time, and a 3-approximation for makespan. Corollary 3 By setting the release dates in LP to zero and by using List-GG-P instead of List-GG-PR in algorithm Start-Parallel-Jobs-by- , we get for the restricted model ! #$ ! & a 4-approximation for total weighted completion time, and a 2-approximation for the makespan.

Acknowledgement Many thanks to Uwe Schwiegelshohn for his reading of the manuscript and constructive comments.

References [ARSY99] J. Aslam, A. Rasala, C. Stein, and N. Young. Improved Bicriteria Existence Theorems for Scheduling Problems, in SODA 99. [Baker74] K. R. Baker. “Introduction to sequencing and scheduling”. Wiley, New York, 1974. [CPSSSW96] S. Chakrabarti, C.A. Phillips, A.S. Schulz, D.B. Shmoys, C. Stein, and J. Wein. Improved scheduling algorithms for minsum criteria, in “Proceeding of the 23rd International Colloquium on Automata, Languages and Programming” (F. Meyer auf der Heide and B. Monien, Eds.), ICALP ’96, LNCS 1099, Springer, pp. 646-657. [CMNS97] C. Chekuri, R. Motwani, B. Natarajan, and C. Stein, Approximation Techniques for Average Completion Time Scheduling, in “Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms”, 1997. [Droz94] M. Drozdowski, On complexity of multiprocessor tasks scheduling, Bulletin of the Polish Academy of Sciences, Technical Science 42/3, pp. 437-445, 1994.

9

[Droz96] M. Drozdowski, Scheduling multiprocessor tasks - An overview, European Journal of Operational Research 94 (1996) 215–230. [GG75] M. Garey and R. Graham. Bounds for Multiprocessor Scheduling with Resource Constraints. SIAM Journal on Computing 4(2) (1975) 187–200. [Graham66] R.L. Graham. Bounds for Certain Multi-Processing Anomalies. Bell System Technical Journal, 45:1563–1581, 1966. [GLLR79] R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: a survey. Annals of Discrete Mathematics 5 (1979) 287–326. [GWW97] M. X. Goemans, J. Wein, and D. P. Williamson. Personal Communication, August 1997. (Quoted in [SS02].) [HSSW97] L. A. Hall, A. S. Schulz, D. B. Shmoys, and J. Wein. Scheduling to Minimize Average Completion Time: Off-line and On-line Approximation Algorithms. Mathematics of Operations Research, August 1997. [LLLR84] J. Labetoulle, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan. Preemptive scheduling of uniform machines subject to release dates. in “Progress in Combinatorial Optimization” (W. R. Pulleyblank, Ed.), pp. 245-261, 1984. [PSW98] C. Phillips, C. Stein, J. Wein. Minimizing Average Completion Time in the Presence of Release Date. Mathematical Programming, B, 82, 1998. [Queyranne93] M. Queyranne. Structure of a simple scheduling polyhedron. Math. Programming 58 (1993) 263–285. [SS02] A. S. Schulz, and M. Skutella. The Power of alpha-Points in Preemptive Single Machine Scheduling. Journal of Scheduling 5(2) (2002) 121–133. [Sch96] U. Schwiegelshohn. Preemptive Weighted Completion Time Scheduling of Parallel Jobs. in “Proceedings of the 4th Annual European Symposium on Algorithms”, pp. 39-51, Lecture Notes in Computer Science 1136, Springer: Berlin, 1996. [SLWTY98] U. Schwiegelshohn, W. Ludwig, J. L. Wolf, J. J. Turek, and P. S. Yu. Smart SMART Bounds for Weighted Response Time Scheduling. SIAM Journal on Computing 28(1) (1998) 237–253. [SW97] C. Stein, and J. Wein. On the Existence of Schedules that are Near-Optimal for both Makespan and Total Weighted Completion Time. Operations Research Letters, 1997.

10