Improving resource sharing in computer networks ...

Viewer
Transcript

Université de Nice - Sophia Antipolis UFR Sciences École Doctorale STIC

THÈSE Présentée pour obtenir le titre de :

Docteur en Sciences de l'Université de Nice - Sophia Antipolis Spécialité : Informatique par

Natalia Osipova Équipe d'accueil : MAESTRO INRIA Sophia Antipolis

Improving resource sharing in computer networks with stochastic scheduling Soutenance à l'INRIA le 27 March 2009 devant le jury composé de : Président : Philippe Directeur : Konstantin Rapporteurs : Samuli Ivo Vishal Examinateurs : Patrick Guillaume

Nain Avrachenkov Aalto Adan Misra Brown Urvoy-Keller

INRIA Sophia Antipolis INRIA Sophia Antipolis Helsinki University of Technology Eindhoven University of Technology Columbia University France Telecom Orange Labs Institut EURECOM

Thèse L'amélioration du partage des ressources dans les réseaux de communication par l'ordonnancement stochastique

Improving resource sharing in computer networks with stochastic scheduling

Natalia Osipova Mars 2009

Improving resource sharing in computer networks with stochastic scheduling by

Natalia Osipova Directeur de thèse : Konstantin Avrachenkov MAESTRO, Inria Sophia Antipolis, France

Abstract In the current thesis we propose several new contributions to improve the performance of computer networks. The obtained results concern the resource sharing problems in the Internet routers, Web servers and operating systems. We study several stochastic scheduling algorithms which decrease the mean waiting time in the system with ecient resource sharing and provide the possibility to introduce the Quality of Service and ow dierentiation to the networks. We show the eectiveness of the proposed algorithms and study the possibility of their implementation in the router queues. The most important obtained results are the following. For the Two Level Processor Sharing scheduling discipline with the hyper-exponential job size distribution with two phases we nd an approximation for the optimal value of the threshold that minimizes the expected sojourn time. With the simulation results (NS-2) we show that TLPS improves signicantly the system performance when the found approximation of the optimal threshold is used. We study the Discriminatory Processor Sharing policy and show the monotonicity of the expected sojourn time in the system depending on the weight vector under certain conditions on the system. We apply the Gittins optimality result to characterize the optimal scheduling discipline in a multi-class single server queue. The found policy minimizes the mean sojourn time in the system between all non-anticipating scheduling policies. In several cases of practical interest we describe the policy structure and provide the simulation results (NS-2). For the the congestion control problem in the networks we propose a new ow-aware algorithm to improve the fair resource sharing of the bottleneck capacity.

Acknowledgments

Three years I passed at MAESTRO project at INRIA I will always remember with love and with warm feelings. For me it was a great scientic experience as well as a possibility to meet so many interesting, clever and nice people. First of all I would like to thank my advisor, Konstantin Avrachenkov, who was leading me during these years in my research, for his help and support, for his patience and encouragement. I am also grateful to him for the time he spent discussing with me, reading and correcting my work and also for the freedom in research I had. I would like to thank Philippe Nain, the head of the MAESTRO team, for greeting me in the project for these three years, for all the time he spent on me and for the good advices he gave me. I am very thankful to Patrick Brown, Alberto Blanc and Urtzi Ayesta, with whom I was working, for their help, patience and advices. Thank to Sindo Nunez for the discussions we had during my internship in CWI. I would like to thank all the members of my PhD committee. I wish specially to thank Samuli Aalto, Ivo Adan and Visal Misra for attentive reading of my thesis and for their corrections. Thank to Guillaume Urvoy-Keller for accepting to be an examinator in my committee. I want to thank all the MAESTRO project members who were here during these years for the friendly atmosphere they created. Thank to Mouhamad Ibrahim, Abdulhalim Dandoush, Giovanni Neglia, Sara Alouf, Dinesh Kumar, Alonso Silva, Ahmad Al Hanbali. Thanks to Nicolas Bonneau for being here and for his advices considering my French. Thanks to Ephie Deriche, who spent so many time on me, for all the explanations she gave. I would like to thank the members of my family for the moral support and love they gave me.

v

vi

Contents

Abstract

iii

Acknowledgements

v

Figures

xi

Tables

1

1 Introduction

3

1.1

1.2

The state of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.1.1

Computer networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.1.2

Computer network architecture . . . . . . . . . . . . . . . . . . . . . . . .

5

1.1.3

The Internet trac structure . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.1.4

Trac control in computer networks . . . . . . . . . . . . . . . . . . . . .

8

1.1.5

TCP/IP protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

Computer networks problems and proposed solutions . . . . . . . . . . . . . . . . 10 1.2.1

Trac control schemes advantages and disadvantages . . . . . . . . . . . . 10

1.2.2

Application of stochastic scheduling to resource allocation in computer networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3

Thesis contribution and organization . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Batch Processor Sharing with Hyper-Exponential Service Time

17

2.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3

The analysis of the Batch Arrival Processor Sharing model . . . . . . . . . . . . . 19

2.4

The analysis of the Two Level Processor Sharing model . . . . . . . . . . . . . . . 23

2.5

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 vii

viii

CONTENTS

3 Optimal choice of threshold in Two Level Processor Sharing

29

3.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3

Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4

3.5

3.3.1

Main denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2

The expected sojourn time in the TLPS system . . . . . . . . . . . . . . . 33

Hyper-exponential job size distribution with two phases . . . . . . . . . . . . . . 34 3.4.1

Notation and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4.2

Explicit form for the expected sojourn time . . . . . . . . . . . . . . . . . 35

3.4.3

Optimal threshold approximation . . . . . . . . . . . . . . . . . . . . . . . 36

3.4.4

Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.5

Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Hyper-exponential job size distribution with more than two phases . . . . . . . . 42 3.5.1

Notation and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5.2

Linear system based solution . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.3

Operator series form for the expected sojourn time . . . . . . . . . . . . . 44

3.5.4

Upper bound for the expected sojourn time . . . . . . . . . . . . . . . . . 45

3.5.5

Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.6

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.7

Appendix : Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Comparison of the Discriminatory Processor Sharing Policies

53

4.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3

Previous results and problem formulation . . . . . . . . . . . . . . . . . . . . . . 55

4.4

Expected sojourn time monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.5

Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.6

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.7

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 Optimal policy for multi-class scheduling in a single server queue

69

5.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.3

Gittins policy in multi-class M/G/1 queue . . . . . . . . . . . . . . . . . . . . . . 71

5.4

Two Pareto classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.1

Model description

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.4.2

Optimal policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.4.3

Mean conditional sojourn times . . . . . . . . . . . . . . . . . . . . . . . . 76

ix

CONTENTS

5.5

5.4.4

Properties of the optimal policy . . . . . . . . . . . . . . . . . . . . . . . . 77

5.4.5

Two Pareto classes with intersecting hazard rate functions . . . . . . . . . 79

5.4.6

Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.4.7

Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.4.8

Multiple Pareto classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Hyper-exponential and exponential classes . . . . . . . . . . . . . . . . . . . . . . 86 5.5.1

Optimal policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.5.2

Expected sojourn times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.5.3

Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.5.4

Pareto and exponential classes . . . . . . . . . . . . . . . . . . . . . . . . 90

5.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.7

Appendix : Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.7.1

Generating function calculation . . . . . . . . . . . . . . . . . . . . . . . . 92

6 Improving TCP Fairness with the MarkMax Policy

99

6.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.3

The MarkMax algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4

Fluid model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.4.1

6.5

6.6

Guideline bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5.1

Fluid model

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.5.2

Scenario 1

6.5.3

Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.5.4

Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7 Conclusions and perspectives

115

8 Présentation des Travaux de Thèse

119

8.1

8.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.1.1

Etat de l'art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8.1.2

Les problèmes dans les réseaux d'ordinateurs et les solutions proposées . . 127

8.1.3

La contribution et l'organisation de la thèse . . . . . . . . . . . . . . . . . 132

File d'attente avec service à temps partagé avec des arrivées en rafales et avec une distribution de temps de service hyper-exponentielle . . . . . . . . . . . . . . 135 8.2.1

Analyse d'une le d'attente avec service à temps partagé avec des arrivées en rafales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

x

CONTENTS

8.2.2 8.3

Analyse d'une politique à temps partagé avec deux niveaux . . . . . . . . 137

Choix du seuil optimal pour la le d'attente munie d'une politique à temps partagé avec deux niveaux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.3.1

Description du modèle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8.3.2

Distribution de temps de service hyper-exponentielle avec deux phases . . 138

8.3.3

Distribution de temps de service hyper-exponentielle avec plusieurs phases 139

8.4

Comparaison des politiques discriminatoires à temps partagé . . . . . . . . . . . . 140

8.5

Politique d'ordonnancement optimale dans une le d'attente multi-classe avec un serveur unique

8.6

8.7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

8.5.1

Politique de Gittins dans une le d'attente multi-classe M/G/1 . . . . . . 143

8.5.2

Deux classes Pareto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

8.5.3

Autres cas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.5.4

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Amélioration de l'équité de TCP avec la politique MarkMax . . . . . . . . . . . . 147 8.6.1

L'algorithme MarkMax

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

8.6.2

Modèle uide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

8.6.3

Résultats de simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Les conclusions et les perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Bibliography

156

Résumé

166

Figures PS

- dash dot line, T (θeopt ) - dash line. . . . . . . . . . . . . . . 40

3.1

T (θ) - solid line, T

3.2

g(ρ) - solid line, g1 (ρ) - dash line, g2 (ρ) - dash dot line. . . . . . . . . . . . . . . . 40

3.3

Mean response time in the system ( s) : TLPS - solid line with stars, DropTail dash line, LAS - dash dot line.

3.4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

The expected sojourn time T (θ) and its upper bound Υ(θ) for N = 10, 100, 500,

1000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5

The relative error ∆(θ) for N = 10, 100, 500, 1000. . . . . . . . . . . . . . . . . . 49

4.1

T

DP S

4.2

T

DP S

5.1

Two Pareto classes, hazard rates . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2

Two Pareto classes, policy scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3

Two Pareto extension classes, hazard rates . . . . . . . . . . . . . . . . . . . . . . 80

5.4

Two Pareto extension classes, g(x) function behavior . . . . . . . . . . . . . . . . 80

5.5

Two Pareto classes, mean sojourn times with respect to the load ρ, V1 . . . . . . 81

5.6

Two Pareto classes, mean sojourn times with respect to the load ρ, V2 . . . . . . 81

5.7

NS-2 simulation scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.8

N Pareto classes, hazard rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.9

N Pareto classes, policy scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

(g(x)), T (g(x)), T

PS PS

,T ,T

opt opt

functions, condition satised. . . . . . . . . . . . . . . . . 61 functions, condition not satised . . . . . . . . . . . . . . 61

5.10 Exponential and HE classes, hazard rates. . . . . . . . . . . . . . . . . . . . . . . 86 5.11 Exponential and HE classes, policy description. . . . . . . . . . . . . . . . . . . . 86 5.12 Exponential and HE classes, mean sojourn times with respect to the load ρ . . . 89 6.1

Some of the possible trajectories in the state space . . . . . . . . . . . . . . . . . 104

6.2

Scenarios 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.3

Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

xi

xii

FIGURES

Tables 3.1

Simulation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2

Increasing the number of phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.1

Two Pareto classes, parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2

Two Pareto classes, simulation parameters . . . . . . . . . . . . . . . . . . . . . . 83

5.3

Mean sojourn times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.4

Exponential and HE classes, simulation parameters . . . . . . . . . . . . . . . . . 89

6.1

Fluid Model : Jain's index, utilization. . . . . . . . . . . . . . . . . . . . . . . . . 109

6.2

Scenario 1 : Jain's index, utilization. . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.3

Scenario 1 : average queue size and delay. . . . . . . . . . . . . . . . . . . . . . . 110

6.4

Scenario 2 : Jain's index, utilization and average queue size. . . . . . . . . . . . . 111

6.5

Scenario 3 : Jain's index, utilization and average queue size and delay

1

. . . . . . 111

2

TABLES

Chapter 1 Introduction

In the early 1970s, networks that interconnected computers and terminals began to appear. These networks were developed to share computer resources and to interchange data between computers. Since then the task of minimization of the transmission costs and times and maximization of the amount of transmitted data was one of the most important tasks in computer networks. While with the technical progress the capacities of the computers grow, the need of quick, ecient and safe data transmission grows as well. The Internet is the largest computer network which connects more than one billion of users all over the world. The size of the Internet grows very fast, therefore the network resources have to be shared between a very large number of users. An incorrect resource allocation may imply server inaccessibility, long delays and other problems in the networks, which lead to the users' dissatisfaction with the provided service. Even though until today a lot of work was done to achieve optimal resource sharing, high performance and low delays, Internet behavior is far from ideal and there are still a lot of problems that have to be solved. In the present thesis we are dealing with the problem of the resource sharing in computer networks. We consider several scheduling algorithms and their application to ow scheduling in the Internet routers. As the response time in the network is one of the most important characteristics for the common users, we concentrate on the problem of mean sojourn time minimization. Taking into account the Internet trac structure we study several size-based dierentiation algorithms which give priority to the short ows and can signicantly decrease the mean response time in the network. We introduce a new ow-aware packet dropping scheme for the Internet routers which improves performance in the network and fairness between the ows. 3

4

Introduction

1.1 The state of the art 1.1.1 Computer networks A computer network is a set of several computers or terminals, which are interconnected by a communication network. Even if computer networks are widely presented in literature, see [Tan96, Sta03], in this introduction we describe some computer network basics to explain the motivation for the provided analytical results. Before talking about computer networks in detail, let us rst answer the question : Why are people interested in computer networks, what can they be used for ?. Globally we can classify the computer networks users in two groups, companies and common users. The companies use the computer networks mainly to achieve resource sharing (all programs, equipment and data availability to the workers of the company), high reliability (possibility to continue to work in case of hardware failure problems), saving money (high cost of big computers in comparison with several small ones), scalability (possibility to add new working places in the network and to increase system performance by adding new processors without global change of the system structure), communication between workers of the company (reports, work discussion). For the common computer and Internet users the most important aims are : access to remote information (bank accounts check, shopping, newspapers, magazines, journals, on-line digital libraries), personal communication (mail, virtual meetings, videoconferences), entertainment (video, movie, television, radio, music, game playing). So, computer networks take a big part of everyday peoples life and can help us in many dierent areas. Now that we pointed why we need the computer networks, let us return to our subject, how computer networks work. The main goal of computer networks is the possibility to interchange data between computers. In its simplest form data communication takes place between two devices that are connected directly. Often, however, it is not practical for two devices to be point-to-point connected. It is the case when the devices are situated very far from each other. An example is the telephone network of the world, or all the computers of a single organization. Then the solution is to connect each device to a communication network. Later in this work we refer to the devices which communicate either as stations or as nodes. The stations may be computers, telephones or other communication devices. Communication networks may be categorized based on the architecture and techniques used to transfer data. Globally there are broadcast and switched (point-to-point) networks. In the broadcast networks the transmission from one station is broadcast and received by all other stations. In the switched networks data is transferred from source to destination through intermediate nodes. The purpose of every node is to move data from node to node until it reaches its destination. Switched networks are divided into circuit-switched networks and packet-switched networks. In circuit-switched networks, the path between a sender and a destination is set in

1.1 The state of the art

5

advance and then the data is transmitted using this channel. In packet-switched networks, data is sent in a sequence of small chunks, called packets. Each packet passes through the network from one node to another along some path leading from source to destination. At each intermediate node, called router, the packet is received, stored briey, and then transmitted to the next node. The router takes decision where to transmit the packet. Packet-switched networks are commonly used for computer-to-computer communications. The computer networks are also classied according to their size as Local Area Networks (LAN), which cover a campus under a few kilometers in size, Metropolitan Area Networks (MAN), which cover a group of oces or a city, and Wide Area Networks (WAN), which cover a large geographical network. LAN and MAN usually do not use the packet switching, because of their limited sizes. The examples of LAN are Ethernet, IBM Token ring and the most known MAN are Distributed Queue Dual Bus (DQDB), etc. The most famous and the largest WAN is the Internet, which connects more then one billion of users and allows to interchange data between them. The WAN networks usually use the packet switching technology. In particular, the Internet is based on the packet switching technology. In the present work, we study problems related to packet-switched networks and in particular, to the Internet.

1.1.2 Computer network architecture Nowadays the communication between computers in the Internet is mostly based on the Open System Interconnection model (OSI) which consists of seven layers, see [Sta94]. The layer structure is used to decompose a complex problem of communication between computers into several smaller problems, the layers are autonomous and do not depend on each other. Every layer is responsible for certain functionalities, it uses the functions of the lower layer and gives functionality to the upper layer. The layers are based on the concept of the protocol, a set of rules which serves to organize data transfer. The OSI layers are : Physical, Data Link, Network, Transport, Session, Presentation and Application. We do not give a full description of all layers and their functionalities here, but we restrict ourselves to the Transport and Network layers of the OSI system, as they correspond to the data transfer and provide error recovery and ow control. The Network layer accepts packets from the Transport layer and delivers them from source to destination in the network. The Network layer is based on the Internet Protocol (IP). The IP technology does not check if the packets were delivered, so the error recovery has to be done by the Transport or other higher level protocols. The Network layer provides unreliable delivery service in the networks. The Transport layer accepts data from the Session layer, splits it into packets and transmits packets to the Network layer, see [Sta94]. The two main Internet protocols of the Transport

6

Introduction

layer are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). Both of them use IP protocol of the Network layer, that's why we are usually talking about TCP/IP protocols. UDP protocol is rarely used in the Internet because of its unreliability. UDP does not provide a reliable delivery of the packets and is used by the applications which provide their own ow control and check packet arrivals. Also UDP can be used when some loss of transferred data can be tolerated, as in the Internet telephony. The most used protocol in computer networks and in the Internet is TCP. TCP is a reliable connection-oriented protocol which allows to deliver the information from one machine in the network to another without errors. It is designed to provide maximum throughput and reliable transfer over an unreliable and unknown network. Dierent parts of the WAN can have dierent topologies, bandwidths, delays, packets sizes and other parameters. Also all these parameters can change. TCP dynamically adapts to properties of the network and is robust in the face of many kinds of failures. TCP provides ow control to make sure that a fast sender does not overow a slow receiver or intermediate nodes with more information than they can handle. Using the Congestion Control mechanism, TCP reduces its sending rate when a loss occurs in the network, and so adapts its sending rate according to the parameters of the receiver and the network. We give a more full description of TCP protocol work in Subsection 1.1.5. The applications that are nowadays [Sta03] the most used in the Internet and computer networks and which use the TCP/IP protocol are : Telnet (virtual terminal), File Transfer Protocol (FTP) is used for le transfers between systems with dierent properties and structures, electronic mail protocol (SMTP) is used to transfer the electronic mail messages, Multipurpose Internet Mail Extension (MIME) makes it possible to include pictures and other multimedia in the message, Domain Name System (DNS) is used to nd the relation between the host names and their network addresses, Hypertext Transfer Protocol (HTTP) is used to transfer web pages in the Internet, Session Initiation Protocol (SIP) is the application level protocol for sessions control in the networks.

1.1.3 The Internet trac structure Let us point out the most important characteristics of the trac structure in computer networks and in the Internet, which we need in the following analysis. In the ow-level modelling framework, ow is the basic unit of the data trac. The ow is dened as an interruptive stream of packets sent from the source to destination. We can model as a ow one TCP connection which opens, sends one or more les and then closes, or we can dene every le sent by the application as a separate ow. In the current work we consider that the ow corresponds to the sending of one le. In the current work we use terms ow/connection/le/session interchangeably, in stochastic scheduling we use the term job.

1.1 The state of the art

7

A ow is basically characterized by its duration, size and sending rate. The Internet trac structure was widely studied in the literature. In [CB97, TMW97, NMM98, SAM99], the authors analyze the real data trac on the selected web servers during a suciently long period of time and describe trac characteristics. In [FML+ 03], the authors propose a monitoring system which is designed to analyze trac measurement and also provide results they got with the proposed system. In [Wil01], the author describes trac measurements and characteristics. In [BGG03], the authors provide trac collection and analysis between several important servers in France. In [BFOBR02], the authors study the admission control in the Internet in application to elastic and streaming ows. In more recent work [CC08], the authors propose a new characteristic, the Quality of Experience, to measure how the users perceive the network performance and provide real trac measurements results. In the Internet trac is divided in elastic and streaming ows. Elastic ows are transferred les, HTTP pages, etc. Streaming ows are created by video and audio applications. Elastic ows are still dominant in the Internet even though audio and video applications are more and more used, see [BFOBR02]. In the current work we study elastic ows, considering that streaming ows take some limited share of the bandwidth. The trac transferred with the TCP/IP protocol represents 90% of all Internet trac, see [CC08, FML+ 03, BFOBR02]. The interarrival times between the les in the Internet are exponentially distributed and le arrivals to the network can be modelled with a Poisson process. The important characteristic of the Poisson process is Poisson Arrivals See Time Averages (PASTA) property [Wol89], which plays an important role in the mathematical analysis of the network modelling. Most ows (90 − 95%) transferred in the Internet are very small, but most of the trac is created by the long ows, which are not numerous (remaining 5 − 10%), see [Wil01, All00, FML+ 03] and others. According to [CC08], 80% of the trac is created by the ows larger than

1 MB and 50% by the ows of size larger than 10 MB. This is caused by the fact that the most frequent ows are created by e-mail and web page transfers, which have small sizes and the long ows are generated by le transfers, peer-to-peer applications, etc., and are much rarer. The short ows are then called mice, long ows elephants and this phenomena in the Internet is called mice-elephant eect. It was found that the le size distributions in the Internet are well modelled by long-tailed and heavy-tailed distributions and also have a Decreasing Hazard Rate (DHR). In [NMM98], with the real data analysis, the authors conrm that the le size distribution in the Internet can be modelled with heavy-tailed Pareto distributions. In [CB97], the authors provide network trac analysis from a web server and found that the le size distribution is heavy tailed. In [Rob01], the author shows that the streaming ows durations are also heavy-tail distributed. In contrast to the ow arrivals, the packet arrivals are generally not Poisson. Because of the

8

Introduction

DropTail router policy, which creates global synchronization in the network, and also because of the TCP algorithm, (we discuss this later in Subsection 1.1.5), the packets have the tendency to arrive in groups, which are called batches. Such arrivals are also called bursty arrivals. Packet sizes in the Internet vary from Maximum Transmit Unit (MTU), to the acknowledgement (ACK) sizes (40 bytes). According to [Wil01, FML+ 03], the large packets represent 50% of all packets in the network, ACKs represent 40% and the rest of the packets have sizes which are randomly distributed between these values. The trac on the link is usually bidirectional, but not symmetric.

1.1.4 Trac control in computer networks One of the main problems associated with the computer networks is trac control, see [Sta94], which is in regulating the amount of trac which enters the network so that the network performance is high. In trac control problems we can mention ow control and congestion control. Flow control is needed to prevent the sender from transmitting information with a rate which is higher than the possible receiving rate of the destination. Flow control regulates data transmission rate between two nodes. Congestion is a situation when the data arrival rate is higher than the network transmission capacity. In this case the router can not serve all the incoming packets, which are then collected in the router buer and wait in the queue to be served. If the arrival rate does not decrease, the queue size increases dramatically, there is no place for more packets, and the new arriving packets are dropped and later retransmitted. The congestion in the networks is responsible for the most important part of delays. Congestion control techniques try to prevent the congestion situation before it happens, or at least, react on it properly, i.e., decrease data arrival rates. The ecient congestion control algorithm has to avoid buer overow and at the same time try to keep the queue not empty, to achieve higher throughput. To provide ecient trac control the sender needs to know the situation on the router and in the network, which is not always easy and even more, usually impossible to realize. On the other side, the router neither does not have a direct access to the data senders to control their sending rates. In the Internet the trac control is realized with the combination of the DropTail policy on the router and TCP/IP protocols. The DropTail policy is the simplest and the most commonly used algorithm for the buer size management in TCP/IP networks. With the DropTail algorithm the router drops the arriving packets from the tail of the queue if the buer is full. The enqueued packets are served according to the First Come First Served (FCFS) policy. The buer size of the router is limited. Even it is technically possible to make the buer size very large, it is not used, because the large queue size

1.1 The state of the art

9

creates large queuing delay. More on this topic one can nd in [AANB02, AKM04, BGG+ 08], etc. The current TCP implementation provides ow and congestion control. We give more detail description of TCP algorithms in the following Subsection 1.1.5.

1.1.5 TCP/IP protocols TCP/IP protocols are now widely used in the Internet and play an important role in determining the network performance. The formal TCP description is given in [Pos81]. The idea of the dynamic congestion window size is proposed in [Jac88]. Later changes and extensions are given in [Bra89, JBB92, Ste97, APS99]. Also the description of TCP can be found in such books as [Sta03, Tan96, Wil98] or in reviews, see [Kri00]. TCP is based on the end-to-end argument idea, which is that the sending rate of data ow is controlled by the receiver. To realize data transmission between two nodes, TCP has to be installed on both of them, the sender and the receiver. When TCP sends a data le, it breaks it into packets (or segments) of the given size and sends each of them separately in the data stream. When packets arrive to the destination, they are given to the TCP entity, which reconstructs the original le. As the IP protocol does not give guarantee of packet arrivals, it is up to TCP to nd which packets were lost and retransmit them. For that purpose every time the destination TCP receives a packet, it sends back to the sender a packet of small size, which is called acknowledgement (ACK) which contains information about the received packet. The receiver acknowledges the last packet of the received continuous stream of packets. If there is a packet which arrives out of order, TCP sends the ACK for the last packet which was received in order. In this case the sender receives several times the same ACK, which is called duplicate ACK, knows that the packet of the stream was lost and can retransmit it. The time between sending a packet and receiving an ACK for it is round-trip-time (RTT) which is an important notion related to TCP. The congestion control scheme in TCP is realized with congestion window (cwnd) which controls the amount of data which can be sent without being acknowledged and in fact controls the rate of transmission. The algorithms which are used in TCP congestion control are : Slow Start, Congestion Avoidance, Fast Retransmit and Fast Recovery. The Slow Start algorithm is used in the beginning of the le transfer to determine the capacity of the network. During Slow Start TCP increments its congestion window by one packet for each received ACK. Slow Start ends when the congestion window reaches some given threshold. After Slow Start algorithm, the Congestion Avoidance algorithm is used. During Congestion Avoidance, congestion window is incremented by one packet per RTT or by one packet when the data which corresponds to the current size of congestion window is acknowledged. Congestion Avoidance is continued until congestion is detected. The Slow Start algorithm makes it possible

10

Introduction

for the TCP connection to increase its sending rate quickly in the beginning of the le transfer, while during the Congestion Avoidance the rate increases slowly to avoid the network overload. To detect a congestion and a packet loss TCP uses a timer. To retransmit the lost packet faster than the timer expires the Fast Retransmit algorithm is used. With the Fast Retransmit the packet is considered to be lost and is retransmitted when TCP receives three duplicate ACK. The Fast Recovery mechanism is that the congestion window is reduced in half in the case of packet loss detection. It helps TCP to restore the congestion window more quickly than if it was reduced to one packet and so help to achieve higher throughput. The Tahoe implementation of TCP includes Slow Start, Congestion Avoidance and Fast Recovery. Reno includes Tahoe properties plus Fast Retransmit. NewReno is a slight modication of the Reno version that improves performance during Fast Recovery and Fast Retransmit. In our studies and simulations we consider the NewReno version of TCP.

1.2 Computer networks problems and proposed solutions 1.2.1 Trac control schemes advantages and disadvantages TCP is the most used data transmission protocol in the Internet as it is exible and provides a reliable data transfer and trac control. On the ow level TCP tries to provide a fair share of the bottleneck capacity between all ows currently present in the queue. As the routers generally do not use discriminating or priority policies, the share of the bottleneck capacity depends only on the sending rates of every ow. Then, if the sending rates of every ow are kept equal, the bandwidth share is equal as well. The advantage of the DropTail policy is its simplicity. There is no need to set various parameters and keep the additional information about the ows and the state of the queue. However, there are many disadvantages of the currently used combination of DropTail policy and TCP/IP protocols. They are packet retransmissions, global synchronization, unfair bandwidth sharing, absence of Quality of Service. With the DropTail policy packets are dropped when the buer is full, TCP reduces its sending rate only after a packet loss is detected, which creates multiple packet retransmissions in the network. DropTail policy does not make dierentiation between ows and so there is no Quality of Service. When several TCP connections share the same bottleneck link, the bottleneck bandwidth is shared unfairly and the ows with small RTTs have an advantage compared to the ows with large RTTs. This happens because during the congestion moments all connections which share the bottleneck link decrease their sending rates, but for connection with high RTT it takes longer to restore its sending rate than for connection with small RTT. Then the nal transferred amount of data for the slow connection is much smaller than for the fast connection. Also the

1.2 Computer networks problems and proposed solutions

11

fact that all connections currently using the bottleneck link reduce their rates nearly at the same time creates global synchronization in the network, which in turn leads to the network underutilization. There were many proposals to increase performance of the Network and Transport layers of the Internet. Among them are Network Pricing, Explicit Congestion Notication (ECN), Active Queue Management (AQM) algorithms and scheduling algorithms. Network Pricing is a category of congestion control, where the cost of transmission is used. Making the transmissions of TCP payable may avoid congestion as the senders will be forced to minimize the generated amount of trac. More on this topic can be found in [Bre96, SCEH96, FR01, FR04, FORR98]. ECN is a ag which is used to warn the TCP sender about the congestion situation in the network, see [Flo95, RF99, RFB01]. When the congestion occurs, the router marks packets with ECN ag instead of dropping them. The packet marked with ECN ag comes to the destination and the receiver sends back the ACK with ECN ag. When the ACK with ECN ag is received by the sender, it reduces in half its congestion window as if a packet loss was detected. So, if in the router instead of dropping the packets, they are marked with ECN ags, the TCP congestion window is reduced, but there is no need to retransmit the packets again. To avoid unfair resource sharing in the Internet several AQM schemes were proposed. AQM is a family of packet dropping algorithms for FCFS queues which manage the length of packet queues dropping the packets when necessary. AQM algorithms inform the sender about the possibility of the congestion before a buer overow happens. Among AQM algorithms are RED [FJ93], GREEN [WZ02], BLUE [FSKS02], MLC(l) [SS07], CHOKe [PPP00], etc. None of them was widely implemented in the networks because of their complexity and nontrivial parameters selection. From the user point of view one of important characteristics in computer networks is the response time, which is determined by the various delays in the network. The delay in the networks consists of the transfer delay, propagation delay, processing delay and queueing delay. In the networks the queueing delay and delay which is caused by the packet drops and retransmissions give the largest part of the response time. The queueing delays in the network can be reduced with the ecient scheduling algorithms. While the AQM scheme nds the packet which has to be dropped to avoid congestion in the network, scheduling algorithms nd the packet which have to be next served and are used to reduce queueing delay and to manage bandwidth share between ows. To develop an ecient scheduling algorithm one has to take into account the specic problems of its application domain. In case of computer networks, these problems are : large number of connections sharing the bottleneck link, the trac characteristics, the changes of the sending rates, the possible changes of the network topology and properties and so on. Even though there

12

Introduction

exist a lot of dierent scheduling algorithms, it is not evident to nd one which is ecient, scalable, easy to implement, does not need knowledge of specic system parameters. In the following subsection we give a short review of the scheduling algorithms which were proposed to be applied in computer networks and in the Internet.

1.2.2 Application of stochastic scheduling to resource allocation in computer networks From the stochastic scheduling theory, it is known that, applying dierent scheduling policies to a queue, it is possible to inuence the system characteristics a lot. The goal of stochastic scheduling is to nd an algorithm which improves system performance and at the same time which is simple to implement. It is quite dicult to model the network on the packet level, as the packet arrivals are bursty and are not Poisson distributed as ow arrivals, see Subsection 1.1.3. Thus networks are often modelled on the ow level. Every le sent by TCP connection is presented as a job and every router as a queue. When we talk about a job size, we consider the time that the job is served in the queue if there is no more jobs in the system. Later in this work we use the terms job size and service time interchangeably. As we discussed in Subsection 1.2.1, the bandwidth share on the bottleneck link of the TCP ows in the case when their RTTs are of the same order is well modelled by the Processor Sharing (PS) discipline, see [HLN97, NMM98, MR00, FBP+ 01, CJ07]. Under the PS policy every job present in the system receives an equal share of the processor capacity. The PS discipline is easy to analyze, Kleinrock [Kle76a, Sec. 4.4] obtained the expression of mean conditional and mean sojourn time in the M/G/1 system scheduled with PS discipline. However, PS disciple does not minimize the mean sojourn time in the system. It is known that the Shortest Remaining Processing Time (SRPT), see [Kle76a, Ch. 3], policy minimizes the mean sojourn time in the system, see also [Sch68]. The SRPT discipline requires knowledge about the job sizes, which is not always possible, as the router does not have information about the size of the le which was send. Kleinrock [Kle76b, Kle76a] gives an overview of policies, which do not use information about the job sizes and are called non-anticipating. In the last years these policies received a signicant attention because of their possible application to resource sharing in computer networks. It is shown in [Yas87] that the Least Attained Service (LAS) or Foreground-Background (FB) policy, see [Kle76a, Sec. 4.6], minimizes the mean sojourn time in the system among all non-anticipating scheduling policies when the job size distribution function has a decreasing hazard rate (DHR). As this is the case for the job size distribution in the Internet, LAS received a lot of attention, it was studied in [RS89, FM03b, RUKB03, RUKVB04, RBUK05]. The survey

1.2 Computer networks problems and proposed solutions

13

on LAS is presented in [NW08]. However, LAS has some disadvantages, for example, it can be very unfair for the long ows in some cases and it increases a lot the service time for the long ows, see [FM03b]. Also the mean sojourn time in the system under LAS highly depends on the job size distribution, [RUKB03]. If there is a long ow in the system which is almost nished to be served and there is another long ow which arrives, then the rst ow has to wait all the service time of the second ow before quitting the system. The problem of LAS unfairness with the large jobs was studied in [RUKB03, WHB03]. Regarding this problem, in [Bro06] it was shown that when the second moment of the job size distribution is innite, LAS can have a smaller expected conditional sojourn time than PS. In particular case of the Pareto job size distribution with parameter α < 1, 5, LAS has always a smaller expected conditional sojourn time than PS. Both, SRPT and LAS policies give priority to the short ows and so minimize mean sojourn time in the system. The le size distribution in the Internet is heavy-tailed and most of the ows have short sizes, see Subsection 1.1.3. Then it seems logical to give priority to the short ows in the network. The dierentiation between short and long ows in the Internet was widely studied, see [GM01, NT02, GM02a, GM02b, RUKB02, RUKB03, FM03b, WBHB04, AANO04, AABN04]. Among ow dierentiating policies is the Multi Level Processor Sharing (MLPS) discipline which was introduced and described by Kleinrock, see [Kle76a, Sec. 4.7]. He shows that the mean sojourn time in the MLPS system can be suciently reduced in comparison with the PS system. When the MLPS discipline is applied, the jobs are served according to their attained service up to the given number of thresholds. In [AANO04, AANO05], the authors show that when the job size distribution has a DHR, MLPS decreases the mean sojourn time in the system with respect to the PS discipline. In [AA06], the authors show that with MLPS the mean delay in the system can be very close to optimal when the job size distribution has a DHR. A particular case of MLPS, Two Level Processor Sharing (TLPS) and its application to resource sharing in computer networks was studied in [AANO04, AABN04]. In [AABN04], based on the TLPS model, the authors develop the RuN2C algorithm and show that it reduces signicantly the mean sojourn time in the system in comparison with the standard DropTail policy. The mean sojorn time in the TLPS model signicantly depends on the threshold selection, which was not yet studied analytically. The main idea behind LAS and TLPS policies is to give priority to the short jobs, but they do not give possibility to give preference to some selected ows. In contrast, Discriminatory Processor Sharing (DPS) policy allows to introduce the Quality of Service in the network. DPS provides a natural approach to model the resource sharing of the TCP ows with dierent RTTs or weighted round-robin algorithm, which is used in operating systems. Also the DPS discipline can be used to model the pricing policies on server, when the dierent services are provided

14

Introduction

according to the paid rates. DPS was rst introduced by Kleinrock [Kle67]. Under DPS jobs are organized in classes and are served according to the vector of weights, so each class has its priority in the system. The DPS policy was studied in [FMI80, RS94, RS96, GM02b, AJK04, KNQB04, KNQB05, AABNQ05]. Most of the results obtained for the DPS queue were collected together in the survey paper [AAA06]. However, weight vector selection in DPS is not a trivial task because of the system complexity. The problem of nding an optimal policy between all non-anticipating scheduling policies in the M/G/1 queue was solved by Gittins in [Git89]. He showed that in the M/G/1 queue the policy which gives service to the job in the system with the highest Gittins index function of the attained service minimizes the mean sojourn time in the system between all non-anticipating scheduling policies. The well known results of LAS optimality for the DHR job size distribution can be derived as a corollary of the general optimality of the Gittins policy. However, this optimality result did not receive much attention and so was not fully exploited.

1.3 Thesis contribution and organization In the current Thesis we study the problem of resource sharing in computer networks. We study several scheduling algorithms from the stochastic scheduling theory and their application to the computer networks. In Chapters 2 - 5 we study the problem of the mean sojourn time minimization in the system with various scheduling algorithms. In Chapter 6 we study the congestion control problem in the networks and propose a new ow-aware algorithm to improve the fair resource sharing of the bottleneck capacity. In Chapter 2 we study the Batch Processor Sharing (BPS) model with hyper-exponential service time distribution. For this distribution we solve Kleinrock's integral equation for the expected conditional response time function and prove the concavity of the solution with respect to the job size. We apply the found result to nd the analytical expressions of the mean conditional and unconditional times for the TLPS model in the following Chapter 3. We also use the batch queue analysis in the derivation of the mean conditional sojourn time in Chapter 5. The results of this chapter are published in [Osi08a]. In Chapter 3 we analyze the TLPS scheduling discipline with the hyper-exponential job size distribution and with the Poisson arrival process. In the rst part of the chapter we study the case when the job size distribution has two phases. The choice of two-phase job size distribution is motivated with the mice-elephant eect of the le size in the Internet, see Subsection 1.1.3. In the case of the hyper-exponential job size distribution with two phases, we nd a closed form analytic expression for the expected sojourn time and an approximation for the optimal value of the threshold that minimizes the expected sojourn time. With the numerical results we show that the mean sojourn time in the TLPS system is very close to optimal with the found

1.3 Thesis contribution and organization

15

approximated threshold value. With the simulation results with NS-2 simulator we show that the analytically found threshold approximation minimizes the mean sojourn time in the TLPS system between other threshold values and gives signicant relative gain in comparison with the DropTail policy. In the second part of Chapter 3 we study the TLPS system when the job size distribution is hyper-exponential with many phases. For this case we derive a tight upper bound for the expected sojourn time conditioned on the job size. We show that when the variance of the job size distribution increases, the gain in system performance increases and the sensitivity to the choice of the threshold near its optimal value decreases. This work is published in [ABO07]. In Chapter 4 we study the comparison of two DPS policies with dierent weight vectors. We show the monotonicity of the expected sojourn time of the system depending on the weight vector under certain condition on the system. The restriction on the system is such that the result is true for systems for which the values of the job size distribution means are very dierent from each other. The restriction can be overcome by setting the same weights for the classes, which have similar means. The condition on the means is a sucient, but not a necessary condition. It becomes less strict when the system is less loaded. The results of this chapter can be found in [Osi08b]. In Chapter 5 we obtain the optimal policy for multi-class scheduling in a single server queue. We apply the results of Gittins [Git89], where he found the optimal policy which minimizes the mean sojourn time in the system in a single class M/G/1 queue between all non-anticipating policies. In this chapter we show that a straightforward extension of Gittins' results allows us to characterize the optimal scheduling discipline in a multi-class M/G/1 queue. We apply the general result to several cases of practical interest where the service time distributions have DHRs, like Pareto or hyper-exponential. We show that in the multi-class case the optimal policy is a priority discipline, where jobs of the various classes depending on their attained service are classied into several priority levels. Using a tagged-job approach we obtain, for every class, the mean conditional sojourn time. This allows us to compare numerically the mean sojourn time in the system between the Gittins optimal and popular policies like PS, FCFS and LAS. As in the Internet the le size is heavy-tailed and has a DHR, see Subsection 1.1.3, the obtained optimal Gittins policy can be applied in the Internet routers, where packets generated by dierent applications must be served. Typically a router does not have access to the exact required service time (in packets) of the TCP connections, but it may have access to the attained service of each connection. Thus we implement the Gittins optimal algorithm in NS-2 and we perform numerical experiments to evaluate the achievable performance gain. In Chapter 6 we introduce MarkMax, a new ow-aware AQM algorithm for Additive Increase Multiplicative Decreases protocols (like TCP). The main idea behind MarkMax is to identify which connection should reduce its sending rate instead of which packets should be dropped.

16

Introduction

In contrast with several previously proposed AQM schemes, MarkMax uses the dierentiation between ows currently presented in the system and cuts the sending rate of the ows with the biggest sending rate. MarkMax sends a congestion signal to a selected connection whenever the total backlog reaches a given threshold. The selection mechanism is based on the state of large ows. Using a uid model we derive some bounds that can be used to analyze the behavior of MarkMax and we compute the per-ow backlog. We provide the simulation results, using NS-2, compare MarkMax with Drop Tail and show how MarkMax improves both the fairness and link utilization when connections have signicantly dierent RTTs. We specify the algorithm, perform its theoretical analysis and provide simulation results which illustrate the performance of MarkMax. The work is published in [OBA08]. We give the conclusion and future work in Chapter 7.

Chapter 2 Batch Processor Sharing with Hyper-Exponential Service Time

2.1 Summary One of the main goals to study BPS is the possibility of its application to age-based scheduling and the possibility to take into account the burstiness of the arrival process. Bursty arrivals often occur in modern systems such as web servers. Age-based scheduling is used in dierentiation of short and long ows in the Internet. We study the BPS model with the hyper-exponential service time distribution. For this distribution we solve Kleinrock's, integral equation for the expected conditional response time function and prove the concavity of the solution with respect to the job size. We note that the concavity of the expected conditional sojourn time for the BPS with the hyper-exponential job size distribution was proven using another method in [KK08]. We apply the obtained results to nd the mean conditional sojourn time in the Two Level Processor Sharing (TLPS) system when the job size distribution is hyper-exponential. We prove that in the TLPS system the mean conditional sojourn time is not a concave function. The results of this chapter are published in [Osi08a].

17

18

Chapter 2 : Batch Processor Sharing with Hyper-Exponential Service Time

2.2 Introduction The Processor Sharing (PS) queueing systems are now often used to model communication and computer systems. The PS systems were rst introduced by Kleinrock (see [Kle76a] and references therein). Under the PS policy each job receives an equal share of the processor. PS with batch arrivals (BPS) is not yet characterized fully. Kleinrock et al. [KMR71] rst studied BPS. They found that the derivative of the expected response time satises an integral equation and found the analytical solution in the case when the job size (service time) distribution function has the form F (x) = 1 − p(x)e−µx where p(x) is a polynomial. Bansal [Ban03], using Kleinrock's integral equation, obtained the solution for the Laplace transform of the expected conditional sojourn time as a solution of a system of linear equations, when the job size distribution is a hyper-exponential distribution. Also he considers distributions with a rational Laplace transform. Rege and Sengupta [RS93] obtained the expression for the response time conditioned upon the number of customers in the system. Feng and Misra [FM03a] provided bounds for the expected conditional response time, the bounds depend on the second moment of the service time distribution. Avrachenkov et al. [AAB05] proved existence and uniqueness of the solution of Kleinrock's integral equation and provided asymptotic analysis and bounds on the expected conditional response time. We study the BPS model with the hyper-exponential service time distribution. For this distribution we solve Kleinrock's integral equation for the expected conditional response time function and prove the concavity of the solution with respect to the job size. We note that the concavity of the expected conditional sojourn time for the BPS with the hyper-exponential job size distribution was proven using another method in [KK08]. One of the main goals to study BPS is the possibility of its application to age-based scheduling and the possibility to take into account the burstiness of the arrival process. Bursty arrivals often occur in modern systems such as web servers. Age-based scheduling is used in differentiation of short and long ows in the Internet. A quite general set of age-based scheduling mechanisms was introduced by Kleinrock and termed as Multi Level PS (MLPS). In MLPS jobs are classied into dierent classes depending on their attained amount of service. Jobs within the same class are served according to FCFS, PS or FB policy. The classes themselves are served according to the FB policy, so that the priority is given to the jobs with small sizes. We study the Two Level PS (TLPS) scheduling mechanism, a particular case of age-based scheduling. It is based on the dierentiation of jobs according to some threshold and gives priority to jobs with small sizes. The TLPS scheduling mechanism can be used to model size based dierentiation in TCP/IP networks and Web server request dierentiation, see [AA06, AABN04]. It is known that many probability distributions associated with network trac and, in particular, the le size distribution in the Internet are often modelled with heavy-tailed distributions.

2.3 The analysis of the Batch Arrival Processor Sharing model

19

In [BM06, FW98] it is shown that a heavy-tailed distribution can be approximated with a hyperexponential distribution with a signicant number of phases. We study the TLPS model with the hyper-exponential service time distribution. We apply the results of the BPS queueing model to the TLPS model with the hyper-exponential service time distribution, nd an expression for expected conditional sojourn time function and prove that it is not a concave function with respect to the job sizes. The chapter is organized as follows. In Section 2.3 the BPS scheduling mechanism with the hyper-exponential service time distribution is considered. In Section 2.4 the results obtained for the BPS model are applied to the TLPS model, where the job size distribution is also hyper-exponential. We put some technical proofs in the Appendix. This results of this chapter were published in [Osi08a]. More detailed proofs can be found in Research Report [Osi07]. The analysis of the queue with batch arrivals is also used in Chapter 3 and Chapter 5 of the thesis.

2.3 The analysis of the Batch Arrival Processor Sharing model Let us consider an M/G/1 system with batch arrivals and PS queueing discipline. The batches arrive according to a Poisson process with arrival rate λ. Let n > 0 be the average size of a batch. Let b > 0 be the average number of jobs that arrive with (and in addition to) an arbitrary job which is tagged upon arrival. Let B(x) be the required job size (service time) distribution and B(x) = 1 − B(x) be its complementary distribution function. The load is given R∞ by ρ = λnm, with m = 0 xdB(x). We assume that the system is stable, ρ < 1. It is known that many important probability distributions associated with network trac are heavy-tailed. In particular, le size distributions observed in the Internet are often heavytailed. The heavy-tailed distributions are not only important and prevalent, but also dicult to analyze. In [BM06, FW98] it was shown that it is possible to approximate a heavy-tailed distribution by a hyper-exponential distribution with a signicant number of phases. Thus, in our work we use the hyper-exponential function to represent the job size distribution function X B(x) = 1 − pi e−µi x , 1 < N ≤ ∞, (2.1) i

pi > 0, µi > 0, i = 1, . . . , N ,

P

i pi

= 1. Without loss of generality, we can assume that (2.2)

0 < µN < µN −1 < . . . < µ2 < µ1 < ∞. By Q

P i

and

Q i

we mean

PN

i=1

and

QN

i=1 .

By

P i6=j

or

Q i6=j

we mean

P i=1,...,N, i6=j

and

i=1,...,N, i6=j .

Let α(x) be the expected conditional response time in the BPS system for a job with service time x and α0 (x) be its derivative. Kleinrock showed in [Kle76a, Sec. 4.7] that α0 (x) satises the

20

Chapter 2 : Batch Processor Sharing with Hyper-Exponential Service Time

following integro-dierential equation Z ∞ Z x 0 0 α (x) = λn α (y)B(x + y)dy + λn α0 (y)B(x − y)dy + bB(x) + 1. 0

0

Before presenting our main result let us prove auxiliary lemmas. Let us dene X pi Ψ(s) = 1 − λn . s + µi

(2.3)

(2.4)

i

Lemma 2.1 The zeros bi , i = 1, . . . , N of the rational function (2.4) are all real, distinct, positive and satisfy the following inequalities : 0 < bN < µN ,

µi+1 < bi < µi ,

i = 1, . . . , N − 1.

(2.5)

Proof. Following the approach of [FMI80], the equation Ψ(s) = 0 has N1 roots −bi , i = 1, . . . , N1 , where N1 is the number of distinct elements within µi . We have N1 = N because of (2.2). All −bi ,

i = 1, . . . , N are real, distinct, negative and satisfy the following inequalities : 0 > −bN > −µN , −µi+1 > −bi > −µi , i = 1, . . . , N − 1. With this we prove the statement of Lemma 2.1.

Lemma 2.2 The solution of the following system of linear equations : X j

is unique and is given by

µ2q

xj = 1, − b2j

Q xk =

q=1,...,N

Q

q6=k

q = 1, . . . , N,

(µ2q − b2k )

(b2q − b2k )

,

k = 1, . . . , N.

(2.6)

(2.7)

Proof. The proof is given in the appendix. Corollary 2.1 The solution of equation (2.6) is positive. Namely, xk > 0 for k = 1, . . . , N . Proof. It follows from (2.2) and (2.5). Now we can prove our main result.

Theorem 2.1 The expected conditional response time in the BPS queue with the hyper-exponential job size distribution function as in (2.1) is given by : X ck X ck α(x) = c0 x − e−bk x + , bk bk k

α(0) = 0,

(2.8)

k

1 c0 = , 1−ρ ! Ã Q 2 2 b q (µq − bk ) Q , ck = 2λn bk q6=k (b2q − b2k )

(2.9)

k = 1, . . . , N,

(2.10)

2.3 The analysis of the Batch Arrival Processor Sharing model

21

where bk , k = 1, . . . , N are the solutions of the equation Ψ(s) = 0 and are all positive, distinct, real and satisfy inequalities (2.5).

Proof. Let us denote by Lα0 (s) the Laplace transform of α0 (x) and Li = Lα0 (µi ), i = 1, . . . , N . From (2.1), (2.3) : 0

α (x) = λn

X

Z −µi x

pi Li e

x

+ λn

α0 (y)B(x − y)dy + bB(x) + 1.

0

i

Taking the Laplace transform of the above equation and using the convolution property, we have

Lα0 (s)Ψ(s) = λn

X pi X pi Li 1 +b + . s + µi s + µi s i

i

Using the results of Lemma 2.1 we get : Q Q Q X X 1 k (s + µk ) k6=i (s + µk ) k6=i (s + µk ) Lα0 (s) = λn pi Li Q +b pi Q + Q . (2.11) s k (s + bk ) k (s + bk ) k (s + bk ) i

i

Hence there exist c0 and ck , k = 1, . . . , N such that : c0 X ck Lα0 (s) = + . s s + bk

(2.12)

k

Then, taking the inversion of the Laplace transform and using α(0) = 0, we get (2.8). From (2.11) and (2.12)

c0 = Lα0 (s)s|s=0 From (2.4) we have

Q µi = Qi . i bi

(2.13)

Q X pi b Q i i = Ψ(s)|s=0 = 1 − λn = 1 − ρ. µi i µi i

So, then for c0 we have (2.9). Let us nd ck , k = 1, . . . , N . We denote : X ci L∗α0 (s) = , s + bi

(2.14)

i

and L∗j = L∗α0 (µj ), j = 1, . . . , N . Using (2.11), (2.12) and (2.14), we can write Q X pi L∗ X pi (s + bi ) ∗ i Q = λn Lα0 (s) i +b . s + µi s + µi i (s + µi ) i

i

Multiplying the above equation by (s + µq ), setting s = −µq , q = 1, . . . , N and using (2.14) we get

X j

Q X cj (b − µq ) cj Q i i = λnpq + bpq , bj − µq i6=q (µi − µq ) bj + µq j

q = 1, . . . , N.

(2.15)

22

Chapter 2 : Batch Processor Sharing with Hyper-Exponential Service Time

Let us notice that from (2.4) we have the following

Q (b − µq ) Q i i = Ψ(s)(s + µq )|s=−µq = −λn pq , i6=q (µi − µq )

q = 1, . . . , N.

Then, using (2.15), we get

X

cj bj b = , 2 2λn − bj

µ2q

j

(2.16)

q = 1, . . . , N.

So, ck , k = 1, . . . , N are solutions of the linear system (2.16). If we denote

xk =

ck bk , b/(2λn)

k = 1, . . . , N,

then the system (2.16) will take the form (2.6) and by Lemma 2.2 for ck we have the statement (2.10). This completes the proof of Theorem 2.1.

Corollary 2.2 The expected conditional sojourn time function in the BPS system with the hyper-exponential job size distribution as in (2.1) is a strictly concave function.

Proof. The function (2.8) is a strictly concave function if α00 (x) = −

P

k ck bk e

−bk x

< 0. This

is true, as ck > 0, bk > 0, k = 1, . . . , N , which follows from b > 0, n > 0, Corollary 2.1 and Lemma 2.1. The result of Corollary 2.2 was also proven using another method in [KK08].

Remark 2.1 Let us denote by n(x) an average density of jobs still in the system which have received an amount of service equal to x . Then n(x)dx is the average number of jobs still in the system which received an amount of service between x and x + dx. From [Kle76a, Ch. 4], we have n(x)dx = λnB(x)α0 (x)dx.

As α0 (x) and B(x) are positive decreasing functions, n(x)dx is also a positive decreasing function. Then the average number of jobs which are still in the system and received an amount of service around x is decreasing with respect to the received amount of service. This property is not true for all queuing systems. In particular, as we will see later, it is not true for the TLPS system with the hyper-exponential job size distribution. Let us denote the expected sojourn time in the BPS system as T Let us prove the following theorem.

BP S

=

R∞ 0

α0 (x)B(x)dx.

2.4 The analysis of the Two Level Processor Sharing model BP S

Theorem 2.2 The expected sojourn time T

23

in the BPS system with the hyper-exponential

job size distribution as in (2.1) is given by T

BP S

=

X pi cj m + , 1−ρ µi + bj i,j

where the coecients bj , cj , j = 1, . . . , N are given as in Theorem 2.1.

Proof. As the expected sojourn time T T

BP S

BP S

is given by Z ∞ = α0 (x)B(x)dx, 0

then using (2.8) we get the statement of the theorem.

2.4 The analysis of the Two Level Processor Sharing model Let us study the TLPS scheduling discipline with the hyper-exponential job size distribution

F (x). Let F (x) = 1 − F (x). The jobs arrive to the system according to a Poisson process with rate λ. We give a detailed TLPS model description in the Section 3.3 of the later Chapter 3. Let θ > 0 be a given threshold. There are two queues in the system, low and high priority queues. Jobs in both queues are served with the PS discipline. In the high priority queue jobs are served until they receive θ of service, if after the job received θ amount of service it is still in the system, it waits in the low priority queue to be served. The low priority queue is served only when the high priority queue is empty, thus, we can consider the low priority queue as a queue with batch arrivals, see also [Kle76a, Sec. 4.7]. Let us denote by T

T LP S

(x) the expected conditional sojourn time in the TLPS system for

a job of size x and by T (θ) the expected sojourn time of the system. According to [Kle76a, Sec. 4.7] the expected conditional sojourn time of the system is given by :  x  , x ≤ θ,  1 − ρθ T LP S T (x) = Wθ + θ + α(x − θ)   , x > θ. 1 − ρθ Rθ Here Xθn = 0 ny n−1 F (y)dy is the n-th moment for the distribution truncated at θ and ρθ = λXθ1 is the utilization factor. According to [Kle76a, Sec. 4.7] the average batch size is n = F (θ)/(1 − ρθ ), the average number of jobs that arrive to the low priority queue in addition to the tagged job is b = 2λF (θ)(Wθ + θ)/(1 − ρθ ) and the mean workload in the system for the distribution truncated at θ is Wθ = λXθ2 /(2(1 − ρθ )). The only unknown term is α(x − θ), the mean conditional sojourn time in the low priority queue for the part of the job (x − θ). To nd expression of

α(x − θ) we apply the analysis of the BPS system to the low priority queue. Using the result

24

Chapter 2 : Batch Processor Sharing with Hyper-Exponential Service Time

of Theorem 2.1 in Section 2.3 we obtain the following result, which is used in [ABO07] and in Chapter 3.

Theorem 2.3 We consider the TLPS system with the hyper-exponential job size distribution. The mean conditional sojourn time for the low priority queue which is a PS queue with batch arrivals have the following expression : α(x) = c0 (θ)x −

X ck (θ) k

α(0) = 0, 1 − ρθ , c0 (θ) = 1−ρ  ck (θ) =

bk (θ)

Y

e−bk (θ)x +

X ck (θ) k

bk (θ)



(µ2q − b2k (θ))

 b   q=1,...,N  Y  , 2 2 2λn  bk (θ) (b (θ) − b (θ))  q

,

k = 1, . . . , N,

k

q6=k i λ P Fθ = 0, and satisfy where bi (θ), i = 1, . . . , N are the roots of the rational function 1 − 1−ρ i s+µ i θ the following inequalities : 0 < bN (θ) < µN , µi+1 < bi (θ) < µi , i = 1, . . . , N − 1. Here Fθi = pi e−µi θ , i = 1, . . . , N . The coecients ck (θ), k = 1, . . . , N are strictly positive for positive θ. The function α(x) is a strictly concave function with respect to the job size for positive θ.

Corollary 2.3 The expected conditional sojourn time T

T LP S

(x) in the TLPS queue with the

hyper-exponential job size distribution is a strictly concave function for x > θ, linear for x < θ and is not a concave function on the interval (0, ∞) with respect to the job sizes for positive values of θ.

Proof. The concavity of T

T LP S

(x) for x > θ follows from Theorem 2.3. The function T

is linear for x < θ, this follows from the standard PS model. As T

T LP S

T LP S

(x)

(x) is not continuous at

the point x = θ, it is also not concave on the interval (0, ∞). From Theorem 2.3 it follows that 1 < α0 (0) and then T

T LP S 0

(x)|x=θ−0 < T

T LP S 0

(x)|x=θ+0 .

Then for the TLPS system the average number of jobs which are still in the system and received an amount of service around x is not a decreasing and not even monotone function with respect to the received amount of service. Now let us give the expression of the mean sojourn time in the TLPS system, which we use in Chapter 3 of the thesis.

2.5 Conclusion

25

Theorem 2.4 The expected sojourn time T (θ) in the TLPS system with the hyper-exponential distribution function is given by the following equation : T (θ) = +

Xθ1 + Wθ F (θ) (m − Xθ1 ) + 1 − ρθ 1−ρ Q 2 2 Fθi (Wθ + θ) X q (µq − bj (θ)) Q , 1 − ρθ bj (θ)(µi + bj (θ)) q6=j (b2q (θ) − b2j (θ))

(2.17)

i,j

where bi (θ), i = 1, . . . , N are dened as in Theorem 2.3.

Proof. According to [Kle76a, Sec. 4.7] T (θ) =

Xθ1 + Wθ F (θ) 1 BP S + T (θ). 1 − ρθ 1 − ρθ

Then using the result of Theorem 2.3 we get the statement of the theorem.

2.5 Conclusion We study the BPS queueing model, when the job size distribution is hyper-exponential, and we nd an analytical expression of the expected conditional response time and for the expected sojourn time. We show that the function of the expected conditional sojourn time in the BPS system with hyper-exponential job size distribution is a concave function with respect to job sizes. We apply the results obtained for the BPS model to the TLPS scheduling mechanism with the hyper-exponential job size distribution and we nd the expressions of the expected conditional response time and expected response time for the TLPS model.

2.6 Appendix Lemma 2.2 : The solution of the system of linear equations (2.6) is unique and is given by (2.7).

Proof. Let x, 1 be two vectors of size N and D be the matrix of size N × N . x = [x1 , x2 , . . . , xN ]T , # " 1 D= µ2i − b2j

1 = [1, 1, . . . , 1]T |1×N

i,j=1,...,N

Then, system (2.6) can be rewritten as

Dx = 1.

26

Chapter 2 : Batch Processor Sharing with Hyper-Exponential Service Time

Applying Cramer's rule [Kur72] we obtain :

det Dk , k = 1, . . . , N, det D Dk = [D[1] , . . . D[k−1] , 1, D[k+1] , . . . D[N ] ].

(2.18)

xk =

Let us remember that the Cauchy matrix, see [Kur72], is a matrix of a form determinant is

·

1 det aj + ck

Q

¸N

1≤j≤k≤N [(aj − QN j,k=1 (aj

= j,k=1

ak )(cj − ck )] + ck )

h

iN 1 aj +ck j,k=1

and it

.

Then as D is a Cauchy matrix, its determinant is known and equals to Q 2 2 2 2 1≤j
(2.19)

Under the product sign by 1 ≤ j < k ≤ N we mean that we take all the combinations (j, k) such as 1 ≤ j < N , 1 < k ≤ N and j < k . By j, k = 1, . . . , N we mean that we take all the pairs (j, k) such as 1 ≤ j ≤ N , 1 ≤ k ≤ N . Due to (2.5), det D > 0 and we can use Cramer's rule to calculate xk . Let us nd det Dk .

det Dk = det [D[1] , . . . D[k−1] , 1, D[k+1] , . . . D[N ] ] = (−1)k−1 det [1, D[1] , . . . D[k−1] , D[k+1] , . . . D[N ] ]. To simplify the ensuing computations let us introduce the following notations :

βi = −b2i−1 ,

i = 2, . . . , k,

βi = −b2i ,

i = k + 1, . . . , N.

Let us notice that here βi , i = 1, . . . , N depend on the index k . As at this point the index k is xed we do not represent this dependency in the notation of βi . Then, we have

¯ ¯ ¯ det Dk = (−1)k−1 ¯ ¯

1 µ2 1 +β2

1 ... 1

...

1 µ2 +β2 N

1 µ2 1 +βN

... ... ...

...

1 µ2 +βN N

¯ ¯ ¯ ¯ ¯

N ×N

Under the sign of determinant we subtract the rst line from all the other lines. ¯ ¯ 1 1 1 ... ... ¯ ¯ 1 µ2 µ2 µ2 1 +β2 1 +βk 1 +βN ¯ ¯ 2 2 2 2 µ1 −µ2 µ1 −µ2 ¯ ¯ ... ... ... 0 2 +β ) 2 (µ2 ¯ 2 1 2 +βk )(µ1 +βk ) det Dk = (−1)k−1 ¯¯ . . . (µ22 +β2.)(µ ¯ .. ... ... ... ... ¯ ¯ 2 2 2 2 µ1 −µN µ1 −µN ¯ ¯ 0 (µ2 +β . . . . . . . . . )(µ2 +β ) (µ2 +β )(µ2 +β ) N

2

1

2

Y (−1)k−1 (µ21 − µ2k ) ¯ ¯ k=2,...,N ¯ Y det Dk = ¯ 2 ¯ (µ1 + βk ) k=2,...,N

N

... ... ...

k

1 µ2 2 +βk

...

1 µ2 +βk N

1

N ×N

k

... ... ...

¯ ¯ ¯ ¯ ¯ (N −1)×(N −1)

2.6 Appendix

27

So, as the above matrix under the sign of determinant is a Cauchy matrix of size N − 1, the following equation holds :

(−1)k−1 det Dk =

Y

Y ¡ ¢ (µ2j − µ2q )(βj − βq )

(µ21 − µ2q )

q=2,...,N

Y

2≤j
Y

(µ21 q=2,...,N

+ βq )

(µ2j + βq )

j,q=2,...,N

Let us recall that βi = −b2i−1 , i = 2, . . . , k and βi = −b2i , i = k + 1, . . . , N , then

(−1)k−1 det Dk =

Y

(µ2j − µ2q )

1≤j
Y

Y

(b2q − b2j )

1≤j
(µ2j − b2q )

j,q=1,...,N,q6=k

Y¡ ¢ Y 2 (µ2j − µ2q )(b2q − b2j ) (µj − b2k ) =

1≤j
Y

(µ2j j,q=1,...,N

−

Y

j=1,...,N

b2q ) (b2q − b2k ) q=1,...,N, q6=k

.

Finally, from (2.18) and (2.19), we have expression (2.7) for xk , which proves Lemma 2.2.

28

Chapter 2 : Batch Processor Sharing with Hyper-Exponential Service Time

Chapter 3 Optimal choice of threshold in Two Level Processor Sharing

3.1 Summary We analyze the TLPS scheduling discipline with the hyper-exponential job size distribution and with the Poisson arrival process. TLPS is a convenient model to study the benet of the le size based dierentiation in TCP/IP networks. In the case of the hyper-exponential job size distribution with two phases, we nd a closed form analytic expression for the expected sojourn time and an approximation for the optimal value of the threshold that minimizes the expected sojourn time. Using the NS-2 simulator we implement the TLPS algorithm in the router queue and provide simulation results for the case of two phase hyper-exponential job size distribution. We show that the found optimal threshold approximation value minimizes mean sojourn time in the TLPS system between other threshold values. We show that with the TLPS policy the relative gain in mean sojourn time in comparison with the DropTail policy is very near to the relative gain which can be reached using the optimal LAS policy and goes up to 36%. In the case of the hyper-exponential job size distribution with more than two phases, we derive a tight upper bound for the expected sojourn time conditioned on the job size. We show that when the mean of the job size distribution is xed and the variance increases, the gain in system performance increases and the sensitivity to the choice of the threshold near its optimal value decreases. The results of this chapter are published in [ABO07].

29

30

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

3.2 Introduction The Two Level Processor Sharing (TLPS) scheduling discipline was rst introduced by Kleinrock, see [Kle76a, Sec. 4.7]. It uses the dierentiation of jobs according to a threshold on the attained service and gives priority to the jobs with small sizes. The TLPS scheduling mechanism can be applied in le size based dierentiation in TCP/IP networks [AANO04, AABN04, FM03b] and Web server request dierentiation [GM02a, HBSBA03]. A detailed description of the TLPS discipline is presented in the ensuing chapter. Of course, TLPS provides a sub-optimal mechanism in comparison with SRPT, which minimizes the expected sojourn time, see [Sch68]. Also SRPT uses the knowledge about the size of the job. Nevertheless, as was shown in [AA06], when the job size distribution has a decreasing hazard rate, the performance of TLPS with appropriate choice of threshold is very close to optimal. In the present chapter we characterize the optimal value of the threshold when the service time is hyper-exponential. The motivation to study TLPS with the hyper-exponential service time is as follows. The distribution of le sizes in the Internet often can be modelled with a heavy-tailed distribution. It is known that heavy-tailed distributions can be approximated with hyper-exponential distributions with a signicant number of phases [BM06, FW98]. Also in [KSH03], it was shown that a hyper-exponential distribution models well the le size distribution in the Internet. In [KSH03] authors propose an ecient algorithm to approximate heavy-tailed distributions with hyper-exponential distributions with many phases. The chapter organization and main results are as follows. In Section 3.3 we provide the model formulation, main denitions and equations. In Section 3.4 we study the TLPS discipline in the case of the hyper-exponential job size distribution with two phases. It is known that the Internet connections belong to two distinct classes with very dierent sizes of transfer. The rst class is composed of short HTTP connections and P2P signaling connections. The second class corresponds to downloads (PDF les, MP3 les, MPEG les, etc.), see Subsection 1.1.3. This fact provides motivation to consider rst the hyper-exponential job size distribution with two phases. We nd an analytic expression for the expected sojourn time in the TLPS system. Then, we present the approximation of the optimal threshold which minimizes the expected sojourn time. We show that the approximated value of the threshold tends to the optimal threshold when the second moment of the job size distribution function goes to innity. We show that the ratio between the expected sojourn time of the TLPS system and the expected sojourn time of the standard PS system can be arbitrary small for very high loads. For realistic loads this ratio can reach 1/2. We also show that the system performance is not too sensitive to the choice of the threshold around its optimal value. In [AABN04] authors provide the scheduling algorithm, RuN2C, which is based on the

3.3 Model description

31

TLPS policy and uses packets sequence numbers to schedule packets. Using NS-2 simulator we implement the TLPS algorithm in the router queue, which schedules packets according to the attained service of every connection presented in the system. For that we keep the trace of the connection's attained service until there are no more packets from the connection in the queue. We provide the simulation results for the dierent values of the threshold and show that the analytically found threshold approximation minimizes mean sojourn time in the TLPS system. We compare the mean sojourn time in the system when the bottleneck queue is scheduled with TLPS, LAS and DropTail policies. We found that the relative gain of the TLPS policy with the approximated value of the optimal threshold can achieve up to 36% in comparison with the DropTail policy and is very close to the relative gain achieved with the optimal LAS policy in comparison with the DropTail policy. In Section 3.5 we analyze the TLPS discipline when the job size distribution is hyperexponential with many phases. We provide an expression of the expected conditional sojourn time as the solution of a system of linear equations. Also we apply an iteration method to nd the expression of the expected conditional sojourn time and using the resulting expression obtain an explicit and tight upper bound for the expected sojourn time function. In the experimental results we show that the relative error of the latter upper bound with respect to the expected sojourn time function is 6-7%. We study the properties of the expected sojourn time function when the parameters of the job size distribution function are selected in such a way that with the increasing number of phases the variance increases. We show that when the variance of the job size distribution increases the gain in system performance increases and the sensitivity of the system to the selection of the approximate optimal threshold value decreases. We put some technical proofs in the Appendix.

3.3 Model description 3.3.1 Main denitions We study the TLPS scheduling discipline with the hyper-exponential job size distribution. The jobs arrive to the system according to a Poisson process with rate λ. We measure the job size in time units. Specically, as the job size we dene the time which would be spent by the server to treat the job if there were no other jobs in the system. Let θ > 0 be a given threshold. When a new job arrives to the system, it goes to the high priority queue, where it is served until it receives the amount of service θ. If the job is still in the system and needs more service than θ, the rest of the job, which is not yet served, goes to the low priority queue. So, the jobs which attain an amount of service more than θ are accumulated in the low priority queue. The low priority queue is served when the high priority queue is empty.

32

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

Both queues are served according to the PS discipline, namely, the server equally divides its capacity among all jobs present in the queue. When the high priority queue is empty, the jobs which are accumulated in the low priority queue arrive to the server in a batch. Thus, we can consider the low priority queue as a queue with batch arrivals, see also [Kle76a, Sec. 4.7]. Let us denote the job size distribution by F (x). By F (x) = 1 − F (x) we denote the compleR∞ mentary distribution function. The mean job size is given by m = 0 xdF (x) and the system load is ρ = λm. We assume that the system is stable (ρ < 1) and is in steady state. It is known that many important probability distributions associated with network trac are heavy-tailed. In particular, the le size distribution in the Internet is heavy-tailed. A distribution function has a heavy tail if e²x (1 − F (x)) → ∞ as x → ∞, ∀² > 0. The heavy-tailed distributions are not only important and prevalent, but also dicult to analyze. Often it is helpful to have the Laplace transform of the job size distribution. However, there is evidently no convenient analytic expression for the Laplace transforms of the Pareto and Weibull distributions, the most common examples of heavy-tailed distributions. In [BM06, FW98], [FW98], it was shown that it is possible to approximate heavy-tailed distributions by hyper-exponential distributions with a signicant number of phases. A hyper-exponential distribution FN (x) is a convex combination of N exponents, 1 ≤ N ≤ ∞, namely,

FN (x) = 1 −

N X

pi e−µi x ,

µi > 0,

pi ≥ 0,

i = 1, . . . , N,

and

i=1

N X

pi = 1. (3.1)

i=1

In particular, we can construct a sequence of hyper-exponential distributions such that it converges to a heavy-tailed distribution [BM06]. For instance, if we select ν η pi = γ1 , µi = γ2 , i = 1, . . . , N, i i γ1 − 1 < γ2 < γ1 − 1, γ1 > 1, 2 P P where ν = 1/ i=1,...,N i−γ1 , η = ν/m i=1,...,N iγ2 −γ1 , then the rst moment of the job size distribution is nite, but the second moment goes to innity when N → ∞. The rst and the second moments m and d for the hyper-exponential distribution are given by : Z ∞ Z ∞ N N X X pi pi 2 m= x dF (x) = , d= x dF (x) = 2 2. µ µ i 0 0 i i=1

(3.2)

i=1

Let us denote

Fθi = pi e−µi θ , We note that

PN

i i=1 Fθ

i = 1, . . . , N.

(3.3)

= F (θ). The hyper-exponential distribution has a simple Laplace trans-

form : N X pi µi . LF (x) (s) = s + µi i=1

3.3 Model description

33

We would like to note that the hyper-exponential distribution has a decreasing hazard rate. In [AA06] it was shown that when a job size distribution has a decreasing hazard rate, then with an appropriate selection of the threshold the expected sojourn time of the TLPS system can be made close to optimal. Thus, in our work we use hyper-exponential distributions to represent job size distribution functions. In the rst part of the current chapter we look at the case of the hyper-exponential job size distribution with two phases and in the second part of the chapter we study the case of more than two phases.

3.3.2 The expected sojourn time in the TLPS system Let us denote by T (x) the expected conditional sojourn time in the TLPS system for a job of size x. Of course, T (x) also depends on θ, but for expected conditional sojourn time we only emphasize the dependence on the job size. On the other hand, we denote by T (θ) the overall expected sojourn time in the TLPS system. Here we emphasize the dependence on θ as later we shall optimize the overall expected sojourn time with respect to the threshold value. To calculate the expected sojourn time in the TLPS system we need to calculate the time spent by a job of size x in the high priority queue and in the low priority queue. For the jobs with size x ≤ θ the system will behave as the standard PS system where the service time distribution is truncated at θ. Let us denote by Z θ Z Xθn = y n dF (y) + θn F (θ) = 0

θ

ny n−1 F (y)dy

0

the n-th moment of the distribution truncated at θ. The distribution truncated at θ equals to to F (x) for x ≤ θ and equals to 1 when x > θ. In the following sections we will need

Xθ1 = m −

N X Fi θ

i=1

µi

,

Xθ2 = 2

N N N X X X Fθi Fθi pi − 2 − 2θ . µ µ2 µ2 i=1 i i=1 i i=1 i

(3.4)

The utilization factor for the truncated distribution is

ρθ =

λ Xθ1

N X Fθi =ρ−λ . µi

(3.5)

i=1

Then, the expected conditional response time is given by  x  , x ≤ θ,  1 − ρθ T (x) = Wθ + θ + α(x − θ)   , x > θ. 1 − ρθ Here Wθ is the mean workload in the system for the jobs with service times truncated at θ and according to the Pollaczek-Khinchin formula equals to

Wθ =

λXθ2 . 2(1 − ρθ )

34

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing According to [Kle76a, Sec.4.7], θ/(1 − ρθ ) expresses the time spent in the high priority queue,

where the ow is served up to the threshold θ and Wθ /(1 − ρθ ) is the time spent waiting for the high priority queue to empty. The remaining term α(x − θ) is the mean conditional sojourn time of the part of the job x − θ spent in the low priority queue. According to Kleinrock [Kle76a, Sec.4.7] the low priority queue can be interpreted as an interrupted PS queue with batch arrivals. Then, α0 (x) = dα/dx is the solution of the following integral equation Z ∞ Z x 0 0 α (x) = λn α (y)B(x + y)dy + λn α0 (y)B(x − y)dy + bB(x) + 1. 0

0

(3.6)

Here n is the average batch size, B(x) is the complementary truncated distribution and b = b(θ) is the average number of jobs that arrive to the low priority queue in addition to the tagged job. The expressions for parameters n, b(θ) are explicitly explained in [Kle76a, Sec.4.7] and equal to

F (θ + x) , F (θ) F (θ) n= , (1 − ρθ ) 2λF (θ)(Wθ + θ) b(θ) = . (1 − ρθ ) B(x) =

The expected sojourn time in the system is given by the following equations : Z ∞ T (θ) = T (x)dF (x), 0

X 1 + Wθ F (θ) 1 BP S T (θ) = θ + T (θ), 1 − ρθ 1 − ρθ Z ∞ Z ∞ BP S T (θ) = α(x − θ)dF (x) = α0 (x)F (x + θ)dx. θ

0

(3.7) (3.8)

Using the analysis of the PS queue with batch arrivals in the previous Chapter 2, Section2.4 we obtained an expression of the mean sojourn time in the TLPS system when the job size distribution is hyper-exponential, see (2.17).

3.4 Hyper-exponential job size distribution with two phases 3.4.1 Notation and motivation In the rst part of our work we consider the hyper-exponential job size distribution with two phases. In particular, the application of the hyper-exponential job size distribution with two phases is motivated by the fact that in the Internet TCP connections belong to two distinct classes with very dierent sizes of transfer. The rst class is composed of short HTTP connections and P2P signaling connections. The second class corresponds to downloads (PDF les, MP3 les,

3.4 Hyper-exponential job size distribution with two phases

35

MPEG les, etc.). We discuss this problem more in the Introduction of the present Thesis, see Section 1.1.3. According to (3.1) the cumulative distribution function F (x) for N = 2 is given by

F (x) = 1 − p1 e−µ1 x − p2 e−µ2 x , where p1 + p2 = 1 and p1 , p2 > 0. The mean job size m, the second moment d, the parameters

Fθi , Xθ1 , Xθ2 and ρθ are dened as in Section 3.3.1 and Section 3.3.2 by formulas (3.2), (3.3), (3.4), (3.5) with N = 2.

3.4.2 Explicit form for the expected sojourn time To nd the expression of T (θ) we use the result we obtained in the previous Chapter 2, Section 2.4, Theorem 2.4 and so prove the following Theorem.

Theorem 3.1 The expected sojourn time in the TLPS system with the hyper-exponential job size distribution with two phases is given by T (θ) =

Xθ1

³ ´ 1 )2 + δ (θ)F 2 (θ) b(θ) µ µ (m − X 1 2 ρ θ + Wθ F (θ) m − ¡ ¢ , + + 1 − ρθ 1−ρ 2(1 − ρ)F (θ) µ1 + µ2 − γ(θ)F (θ) Xθ1

(3.9)

where δρ (θ) = 1 − γ(θ)(m − Xθ1 ) = (1 − ρ)/(1 − ρθ ) and γ(θ) = λ/(1 − ρθ ).

Proof. As we found in the previous Chapter 2, Section 2.4, Theorem 2.4, T (θ) = +

Xθ1 + Wθ F (θ) (m − Xθ1 ) + 1 − ρθ 1−ρ Q 2 2 Fθi (Wθ + θ) X q (µq − bj (θ)) Q , 1 − ρθ bj (θ)(µi + bj (θ)) q6=j (b2q (θ) − b2j (θ))

(3.10)

i,j

where bi (θ) are the roots of the rational function Ψ(s) = 1 −

λ 1−ρθ

P

Fθi i s+µi

= 0. Let us dene

δρ (θ) = (1 − ρ)/(1 − ρθ ) and γ(θ) = λ/(1 − ρθ ). Then for the case of two phase job size distribution function Ψ(s) equals to

Ψ(s) =

s2 + s(µ1 + µ2 − γ(θ)F (θ)) + µ1 µ2 δρ (θ) (s + µ1 )(s + µ2 )

and has two roots, −b1 (θ) and −b2 (θ), which are the solutions of the square equation s2 + s(µ1 +

µ2 − γ(θ)F (θ)) + µ1 µ2 δρ (θ) = 0. Then we know that b1 (θ) + b2 (θ) = µ1 + µ2 − γ(θ)F (θ),

(3.11)

b1 (θ)b2 (θ) = µ1 µ2 δρ (θ).

(3.12)

36

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

Simplifying expression (3.10) and using (3.11) and (3.12) we get expression (3.9) and so prove the statement of the Theorem. The same result can be obtained using the Laplace transform based method described in [Ban03].

3.4.3 Optimal threshold approximation We are interested in the minimization of the expected sojourn time function T (θ) with respect to θ. Of course, one can dierentiate the exact analytic expression provided in Theorem 3.1 and set the result of the dierentiation to zero. However, this will give a transcendental equation for the optimal value of the threshold. In order to nd an approximate solution of 0 0 e 0 (θ) and obtain T (θ) = dT (θ)/dθ = 0, we approximate the derivative T (θ) by some function T e 0 (θ˜ ) = 0. a solution for T opt Since in the Internet connections belong to two distinct classes with very dierent sizes of 0

transfer (see Section 3.3.1), then to nd the approximation of T (θ) we consider a particular case when

1 µ1

<<

1 µ2 .

Let us introduce a small parameter ² such that

µ2 = ²µ1 ,

p1 = 1 −

² (mµ1 − 1) , 1−²

p2 =

² (mµ1 − 1) . 1−²

We note that when ² → 0 the second moment of the job size distribution goes to innity. We note that the system has four free parameters. In particular, if we x µ1 , ², m, and ρ, the other parameters µ2 , p1 , p2 and λ will be functions of the former parameters.

Lemma 3.1 The following inequality holds : µ1 ρ > λ. Proof. Since p1 > 0 and p2 > 0, then mµ1 > 1 and m > we get

ρ λ

>

1 µ1 .

1 µ1 .

Taking into account that λm = ρ,

Consequently, we have that µ1 ρ > λ.

Proposition 3.1 The derivative of T (θ) can be approximated by the following function : e 0 (θ) = −e−µ1 θ µ c + e−µ2 θ µ c , T 1 1 2 2

where c1 =

ρ(mµ1 − 1) , µ1 (mµ1 − ρ)(1 − ρ)

c2 =

ρm(mµ1 − 1) . (mµ1 − ρ)2

Namely, 0 e 0 (θ) = O(µ /µ ). T (θ) − T 2 1

(3.13)

3.4 Hyper-exponential job size distribution with two phases

37

0 Proof. Using the analytic expression for both T (θ) and Te 0 (θ), we get the Taylor series for 0 e 0 (θ) with respect to ², which shows that indeed T (θ) − T

0 e 0 (θ) = O(²). T (θ) − T

Thus we have found an approximation of the derivative of T (θ). Now we can nd an ape 0 (θ) = 0. proximation of the optimal threshold by solving the equation T

Theorem 3.2 Let θopt denote the optimal value of the threshold. Namely, θopt = arg min T (θ). The value θ˜opt given by θ˜opt =

1 ln µ1 − µ2

µ

(µ1 − λ) µ2 (1 − ρ)

¶ (3.14)

0 approximates θopt so that T (θ˜opt ) = o(µ2 /µ1 ).

Proof. Solving the equation e 0 (θ) = 0, T we get an analytic expression for the approximation of the optimal threshold : µ ¶ 1 µ1 (1 − ρ) θeopt = − ln ² µ1 (1 − ²) (µ1 − λ) µ ¶ (µ1 − λ) 1 ln . = µ1 − µ2 µ2 (1 − ρ) Let us show that the above threshold approximation is greater than zero. We have to show that (µ1 −λ) µ2 (1−ρ)

> 1. Since µ1 > µ2 and µ1 ρ > λ (see Lemma 3.1), we have µ1 > µ2 =⇒

µ1 (1 − ρ) > µ2 (1 − ρ)

=⇒

λ < µ1 ρ < µ1 − µ2 (1 − ρ)

=⇒

(µ1 − λ) > µ2 (1 − ρ).

0 Expanding T (θeopt ) as a power series with respect to ² gives : 0 T (θeopt ) = ²2 (const0 + const1 ln ² + const2 ln2 ²),

where consti , i = 1, 2 are some constant values with respect to ² 1 . 1. The expressions of consti are cumbersome and can be found using Maple command series.

38

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing Thus, 0 T (θeopt ) = o(²) = o(µ2 /µ1 ),

which completes the proof. From formula (3.14) we can see that θeopt is of the same order as 1/µ1 ln (1/²). As a consequence θeopt goes to innity when ² → 0. Also formula (3.14) indicates that the value of the threshold should be chosen between 1/µ1 and 1/µ2 . In the next proposition we characterize the limiting behavior of T (θopt ) and T (θeopt ) as ² → 0. In particular, we show that T (θeopt ) tends to the exact minimum of T (θ) when ² → 0.

Proposition 3.2 lim T (θopt ) = lim T (θeopt ) =

²→0

²→0

m − c1 , 1−ρ

where c1 is given by (3.13).

Proof. We nd the following limit as ² → 0 : lim T (θ) =

²→0

m − c1 + c1 e−µ1 θ , 1−ρ

where c1 is given by (3.13). Since lim²→0 T (θ) is a decreasing function, the optimal threshold for it is θopt = ∞. Thus,

lim T (θopt ) = lim lim T (θ) =

²→0

θ→∞ ²→0

m − c1 . 1−ρ

On the other hand, we obtain

lim T (θeopt ) =

²→0

m − c1 , 1−ρ

which proves the proposition. Let us denote by g(ρ) =

T

PS

−T (θeopt ) T

PS

the relative gain of the TLPS system with the optimal

threshold approximation (3.14) with respect to the PS system. In the next proposition we study the limiting behavior of g(ρ) when ² → 0 and when the load of the system ρ → 1.

Proposition 3.3 The gain of the TLPS system with the value of the threshold as in (3.14) according to the standard PS system has the following properties : lim g(ρ) =

²→0

T

PS

− lim²→0 T (θeopt )

T 1 lim lim g(ρ) = . ρ→1 ²→0 mµ1

PS

The limit lim²→0 g(ρ) is an increasing function of ρ.

=

ρ(mµ1 − 1) , mµ1 (mµ1 − ρ)

3.4 Hyper-exponential job size distribution with two phases

39

Proof. Follows from the previous derivations. One can see that the limit lim²→0 g(ρ) can be made arbitrarily close to one by choosing µ1 suciently close to 1/m and the load suciently close to one. This in turn implies that the ratio PS lim²→0 T (θeopt )/T can be made as close to zero as one wants. This is a striking result as it shows that the performance of the TLPS system can be arbitrarily better than the performance of the PS system for some selection of the parameters. However, in the next section with the numerical results we show that this set of parameters is very small and for realistic parameters the gain is in the order of 50%.

3.4.4 Numerical results For plots in Figures 3.1-3.2 we use the following parameters : ρ = 0.909, m = 1.818, µ1 = 1,

µ2 = 0.1, so λ = 0.5 and ² = µ2 /µ1 = 0.1. Then, p1 = 0.909 and p2 = 0.0909. PS In Figure 3.1 we plot T (θ), T and T (θeopt ). We note that the expected sojourn time in PS the standard PS system T is equal to T (0). We observe that T (θeopt ) corresponds well to the optimum even though ² = 0.1 is not too small. Let us now study the gain that we obtain using TLPS, by setting θ = θeopt , in comparison with the standard PS. To this end, we plot the ratio g(ρ) =

T

PS

−T (θeopt ) T

PS

in Figure 3.2. The gain

in the system performance with TLPS in comparison with PS strongly depends on ρ, the load of the system. One can see that the gain of the TLPS system with respect to the standard PS system goes up to 45% when the load of the system increases. To study the sensitivity of the TLPS system with respect to θ, we plot in Figure 3.2 the PS PS T −T ( 32 θeopt ) T −T ( 21 θeopt ) ratios g1 (ρ) = and g2 (ρ) = . Thus, even with the 50% error of the θeopt PS PS T

T

value, the system performance is close to optimal. One can see that it is benecial to use TLPS instead of PS in case of heavy and moderately heavy loads. We also observe that the TLPS system is not too sensitive to the choice of the threshold near its optimal value, when the job size distribution is hyper-exponential with two phases. Nevertheless, it is better to choose larger rather than smaller values of the threshold.

3.4.5 Simulation results Using NS-2 simulator we implemented the algorithm based on the TLPS scheduling scheme and provide experimental results for the case of two phase hyper-exponential job size distribution. The algorithm is implemented in the router queue. In the router we keep the trace of the attained service of all ows in the system. The trace is kept during some time after which the router does not receive more packets from this ow. Every time to dequeue the packet from the bottleneck router, the router checks, if the rst packet in the queue belongs to the ow which already received θ amount of service. If the ow did not receive θ amount of service, then the

40

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

0.45

20

)

g(ρ) g1(ρ) g2(ρ)

0.4

18 16

0.35

14

0.3

opt

12

0.25

10 0.2

8 0.15

6 0.1

4 0.05

2 0 0

5

10

15

20

θ

Figure 3.1 T (θ) - solid line, T T (θeopt ) - dash line.

25

PS

30

35

0 0

40

- dash dot line,

0.1

0.2

0.3

0.4

0.5

ρ

0.6

0.7

0.8

0.9

1

Figure 3.2 g(ρ) - solid line, g1 (ρ) - dash line, g2 (ρ) - dash dot line.

packet is served, else, the next packet is considered. If in the queue there are no packets which belong to the ows which did not yet received θ amount of service, the rst packet in the queue is served. In [AABN04] authors provide RuN2C, the scheduling algorithm based on the TLPS scheme. RuN2C takes the decision of the packet service according to the packet sequence number. In the current work we do not use the packets sequence numbers to take the scheduling decision, but we keep the track of the attained service for every ow in the system. The simulation topology is the following. The les are generated by the FTP sources which are connected to the TCP senders. All TCP senders send les to the TCP destination nodes using the same bottleneck link. Every FTP source belongs to one of two sending classes in the system. Each class i, i = 1, 2 sends les with Poisson process with rate λi and has the exponential le size distribution with mean mi . We consider that all connection have the same propagation delays. The bottleneck capacity is µ. We apply the TLPS scheduling algorithm to schedule the packets in the queue of the bottleneck link. The proposed scheme is equivalent to the case of hyper-exponential job size distribution with two phases, where λ = λ1 + λ2 , p1 = λ1 /λ, p2 = λ2 /λ, 1/µ1 = m1 /µ, 1/µ2 = m2 /µ,

ρ = λ(p1 /µ1 + p2 /µ2 ). Then we can use the approximated value of the optimal threshold given by (3.14)

θ˜opt =

1 ln µ1 − µ2

µ

(µ1 − λ) µ2 (1 − ρ)

¶ .

After we found the approximated value of the optimal threshold for the analytical model, we

3.4 Hyper-exponential job size distribution with two phases

41

have to multiply θ˜opt by the bottleneck capacity to get the real threshold value we use in the simulations, θ˜simul = θ˜opt µ. opt

For the simulations we select the following system parameters. The bottleneck capacity

µ=100 Mbit/s. All the connections have a Maximum Segment Size (MSS) of 540 B. The propagation delay of every link equals to 2 ms. The duration of every simulation is 2000 s. Other system parameters and the approximated value of the optimal threshold are given in Table 3.1. In the current simulation model the short ows take ρ1 = 0.25 and the long ows ρ2 = 0.61 of the total bottleneck capacity. The total load in the system is ρ = 0.86.

m1

m2

λ1

λ2

ρ1

ρ2

ρ

simul θ˜opt

1157 MSS

11574 MSS

5.0

1.22

0.25

0.61

0.86

7638 MSS

Table 3.1 Simulation parameters We compare the mean sojourn time in the system under TLPS, LAS and DropTail policies. For the TLPS policy we provide the simulation results for dierent values of θ, which is varied from 1157 MSS to 80000 MSS. The results are presented in Figure 3.3. As one can see the found approximated value of the optimal threshold θ˜simul =7.6x103 MSS minimizes the mean sojourn opt

time in the TLPS system between other threshold values. The mean sojourn time in the TLPS system is very close to the optimal value of the mean sojourn time achieved with the LAS policy. The maximal achieved relative gain with the TLPS policy when θ = θ˜simul in comparison with opt

the DropTail policy equals to 35.7%, while the relative gain with the optimal LAS policy in comparison with the DropTail policy is 36.7%. 1.5

1.4

DropTail

1.3

1.2

1.1 TLPS 1

0.9

0.8

LAS

1.6 7.6 14 20

34 θ MSS

46

70

x103

Figure 3.3 Mean response time in the system ( s) : TLPS - solid line with stars, DropTail - dash line, LAS - dash dot line.

42

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

3.5 Hyper-exponential job size distribution with more than two phases 3.5.1 Notation and motivation In the second part of the present work we analyze the TLPS discipline with the hyperexponential job size distribution with more than two phases. Using hyper-exponential distribution with more than two phases we obtain a more realistic representation of the le size distribution in the Internet. In particular it was shown in [BM06, KSH03, FW98] that the hyper-exponential distribution with a signicant number of phases models well the le size distribution in the Internet. Thus, we will use

F (x) = 1 −

N X

pi e−µi x ,

i=1

N X

pi = 1,

µi > 0, pi ≥ 0,

i = 1, . . . , N,

1 < N ≤ ∞.

i=1

It appears that in case of many phases nding an explicit expression for the optimal threshold value is quite a challenging problem. In order to deal with a general hyper-exponential distribution we proceed with the derivation of a tight upper bound on the expected sojourn time function. The upper bound has a simple expression in terms of the system parameters and can lend itself to ecient numerical optimization. P P In the following we write simply i instead of N i=1 . The mean job size m, the second moment d, the parameters Fθi , Xθ1 , Xθ2 and ρθ are dened as in Section 3.3.1 and Section 3.3.2 by formulas (3.2), (3.3), (3.4), (3.5) for any 1 ≤ N ≤ ∞. The formulas presented in Section 3.3.2 can still be used to calculate b(θ), B(x), Wθ , γ(θ), δρ (θ),

T

T LP S

(x), T (θ). We also need the following operator notations : Z x Z ∞ Φ1 (β(x)) = γ(θ) β(y)F (x + y + θ)dy + γ(θ) β(y)F (x − y + θ)dy, 0 Z ∞ 0 Φ2 (β(x)) = β(y)F (y + θ)dy, 0

for any function β(x). In particular, for some given constant c, we have

Φ1 (c) = c γ(θ)(m − Xθ1 ) = c q,

(3.15)

Φ2 (c) = c (m − Xθ1 ),

(3.16)

where

q = γ(θ)(m − Xθ1 ) =

λ(m − Xθ1 ) ρ − ρθ = < 1. 1 − ρθ 1 − ρθ

(3.17)

The integral equation (3.6) can now be rewritten in the form

α0 (x) = Φ1 (α0 (y))+

b(θ) F (x + θ) + 1 F (θ)

(3.18)

3.5 Hyper-exponential job size distribution with more than two phases and equation (3.8) for T

BP S

43

(θ) takes form T

BP S

(θ) = Φ2 (α0 (x)).

(3.19)

3.5.2 Linear system based solution Using the Laplace transform based method described in [Ban03] we prove the following proposition.

Proposition 3.4 The following formula holds : T

BP S

(θ) =

X

Fθi Li ,

(3.20)

i

with Li = L∗i +

1 , δρ (θ)µi

i = 1, . . . , N ,

where the L∗i , i = 1, . . . , N are the solution of the linear system ! Ã X F i L∗ X Fi b(θ) X Fθi ∗ θ θ i = γ(θ) + , Lp 1 − γ(θ) µp + µi µp + µi F (θ) µp + µi i

i

Proof. To nd T

BP S

p = 1, . . . , N.

(3.21)

i

(θ) we need to solve the integral equation (3.6). Let us recall that γ(θ) =

λ/(1 − ρθ ), then we can rewrite (3.6) in the following way Z ∞ Z x 0 0 α (y)F (x + y + θ)dy + γ(θ) α0 (y)F (x − y + θ)dy + b(θ)B(x) + 1, α (x) = γ(θ) 0 0 Z ∞ Z x X 0 −µi x 0 −µi y i α (x) = γ(θ) Fθ e α (y)e dy + γ(θ) α0 (y)F (x − y + θ)dy + b(θ)B(x) + 1. 0

i

0

R∞

α0 (y)e−µi y dy , i = 1, . . . , N are the Laplace transforms of R∞ α0 (y) evaluated at µi , i = 1, . . . , N . Denote by Lα0 (s) = 0 α0 (x)e−sx dx the Laplace transform

We note that in the latter equation

0

of α0 (x) and let Li = Lα0 (µi ), i = 1, . . . , N . Then, we have Z x X α0 (x) = γ(θ) Fθi Li e−µi x + γ(θ) α0 (y)F (x − y + θ)dy + b(θ)B(x) + 1. 0

i

Now taking the Laplace transform of the above equation and using the convolution property, we get

X F i Li X F i Lα0 (s) b(θ) X Fθi 1 θ θ + γ(θ) + + s + µi s + µi s F (θ) i s + µi i i ! Ã X F i Li X Fi b(θ) X Fθi 1 θ θ = γ(θ) + + . Lα0 (s) 1 − γ(θ) s + µi s + µi F (θ) s + µi s Lα0 (s) = γ(θ)

=⇒

i

i

i

44

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

Then, we substitute into the above equation s = µi , i = 1, . . . , N and get Li , i = 1, . . . , N as a solution of the linear system ! Ã X Fi X F i Li b(θ) X Fθi 1 θ θ + + , Lp 1 − γ(θ) = γ(θ) µp + µi µp + µi F (θ) µp + µi µ p i i i If now we set Lp = L∗p +

1 δρ (θ)µp ,

p = 1, . . . , N.

p = 1, . . . , N , then L∗p are the solutions of the linear system BP S

(3.21). Next we need to calculate T (θ). Z ∞ Z BP S 0 T (θ) = α (x)F (x + θ)dx = 0

∞

α0 (x)

X

0

Fθi e−µi x dx =

i

X

Fθi Li .

i

Finally, we have (3.20). Unfortunately, the system (3.21) does not seem to have a tractable nite form analytic solution. Therefore, in the ensuing subsections we propose an alternative solution based on an operator series and construct a tight upper bound.

3.5.3 Operator series form for the expected sojourn time Since the operator Φ1 is a contraction [AABN04, AAB05], we can iterate equation (3.18) starting from some initial point α00 . The initial point can be simply a constant. As shown in [AABN04, AAB05] the iterations converge to the unique solution of (3.18). Specically, we make iterations in the following way : 0 αn+1 (x) = Φ1 (αn0 (x)) +

b(θ) F (x + θ) + 1, F (θ)

n = 0, 1, 2, . . . .

At every iteration step we construct the following approximation of T

BP S

(3.22)

(θ) according to (3.19) :

BP S

0 T n+1 (θ) = Φ2 (αn+1 (x)).

(3.23)

Using (3.22) and (3.23), we construct the operator series expression for the expected sojourn time in the TLPS system.

Theorem 3.3 The expected sojourn time T (θ) in the TLPS system with the hyper-exponential job size distribution is given by

! Ã∞ X ¡ ¢ Xθ1 + Wθ F (θ) m − Xθ1 b(θ) T (θ) = Φ2 Φi1 (F (x + θ)) . (3.24) + + 1 − ρθ 1−ρ F (θ)(1 − ρθ ) i=0

Proof. From (3.22) we have αn0 (x)

=

q n α00

n−1 X

n−1

b(θ) X i b(θ) Φ1 (F (x + θ)) + + q + F (x + θ) + 1, F (θ) i=1 F (θ) i=1 i

3.5 Hyper-exponential job size distribution with more than two phases

45

and then from (3.23) and (3.15) it follows that

Ã BP S T n (θ)

= (m − Xθ1 ) q n α00 +

n−1 X

!

b(θ) + F (θ)

qi

i=0

Ã Φ2

Ãn−1 X

!! Φi1 (F (x + θ))

.

i=0

Using the facts (see (3.17)) :

1. q < ρ < 1 =⇒ q n → 0 as n → ∞, ∞ X 1 1 − ρθ 2. qi = = , 1−q 1−ρ i=0

we conclude that

T

BP S

(θ) =

BP S lim T (θ) n→∞ n

µ = (m −

Xθ1 )

1 − ρθ 1−ρ

¶

b(θ) + F (θ)

Ã

∞ X

! ¡ i ¢ Φ2 Φ1 (F (x + θ)) .

i=0

Finally, using (3.7) we obtain (3.24). The resulting formula (3.24) is still dicult to analyze. Therefore, in the next subsection using (3.24) we nd an approximation, which is also an upper bound, of the expected sojourn time function in a more explicit form.

3.5.4 Upper bound for the expected sojourn time Let us start with auxiliary results.

Lemma 3.2 For any function β(x) ≥ 0 with βj = if

d(βj µj ) ≥ 0, dµj

j = 1, . . . , N

R∞ 0

β(x)e−µj x dx,

it follows that

Φ2 (Φ1 (β(x))) ≤ qΦ2 (β(x)) .

Proof. See Appendix. Lemma 3.3 For the TLPS system with the hyper-exponential job size distribution the following statement holds : ¡ ¢ ¡ ¢ Φ2 Φ1 (α0 (x)) ≤ qΦ2 α0 (x) .

Proof. We dene αj0 =

R∞ 0

(3.25)

α0 (x)e−µj x dx, j = 1, . . . , N . As was shown in [Osi08a], α0 (x) has

the following structure :

α0 (x) = a0 +

X k

ak e−bk x ,

a0 ≥ 0, ak ≥ 0, bk > 0,

k = 1, . . . , N.

46

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

Then, we have that α0 (x) ≥ 0 and

αj0 =

a0 X ak + , µj bk + µj

j = 1, . . . , N,

k

X ak µj X ak bk d(αj0 µj ) X ak = − = ≥ 0, 2 dµj bk + µj (bk + µj ) (bk + µj )2

=⇒

k

as ak ≥ 0, bk > 0,

k

j = 1, . . . , N,

k

k = 1, . . . , N . So, then, according to Lemma 3.2 we have (3.25).

Let us state the following Theorem :

Theorem 3.4 An upper bound for the expected sojourn time function T (θ) in the TLPS system with the hyper-exponential job size distribution function with many phases is given by Υ(θ) : X Fi Fj Xθ1 + Wθ F (θ) m − Xθ1 b(θ) θ θ T (θ) ≤ Υ(θ) = + + . 1 − ρθ 1−ρ F (θ)(1 − ρ) i,j µi + µj

(3.26)

Proof. According to the recursion (3.22), we consider αe0 (x) as a candidate for the approximation of α0 (x). Namely, α e0 (x) satises the following equation :

α e0 (x) = α e0 (x)Φ1 (1) +

b(θ) F (x + θ) + 1. F (θ)

Then, using (3.15), we can nd the analytic expression for α e0 (x) :

=⇒ BP S

We take Υ BP S

Υ

b(θ) α e0 (x) = q α e0 (x) + F (x + θ) + 1, F (θ) µ ¶ 1 b(θ) 0 α e (x) = F (x + θ) + 1 . 1 − q F (θ)

(θ) = Φ2 (e α0 (x)) as an approximation for T

(θ) = Φ2 (e α0 (x)) =

BP S

(θ) = Φ2 (α0 (x)). Then

j (m − Xθ1 ) (m − Xθ1 ) b(θ) b(θ) X Fθi Fθ Φ2 (F (x + θ)) = + + . 1−q 1−q F (θ) F (θ) i,j µi + µj

Let us prove that

T

BP S

(θ) ≤ Υ

BP S

(θ),

or equivalently

T

BP S

BP S

(θ) − Υ

(θ) = Φ2 (α0 (x)) − Φ2 (e α0 (x)) ≤ 0.

3.5 Hyper-exponential job size distribution with more than two phases

47

Let us look at

Φ2 (α0 (x)) − Φ2 (e α0 (x)) = µ ¶ µ µ ¶¶ b(θ) b(θ) 0 0 = Φ2 (Φ1 (α (x))) + Φ2 F (x + θ) + 1 − qΦ2 (e α (x)) + Φ2 F (x + θ) + 1 F (θ) F (θ) ¡ ¢ = Φ2 (Φ1 (α0 (x))) − qΦ2 (α0 (x)) + q Φ2 (α0 (x)) − Φ2 (e α0 (x)) =⇒ Φ2 (α0 (x)) − Φ2 (e α0 (x)) =

¢ 1 ¡ Φ2 (Φ1 (α0 (x))) − qΦ2 (α0 (x)) . 1−q

Now from Lemma 3.3 and formula (3.7) we conclude that (3.26) is true. In this subsection we found the analytic expression of the upper bound of the expected sojourn time in the case when the job size distribution is a hyper-exponential function with many phases. In the experimental results of the following subsection we show that the obtained upper bound is also a close approximation. The analytic expression of the upper bound which we obtained is more clear and easier to analyze than the expression (3.20) for the expected sojourn time. It can be used in ecient numerical optimization of the TLPS performance.

3.5.5 Numerical results We calculate T (θ) and Υ(θ) for dierent numbers of phases N of the job size distribution function. We take N = 10, 100, 500, 1000. To calculate T (θ) we nd the numerical solution of the system of linear equations (3.21) using the Gauss method. Then using the result of Proposition 3.4 we nd T (θ). For Υ(θ) we use equation (3.26). As was mentioned in Subsection 3.3.1, by using the hyper-exponential distribution with many phases, one can approximate a heavy-tailed distribution. In our numerical experiments we x ρ, m, and select pi and µi in such a way that by increasing the number of phases we let the second moment d (see (3.2)) increase as well. Here we take ν η ρ = 0.909, m = 1.818, pi = 2.5 , µi = 1.2 , i = 1, . . . , N. i i In particular, we have X 1 pi = 1, =⇒ ν = P −2.5 , ii i X pi ν X −1.3 i . = m, =⇒ η = µi m i

i

In Figure 3.4 one can see the plots of the expected sojourn time and its upper bound as functions of θ when N equals to 10, 100 , 500 and 1000. In Figure 3.5 we plot the relative error of the upper bound

∆(θ) =

Υ(θ) − T (θ) , T (θ)

48

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

when N equals to 10, 100 , 500 and 1000. As one can see, the upper bound (3.26) is very tight. We nd the maximum gain of the expected sojourn time of the TLPS system with respect to the standard PS system. As previously we denote the gain by g(θ) =

T

PS

T

−T (θ) PS

, where T

PS

is the expected sojourn time in the standard PS system. The data and results are summarized in Table 3.2. As one can see, the relative error of the ∆(θ) of the upper bound according to the expected sojourn time is 6 − 8%, so the upper bound is very tight. N

η

d

θopt

maxθ g(θ)

maxθ ∆(θ)

10

0.95

7.20

5

32.98%

0.0640

100

1.26

32.28

12

45.75%

0.0807

500

1.40

113.31

21

49.26%

0.0766

1000

1.44

200.04

26

50.12%

0.0743

Table 3.2 Increasing the number of phases We can make the following conclusions when increasing number of phases : 1. the maximum relative gain maxθ g(θ) in expected sojourn time in comparison with PS increases ; 2. the sensitivity of the system performance with respect to the selection of the sub-optimal threshold value decreases. Thus the TLPS system produces better and more robust performance as the variance of the job size distribution increases.

3.6 Conclusion We analyze the TLPS scheduling mechanism with the hyper-exponential job size distribution function. In Section 3.4 we analyze the system when the job size distribution function has two phases and nd the analytic expressions of the expected conditional sojourn time and the expected sojourn time of the TLPS system. Connections in the Internet belong to two distinct classes : short HTTP and P2P signaling connections and long downloads such as PDF, MP3, and so on. Thus, according to this observation, we consider a special selection of the parameters of the job size distribution function with two phases and nd the approximation of the optimal threshold, when the variance of the job size distribution goes to innity. We show that the approximated value of the threshold tends to the optimal threshold, when the second moment of the distribution function goes to innity. We found that the ratio between the expected sojourn time of the TLPS system and the expected sojourn time of the standard PS system can be arbitrary small for very high loads.

3.6 Conclusion

49

24

9%

PS

T ϒ(θ) T(θ)

22

∆(θ)

8% 7%

20 6%

N=10

18

5%

16

4% 3%

14

N=100 2%

12

N=500

N=1000

10

8 0

1%

20

30

40

50

θ

60

70

80

90

100

0

N=1000

N=10

0%

10

N=500

N=100

10

20

30

40

50

θ

60

70

80

90

100

Figure 3.4 The expected sojourn time T (θ) and Figure 3.5 The relative error ∆(θ) for N = 10, its upper bound Υ(θ) for N = 10, 100, 500, 1000. 100, 500, 1000.

For realistic loads this ratio can reach 1/2. Also we show the system is not too sensitive to the selection of the optimal value of the threshold. With NS-2 simulator we implement TLPS scheduling scheme in the router of the bottleneck link. We show that the analytically found approximation of the optimal threshold value minimizes the mean sojourn time in the TLPS system between other threshold values. With the simulation results we show that TLPS with the found approximated value of the optimal threshold can give up to 35% gain in comparison with the DropTail policy and almost the same gain as the optimal LAS policy. In Section 3.5 we study the TLPS model when the job size distribution is a hyper-exponential function with many phases. We provide an expression of the expected conditional sojourn time as a solution of a system of linear equations. Also we apply the iteration method to nd the expression of the expected conditional sojourn time in the form of operator series and using the obtained expression we provide an upper bound for the expected sojourn time function. With the experimental results we show that the upper bound is very tight and can be used as an approximation of the expected sojourn time function. The obtained upper bound can be used to identify an approximation of the optimal threshold value for the TLPS system when the job size distribution is heavy-tailed. We study the properties of the expected sojourn time function, when the parameters of the job size distribution function are selected in such a way that it approximates a heavy-tailed distribution as the number of phases of the job size distribution increases. As the number of phases increases the gain of the TLPS system compared with the standard PS system increases

50

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

and the sensitivity of the system with respect to the selection of the optimal threshold decreases.

3.7 Appendix : Proof of Lemma 3.2 Let us consider any function β(x) ≥ 0 and dene βj =

R∞ 0

β(x)e−µj x dx,

j = 1, . . . , N. Let

us show for β(x) ≥ 0 that if

d(βj µj ) ≥ 0, dµj As

Z

then it follows that

j = 1, . . . , N,

∞Z x

Z

∞Z ∞

β(y)F (x − y + θ)F (x + θ)dydx = 0

and

0

Φ2 (Φ1 (β(x))) ≤ qΦ2 (β(x)) .

0

β(y)F (x1 + θ)F (x1 + y + θ)dx1 dy

0

Z ∞Z ∞ Φ2 (Φ1 (β(x))) = γ(θ) β(y)F (x + y + θ)F (x + θ)dydx 0 0 Z ∞Z x + γ(θ) β(y)F (x − y + θ)F (x + θ)dydx, 0

then

∞Z ∞

Z Φ2 (Φ1 (β(x))) = 2γ(θ)

0

β(x)F (x + θ)F (x + y + θ)dydx = 0

0

Z

X F iF j X F iF j θ θ θ θ β(x) e−µj x dx = 2γ(θ) βj . µi + µj µi + µj

∞

= 2γ(θ) 0

i,j

i,j

Also for Φ2 (β(x)), taking into account that q = γ(θ)

P

Fθi i µi ,

we obtain

X F i X jZ ∞ X F iF j θ θ θ qΦ2 (β(x)) = γ(θ) β(x)e−µj x dx = γ(θ) Fθ βj . µi µi 0 i

j

i,j

Thus, a sucient condition for the inequality Φ2 (Φ1 (β(x))) ≤ qΦ2 (β(x)) to be satised is that for every pair i, j :

2 2 1 1 βj + βi ≤ βj + βi µi + µj µj + µi µi µj

⇐⇒

−(βj µj − βi µi )(µj − µi ) ≤ 0.

The inequality is indeed satised when βj µj is an increasing function of µj . We conclude that Φ2 (Φ1 (β(x))) ≤ qΦ2 (β(x)), which proves Lemma 3.2.

3.7 Appendix : Proof of Lemma 3.2

51

52

Chapter 3 : Optimal choice of threshold in Two Level Processor Sharing

Chapter 4 Comparison of the Discriminatory Processor Sharing Policies

4.1 Summary The DPS policy was introduced by Kleinrock. Under the DPS policy jobs are organized in classes, which share a single server. The capacity that each class obtains depends on the number of jobs currently presented in all classes and is controlled by the vector of weights. Varying DPS weights it is possible to give priority to dierent classes at the expense of others, control their instantaneous service rates and optimize dierent system characteristics as mean sojourn time and so on. So, the proper weight selection is an important task, which is not easy to solve because of the model's complexity. We study the comparison of two DPS policies with dierent weight vectors. We show the monotonicity of the expected sojourn time of the system depending on the weight vector under a certain condition on the system. The restriction on the system is such that the result is true for systems for which the values of the job size distribution means are very dierent from each other. The restriction can be overcome by setting the same weights for the classes, which have similar means. The condition on means is a sucient, but not a necessary condition. It becomes less strict when the system is less loaded. The results of the current work can be found in [Osi08b].

53

54

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

4.2 Introduction The Discriminatory Processor Sharing (DPS) policy was introduced by Kleinrock [Kle67]. Under the DPS policy jobs are organized in classes, which share a single server. The capacity that each class obtains depends on the number of jobs currently presented in all classes. All jobs present in the system are served simultaneously at rates controlled by the vector of weights

{gk > 0, k = 1, . . . , N }, where N is the number of classes. If there are Nj jobs in class j , then P each job of this class is served with the rate gj / N k=1 gk Nk . When all weights are equal, DPS system is equivalent to the standard PS policy. The DPS policy model has recently received a lot of attention due to its wide range of application. For example, DPS could be applied to model ow level sharing of TCP ows with dierent ow characteristics such as dierent RTTs and packet loss probabilities. DPS also provides a natural approach to model the weighted round-robin discipline, which is used in operating systems for task scheduling. In the Internet one can imagine the situation that servers provide dierent service according to the payment rates. For more applications of DPS in communication networks see [BT01], [GM02b], [AJK04], , [CvdBB+ 05], [HT05]. Varying DPS weights it is possible to give priority to dierent classes at the expense of others, control their instantaneous service rates and optimize dierent system characteristics as mean sojourn time and so on. So, the proper weight selection is an important task, which is not easy to solve because of the model's complexity. The previously obtained results on DPS model are the following. Kleinrock in [Kle67] was the rst studying DPS. Then the paper of Fayolle et al. [FMI80] provided results for the DPS model. For the exponentially distributed required service times the authors obtained the expression of the expected sojourn time as a solution of a system of linear equations. The authors show that independently of the weights the slowdown for the expected conditional response time under the DPS policy tends to the constant slowdown of the PS policy as the service requirements increase to innity. Rege and Sengupta in [RS94] proved a decomposition theorem for the conditional sojourn time. For exponential service time distributions in [RS96] they obtained higher moments of the queue length distribution as the solutions of a system of linear equations and also provided a theorem for the heavy-trac regime. Van Kessel et al. in [KNQB04], [KNQB05] study the performance of DPS in an asymptotic regime using time scaling. For general distributions of the required service times the approximation analysis was carried out by Guo and Matta in [GM02b]. Altman et al. [AJK04] study the behavior of the DPS policy in overload. Most of the results obtained for the DPS queue were collected together in the survey paper of Altman et al. [AAA06]. Avrachenkov et al. in [AABNQ05] proved that the mean unconditional response time of

4.3 Previous results and problem formulation

55

each class is nite under the usual stability condition. They determine the asymptote of the conditional sojourn time for each class assuming nite service time distribution with nite variance. The problem of weights selection in the DPS policy when the job size distributions are exponential was studied by Avrachenkov et al. in [AABNQ05] and by Kim and Kim in [KK06]. In [KK06] it was shown that the DPS policy reduces the expected sojourn time in comparison with the PS policy when the weights increase in the opposite order with the means of job classes. Also in [KK06] the authors formulate a conjecture about the monotonicity of the expected sojourn time of the DPS policy. The idea of the conjecture is that comparing two DPS policies, one which has a weight vector closer to the optimal policy, provided by cµ-rule, see [Rig94], has smaller expected sojourn time. Using the method described in [KK06], in the present chapter we prove this conjecture with a restriction on the system parameters. The restriction on the system is such that the result is true for systems for which the values of the job size distribution means are very dierent from each other. The restriction can be overcome by setting the same weights for the classes, which have similar means. The condition on means is a sucient, but not a necessary condition. It becomes less strict when the system is less loaded. The chapter is organized as follows. In Section 4.3 we give general denitions of the DPS policy and formulate the problem of expected sojourn time minimization. In Section 4.4 we formulate the main theorem and prove it. In Section 4.5 we give the numerical results. Some technical proofs can be found in the Appendix.

4.3 Previous results and problem formulation We consider the DPS model. All jobs are organized in N classes and share a single server. Jobs of class k = 1, . . . , N arrive with a Poisson process with rate λk and have required serviceP time distribution Fk (x) = 1 − e−µk x with mean 1/µk . The load of the system is ρ = N k=1 ρk and ρk = λk /µk , k = 1, . . . , N . We consider that the system is stable, ρ < 1. Let us denote P λ= N k=1 λk . The state of the system is controlled by a vector of weights g = (g1 , . . . , gN ), which denotes the priority for the job classes. If in class k there are currently Nk jobs, then each job of class k P is served with the rate equal to gj / N k=1 gk Nk , which depends on the current system state, or on the number of jobs in each class. Let T

DP S

(g) be the expected sojourn time of the DPS system with the weight vector g . We

have

T

DP S

(g) =

N X λk k=1

λ

T k (g),

where T k (g) is the expected sojourn time for class k . The expressions for the expected sojourn

56

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

times T k (g), k = 1, . . . , N can be found as a solution of the system of linear equations, see [FMI80],

 T k (g) 1 −

N X j=1



N

X λj gj T j (g) λj gj 1 − = , µj gj + µk gk µj gj + µk gk µk

k = 1, . . . , N.

(4.1)

j=1

Let us notice that for the standard Processor Sharing system T

PS

=

ρ/λ 1−ρ .

One of the problems when studying DPS is to minimize the expected sojourn time T with some weight selection. Namely, nd

T

g∗

DP S

DP S

(g)

such as

(g ∗ ) = min T

DP S

g

(g).

This is a general problem and to simplify it the following subcase is considered. To nd a set G such that

T

DP S

(g ∗ ) ≤ T

PS

,

∀g ∗ ∈ G.

(4.2)

For the case when job size distributions are exponential the solution of (4.2) is given by Kim and Kim in [KK06] and is as follows. If the means of the classes are such as µ1 ≥ µ2 ≥ . . . ≥ µN , then G consists of all such vectors which satisfy

G = {g| g1 ≥ g2 ≥ . . . ≥ gN }. Using the approach of [KK06] we solve a more general problem about the monotonicity of the expected sojourn time in the DPS system, which we formulate in the following section as Theorem 4.1.

4.4 Expected sojourn time monotonicity Let us formulate and prove the following Theorem.

Theorem 4.1 Let the job size distribution for every class be exponential with mean 1/µi , i = 1, . . . , N and we enumerate them in the following way µ1 ≥ µ2 ≥ . . . ≥ µN .

Let us consider two dierent weight policies for the DPS system, which we denote as α and β . Let α, β ∈ G, or α1 ≥ α2 ≥ . . . ≥ αN , β1 ≥ β2 ≥ . . . ≥ βN .

4.4 Expected sojourn time monotonicity

57

The expected sojourn time of the DPS policies with weight vectors α and β satises T

DP S

(α) ≤ T

DP S

(4.3)

(β),

if the weights α and β are such that : αi+1 βi+1 ≤ , i = 1, . . . , N − 1, αi βi and the following restriction is satised : µj+1 ≤ 1 − ρ, µj

(4.4)

(4.5)

for every j = 1, . . . , N .

Remark 4.1 If for some classes j and j + 1 condition (4.5) is not satised, then by choosing the weights of these classes to be equal, we can still use Theorem 4.1. Namely, for classes j and µ j + 1 such as µj+1 > 1 − ρ, if we set αj+1 = αj and βj+1 = βj , then still the statement (4.3) of j Theorem 4.1 holds.

Remark 4.2 Theorem 4.1 shows that the expected sojourn time T

DP S

(g) is monotonous accor-

ding to the selection of weight vector g . The closer is the weight vector to the optimal policy, provided by cµ-rule, the smaller is the expected sojourn time. This is shown by the condition (4.4), which shows that vector α is closer to the optimal cµ-rule policy than vector β . Theorem 4.1 is proved with restriction (4.5). This restriction is a sucient and not a necessary condition on system parameters. It shows that the means of the job classes have to be quite dierent from each other. This restriction can be overcome, giving the same weights to the job classes with the similar mean values. Condition (4.5) becomes less strict as the system becomes less loaded. To prove Theorem 4.1 let us rst give some notations and prove additional Lemmas. Let us rewrite linear system (4.1) in the matrix form. Let T

(g)

(g)

(g)

(g)

= [T 1 , . . . , T N ]T be the column

vector of T k , k = 1, . . . , N . Here by [ ]T we mean transpose sign, so [ ]T is a column vector. We denote 1 as a vector of ones, so 1T is a column vector of ones. By [ ](g) we note that this element depends on the weight vector selection g ∈ G. Let us consider that later in the chapter vectors g, α, β ∈ G, if the opposite is not noticed. Let us give the following notations. (g)

σij =

gj . µi gi + µj gj

(g)

Using the notation of σij let us dene matrices A(g) and D(g) in the following way. (g)

(g)

Aij = σij , i, j = 1, . . . , N, ( P (g) N (g) k=1 σik , i, j = 1, . . . , N, Dij = 0, i, j = 1, . . . , N,

(4.6)

i = j, i 6= j.

(4.7)

58

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

Then linear system (4.1) becomes

(E − D(g) − A(g) )T

·

(g)

=

1 1 .... µ1 µN

¸T (4.8)

.

Let us denote

B (g) = A(g) + D(g) ,

∀g.

We need to nd the expected sojourn time of the DPS system T denition of T

DP S

DP S

(g). According to the

(g) and equation (4.8) we have

T

DP S

(g) = λ−1 [λ1 , . . . , λN ]T

(g)

·

= λ−1 [λ1 , . . . , λN ](E − B (g) )−1

1 1 ,..., µ1 µN

¸T .

Let us consider the case when λi = 1 for i = 1, . . . , N . These results can be extended for the case when λi are dierent. We prove it following the approach of [KK06] in Proposition 4.1 at the end of the current section. Then the previous equation becomes

T

DP S

(g) = λ−1 1(E − B (g) )−1 [ρ1 , . . . , ρN ]T . (g)

(4.9)

(g)

Let us show some properties of σij . From the denition of σij it follows (g)

(g)

σij gi = σji gj , (g)

σij

µi

(g)

+

σji

µj

=

1 , µi µj

(4.10)

i, j = 1, . . . , N . Also we prove Lemma 4.1.

Lemma 4.1 If α and β satisfy (4.4), then (α)

(β)

i ≤ j,

(4.11)

(α)

(β)

i ≥ j.

(4.12)

σij ≤ σij , σij ≥ σij ,

Proof. If α and β satisfy (4.4), then

αj αi

≤

βj βi ,

i ≤ j , i, j = 1, . . . , N − 1. From here αj µi βi ≤

βj µi αi , i ≤ j . Adding to both parts αj µj βj and dividing both parts by (µi βi + µj βj ) we get (4.11). We prove (4.12) in a similar way.

Lemma 4.2 If α, β satisfy (4.4), then T

DP S

(α) ≤ T

DP S

(β),

when the elements of vector y = 1(E − B (α) )−1 M are such that y1 ≥ y2 ≥ . . . ≥ yN , where M as a diagonal matrix M = diag(µ1 , . . . , µN ).

4.4 Expected sojourn time monotonicity

59

Proof. Using expression (4.9) for g = α, β we get the following. T

DP S

(α) − T

DP S

(β) = λ−1 1((E − B (α) )−1 − (E − B (β) )−1 ) [ρ1 , . . . , ρN ]T = λ−1 1((E − B (α) )−1 (B (α) − B (β) )(E − B (β) )−1 ) [ρ1 , . . . , ρN ]T .

Let M be a diagonal matrix M = diag(µ1 , . . . , µN ) and

y = 1(E − B (α) )−1 M.

(4.13)

Then

T

DP S

(α) − T

DP S

(β) = 1(E − B (α) )−1 M M −1 (B (α) − B (β) )T

(β)

(β)

= yM −1 (B (α) − B (β) )T X µ yj (α) yi (α) µ yj (β) yi (β) ¶¶ (β) = σ + σij − σ + σij Tj µj ji µi µj ji µi i,j Ã Ã (α) ! (β) ! X σji σji yi (α) (β) (β) yj = − + (σij − σij ) T j . µj µj µi i,j

As (4.10),

(g)

σji µj

T

=

DP S

1 µi µj

(g)

−

(α) − T

σij µi

DP S

, g = α, β , then

¶ ´ y X µ yj ³ (α) (β) (β) (α) (β) i σ − σij + (σij − σij ) T j (β) = − µi ij µi i,j ¶ ´ X µ³ (α) 1 (β) (β) = σij − σij (yi − yj ) Tj . µi i,j

Using Lemma 4.1 we get that expression

³ ´ (α) (β) σij − σij (yi − yj ) ≤ 0, i, j = 1, . . . , N when

y1 ≥ y2 ≥ . . . ≥ yN . This proves the statement of Lemma 4.2.

Lemma 4.3 Vector y given by (4.13) satises y1 ≥ y2 ≥ . . . ≥ yN ,

if the following is true : µi+1 ≤ 1 − ρ, µi

for every i = 1, . . . , N .

Proof. The proof can be found in the appendix.

60

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

Remark 4.3 For the job classes such as

µi+1 µi

> 1 − ρ we prove that it is sucient to give these

classes the same weights, αi+1 = αi to keep y1 ≥ y2 ≥ . . . ≥ yN ,. The proof can be found in the Appendix. Combining the results of Lemmas 4.1, 4.2 and 4.3 we prove the statement of Theorem 4.1. Remark 4.3 gives Remark 4.1 after Theorem 4.1. Now in Proposition 4.1 we prove the extension of Theorem 4.1 for the case when λi 6= 1.

Proposition 4.1 The result of Theorem 4.1 is extended to the case when λi 6= 1. Proof. Let us rst consider the case when all λi = q , i = 1, . . . , N . It can be shown that for this case the proof of Theorem 4.1 is equivalent to the proof of the same Theorem but for the new system with λ∗i = 1, µ∗i = qµi , i = 1, . . . , N . For this new system the results of Theorem 4.1 is evidently true and restriction (4.5) is not changed. Then, Theorem 4.1 is true for the initial system as well. If λi are rational, then they can be written in λi =

pi q,

where pi and q are positive integers.

Then each class can be presented as pi classes with equal means 1/µi and intensity 1/q . So, the DPS system can be considered as the DPS system with p1 + . . . + pK classes with the same arrival rates 1/q . The result of Theorem 4.1 is extended to this case. If λi , i = 1, . . . , N are positive and real we apply the previous case of rational λi and use continuity. In the following section we give numerical results on Theorem 4.1. We consider two cases, when condition (4.5) is satised and when it is not satised. We show that condition (4.5) is a sucient and not a necessary condition on the system parameters.

4.5 Numerical results Let us consider a DPS system with 3 classes. Let us consider the set of normalized weight P P vectors g(x) = (g1 (x), g2 (x), g3 (x)) , 3i=1 gi (x) = 1, gi (x) = x−i /( 3i=1 x−i ), x > 1. Every point x > 1 denotes a weight vector. Vectors g(x), g(y) satisfy property (4.4) when 1 < y ≤

x, namely gi+1 (x)/gi (x) ≤ gi+1 (y)/gi (y), i = 1, 2, 1 < y ≤ x. On Figures 4.1, 4.2 we plot T

DP S

(g(x)) with weights vectors g(x) as a function of x, the expected sojourn times T

the PS policy and T

opt

PS

for

for the optimal cµ-rule policy.

On Figure 4.1 we plot the expected sojourn time for the case when condition (4.5) is satised for three classes. The parameters are : λi = 1, i = 1, 2, 3, µ1 = 160, µ2 = 14, µ3 = 1.2, then

ρ = 0.911. On Figure 4.2 we plot the expected sojourn time for the case when condition (4.5) is not satised for three classes. The parameters are : λi = 1, i = 1, 2, 3, µ1 = 3.5, µ2 = 3.2,

4.6 Conclusion

61

3.9

3.5

PS

T

3.88 3.45

PS

T

3.86

3.4 3.84 3.35

3.82 3.8

3.3

DPS

T

(g(x))

TDPS(g(x))

3.78

3.25 3.76 3.2

opt

T

3.74

3.15

3.1 1

Topt

3.72

2

3

4

Figure 4.1 T

DP S

5

6

(g(x)), T

7

PS

, T

condition satised.

8

opt

9

10

3.7 1

2

3

4

DP S

5

6

functions, Figure 4.2 T (g(x)), T condition not satised

µ3 = 3.1, then ρ = 0.92. One can see that T

DP S

(g(x)) ≤ T

DP S

PS

7

, T

8

opt

9

10

functions,

(g(y)), 1 < y ≤ x even when the

restriction (4.5) is not satised.

4.6 Conclusion We study the DPS policy with exponential job size distribution. One of the main problems studying DPS is the expected sojourn time minimization according to the weights selection. In the present work we compare two DPS policies with dierent weights. We show that the expected sojourn time is smaller for the policy with the weigh vector closer to the optimal policy vector, provided by cµ-rule. So, we prove the monotonicity of the expected sojourn time for the DPS policy according to the weight vector selection. The result is proved with a restriction on system parameters. The found restriction on the system parameters is such that the result is true for systems such as the mean values of the job class size distributions are very dierent from each other. We found, that to prove the main result it is sucient to give the same weights to the classes with similar means. The found restriction is a sucient and not a necessary condition on system parameters. When the load of the system decreases, the condition becomes less strict.

4.7 Appendix In the following proof we do not use the dependency of the parameters on g to simplify the notations. We consider that vector g ∈ G, or g1 ≥ g2 . . . ≥ gN . To simplify the notations let us

62 use

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

P k

instead of

PN

k=1 .

Lemma 4.3. Vector y = 1(E − B)−1 M satises y1 ≥ y2 ≥ . . . ≥ yN , if

µi+1 µi

≤ 1 − ρ, for every

i = 1, . . . , N − 1.

Proof. Using the results of the following Lemmas 4.4, 4.5, 4.6, 4.7, 4.8 we prove the statement of Lemma 4.3 and give the proof for Remark 4.3. Let µ = [µ1 , . . . , µN ] be the vector of µi , i = 1, . . . , N and let us give the following notations

µ ˜ = µ(E − D)−1 , A˜ = M We dene f (x) =

P

x k µk (x+µk gk )

(E −

A˜ij

1−

= 0, µ = =

1 P k

(4.14) −1

AM (E − D)

and notice that 1 −

(E − D)−1 = jj D)−1 ij

−1

=

σjk

P k

.

(4.15)

σjk = 1 − ρ + f (µj gj ). Then

1 > 0, 1 − ρ + f (µj gj )

i 6= j,

¶ µj gj (E − D)−1 jj µi (µi gi + µj gj ) µj gj > 0, µi (µi gi + µj gj )(1 − ρ + f (µj gj ))

i, j = 1, . . . , N . Let us prove an additional Lemma.

Lemma 4.4 Matrix A˜ = M −1 AM (E − D)−1 is a positive contraction. Proof. Matrix A˜ is a positive operator as its elements A˜ij are positive. Let Ω = {X|xi ≥ 0, i = ˜ ∈ Ω. Then to prove that matrix A˜ is a contraction it is enough 1, . . . , N }. If X ∈ Ω, then AX to show that

∃ q,

˜ ||AX|| ≤ q||X||,

0 < q < 1,

As X ∈ Ω, then we can take ||X|| = 1X =

˜ 1AX =

X

xj

j

X

X i

P

i xi .

A˜ij =

∀ X ∈ Ω.

Then

X j

P xj

µj gj i µi (µj gj +µi gi )

(1 − ρ + f (µj gj ))

f (µj gj ) 1 − ρ + f (µj gj ) j ¶ X µ 1−ρ xj 1 − = 1 − ρ + f (µj gj ) j P Ã ! xj X j 1−ρ+f (µj gj ) P xj 1 − (1 − ρ) = . j xj

=

j

xj

(4.16)

4.7 Appendix

63 P

Let us denote ∆0 =

xj j 1−ρ+f (µj gj )

P

j xj

˜ = . Then 1AX

P j

xj (1 − (1 − ρ)∆0 ).

We need to nd the value of q , which satises the following

˜ ≤ q1X, 1AX X X xj (1 − (1 − ρ)∆0 ) ≤ q xj , j

j

1 − (1 − ρ)∆0 ≤ q. As f (µj gj ) > 0, then 0 < 1 − (1 − ρ)∆0 < 1. We dene δ =

1 1−ρ+maxj f (µj gj ) .

Let us notice that

maxj f (µj gj ) always exists as the values of µj gj , j = 1, . . . , N are nite. As δ < ∆0 , then if we select q = 1 − (1 − ρ)δ , then the found q is 0 < q < 1 and satises condition (4.16). This completes the proof.

Lemma 4.5 If (0)

(4.17)

y1 = [0, . . . , 0], ˜ y (n) = µ ˜ + y (n−1) A,

(4.18)

n = 1, 2, . . . ,

then y (n) → y , when n → ∞.

Proof. Let us recall that y = 1(E − B)−1 M and B = E − A − D, then yM −1 (E − D − A) = 1, yM −1 (E − D) = yM −1 A + 1, y = yM −1 A(E − D)−1 M + 1(E − D)−1 M. As matrices D and M are diagonal, the M D = DM and then

y = µ(E − D)−1 + yM −1 AM (E − D)−1 , where µ = [µ1 , . . . , µN ]. According to notations (4.14) and (4.15) we have the following

˜ y=µ ˜ + y A. (n)

(n)

(0)

Let us denote y (n) = [y1 , . . . , y1 ], n = 0, 1, 2, . . . and let dene y1 and y (n) by (4.17) and ( 4.18). According to Lemma 4.4 operator A˜ is a positive reexion and is a contraction. Also µ ˜i are positive. Then y (n) → y , when n → ∞ and we prove the statement of Lemma 4.5.

Lemma 4.6 Let y (n) , n = 0, 1, 2, . . . be dened as in Lemma 4.5, then (n)

y1

if

µi+1 µi

(n)

≥ y2

(n)

≥ . . . ≥ yN ,

≤ 1 − ρ for every i = 1, . . . , N − 1.

n = 1, 2, . . .

(4.19)

64

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

Proof. We prove the statement (4.19) by induction. For y (0) the statement (4.19) is true. Let (n−1)

us assume that (4.19) is true for the (n − 1) step, y1 (n) induction statement we have to show that y1 (n) (n) i = 1, . . . , N − 1. Let us consider yj − yp , j

(n)

yj

≥

(n) y2

(n−1)

≥ y2

≥ ... ≥

(n−1)

≥ . . . ≥ yN

(n) yN ,

when

µi+1 µi

. To prove the

≤ 1 − ρ for every

≤ p. As

=µ ˜j +

N X

(n−1)

yi

A˜ij ,

i=1

then (n)

yj

− yp(n) = µ ˜j − µ ˜p +

N X

(n−1)

yi

(A˜ij − A˜ip ).

i=1 µ

In Lemma 4.7 we show that µ ˜j ≥ µ ˜p , j ≤ p, when µi+1 ≤ 1 − ρ for every i = 1, . . . , N − 1. Let i PN (n−1) ˜ us regroup the sum i=1 yi (Aij − A˜ip ) in the following way N X

(n−1)

yi

(n−1) (A˜ij − A˜ip ) = yN

i=1 (n−1)

As yi PN

N −1 r N X X X (n−1) (n−1) (A˜kj − A˜kp ). (A˜kj − A˜kp ) + (yi − yi+1 ) i=1

k=1

k=1

(n−1)

≥ yi+1 , i = 1, . . . , N − 1, according to the induction step, then to show that P (n−1) ˜ (Aij − A˜ip ) ≥ 0, j ≤ p it is enough to show that ri=1 (A˜ij − A˜ip ) ≥ 0, j ≤ p, i=1 yi

r = 1, . . . , N . We show this in Lemma 4.8. Using the previous discussion we prove the induction step and so prove Lemma 4.6. Now let us prove Lemmas 4.7 and 4.8.

Lemma 4.7 µ ˜1 ≥ µ ˜2 . . . ≥ µ ˜N ,

if

µi+1 µi

≤ 1 − ρ for every i = 1, . . . , N − 1.

Proof. We consider µ˜j − µ˜p , j < p. Let us recall that g1 ≥ g2 . . . ≥ gN and µ1 ≥ µ2 . . . ≥ µN . Let us denote f2 (x) =

P

gk k x+µk gk ,

µ ˜j − µ ˜p =

then µ ˜i =

µi 1−f2 (µi gi )

and

µj − µp − (µj f2 (µp gp ) − µp f2 (µj gj )) . (1 − f2 (µj gj ))(1 − f2 (µp gp ))

Let us denote ∆1 = µj − µp − (µj f2 (µp gp ) − µp f2 (µj gj )). As 0 < f2 (x) < ρ, then

Ã ∆1 > (µj − µp ) 1 − ρ

Ã

1 µ 1 − µpj

!! ≥ 0,

4.7 Appendix

65

when

µp ≤ 1 − ρ. µj So, µ ˜j − µ ˜p ≥ 0, j < p if

µp µj

≤ 1 − ρ.

Let us consider ∆1 when µj > µp and gj = gp . In this case Ã ! N X gk (gj (µj + µp ) + µk gk ) ∆1 = (µj − µp ) 1 − . (µp gj + µk gk )(µp gj + µk gk ) k=1

We can show that

gk (gj (µj + µp ) + µk gk ) 1 < , (µj gj + µk gk )(µp gj + µk gk ) µk Then

Ã

N X 1 ∆1 > (µj − µp ) 1 − µk

k = 1, . . . , N.

! = (µj − µp )(1 − ρ) > 0.

k=1

In the case when µj = µp and gj = gp , then µ ˜j − µ ˜p = 0. Then we have proved the following : If gj = gp µj = µp ,

then µ ˜j = µ ˜p ,

If gj = gp µj > µp , If gj ≥ gp µj ≥ µp ,

then µ ˜j > µ ˜p , µp µj

≤ 1 − ρ, then µ ˜j ≥ µ ˜p .

Setting p = j + 1 and recalling that µ1 ≥ . . . ≥ µN , we get that µ ˜1 ≥ µ ˜2 . . . ≥ µ ˜N is true when µi+1 µi

≤ 1 − ρ for every i = 1, . . . , N − 1. That proves the statement of Lemma 4.7.

Returning back to the main Theorem 4.1, Lemma 4.7 gives condition (4.5) as a restriction on system parameters. Let us notice that if for the job classes i and i + 1 we have that

µi+1 µi

< 1 − ρ, then setting the

weights for these classes equal, still µ ˜i ≥ µ ˜i+1 . This condition gives us as a result Remark 4.3 and Remark 4.1.

Lemma 4.8 r X

A˜i1 ≥

i=1

r X

A˜i2 ≥ . . . ≥

i=1

r X

A˜iN ,

r = 1, . . . , N.

i=1

Proof. Let us recall A˜ = M −1 AM (E − D)−1 and let us x r in the following proof. Let us dene

Pr f3 (x) =

1−ρ

x i=1 µi (x+µi gi ) P x + N k=1 µk (x+µk gk )

=

h1 (x) , 1 − ρ + h1 (x) + h2 (x)

66

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

where h1 (x) =

Pr

x i=1 µi (x+µi gi )

> 0, and h2 (x) =

PN

x j=r+1 µj (x+µj gj )

> 0. Then

Pr

˜

i=1 Aij

=

f3 (µj gj ). To prove the statement of the Theorem it is enough to show that the function f3 (x) is increasing in x. For that it is enough to show that

df3 (x) dx

=

df3 (x) dx

≥ 0. Let us consider

h01 (x)(1 − ρ) + h01 (x)h2 (x) − h1 (x)h02 (x) . (1 − ρ + h1 (x) + h2 (x))2

We can show that

h01 (x)h2 (x) − h1 (x)h02 (x) =

r N X X i=1 k=r+1

x2 (µi gi − µk gk ) ≥ 0, (x + µi gi )2 (x + µk gk )2 µk µi

as µj gj ≥ µp gp , j ≤ p. Since h01 (x) > 0 and 1 − ρ > 0, then function of x and we prove the statement of Lemma 4.8.

df3 (x) dx

≥ 0, f3 (x) is an increasing

4.7 Appendix

67

68

Chapter 4 : Comparison of the Discriminatory Processor Sharing Policies

Chapter 5 Optimal policy for multi-class scheduling in a single server queue

5.1 Summary In this chapter we apply the Gittins optimality result to characterize the optimal scheduling discipline in a multi-class M/G/1 queue. We apply the general result to several cases of practical interest where the service time distributions belong to the set of DHR distributions, like Pareto or hyper-exponential. When there is only one class it is known that in this case the LAS policy is optimal. We show that in the multi-class case the optimal policy is a priority discipline, where jobs of the various classes depending on their attained service are classied into several priority levels. Using a tagged-job approach we obtain, for every class, the mean conditional sojourn time. This allows us to compare numerically the mean sojourn time in the system between the Gittins optimal and popular policies like PS, FCFS and LAS. Our results may be applicable for instance in an Internet router, where packets generated by dierent applications must be served or service is non-preemptive. Typically a router does not have access to the exact required service time (in packets) of the TCP connections, but it may have access to the attained service of each connection. Thus we implement the Gittins optimal algorithm in NS-2 and perform experiments to evaluate the achievable performance gain. We nd that in the particular example with two classes and Pareto-type service time distribution the Gittins policy outperform LAS by nearly 10% under moderate load.

69

70

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

5.2 Introduction We are interested to schedule jobs in the M/G/1 queue with the aim to minimize the mean sojourn time in the system as well as the mean number of jobs in the system. In our study we restrict ourselves to the non-anticipating scheduling policies. Let us recall that the policy is non-anticipating if it does not use information about the size of the arriving jobs. In [Git89], Gittins considered an M/G/1 queue and proved that the so-called Gittins index rule minimizes the mean delay. At every moment of time the Gittins rule calculates, depending on the attained service times of jobs, which job should be served. Gittins derived this result as a byproduct of his groundbreaking results on the multi-armed bandit problem. The literature on multi-armed bandit related papers that build on Gittins' result is huge (see for example [VWB85, Whi88, Web92, Tsi93, DGNM96, FW99, BNM00]). However, the optimality result of the Gittins index in the context of an M/G/1 queue has not been fully exploited, and it has not received the attention it deserves. In the present work we generalize the Gittins index approach to the scheduling in the multiclass M/G/1 queue. We emphasize that Gittins' optimality in a multi-class queue holds under much more general conditions than the condition required for the optimality of the well-known

cµ-rule. We recall that the cµ-rule is the discipline that gives strict priority in descending order of ck µk , where ck and µk refer to a cost and the inverse of the mean service requirement, respectively, of class k . Indeed it is known (see for example [BVW85, SY92, NT94]) that the

cµ-rule minimizes the weighted mean number of customers in the queue in two main settings : (i) generally distributed service requirements among all non-preemptive disciplines and (ii) exponentially distributed service requirements among all preemptive non-anticipating disciplines. In the preemptive case the cµ-rule is only optimal if the service times are exponentially distributed. On the other hand, by applying Gittins' framework to the multi-class queue one can characterize the optimal policy for arbitrary service time distributions. We believe that our results open an interesting avenue for further research. For instance well-known optimality results in a single-class queue like the optimality of the LAS discipline when the service times are of type decreasing hazard rate or the optimality of FCFS when the service time distribution is of type New-Better-than-Used-in-Expectation can all be derived as corollaries of Gittins' result. The optimality of the cµ-rule can also easily be derived from the Gittins' result. In order to get insights into the structure of the optimal policy in the multi-class case we consider several relevant cases where the service time distributions are Pareto or hyperexponential. We have used these distributions due to the evidence that the le size distributions in the Internet are well presented by the heavy-tailed distributions such as Pareto distributions with the innite second moment. Also it was shown that job sizes in the Internet are well modelled with the distributions with the decreasing hazard rate. We refer to [NMM98, CB97,

5.3 Gittins policy in multi-class M/G/1 queue

71

Wil01] for more details on this area, see also Subsection 1.1.3. In particular, we study the optimal multi-class scheduling in the following cases of the service time distributions : two Pareto distributions, several Pareto distributions, one hyper-exponential and one exponential distributions. Using a tagged-job approach and the collective marks method we obtain, for every class, the mean conditional sojourn time. This allows us to compare numerically the mean sojourn time in the system between the Gittins optimal and popular policies like PS, FCFS and LAS. We nd that in a particular example with two classes and Pareto-type service time distribution the Gittins policy outperforms LAS by nearly 25% under moderate load. From an application point of view, our ndings could be applied in Internet routers. Imagine that incoming packets are classied based on the application or the source that generated them. Then it is reasonable to expect that the service time distributions of the various classes may dier from each other. A router in the Internet does not typically have access to the exact required service time (in packets) of the TCP connections, but it may have access to the attained service of each connection. Thus we can apply our theoretical ndings in order to obtain the optimal (from the connection-level performance point of view) scheduler at the packet level. We implement the Gittins scheduling policy in the NS-2 simulator and perform experiments to evaluate the achievable performance gain. The structure of the chapter is as follows : In Section 5.3 we review the Gittins index policy for the single-class M/G/1 queue and then provide a general framework of the Gittins index policy for the multi-class M/G/1 queue. In Section 5.4, we study the Gittins index policy for the case of two Pareto distributed classes. In particular, we derive analytic expressions for the mean conditional sojourn times, study various properties of the optimal policy, provide numerical examples and NS-2 simulations. At the end of Section 5.4 we generalize the results to multiple Pareto classes. In Section 5.5 we study the case of two distributions : one distribution being exponential and the other distribution being hyper-exponential with two phases. For the case of exponential and hyper-exponential distributions, we also obtain analytical results and provide numerical examples. Section 5.6 concludes the chapter. Some additional profs are given in the Appendix.

5.3 Gittins policy in multi-class M/G/1 queue Let us rst recall the basic results related to the Gittins index policy in the context of a single-class M/G/1 queue. Let Π denote the set of non-anticipating scheduling policies. Popular disciplines such as PS, FCFS and LAS, also called FB, belong to Π. Important disciplines that do not belong to Π are SRPT and Shortest Processing Time (SPT). We consider a single-class M/G/1 queue. Let X denote the service time with distribution

72

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

P (X ≤ x) = F (x). The density is denoted by f (x), the complementary distribution by F (x) = π

1 − F (x) and the hazard rate function by h(x) = f (x)/F (x). Let T (x), π ∈ Π denote the mean conditional sojourn time for the job of size x in the system under the scheduling policy π , and π

T , π ∈ Π denote the mean sojourn time in the system under the scheduling policy π . Let us give some denitions.

Denition 5.1 For any a, ∆ ≥ 0, let R∆ J(a, ∆) = R 0∆ 0

f (a + t)dt F (a + t)dt

=

F (a) − F (a + ∆) . R∆ F (a + t)dt 0

(5.1)

For a job that has attained service a and is assigned ∆ units of service, equation (5.1) can be interpreted as the ratio between (i) the probability that the job will complete with a quota of

∆ (interpreted as payo) and (ii) the expected processor time that a job with attained service a and service quota ∆ will require from the server (interpreted as investment). Note that for every a > 0

f (a) = h(a), F (a) F (a) = 1/E[X − a|X > a]. J(a, ∞) = R ∞ 0 F (a + t) dt

J(a, 0) =

Note further that J(a, ∆) is continuous with respect to ∆.

Denition 5.2 The Gittins index function is dened by G(a) = sup J(a, ∆), ∆≥0

(5.2)

for any a ≥ 0. We call G(a) the Gittins index after the author of book [Git89], which handles various static and dynamic scheduling problems. Independently, Sevcik dened a corresponding index when considering scheduling problems without arrivals in [Sev74]. In addition, this index has been dealt with by Yashkov, see [Yas92] and references therein, in particular the works by Klimov [Kli74, Kli78].

Denition 5.3 For any a ≥ 0, let ∆∗ (a) = sup{∆ ≥ 0 | J(a, ∆) = G(a)}. By denition, G(a) = J(a, ∆∗ (a)) for all a.

(5.3)

5.3 Gittins policy in multi-class M/G/1 queue

73

Denition 5.4 The Gittins index policy πg is the scheduling discipline that at every instant of time gives service to the job in the system with highest G(a), where a is the job's attained service.

Theorem 5.1 The Gittins index policy minimizes the mean sojourn time in the system between all non-anticipating scheduling policies. In other words, in the M/G/1 queue for any π ∈ Π, T

πg

π

≤T .

Proof. See [Git89]. Note that by Little's law the Gittins index policy also minimizes the mean number of jobs in the system. We generalize the result of Theorem 5.1 to the case of the multi-class single server queue. Let us consider a multi-class M/G/1 queue. Let Xi denote the service time with distribution

P (Xi ≤ x) = Fi (x) for every class i = 1, . . . , N . The density is denoted by fi (x) and the complementary distribution by F i (x) = 1 − Fi (x). The jobs of every class-i arrive with the P Poisson process with rate λi , the total arrival rate is λ = N i=1 λi . For every class i = 1, . . . , N we dene Ji (a, ∆) =

R∆

R 0∆ 0

sup∆≥0 Ji (a, ∆).

fi (a+t)dt F i (a+t)dt

and then the Gittins index of a class-i job is dened as Gi (a) =

π

We dene as T i (x) the mean conditional sojourn time for the class-i job of size x, i = π

1, . . . , N and as T the mean sojourn time in the system under the scheduling policy π ∈ Π.

Proposition 5.1 In a multi-class M/G/1 queue the policy that schedules the job with highest Gittins index Gi (a), i = 1, . . . , N in the system, where a is the job's attained service, is the optimal policy that minimizes the mean sojourn time.

Proof. The result follows directly from the application of the Denition 5.2 and Theorem 5.1 to a multi-class M/G/1 queue. Let hi (x) = fi (x)/F i (x) denote the hazard rate function of class i = 1, . . . , N . Let the service time distribution of class-i have a decreasing hazard rate. It is possible to show, see [AA07], that if hi (x) is non-increasing, the function Ji (a, ∆) is non-increasing in ∆. Thus

Gi (a) = Ji (a, 0) = hi (a).

(5.4)

As a consequence we obtain the following proposition.

Proposition 5.2 In a multi-class M/G/1 queue with non-increasing hazard rate functions hi (x) for every class i = 1, . . . , N , the policy that schedules the job with highest hi (a), i = 1, . . . , N in the system, where a is the job's attained service, is the optimal policy that minimizes the mean sojourn time.

74

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

Proof. Follows immediately from the Gittins policy Denition 5.4, Proposition 5.1 and equation (5.4). The policy presented in Proposition 5.2 is an optimal policy for the multi-class single-server queue. Let us notice that for the single class single server queue the Gittins policy becomes a LAS policy, as the hazard rate function is the same for all jobs and so the job with the maximal value of the hazard rate function from attained service is the job with the least attained service. When we serve jobs with the Gittins policy in the multi-class queue to nd a job which has to be served next we need to calculate the hazard rate of every job in the system. The job which has the maximal value of the hazard rate function is served the next. Later by the value of the hazard rate we mean the value of the hazard rate function of the job's attained service. Now let us consider several subcases of the described general approach. Depending on the behavior of the hazard rate functions of the job classes the policy is dierent. We consider the case with two job classes in the system and two subcases : (a) both job classes are distributed with Pareto and the hazard rate functions do not cross and (b) job size distributions are hyperexponential with one and two phases and they cross at one point. Then we extend the case of two Pareto job classes to the case of N Pareto job classes. We provide the analytical expressions for the mean conditional sojourn times in the system and numerical results. We implemented the algorithm for the case of two Pareto classes with the NS-2 simulator on the packet level.

5.4 Two Pareto classes Let us rst present the case when job sizes are distributed according to Pareto distribution.

5.4.1 Model description We consider the case when the job size distribution functions are Pareto. We consider the two-class single server M/G/1 queue. Jobs of each class arrive to the server with Poisson process with rates λ1 and λ2 . The job sizes are distributed according to the Pareto distributions, namely

Fi (x) = 1 −

bci i , (x + bi )ci

i = 1, 2.

(5.5)

Here bi = mi (ci − 1), where mi is the mean of class-i, i = 1, 2. Then fi (x) = bci i ci /(x + bi )ci +1 ,

i = 1, 2 and the hazard rate functions are hi (x) =

ci , (x + bi )

i = 1, 2.

These functions cross at the point

a∗∗ =

c2 b1 − c1 b2 . c1 − c2

5.4 Two Pareto classes

75

class-1

high-priority queue θ

LAS

Jobs

h2(0)

low-priority queue Gittins θ class-1

class-2

h1(x)

class-2

h2(x) θ g(x)

x

x

Figure 5.1 Two Pareto classes, hazard rates

Figure 5.2 Two Pareto classes, policy scheme

Without loss of generality suppose that c1 > c2 . Then the behavior of the hazard rate functions depends on the values of b1 and b2 . Let us rst consider the case when the hazard rate function do not cross, so a∗∗ < 0. This happens when b1 /b2 < c1 /c2 . Then the hazard-rate functions are decreasing and never cross and

h1 (x) > h2 (x), for all x ≥ 0. Let us denote θ and function g(x) in the following way that

h1 (x) = h2 (g(x)),

h1 (θ) = h2 (0).

We can see that g(θ) = 0. For given expressions of hi (x), i = 1, 2 we get

g(x) =

c2 (x + b1 ) − b2 , c1

θ=

c1 b2 − c2 b1 . c2

According to the denition of function g(x), the class-1 job of size x and the class-2 job of size

g(x) have the same value of the hazard rate when they are fully served, see Figure 5.1. Then the optimal policy structure is given on Figure 5.2.

5.4.2 Optimal policy Jobs in the system are served in two queues, low and high priority queues. The class-1 jobs which have attained service a < θ are served in the high priority queue with LAS policy. When the class-1 job achieves θ amount of service it is moved to the second low priority queue. The class-2 jobs are moved immediately to the low priority queue. The low priority queue is served only when the high priority queue is empty. In the low priority queue jobs are served in the following way : the service is given to the job with the highest hi (a), where a is the job's attained

76

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

service. So, for every class-1 job with a attained service the function h1 (a) is calculated, for every class-2 job with a attained service the function h2 (a) is calculated. After all values of hi (a) are compared, the job which has the highest hi (a) is served. Now let us calculate the expressions of the mean conditional sojourn time for the class-1 and class-2 jobs.

5.4.3 Mean conditional sojourn times Let us denote by indices [](1) and [](2) the values for class-1 and class-2 accordingly. Let us dene as Xyn

(i)

(i)

the n-th moment and ρy be the utilization factor for the distribution

Fi (x) truncated at y for i = 1, 2. The distribution truncated at y equals F (x) for x ≤ y and equals 1 when x > y . Let us denote Wx,y the mean workload in the system which consists only of class-1 jobs with service times truncated at x and of class-2 jobs with service times truncated at y . According to the Pollaczek-Khinchin formula

Wx,y =

λ1 Xx2

(1)

+ λ2 Xy2 (1)

(2)

(2)

2(1 − ρx − ρy )

.

(5.6)

Now let us formulate the following Theorem which we prove in the Appendix.

Theorem 5.2 In the two-class M/G/1 queue where the job size distributions are Pareto, given by (5.5), and which is scheduled with the Gittins policy described in Subsection 5.4.2, the mean conditional sojourn times for class-1 and class-2 jobs are T1 (x) =

x + Wx,0

, x ≤ θ, (1) 1 − ρx x + Wx,g(x) T1 (x) = , x > θ, (1) (2) 1 − ρx − ρg(x) T2 (g(x)) =

g(x) + Wx,g(x) (1)

(2)

1 − ρx − ρg(x)

,

x > θ.

(5.7) (5.8) (5.9)

Proof. The proof is very technical and is given in the Appendix. Let us give a very general idea of the proof. To obtain expressions (5.8), (5.9) we use the fact that the second low priority queue is the queue with batch arrivals. To obtain expressions of the mean batch size with and without the tagged job we apply the Generating function analysis using the method of the collective marks. The obtained expressions (5.7), (5.8) and (5.9) can be interpreted using the tagged-job and mean value approach. Let us consider class-1 jobs. The job of size x ≤ θ is served in the high priority queue with the LAS policy, so for it the mean conditional sojourn time is known, [Kle76a, Sec. 4.6],

5.4 Two Pareto classes

T1 (x) =

x+Wx,0 (1)

1−ρx

77 (1)

, x ≤ θ, where Wx,0 is the mean workload and ρx is the mean load in the system

for class-1 jobs with the service time distribution truncated at x. The mean workload Wx,0 and (1)

mean load ρx consider only jobs of the high priority queue of class-1. For jobs of size x > θ the expression (5.8) can be presented in the following way, T1 (x) = (1)

(2)

x + Wx,g(x) + T1 (x)(ρx + ρg(x) ), where x is time which is actually spent to serve the job ; Wx,g(x) is the mean workload which the tagged job nds in the system and which has to be processed before it ; (1)

(2)

T1 (x)(ρx + ρg(x) ) is the mean time to serve jobs which arrive to the system during the sojourn time of the tagged job and which have to be served before it. Let us provide more explanations. Let us nd the mean workload which the tagged class-1 job of size x nds in the system and which has to be done before it. According to the optimal policy description the jobs which have to be served before the tagged job are class-1 jobs of size less than x and class-2 jobs of size less then g(x). The mean workload has to be calculated for the class-1 job size distrubution truncated on x and class-2 job size distribution truncated at

g(x). We use (5.6) to calculate Wx,g(x) . Let us notice that for the class-2 job of size g(x) the mean workload which has to be done before it is the same. Now let us nd the mean workload which arrives during the sojourn time of the tagged job,

T1 (x). The mean load of jobs arriving to the system is : for the class-1 of size less than x is λ1 Xx1

(1)

(1)

1 = ρx and for the class-2 of size less than g(x) is λ2 Xg(x)

(2)

(2)

(1)

= ρg(x) . Then T1 (x)(ρx +

(2)

ρg(x) ) is the mean workload which arrives during the sojourn time of the tagged job of class-1 of size x. Now we use the similar analysis to give an interpretation to the expression of T2 (g(x)) for the class-2 job of size g(x). We can rewrite expression (5.9) in the following way T2 (g(x)) = (1)

(2)

g(x) + Wx,g(x) + T2 (g(x))(ρx + ρg(x) ). In the case of the tagged job of class-2 of size g(x) jobs which have to be served before the tagged job are jobs of class-1 of size less than x and jobs of class-2 of size less than g(x). Then in the previous expression g(x) is the time to serve the class-2 job of size g(x) ; Wx,g(x) is the mean workload in the system for the class-2 job of size g(x) which has to be served before it ; (1)

(2)

T2 (g(x))(ρx + ρg(x) ) is the mean work which arrives during the sojourn time T2 (x) and which has to be served before class-2 job of size g(x).

5.4.4 Properties of the optimal policy Property 5.1 When class-2 jobs arrive to the server they are not served immediately, but wait until the high priority queue is empty. The mean waiting time is the limit limg(x)→0 T2 (g(x)).

78

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

As limx→θ g(x) = 0, then lim T2 (g(x)) =

g(x)→0

Wθ,0 (1)

1 − ρθ

=

λ1 Xθ2

(1) (1)

2(1 − ρθ )2

.

Let us notice that lim T2 (g(x)) 6= T1 (θ) =

g(x)→0

θ + Wθ,0 (1)

1 − ρθ

.

Class-2 jobs wait in the system to be served in the low priority queue, the mean waiting time is limg(x)→0 T2 (g(x)). Class-1 jobs of size more then θ also wait in the system to be served in the low priority queue, the mean waiting time for them is T1 (θ). Property 5.1 shows that these two mean waiting times are not equal, so class-1 jobs and class-2 jobs wait dierent times to start to be served in the low priority queue.

Property 5.2 Let us consider the condition of no new arrival. According to the optimal policy structure in the low priority queue jobs are served according to the LAS policy with dierent rates, which depend on the number of jobs in each class and hazard rate functions. For the case when there are no new arrivals in the low priority queue we can calculate the rates with which class-1 jobs and class-2 jobs are served in the system at every moment of time. We consider that all class-1 jobs and all class-2 jobs already received the same amount of service. Let n1 and n2 be the number of jobs in class-1 and class-2 and let x1 and x2 be the attained services of every job in these classes. Then at any moment h1 (x1 ) = h2 (x2 ).

If the total capacity of the server is ∆, then let ∆1 and ∆2 be the capacities which each job of class-1 and class-2 receives. Then (5.10)

n1 ∆1 + n2 ∆2 = ∆.

Also h1 (x1 + ∆1 ) = h2 (x2 + ∆2 ).

As ∆ is very small (and so as well ∆1 and ∆2 ) according to the LAS policy, then we can approximate hi (x + ∆i ) = hi (x) + ∆i h0i (x),

Then from the previous equations we have ∆1 h01 (x1 ) = ∆2 h02 (x2 ).

i = 1, 2.

5.4 Two Pareto classes

79

Then using (5.10) we get h02 (x2 ) ∆1 = , 0 ∆ n1 h2 (x2 ) + n2 h01 (x1 ) ∆2 h01 (x1 ) = . ∆ n1 h02 (x2 ) + n2 h01 (x1 )

This result is true for any two distributions for which the hazard rates are decreasing and never cross. For the case of two Pareto distributions given by (5.5) we have the following : ∆1 c1 , = ∆ n1 c1 + n2 c2

∆2 c2 . = ∆ n1 c1 + n2 c2

So, for the case of two Pareto distributions the service rates of class-1 and class-2 jobs do not depend on the current jobs' attained services. Also service rates depend only on the parameters c1 and c2 of the Pareto job size distribution. This means that we always know that jobs of the class with the higher ci are served with higher priority than jobs of another class.

Property 5.3 As one can see from the optimal policy description, class-1 and class-2 jobs leave the system together if they have the same values of the hazard rate functions of their sizes and if they nd each other in the system. According to the denition of the g(x) function we can conclude that the class-1 job of size x and class-2 job of size g(x), if they nd each other in the system, leave the system together. But these jobs do not have the same conditional mean sojourn time, T1 (x) 6= T2 (g(x)). This follows from expressions (5.8) and (5.9).

5.4.5 Two Pareto classes with intersecting hazard rate functions Now let us consider the case when the hazard rate function cross, then a∗∗ = (c2 b1 −

c1 b2 )/(c1 − c2 ) ≥ 0, see Figure 5.3. As we considered c1 > c2 , then h1 (0) < h2 (0) and then class-2 jobs are served in the high priority queue until they receive θ∗ = (c2 b1 − c1 b2 )/c1 amount of service. Here θ∗ is such that h2 (θ∗ ) = h1 (0) and g(θ∗ ) = 0. The g(x) function crosses the

y = x function at point a∗∗ , see Figure 5.4. Before class-1 and class-2 jobs get service a∗∗ , class-2 jobs have to receive more service to have the same priority as class-1 jobs. After they reach a∗∗ situation chaged. Now class-1 jobs have to be more served to have the same priority as class-2 jobs. But as we show in Property 5.2, the rates with which we serve class-1 and class-2 jobs depend only on the c1 and c2 parameters and so class-1 jobs always have priority over class-2 jobs accoring to the service rates. We can rewrite the expressions of mean conditional sojourn times of Section 5.4, Theorem 5.2 in the following way

80

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

y

h2(0)

y = g(x) a∗∗ > 0

h1(0) y = g(x) a∗∗ < 0

h1(x)

θ∗

y=x

h2(x)

θ∗

a∗∗ g(x)

x

x

θ

a∗∗

x

Figure 5.3 Two Pareto extension classes, hazard

Figure 5.4 Two Pareto extension classes, g(x)

rates

function behavior

Corollary 5.1 In the two-class M/G/1 queue where the job size distributions are Pareto, given by (5.5) such that the hazard rate functions cross, and which is scheduled with the Gittins optimal policy, the mean conditional sojourn times for class-1 and class-2 jobs are T1 (x) =

x + Wx,g(x) (1)

(2)

1 − ρx − ρg(x)

,

x ≥ 0,

x + W0,x

, x ≤ θ∗ , (2) 1 − ρx g(x) + Wx,g(x) T2 (g(x)) = , x > θ∗ . (1) (2) 1 − ρx − ρg(x)

T2 (x) =

Proof. The proof follows from the previous discussion.

5.4.6 Numerical results We consider two classes with parameters presented in Table 5.1 and we calculate the mean sojourn time in the system numerically, using the expressions of the mean conditional sojourn time (5.8), (5.9) and (5.7). We provide the results for two dierent parameters sets, which we call V1 and V2 . It is known that in the Internet most of the trac is generated by the large les (80%), while most of the les are very small (90%). This phenomenon is referred to as mice-elephant eect. Also it is known that the le sizes are well presented by the heavy-tailed distributions like Pareto. Here class-1 jobs represent "mice" class and class-2 jobs "elephants". We consider

5.4 Two Pareto classes

81

0.5

20 FCFS

18

LAS

LAS

16

Gittins

0.4

Gittins

14

0.35

12

0.3

10

0.25

8

0.2

6

0.15

4

0.1

2

0.05

0 0.5

PS

0.45

PS

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0 0.5

0.95

Figure 5.5 Two Pareto classes, mean sojourn

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Figure 5.6 Two Pareto classes, mean sojourn

times with respect to the load ρ, V1

V

0.55

times with respect to the load ρ, V2

Table 5.1 Two Pareto classes, parameters c1 c2 m1 m2 ρ1 ρ2

ρ

V1

25.0

2.12

0.04

0.89

0.1

0.4..0.85

0.5..0.95

V2

10.0

1.25

0.05

1.35

0.25

0.25..0.74

0.5..0.99

that the load of the small les is xed and nd the mean sojourn time in the system according to the dierent values of the "elephant" class arrival rate. We compare the mean sojourn time for the Gittins policy, PS, FCFS and LAS policies. These policies can be applied either in the Internet routers or in the Web service. The expected sojourn times for these policies are, see [Kle76a],

T

PS

T

F CF S

=

ρ/λ , 1−ρ = ρ/λ + W∞,∞ ,

here W∞,∞ means the total mean unnished work in the system. Z ∞ 1 LAS LAS T = T (x)(λ1 f1 (x) + λ2 f2 (x))dx, λ1 + λ2 0 where

T

LAS

(x) =

x + Wx,x (1)

(2)

1 − ρx − ρx

.

82

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

The mean sojourn times for the parameters sets V1 and V2 are presented in Figures 5.5,5.6. For the results of V2 we do not plot the mean sojourn time for the FCFS policy as class-2 has an innite second moment. The relative gains in mean sojourn time between the Gittins and LAS and Gittins and PS policies are the following. For the set of parameters V1 :max T

max T

PS

Gitt

−T PS T

= 0.78 and max T

0.98 and max T

LAS

T

−T

Gitt

LAS

LAS

T

−T

Gitt

LAS

F CF S

T

−T

Gitt

= 0.99,

F CF S

= 0.45. For the set of parameters V2 : max T

PS

Gitt

−T PS T

=

= 0.39. The maximal gain is achieved when the system is loaded by

around 90%. We note that the PS policy produces much worse results than LAS and Gittins policies.

5.4.7 Simulation results

class-1 a1

A1 µ

S

D

class-2 a2

A2

Figure 5.7 NS-2 simulation scheme. We implement Gittins policy algorithm for the case of two Pareto distributed classes in NS-2 simulator. The algorithm is implemented in the router queue. In the router we keep the trace of the attained service (number of the transmitted packets) for every connection in the system. We use timer to detect the moment when there are no more packets from a connection in the queue. Then we stop to keep the trace of the attained service for this connection. It is possible to select the packet with the minimal sequence number of the connections which has to be served instead of selecting the rst packet in the queue. In the current simulation this parameter does not play a big role according to the selected model scheme and parameters. (There are no drops in the system, so there are no retransmitted packets. Then all the packets

5.4 Two Pareto classes

83

arrive in the same order as they were sent.) The algorithm which is used for the simulations is as follows :

Algorithm on packet dequeue select the connection f with the max hi (af ), where

af is the ow's attained service select the rst packet pf of the connection f in the queue dequeue selected packet pf set af = af + 1 To compare Gittins policy with the LAS policy we also implemented LAS algorithm in the router queue. According to the LAS discipline the packet to dequeue is the packet from the connection with the least attained service. The simulation topology scheme is given in Figure 5.7. Jobs arrive to the bottleneck router in two classes, which represent mice and elephants in the network. Jobs are generated by FTP sources which are connected to TCP senders. File size distributions are Pareto, Fi = 1−bci i /(x+

bi )ci , i = 1, 2. Jobs arrive according to Poisson arrivals with rates λ1 and λ2 . We consider that all connections have the same propagation delays. The bottleneck link capacity is µ = 100 Mbit/s. All the connections have a Maximum Segment Size (MSS) of

540 B. The simulation run time is 2000 seconds. We provide two dierent versions of parameters selection, which we call Vs1 and Vs2 . In Vs1 rst class takes 25% of the total bottleneck capacity and in Vs2 it takes 50%. Both scenarios correspond to the case when the hazard rate function do not cross, see Subsection 5.4. The parameters we used are given in Table 5.2.

Table 5.2 Two Pareto classes, simulation parameters Ver. c1 c2 m1 m2 ρ1 ρ2 ρ Vs1

10.0

1.25

0.5

6.8

0.25

0.50

0.75

Vs2

10.0

2.25

0.5

4.5

0.50

0.37

0.87

The results are given in Table 5.3. We provide results for the NS-2 simulations and the values of the mean sojourn times provided by the analytical model with the same parameters. We calculate the related gain of the Gittins policy in comparison with DropTail and LAS policies,

g1 =

T

DT

−T DT T

Gitt

and g2 =

T

LAS

T

−T

Gitt

LAS

We found that with the NS-2 simulations the gain of the Gittins policy in comparison with the LAS policy is not so signicant when the small jobs do not take a big part of the system load. As one can see in Vs2 when the class-1 load is 50% the related gain of the Gittins policy

84

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

Ver.

Table 5.3 Mean sojourn times DT LAS Gitt T T T g1

g2

Vs1 NS-2

18.72

2.10

2.08

88.89%

0.95%

Vs1 theory

PS : 4.71

1.58

1.01

78.56%

36.08%

Vs2 NS-2

6.23

2.03

1.83

70.63%

9.85%

Vs2 theory

PS : 6.46

3.25

2.19

66.10%

32.62%

in comparison with LAS policy is 10%. In both versions the relative gain for the corresponding analytical system is much higher and reaches up to 36%. We explain this results with the phenomena related to the TCP working scheme. Also we explain the low gain in Vs1 by the fact that the load in the system is not high.

5.4.8 Multiple Pareto classes class-1

h1(0)

h2(0)

1-priority queue θ1,2

LAS

class-2 2-priority queue Jobs

θ1,2

Gittins class-1

θ1,3

h1(x)

θ2,3

h2(x) h3(x)

class-2

class-3

3-priority queue

θ1,2 g1,3(y) θ1,3 g1,2(y)

y

x

Figure 5.9 N Pareto classes, policy scheme

Figure 5.8 N Pareto classes, hazard rates

We consider a multi-class single server M/G/1 queue. Jobs arrive to the system in N classes. Jobs of i-th class, i = 1, . . . , N arrive according to Poisson arrival processes with rates λi . Jobs size distributions are Pareto, namely

Fi (x) = 1 −

1 , (x + 1)ci

i = 1, . . . , N.

Then, the hazard rates

hi (x) =

ci , (x + 1)

i = 1, . . . , N,

5.4 Two Pareto classes

85

never cross. Without loss of generality, let us consider that c1 > c2 > . . . > cN . Let us dene the values of θi,j and gi,j (x), i, j = 1, . . . , N in the following way

hi (θi,j ) = hj (0), hi (x) = hj (gi,j (x)). Then we get

gi,j (x) =

cj (x + 1) − 1, ci

θi,j =

ci − 1. cj

Let us notice that θk,i < θk,i+1 and θi,k > θi+1,k , k = 1, . . . , N , i = 1, . . . , N − 1, i 6= k , i 6= k + 1, see Figure 5.8. Let us denote that θi,i = 0 for i = 1, . . . , N . The scheme of the optimal policy is given on Figure 5.9.

Optimal policy. There are N queues in the system. Class-1 jobs arrive to the system and go to the rstpriority queue-1. There they are served with the LAS policy until they get θ1,2 of service. Then they are moved to the queue-2, which is served only when the queue-1 is empty. In the queue-2 jobs of class-1 are served together with jobs of class-2. Every moment the service is given to the job with the highest hi (a), i = 1, 2, where a is a jobs attained service. When jobs of class-1 attain service θ1,3 they are moved to the queue-3. When jobs of class-2 attain service θ2,3 they are also moved to the queue-3. In queue-3 the jobs of class-1, class-2 and class-3 are served together. Every moment of time the service is given to the job with the highest hi (a), i = 1, 2, 3, where a is a jobs attained service. And so on. To nd the expressions for the mean conditional sojourn times in the system we use the analysis which we used in interpretation of the mean conditional sojourn times expressions in the case of two class system, see Section 5.4. The mean conditional sojourn time for the tagged job of class-k consists of the time to serve the tagged job when the system is empty, the mean workload in the system which has to be served before the tagged job and the mean workload which arrives during the sojourn time of the tagged job and has to be served before it. Let the tagged job be from class-1 of size x. Jobs which have the same priority in the system and which have to be served before the tagged job are : class-1 jobs of size less than x, class-i jobs of size less than g1,i (x). We denote Xyn

(i)

(i)

the n-th moment and ρy the utilization factor for the distribution Fi (x)

of the class-i, i = 1, . . . , N truncated at y . The mean workload in the system which has to be served before the tagged job is then found with Pollaczek-Khinchin formula and equals to PN 2 i=1 λi Xg1,i (x) Wx,g1,2 (x),...,g1,N (x) = . P 2(1 − N i=1 ρg1,i (x) ) Then we formulate the theorem.

86

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

Theorem 5.3 For class-1 jobs of size x such as θ1,p < x < θ1,p+1 , p = 1, . . . , N and corresponding class-k jobs with sizes g1,k (x), k = 2, . . . , p the mean conditional sojourn times are given by x + W (x, g1,2 (x), . . . , g1,p (x)) , 1 − ρ1 (x) − ρ2 (g1,2 (x)) − . . . − ρp (g1,p (x)) g1,k (x) + W (x, g1,2 (x), . . . , g1,p (x)) Tk (g1,k (x)) = . 1 − ρ1 (x) − ρ2 (g1,2 (x)) − . . . − ρp (g1,p (x)) T1 (x) =

Here we consider that θi,N +1 = ∞, i = 1, . . . , N .

Proof. Similar to the proof of Theorem 5.2.

5.5 Hyper-exponential and exponential classes class-2

high-priority queue LAS priority a∗

Jobs

class-1

second-priority queue LAS

h2(x1) h1

µ1

h2(x)

h2(x2) x1 < a∗ a∗

third-priority queue LAS a∗

x2 > a∗

x

Figure 5.10 Exponential and HE classes, hazard

Figure 5.11 Exponential and HE classes, policy

rates.

description.

We consider a two class M/G/1 queue. Jobs of each class arrive with Poisson arrival process with rates λ1 and λ2 . The job size distribution of class-1 is exponential with mean 1/µ1 , and hyper-exponential with two phases for class-2 with the mean (µ3 p + (1 − p)µ2 )/(µ2 µ3 ). Namely,

F1 (x) = 1−e−µ1 x , F2 (x) = 1 −pe−µ2 x −(1 − p)e−µ3 x . Note that the hazard rates are

h1 (x) = µ1 , h2 (x) =

pµ2 e−µ2 x + (1 − p)µ3 e−µ3 x , x ≥ 0. pe−µ2 x + (1 − p)e−µ3 x

(5.11)

5.5 Hyper-exponential and exponential classes

87

The hazard rate function of class-1 is a constant and equals to h1 = µ1 . The hazard rate function h2 (x) of the class-2 is decreasing in x. As both hazard rate functions are non-increasing the optimal policy which minimizes the mean sojourn time is Gittins policy based on the value of the hazard function, which gives service to jobs with the maximal hazard rate. For the selected job size distributions the hazard rate functions behave in dierent ways depending on parameters µ1 , µ2 , µ3 and p. The possible behaviors of the hazard rate functions determine the optimal policy in the system. If the hazard rate functions never cross, the hazard rate of class-1 is higher than the hazard rate of class-2, then class-1 jobs are served with priority to class-2 jobs. This happens when h1 > h2 (x), x > 0. As h2 (x) is decreasing, then this happens when µ1 > h2 (0). Let us consider that µ2 > µ3 , then as h2 (0) = pµ2 + (1 − p)µ3 and µ1 > h2 (0) if µ1 > µ2 > µ3 . For this case it is known that the optimal policy is a strict priority policy, which serves class-1 jobs with the strict priority with respect to class-2 jobs. From our discussion it follows that this policy is optimal even if µ2 > µ1 > µ3 , but still µ1 > pµ2 + (1 − p)µ3 . Let us consider the case when µ2 > µ1 > µ3 and µ1 < pµ2 + (1 − p)µ3 . Then it exists the unique point of intersection of h2 (x) and h1 . Let us denote by a∗ the point of this intersection. The value of a∗ is the solution of

pµ2 e−µ2 x +(1−p)µ3 e−µ3 x pe−µ2 x +(1−p)e−µ3 x

1 ln a = µ2 − µ3 ∗

µ

= µ1 . Solving this equation, we get that

p µ2 − µ1 1 − p µ1 − µ3

¶ .

The hazard rate function scheme is given on Figure 5.10. Then, the optimal policy is the following.

5.5.1 Optimal policy. There are three queues in the system, which are served with the strict priority between them. The second priority queue is served only when the rst priority queue is empty and the third priority queue is served only when the rst and second priority queues are empty. Class-2 jobs that arrive to the system are served in the rst priority queue with the LAS policy until they get

a∗ amount of service. After they get a∗ amount of service they are moved to the third priority queue, where they are served according to the LAS policy. Class-1 jobs arrive to the system and go to the second priority queue, where they are served with LAS policy. Since h1 (x) = µ1 , class-

1 jobs can be served with any non-anticipating scheduling policy. The scheme of the optimal policy is given on Figure 5.11. According to this optimal policy we nd the expressions of the expected sojourn times for the class-1 and class-2 jobs.

88

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

5.5.2 Expected sojourn times Let us recall that the mean workload in the system for class-1 jobs of size less than x and class-2 jobs of size less than y is Wx,y and is given by (5.6). We prove the following Theorem.

Theorem 5.4 The mean conditional sojourn times in the M/G/1 queue with job size distribution given by (5.11) under Gittins optimal policy described in Subsection 5.5.1 are given by x + Wx,a∗

, x ≥ 0, (1) (2) 1 − ρx − ρa∗ x + W0,x T2 (x) = , x ≤ a∗ , (2) 1 − ρx x + W∞,x T2 (x) = , x > a∗ . (1) (2) 1 − ρ∞ − ρx T1 (x) =

(5.12) (5.13) (5.14)

Proof. To nd expressions of the mean conditional sojourn times we use the mean-value analysis and tagged job approach. The mean conditional sojourn time for the class-1 job of size x consists of the following elements. x, time needed to serve the job itself. mean workload in the system which has to be served before the tagged job. mean time to serve jobs which arrive to the system during the sojourn time of the current job and which have to be served before the tagged job. When the tagged job is a class-1 job of size x jobs which have to be served before it are all class-1 jobs of size x and all class-2 jobs of size less than a∗ . Then the mean workload which the tagged job nds in the system and which has to be served before it is Wx,a∗ . The mean work which arrives to the system during the sojourn time of the tagged job T1 (x) and have to be done before it takes into account only class-1 jobs of size less than x and class-2 jobs of size less (1)

(2)

than a∗ . So, it equals to T1 (x)(ρx + ρa∗ ). For the tagged job of class-2 of size x ≤ a∗ the jobs which have to be served before it are class-2 jobs of size less than x. Then the mean workload which the tagged job nds in the system and which has to be served before it is W0,x and the mean time to serve jobs which arrive to (2)

the system during T 2 (x) is T2 (x)ρx . For the class-2 job of size x > a∗ the jobs which have to be served before it are all class-1 jobs and class-2 jobs of size less than x. Then the mean workload which has to be served before the tagged job is W∞,x and the mean time spend to serve jobs which arrive during the sojourn (1)

(2)

time of the current job is T2 (x)(ρ∞ + ρx ).

5.5 Hyper-exponential and exponential classes

89

Summarizing the results of the previous discussion we get (2)

T1 (x) = x + Wx,a∗ + T1 (x)(ρ(1) x + ρa∗ ) x ≥ 0, T2 (x) = y + W0,x + T2 (x)ρ(2) x ,

x ≤ a∗ ,

(2) ∗ T2 (x) = y + W∞,x + T2 (x)(ρ(1) ∞ + ρx ) x > a .

from here we get the proof of the Theorem.

5.5.3 Numerical results Let us calculate numerically for some examples the mean sojourn time in the system when the Gittins policy is used. We consider two classes with the parameters given in Table 5.4. Also here p = 0.1 and the threshold value is a∗ = 7.16. We compare the obtained results with the mean sojourn times when the system is scheduled with FCFS, PS and LAS policies, the results are given on Figure 5.12. 25 FCFS PS LAS 20

Gittins

15

10

5

0 0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

Figure 5.12 Exponential and HE classes, mean sojourn times with respect to the load ρ

Table 5.4 Exponential and HE classes, simulation parameters µ1 µ2 µ3 m1 m2 ρ1 ρ2 ρ 0.6

1.0

0.5

1.6

1.1

0.1

0.4..0.85

0.5..0.95

90

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

5.5.4 Pareto and exponential classes We can apply the same analysis for the case when class-1 job size distribution is exponential and class-2 job size distribution is Pareto. Let us consider the case when the hazard rate functions of class-1 and class-2 cross at one point. Let F1 (x) = 1 − e−µ1 x and F2 (x) = 1 − bc22 /(x + b2 )c2 . Then h1 = µ1 and h2 (x) = c2 /(x + b2 ). The crossing point is a∗ = c2 /µ1 − b2 . When a∗ ≤ 0 the hazard rate functions do not cross and then the optimal policy is to give strict priority to class-1 jobs. If a∗ > 0 then the hazard rate functions cross at one point and the optimal policy is the same as in the previous section. Then the expressions of the mean conditional sojourn timed of class-1 and class-2 are also (5.12), (5.13) and (5.14).

5.6 Conclusions In [Git89], Gittins considered an M/G/1 queue and proved that the so-called Gittins index rule minimizes the mean delay. The Gittins rule determines, depending on the jobs attained service, which job should be served next. Gittins derived this result as a by-product of his groundbreaking results on the multi-armed bandit problem. Gittins' results on the multi-armed bandit problem have had a profound impact and it is extremely highly cited. However, Gittins work in the M/G/1 context has not received much attention. In [AA07], the authors showed that Gittins policy could be used to characterize the optimal scheduling policy when the hazard rate of the service time distribution is not monotone. In the current work we use the Gittins policy to characterize the optimal scheduling discipline in a multi-class queue. Our results show that, even though all service times have a decreasing hazard rate, the optimal policy can signicantly dier from LAS, which is known to be optimal in the single-server case. We demonstrate that in particular cases PS has much worse performance than Gittins policy. Using NS-2 simulator we implemented the Gittins optimal policy in the router queue and provided simulations for several particular schemes. With the simulation results we found that the Gittins policy can achieve 10% gain in comparison with the LAS policy and provides much better performance than the DropTail policy. In future research we may consider other types of service time distributions. The applicability of our results in real systems like the Internet should also be more carefully evaluated. We also would like to investigate the conditions under which the Gittins policy gives signicantly better performance than LAS policy.

5.7 Appendix : Proof of Theorem 2

91

5.7 Appendix : Proof of Theorem 2 We prove that the mean conditional sojourn times in the system described in Section 5.4 scheduled with the optimal Gittins policy given in Subsection 5.4.2 are given with (5.7), (5.8) and (5.9). The class-1 jobs of size x ≤ θ are served in the high priority queue with LAS policy, so the expression for the mean conditional sojourn time for this case is known, see [Kle76a, sec. 4.6], as is given by (5.7). Let us consider class-1 jobs with sizes x > θ and class-2 jobs, which are served in the low priority queue. There is a strict priority between the queues and the low priority queue is served only when the high priority queue is empty. Then the low priority queue is a queue with batch arrivals. To nd the expressions of the mean conditional sojourn times in the system we use the analysis similar to the one of Kleinrock for Multi Level Processor Sharing queue in [Kle76a, sec. 4.7]. In the following analysis we consider only the class-1 jobs of size less than x and class-2 jobs of size less than g(x). So, we consider that the class-1 job size distribution is truncated at x and job size distribution of class-2 is truncated at g(x). We formulate the following Lemma.

Lemma 5.1 The mean conditional sojourn times for class-1 job of size x > θ and for class-2 job of size g(x) > 0 equal to T1 (x) =

θ + Wθ,0 1−

T2 (g(x)) =

(1) ρθ

+

Wθ,0 (1)

1 − ρθ

α1 (x − θ, g(x))

+

, (1) 1 − ρθ α2 (x − θ, g(x)) (1)

1 − ρθ

(5.15)

,

(5.16)

where α1 (x − θ, g(x)) and α2 (x − θ, g(x)) are the times spent in the low priority queue by class-1 and class-2 jobs respectively and equal to x − θ + A1 (x) + Wb , 1 − ρb g(x) + A2 (g(x)) + Wb α2 (x − θ, g(x)) = , 1 − ρb α1 (x − θ, g(x)) =

where Wb is the mean workload in the low priority queue which the tagged batch sees when arrives to the low priority queue, ρb is the mean load in the low priority queue and Ai (x), i = 1, 2 are the mean works which arrive to the low priority queue with the tagged job in the batch.

Proof. Let us consider that the tagged job is from class-1 and has a size x > θ. The time it spends in the system consists of the mean time it spends in the high priority queue. This time

92 is

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue θ+Wθ,0 1−ρθ

as it has to be served only with class-1 jobs until it gets θ amount of service. After

the tagged job is moved to the low priority queue after waiting while the high priority queue becomes empty. The time α1 (x−θ) is the time spent by the tagged job in the low priority queue. This time consists of the time spent to serve the job itself, x − θ, of the mean workload in the low priority queue which the tagged job nds, Wb , of the mean work which arrives in the batch with the tagged job, A1 (x) and of the mean work which arrives during the sojourn time of the tagged job, α1 (x − θ)ρb . We use the same analysis for the mean conditional sojourn time of the class-2 job of size

g(x). Now let us nd the expressions for the Wb , ρb , A1 (x) and A2 (x). Let us dene the truncated n distribution F1,θ,x (y) = F1 (y), θ < y < x and F1,θ,x (y) = 0, y < θ, y > x. Let Xθ,x

moment and

(i) ρθ,x ,

(i)

be the n-th

i = 1, 2 be the utilization factor for this truncated distribution. We use this

notation because jobs of class-1 which nd themselves in a batch are already served until θ. Let Ni be the random variable which denotes the number of jobs in a batch of class-i, i = 1, 2. (1)

(2)

We dene Xθ,x as the random variable which denotes the size of class-1 job in a batch. Let Xg(x) be the random variable which corresponds to the size of the class-2 job in a batch. Then

Yb =

N1 X

(1) Xi,θ,x

i=1

+

N2 X

(2)

Xi,g(x) ,

i=1

is the random variable which denotes the size of the batch. Let us denote as λb the batch arrival rate. We know that λb = λ1 + λ2 . According to the previous notations we can write

ρb = λb E[Yb ], here E[Yb ] is the mean work that a batch brings and by Pollaczek-Khinchin

Wb =

λb E[Yb2 ] . 2(1 − ρb )

Let us note that Wb does not depend from which class the tagged job comes. As we know the (1)

(2)

rst and the second moments of Xθ,x , Xg(x) , to nd ρb and Wb we need to know the rst and the second moments of Ni , i = 1, 2. To nd this values we we use the method of the Generating functions, which is described in the following section.

5.7.1 Generating function calculation We propose a two dimension generating function G(z1 , z2 ), which we obtain using collective marks method. The method of the collective marks is described in [Kle76b, Ch. 7].

5.7 Appendix : Proof of Theorem 2

93

Denition 5.5 Let us mark jobs in a batch in the following way. We mark a job of class-1 with a probability 1 − z1 , then z1 is a probability that a job of class-1 is not marked. The same is dened for jobs of class-2 as z2 . Let pn1 ,n2 be the probability that n1 class-1 and n2 class-2 jobs arrive in the batch. Then XX G(z1 , z2 ) = z1n1 z2n2 pn1 ,n2 n1

n2

is a generation function and it gives a probability that there are no marked jobs in the batch. Let us dene as a "starter" or S a tagged job. Let us distinguish the cases when the starter

S belongs to class-1 or class-2 and denote by G1 (z1 , z2 ) and G2 (z1 , z2 ) the probabilities that there are no marked jobs in the batch if the starter is from the class-1 and class-2. When the

S ∈ class-1, we consider two cases depending on the size of the starter (S ≤, > θ). Then G(z1 , z2 ) =

λ2 λ1 ([G1 (z1 , z2 ), S ≤ θ] + [G1 (z1 , z2 ), S > θ]) + G2 (z1 , z2 ). λb λb

Lemma 5.2 The Generating function equals to λ1 ( λb

G(z1 , z2 ) =

Z

θ

0

e−λ1 x(1−G1 (z1 ,z2 ))−λ2 x(1−z2 ) dF1 (x) +

+ z1 e−λ1 θ(1−G1 (z1 ,z2 ))−λ2 θ(1−z2 ) F 1 (θ)) +

λ2 z2 . λb

(5.17)

Proof. Let us calculate G1 (z1 , z2 ). When the class-1 job arrives to the system it creates the busy period. Still this job does not receive θ amount of service the low priority queue is not served. So, jobs which arrive to the low priority queue and jobs which are already in the low priority queue are waiting and so they create a batch. The probability that there are no marked job in this batch is G1 (z1 , z2 ). Let the class-1 job of size x arrives to the system. Let x ≤ θ. The probability that k1 class-1 jobs arrive in the period (0, x) is P1 (x) = e−λ1 x (λ1 x)k1 /k1 !. The probability that all the batches generated by this arrived k1 jobs of class-1 is G1 (z1 , z2 )k1 , because each of them generates the batch which does not have marked jobs with probability G1 (z1 , z2 ). During time

(0, x) the probability that k2 class-2 jobs arrive to the system is P2 (x) = e−λ2 x (λ2 x)k2 /k2 !. The probability that this jobs are not marked is not included in G1 (z1 , z2 ) and equals to z2k2 . Then we summarize on k1 and k2 , integrate on x in (0, θ) with dF1 (x), as only the class-1 jobs generate busy periods. We get that the probability that there are no marked jobs in the batch is   Z θ X ∞  P1 (x)G1 (z1 , z2 )k1 P2 (x)z2k1  dF1 (x) = [G1 (z1 , z2 ), S ≤ θ] = 0

Z = 0

k1 =0 θ

e−λ1 x(1−G1 (z1 ,Z2 ))−λ2 x(1−z2 ) dF1 (x).

94

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

Let class-1 job of size x > θ arrives to the system. The class-1 job is rst served in the high priority queue until it gets θ of service. Then it is moved to the low priority queue. The probability that k1 class-1 jobs arrive in the period (0, θ) is P1 (θ) = e−λ1 θ (λ1 θ)k1 /k1 !. The probability that there are no marked jobs in all the batches generated by this arrived k1 class-1 jobs is G1 (z)k1 . The probability that k2 class-2 jobs arrive to the system in the period (0, θ) is

P2 (θ) = e−λ2 θ (λ2 θ)k2 /k2 !. The probability that all this jobs are not marked is z k2 . We have to take into account the "starter" itself, as it has the size more than θ and it comes in the batch. The probability that the starter is not marked is z1 . Then we summarize on k1 and k2 , integrate on x on (θ, ∞) with dF1 (x), as only the class-2 jobs generate busy periods. We get

Z [G1 (z1 , z2 ), S > θ] =

 ∞

θ

∞ X



 P1 (θ)G1 (z1 , z2 )k1 z1 P2 (θ)z2k1  dF1 (x) =

k1 =0

= z1 e−λ1 θ(1−G1 (z1 ,z2 ))−λ2 θ(1−z2 ) F 1 (θ). Let us nd G2 (z1 , z2 ). When a job of the second class arrives to the system it generates the batch of size one, then the probability that jobs of this batch are not marked is z2 . Then

G2 (z1 , z2 ) = z2 . Z [G2 (z1 , z2 )] =

∞

0

z2 dF2 (x) = z2 .

Finally

G(z1 , z2 ) =

λ1 λ2 G1 (z1 , z2 ) + G2 (z1 , z2 ), λb λb

and we get (5.17). Let us notice that G(1, 1) = 1. Now we can calculate E[N1 ], E[N2 ] and so ρb and Wb . After some mathematical calculations we get the following result.

Lemma 5.3 (1)

ρb = 1 −

(2)

1 − ρx − ρg(x) (1)

1 − ρθ

, (1)

Wb = Wx,g(x) − Wθ (1 + ρb ) − θ

(1)

ρx − ρθ

(1)

1 − ρθ

.

5.7 Appendix : Proof of Theorem 2

95

Proof. We use the following equations. For i = 1, 2 E[Ni ] =

∂G(z1 , z2 ) |1,1 , ∂zi

E[Ni (Ni − 1)] = E[Ni2 ] − E[Ni ] = E[N1 N2 ] = Using bi = Lemma.

E[Ni2 ] E[Ni ]

∂ 2 G(z1 , z2 ) |1,1 , ∂zi2

∂ 2 G(z1 , z2 ) |1,1 . ∂z1 ∂z2

− 1 after some mathematical calculations we obtain the result of the current

Now let us nd expressions for A1 (x) and A2 (x).

Lemma 5.4 The mean workload which comes with the tagged job of class-1 of size x in the batch and has to be served before it equals to (2)

A1 (x) = 2(Wθ + θ)ρb − θ

ρg(x) (1)

1 − ρθ

.

Proof. The term A1 (x) is the work that arrives with the tagged job of class-1 of size x and that gets served before its departure. Since the tagged job arrives from class-1 only when the batch is started by a class-1 job, the calculations now will depend on G1 (z1 , z2 ). We denote b1|1 and

b2|1 the mean number of jobs of class-1 and class-2 which arrive in the batch with the tagged job of class-1 when the batch is initiated by a class-1 job. Then (1)

(2)

(1)

A1 (x) = b1|1 E[Xθ,x ] + b2|1 E[Xg(x) ] − E[Xθ,x ]. Here

b1|1 =

X

n1

n1

2 ] E[N1|1 n1 P (n1 ) = , E[N1|1 ] E[N1|1 ]

where N1|1 is the random variable which corresponds to the number of jobs of class-1 in the batch when the batch is initiated µ by the class¶ 1 job. So the number of class-1 jobs that arrive in addition to the tagged job is

2 ] E[N1|1 E[N1|1 ]

− 1 . Note that since we condition on the fact that the

starter is a class-1 job, N1|1 is now calculated from G1 (z1 , z2 ) so :

∂G1 (z1 , z2 ) |1,1 , ∂z1 ∂ 2 G1 (z1 , z2 ) E[N1|1 (N1|1 − 1)] = |1,1 . ∂z1 ∂z1 E[N1|1 ] =

96

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

Then we can nd (b1|1 − 1). Now we need to calculate b2|1 , that is, the mean number of class-2 jobs that the tagged job of class-1 job see. We have that from the Generating function G1 (z1 , z2 ) by conditioning on the number of class-1 jobs : XX XX G1 (z1 , z2 ) = z1n1 z2n2 pn1 ,n2 = z1n1 z2n2 pn2 |n1 pn1 , n1

n2

n1

∂2G

1 (z1 , z2 ) |1,1 = E[N1 ] ∂z1 ∂z2

XX n1

n2

n2 pn2 |n1

n2

n1 pn1 = E[N1 ]b2|1 . E[N1 ]

Then we can calculate b2|1

b2|1 =

∂ 2 G1 (z1 , z2 ) 1 |(1,1) . E[N1|1 ] ∂z1 ∂z2

Finally we nd the expression for A1 (x).

Lemma 5.5 The mean workload which comes with the tagged job of class-2 of size g(x) in the batch and has to be served before it equals to (2)

A2 (g(x)) = 2(Wθ + θ)ρb − θ

ρg(x) (1)

1 − ρθ

− θρb .

Proof. The term A2 (g(x)) is the work that arrives with the tagged job of size g(x) of class-2 and that gets served before its departure. When the tagged job arrives from class-2 the batch can be started by a class-1 or by a class-2 job, so the calculations depend on G(z1 , z2 ). We denote b1|2 and b2|2 the mean number of jobs of class-1 and class-2 which arrive in the batch with the tagged job of class-2. Then (1)

(2)

(2)

A2 (g(x)) = b1|2 E[Xθ,x ] + b2|2 E[Xg(x) ] − E[Xg(x) ] = (1)

(2)

= b1|2 E[Xθ,x ] + (b2|2 − 1)E[Xg(x) ]. As the tagged job is from class-2, then b2|2 = b2 . We need to nd the value of b1|2 . We use the fact that jobs of class-1 and class-2 arrive independently from each other. XX XX G(z1 , z2 ) = z1n1 z2n2 pn1 ,n2 = z1n1 z2n2 pn1 |n2 pn2 n1

n2

n1

n2

XX n2 pn2 ∂ 2 G(z1 , z2 ) |1,1 = E[N2 ] n1 pn1 |n2 = E[N2 ]b1|2 . ∂z1 ∂z2 E[N2 ] n1

n2

Then

b1|2 =

1 ∂ 2 G(z1 , z2 ) |1,1 . E[N2 ] ∂z1 ∂z2

From here we get the expression for A2 (g(x)). Now we can prove the result of Theorem 5.2.

5.7 Appendix : Proof of Theorem 2

Lemma 5.6 Expressions (5.15), (5.16) and (5.8), (5.9) are equal. Proof. After simplication of the expressions (5.15), (5.16) we get equations (5.8), (5.9).

97

98

Chapter 5 : Optimal policy for multi-class scheduling in a single server queue

Chapter 6 Improving TCP Fairness with the MarkMax Policy

6.1 Summary We introduce MarkMax a new ow-aware AQM algorithm for Additive Increase Multiplicative Decreases protocols (like TCP). MarkMax sends a congestion signal to a selected connection whenever the total backlog reaches a given threshold. The selection mechanism is based on the state of large ows. Using a uid model we derive some bounds that can be used to analyze the behavior of MarkMax and we compute the per-ow backlog. We conclude the chapter with simulation results, using NS-2, comparing MarkMax with DropTail and showing how MarkMax improves both fairness and link utilization when connections have signicantly dierent RTTs. The results of this work are published in [OBA08].

99

100

Chapter 6 : Improving TCP Fairness with the MarkMax Policy

6.2 Introduction It has been known for a long time that if TCP connections with dierent RTTs share a bottleneck link, TCP connections with smaller RTTs take a larger share of the bandwidth [Flo91, Man90]. In [LM97] the authors have observed that under synchronization assumptions a TCP connection obtains a share of the link capacity proportional to RTT α with 1 < α < 2. In [Bro00] the author has used a uid approximation to derive a more rigorous model for the case when connections have dierent RTTs. Then, in [ABL+ 00] it was observed that in the case of not complete synchronization and, especially when RED [FJ93] is used, the distribution of the link capacity is more fair. In particular, the experiments of [ABL+ 00] have suggested that a TCP connection obtains a share of the link capacity proportional to RTT 0.85 . This was later justied by an analytical model for the case of two competing TCP connections [AJN02]. In [AART06] the authors have used a uid model to analyze what happens if only one connection reduces its sending rate when multiple connections share the same bottleneck link but they have ignored backlog dynamics : whenever the total arrival rate at the bottleneck link is equal to its capacity one of the connection reduces its sending rate, so that the backlog is always zero. In [SS07] the authors have proposed an MLC(l) AQM algorithm to approach maxmin fairness. In particular, for l = 1 the MLC(l) algorithm performs similar to RED and for l = 2 the MLC(l) algorithm performs similar to CHOKe [PPP00]. The authors of [SS07] argue that by choosing a signicantly large parameter l one can be arbitrary close to the maxmin fairness. The present work indicates that this does not appear to be the case. Building upon [AFG06, AART06] and [Sta07] we propose a new ow-aware active queue management packet dropping scheme (MarkMax). The main idea behind MarkMax is to identify which connection should reduce its sending rate instead of which packets should be dropped. To improve fairness we propose to cut ows with the largest sending rate during the congestion moments. Several AQM schemes previously proposed do not discriminate between ows. Typically they drop every incoming packet with a certain probability that is a function of the state of the queue. When AQM was rst introduced in the 1990s it was infeasible to classify incoming packets in real time for high speed links but with technological advances this is now possible. Furthermore, to reduce the numbers of ows that need to be tracked, it is possible to concentrate on the larger ows using the heavy-hitter counters of [Sta07] to identify large ows. Then, according to [AABN04] we suggest to treat short ows with priority and mark large ows which have the largest backlog during the congestion moments. We also suggest to use ECN [RFB01] to minimize the number of dropped packets. The chapter is organized as follows : In the next Section 6.3 we specify the algorithm. Then, in Section 6.4 we perform its theoretical analysis. We conclude the chapter with a section on

6.3 The MarkMax algorithm

101

NS-2 simulations illustrating the performance of MarkMax.

6.3 The MarkMax algorithm The algorithm has three parameters : the thresholds θ, θl , θh , selected in such a way that

θl < θ < θh . The threshold θ acts as a trigger, whenever the queue size is above this value one connection is cut. We propose two dierent ways of selecting which connection to cut, as described later on. The other two thresholds are needed because we are dealing with a packet based system with non-zero propagation and queueing delays. Let q be the queue size and flag be a Boolean variable initialized to true. The following algorithm is executed every time a new packet arrives : enqueue packet

if q ≤ θl or q ≥ θh then flag ← true if q ≥ θ and flag =true then a. select connection with MarkMax-B (full backlog based MarkMax) or MarkMax-T (backlog tail based MarkMax) b. set the ECN ag in the rst packet of the selected connection from the head of the queue c. flag ← false The θh and θl thresholds are used do determine whether a congestion signal should be sent or not, if q ≥ θ. After a congestion signal is sent the algorithm will not send another one as long as the queue remains in the interval [θl , θh ]. The θh threshold acts as a safety mechanism covering the cases when a single cut in the sending rate of the selected connection might not be enough to reduce the total arrival rate to a value smaller than the capacity of the outgoing link. Whenever the queue size is above θh we keep sending congestion signals to the selected connection. This does happen especially during the slow start phase, because of the fast increase of the congestion window. Given that the system has non-zero propagation and queueing delays whenever we set the ECN bit of a certain connection we need to wait for the sender to receive the corresponding acknowledgment before it reduces its sending rate. Before such reduction is noticeable at the bottleneck link we still need to wait for the propagation and queueing delay between the sender and the bottleneck link. During this time the sending rate and the queue will keep growing so that, at the bottleneck link, it is not immediately possible to conclude whether a single cut is enough or not. Clearly if we set θh too high the system will respond slowly, whenever one

102

Chapter 6 : Improving TCP Fairness with the MarkMax Policy

cut is not enough, and the queue will be larger. On the other hand if we set θh too close to θ unnecessary multiple cuts can take place. The lower threshold θl is needed because the queue size can oscillate around θ, due to the arrival and departure of single packets and to the bursty nature of the arrival ows. If the queue size is close to θ the threshold can be crossed multiple times, so if we use only one threshold θ this could generate multiple congestion signals, potentially causing the sender to reduce its sending rate multiple times 1 . Furthermore it could happen that dierent connections are selected, causing, again, multiple and unnecessary cuts. Because of these oscillations using

θl is the only way to determine whether the selected connection has reacted to the congestion signal. Even if a single cut is enough to reduce the total sending rate to a value smaller than the capacity of the outgoing link the additive increase aspect of TCP will increase the sending rate again so that the backlog will, eventually, start to increase again. Clearly if we set θl too low the backlog might never reach it forcing the algorithm to use only the θh threshold and to send multiple spurious congestion signals. In the next section we use a uid approximation to further discuss the selection of θ and

θh . Based on the simulations we run it looks reasonable to suggest that θh and θl can be set as follows : θh = 1.15 · θ and θl = 0.85 · θ. After enqueueing the arriving packet the algorithm sets the flag variable to TRUE if the queue size has grown too large or has suciently decreased. In both cases the queue size is suciently far from θ so that we should send a new congestion signal if q ≥ θ. This is done by the last if statement : at rst a connection is selected, then the ECN ag is set in the rst packet from the head of the queue of the selected connection. Finally the flag is set to FALSE to indicate the fact that one congestion signal has already been sent. We propose two dierent criteria for selecting the connection to be cut : MarkMax-B selects the connection with the biggest (per connection) backlog and MarkMax-T selects the connection with the biggest backlog in the nal part of the queue (the tail). As such the MarkMax-T variant has one extra parameter, expressed as a percentage, indicating the portion of the queue that will be considered. The per connection backlog is related to the sending rate of each connection. Clearly a larger sending rate will result in larger backlog. More precisely the connection with the biggest backlog is the connection with the largest average sending rate since the beginning of the current busy period. Larger values of θ and corresponding larger queues lead to a larger averaging window, basically increasing the memory of the system. The idea behind MarkMax-T is to reduce the averaging window in order to identify the connection with the biggest instantaneous rate. 1. The ECN specication does mention that senders should reduce the sending rate only once per round trip time, but this is not enough to guarantee that multiple cuts will not take place if we mark multiple packets.

6.4 Fluid model

103

6.4 Fluid model Consider N TCP connections sharing a single bottleneck link with capacity µ. Let RT Ti be the round trip time of the i-th connection (i = 1, . . . , N ) and λi (t) its sending rate at time t. We approximate the behavior of the system using a uid model. Data is represented by a uid P that ows into the buer with rate λ(t) = i λi (t), and it leaves the buer with rate µ if there is a non-zero backlog. Fluid models have been successfully used to model TCP connections. In [AAB00] it is shown that such a model adequately describes the behavior of a TCP connection, provided the average sending rate is large enough. As in [AAP05] we assume that, between congestion signals, senders increase their sending rate linearly. If at time t0 the sending rate of the i-th sender is λ0,i then at time t > t0 its sending rate is λi (t) = λ0,i + αi t, where αi = 1/(RTT i )2 . For the sake of simplicity we assume that RTT i is a constant, as if it is often done (see, for example, [ABN+ 95, SZC90, AAP05]). It is not too hard to see that, if at time t0 the sending rates are λ0,i and the total backlog is x0 , the backlog x(t) is given by :

x(t) = x0 + (λ0 − µ)t + Where λ0 = (λ0 −µ)2 2α

P

i λ0,i

and α =

P

i αi ,

α 2 t . 2

provided x0 and λ0 are such that x0 ≥

(6.1) (λ0 −µ)2 . 2α

If x0 <

and λ0 < µ then, after a decreasing phase, the buer will be empty for a certain time

and will nally start increasing again. In this case  2  x0 + (λ0 − µ)t + αt2 , if t ≤ t1    0 x(t) = 0, if t1 ≤ t ≤ µ−λ α  ³ ´  2   α t − µ−λ0 , 0 if t > µ−λ 2 α α

(6.2)

√

where t1 =

µ−λ0 −

(µ−λ0 )2 −2αx0 . α

Solving λ(t) = λ0 + αt for t and substituting in (6.1) we have that

x(λ) = provided x0 ≥

(λ0 −µ)2 . 2α

λ2 λµ µλ0 λ2 − + x0 + − 0 2α α α 2α

(6.3)

A similar expression can be obtained substituting the value of t in (6.2).

Figure 6.1 shows some of the possible trajectories of the system. Note that all these parabolas have the same shape in the sense that as x0 and λ0 vary the only thing that changes is the height of the vertex on the λ = µ line. One possible way of adapting the MarkMax algorithm to the uid case is as follows : every time the total backlog x(t) reaches θ we can send a congestion signal to the corresponding

104

Chapter 6 : Improving TCP Fairness with the MarkMax Policy x(λ) θ

(3)

0

βµ

(2.2)

(2.1)

(1)

+ λ+ ∗ λ

µ

λ− λmax

λ

Figure 6.1 Some of the possible trajectories in the state space connection by multiplying its sending rate by β (0 < β < 1). Throughout the chapter we will use β = 1/2 (to model TCP New Reno) unless otherwise stated. The two selection methods previously discussed can easily be adapted as well : for MarkMax-B we select the connection with the biggest backlog, while for MarkMax-T we pick the connection with the biggest instantaneous sending rate. Recall that the idea behind MarkMax-T was exactly this and, with the uid model, we know λi (t) exactly so there is no need to approximate it. To simplify the analysis, unless otherwise specied, we assume that the source reacts immediately to the congestion signals. Combining this with the fact that we know the sending rate after a cut and there are no short term oscillations in the queue size, it suces to use only one threshold (θ). As a consequence whenever the backlog reaches θ the chosen connection, say j , P immediately changes its rate to βλj . If i6=j λi + βλj > µ (that is the arrival rate is still greater than µ) the procedure is repeated by selecting a new connection to cut (it can be the same one or not, depending on the specic case) until the total sending rate is less than µ. For the MarkMax-T version this procedure is guaranteed to terminate : eventually all connections will be cut. While for MarkMax-B this is not the case : if multiple cuts are needed the algorithm will always pick the same connection. As there is no feedback delay the backlog does not change. If the sum of the rates of the other connections is greater than µ even an innite number of cuts will not suce and the algorithm will not terminate. Given that this happens only in the uid model and only for very large (and unrealistic) values of θ we decided not to address the problem. It is worth noting that using this uid model it is also possible to exactly compute the per connection backlog, at any given time t, using an approach based on network calculus [Cru91]. Let Ri,in (t) be the total amount of trac sent by the i-th connection until time t (this is generally Rt called a process in network calculus), that is Ri,in (t) = 0 λi (u)du. Similarly let Ri,out (t) be

6.4 Fluid model

105

the total amount of trac of connection i that has left the buer until time t. Clearly the backlog at time t is xi (t) = Ri,in (t) − Ri,out (t) so that we need to compute Ri,in (t) and Ri,out (t) to nd xi (t). Let t1 , . . . , tn be the times at which a congestion signal was sent (to any of the connections). Between two congestion signals, say tj and tj+1 , we know that if λi (t) = λi,j + αi t then Ri,in (t) = Si,j + λi,j (t − tj ) +

αi 2 2 (t − tj )

where λi,j , λi (tj ) and Si,j , Ri,in (tj ). This way

we can also compute Ri,in (τ ) for any τ ≤ t. To compute Ri,out (t) we can take advantage of the fact that we are dealing with a uid FCFS queue with continuous inputs (the arrival rate is bounded) so that the delay for all the bits exiting at time t is the same and it is equal d(t) = inf {u ≥ 0|Rin (t − u) ≤ Rout (t)} where P P Rin (t) = i Ri,in (t) and Rout (t) = i Ri,out (t). This implies that Ri,out (t) = Ri,in (t − d(t)). As we know Ri,in (τ ) for any τ ≤ t we only need to nd d(t) to compute Ri,out (t). Let v , t − d(t), that is the bits that are exiting at time t joined the queue at time v . We can nd v exploiting the fact that Ri,in (v) = Ri,out (t) and that Rout (t) = µ(t − u), where u is the beginning of the system busy period containing t and can be found because Ri,in (τ ) is known for all τ ≤ t. We also have that, if tk ≤ τ ≤ tk+1 :

Rin (τ ) = Sk + λ0 (τ − tk ) + where k , max {j|Sj < Rin (v)} and Sj ,

P

i Si,j .

α (τ − tk )2 , 2

(6.4)

That is the trac exiting at time t entered

in the buer between tk and tk+1 . As tk ≤ v ≤ tk+1 we can use (6.4) to solve Rin (v) = µ(t − u) for v and nally compute d(t) = t − v . Knowing d(t) we can use Ri,out (t) = Ri,in (t − d(t)) to compute xi (t) = Ri,in (t) − Ri,out (t). Using this method we wrote a simulator for the uid model (in Python) that implements both variants of MarkMax. Using this simulator we have noticed that, provided the value of θ is not too big, MarkMax-B and MarkMax-T behave in a very similar way. In the remainder of this section we present some results that can be derived using the uid model.

6.4.1 Guideline bounds Let tθ be such that x(tθ ) = θ. Let λ− and λ+ be, respectively, the total sending rate before and after the cut(s) at time tθ . Let

 0, if λ ≤ µ g(λ) = , 2  (λ−µ) , if λ ≥ µ 2α (marked as (2.2) in Figure 6.1) and let A = {(λ, x)|x > g(λ)}. It is easy to verify that if

(λ0 , x0 ) ∈ A, then any trajectory starting at (λ0 , x0 ) stays in A. Furthermore, given that we send the congestion signal(s) whenever x(t) = θ and that there is no feedback delay, the maximum

106

Chapter 6 : Improving TCP Fairness with the MarkMax Policy

rate λmax corresponds to intersection between g(λ) and x = θ in Figure 6.1. It is easy to see √ that λmax = µ + 2αθ. Clearly all the trajectories described by (6.3) intersect the x = θ line twice, once to the left of the λ = µ line and once to the right. Only the intersection points to the right correspond to an increasing backlog phase so that λ− is always between µ and λmax . We can also bound

λ+ : as we keep sending congestion signals until the arrival rate is less than µ we have λ+ ≤ µ. The fact that λ− ≥ µ implies that λ+ cannot be smaller than βµ (this happens when λ+ = µ and either there is only one connection or, in the case of multiple connections, the biggest one is signicantly bigger than the others). Combining all this we have : √ µ ≤λ− ≤ µ + 2αθ,

(6.5)

+

(6.6)

βµ ≤λ ≤ µ.

After the cut(s) the total sending rate will be reduced by a factor β˜ , λ+ /λ− , which is always smaller than β .

Lemma 6.1 If we use MarkMax-T then : β N +β−1 √ ≤ β˜ ≤ . N 1 + 2αθ/µ − Proof. Let λ− i be the sending rate of i-th connection at time tθ so that λ =

(6.7)

P

− i λi .

And let

j be such that λj = maxi {λi }, then : λ− λ+ j β˜ = − ≤ 1 − (1 − β) − λ λ λ− N −1+β ≤ 1 − (1 − β) − = . λ N N Where the rst equality is the denition of β˜, the rst inequality follows from the fact that P − − − λ+ ≤ βλ− i6=j λj = λ − λj (1 − β) ; this inequality is true because the right hand side j + corresponds to the case where there is only a single cut and in this case λ+ is largest. The second the inequality follows from the fact that λj = maxi {λi } ≥

λ− . N √

By (6.6) and (6.5) we have that λ+ ≥ βµ and λ− ≤ µ + with the denition of β˜ we have the lower bound.

2αθ, combining these inequalities

Using the upper bounds in (6.5) and (6.7) we have : √ ˜ − ≤ (µ + 2θα) N + β − 1 . λ+ = βλ (6.8) N As the upper bound on β˜ corresponds to the case where only one connection is cut, if the right hand side of (6.8) is less than µ then a single cut of the connection with the biggest rate will be enough. The following lemma follows immediately by setting the right hand side of (6.8) less than or equal to µ and solving for θ.

6.4 Fluid model

107

Lemma 6.2 If we use MarkMax-T and if θ≤

then λ+ = βλj +

P i6=j

1 µ2 (1 − β)2 , 2α (N − 1 + β)2

(6.9)

λi ≤ µ (that is after a single cut λ < µ), where λj = maxi {λi }.

Using the lower bound in (6.8) we can nd a lower bound on θ so that there will be no underow. That is the backlog is always positive and the link is fully utilized.

Lemma 6.3 If we use MarkMax-T and if θ>

where ζ =

√β , 1+ 2αθ/µ

µ2 (1 − ζ)2 , 2α

(6.10)

then the backlog is positive.

Proof. We have that : ˜ − ≥ ζµ λ+ = βλ √ √ µ − 2αθ µ = µ − 2αθ, > µ where the rst inequality follows from (6.7) and (6.5) and the second from (6.10). It is easy to √ see that if λ+ > λ+ ∗ = µ − 2αθ then the backlog is always positive (see Figure (6.1) : we want the vertex of the parabola (6.3) to be on the x = 0 axis), which completes the proof. We conclude with a bound that can be used as a guideline to set θh .

Lemma 6.4 At time t = tθ + RT Tj x(t) ≤ θ +

√ α 2αθRTT j + RT Tj2 2

(6.11)

where RT Tj = maxi RT Ti .

Proof. Consider a cycle that start at time tθ then at time t = tθ + RT Ti α x(t) = θ + (λ− − µ)(t − tθ ) + (t − tθ )2 2 α − = θ + (λ − µ)RT Ti + RT Ti2 2 √ α ≤ θ + 2αθ max{RT Ti } + max{RT Ti2 }, i 2 i where the rst equality follows from (6.1), and the inequality from the upper bound in (6.5). Using (6.11) it is possible to know by how much the queue could grow between the time the threshold θ is reached and the time the slowest of the connections (i.e. the one with the biggest RTT) reacts to a congestion signal.

108

Chapter 6 : Improving TCP Fairness with the MarkMax Policy

6.5 Simulation results We have modied the NS-2 simulator in order to simulate the behavior of the proposed algorithm. We have implemented both the MarkMax-B and the MarkMax-T, referred to as MM-B and MM-T, respectively, in this section. For MM-T we only consider the last 10% of the queue (recall that for this version we are considering only the nal part of the queue when determining the connection with the biggest backlog). We have compared MarkMax with the standard DropTail (DT) policy, by setting the queue size for DT equal to θ. For the MM case the buer size was large enough to be considered unlimited so that we could verify that MM can stabilize the queue size. We consider three scenarios, the corresponding topologies are presented in Figures 6.2,6.3. Each node si has a TCP connection with node di . All the connections have a Maximum Segment Size (MSS) of 540 B. The bottleneck link is the link between the nodes S and D and has capacity µ and propagation delay abtlnk . The links (si , S) and (D, di ) have capacity µi and propagation delay ai . For the rst scenario (see Figure 6.2) there are only two sources and two destinations while for the second scenario there is an additional TCP connection sending trac in the opposite direction on the bottleneck link in order to introduce some variability in the ow of the acknowledgments for connections 1 and 2. The links used by this additional connection are represented as dotted lines in Figure 6.2. In the third scenario we consider 10 connections (see Figure 6.3) with all the trac going in one direction. In all cases only the link (S, D) uses MM while all the other links use DT. Let q¯ be the average queue size at the bottleneck link and q¯i (i = 1, ..., N ) be the average queue size for the i-th connection. Using Little's formula we have that the average queueing delay at the bottleneck link is T¯ = q¯/µ . We can express the round trip time of the i-th connection as : RTT i = 4ai + 2abtlnk + T¯, assuming the service time of each packet is negligible.

Let δi , RTT i − T¯ = 4ai + 2abtlnk . By increasing δi for some connections we model dierent

propagation and queueing delays of multiples links that, for the sake of simplicity, are not explicitly considered. Let tf be the total simulation time. Given that all the sources start sending data at time 0 we have that the bottleneck link could transmit at most µtf units of data. Let D(tf ) be the total amount of data actually transmitted during the simulation so that the utilization of the link is

ρ , D(tf )/(µtf ). Let Di (tf ) be the total amount of data received by the i-th connection so that gi = Di (tf )/tf is the corresponding goodput. To compare the fairness of dierent solutions we use Jain's fairness index which is dened as : ³P N

J= Note that

1 N

i=1 gi

N

PN

´2

2 i=1 gi

.

≤ J ≤ 1 and that bigger values indicates greater fairness.

6.5 Simulation results

s1

109

µ2 , a 2 s2

S

s1

d1

µ, a1

µ, a1 µ, abtln

D

µ1 , a 1

µ1 , a 1

s2

µ2 , a 2

µ2 , a 2

S

µ, abtln

D

µ2 , a 2

d1 d2

d2 µ3 , a 3

µ3 , a 3 d3

µ10, a10

s10

s3

Figure 6.2 Scenarios 1 and 2

µ10, a10

d10

Figure 6.3 Scenario 3

6.5.1 Fluid model Using the uid model simulator we investigate the behavior of MarkMax-B for dierent −6 values of θ. In this case µ =70 Mbit/s, RTT =12 ms, α = 540 · 10 MB/s, i = 1, 2. Table 6.1 1

i

RTT 2i

shows the values of Jain's index and bottleneck link utilization for this case. As θ increases the utilization increases as well, due to the increase in the average backlog size. When θ is not suciently large the utilization is less than one due to periodic underows. For each value of θ Jain's index decreases but it is not too far from 1.

θ = 60 MSS

θ = 240 MSS

θ = 960 MSS

RT T2 RT T1

J

ρ

J

ρ

J

ρ

3

0.9893

0.890

0.9906

0.9500

0.9815

0.9964

7

0.9874

0.892

0.9874

0.9401

0.9788

0.9990

10

0.9861

0.890

0.9869

0.9400

0.9760

0.9990

20

0.9846

0.889

0.9863

0.9440

0.9754

0.9990

50

0.9836

0.899

0.9821

0.9433

0.9664

0.9925

Table 6.1 Fluid Model : Jain's index, utilization.

6.5.2 Scenario 1 For Scenario 1 we set µ =70 Mbit/s, µ1 = µ2 =300 Mbit/s, δ1 =12 ms, θ =240 MSS, θl =200 MSS,

θh =280 MSS, θDT =240 MSS. Table 6.2 gives the values of Jain's index and link utilization for dierent values of δ2 /δ1 and dierent queue management algorithms. Both MM variants outperform DT except in the rst case when δ2 /δ1 = 3. In this case Jain's index for DT is bigger

110

Chapter 6 : Improving TCP Fairness with the MarkMax Policy

but the utilization is somewhat lower. At the same time the dierence between Jain's index for DT and MM is signicantly large for larger values of δ2 /δ1 . Table 6.3 shows that the average queue size for the MM algorithms is somewhat larger than for DT. This is due to the increased link utilization obtained by MM. We have veried that in this case the hypothesis of Lemma 6.2 are satised and in the simulations it is indeed the case that one cut is always enough to reduce the total sending rate to a value less than µ. As the dierence between MM-B and MM-T is not signicant we only use MM-B in the remaining scenarios. DT

MM-B

MM-T

δ2 δ1

J

ρ

J

ρ

J

ρ

3

0.9893

0.9751

0.9853

0.9999

0.9633

0.9999

7

0.7540

0.9720

0.9625

0.9999

0.9515

0.9999

10

0.5361

0.9563

0.9494

0.9999

0.9501

0.9997

20

0.5484

0.9993

0.9561

0.9994

0.9258

0.9997

Table 6.2 Scenario 1 : Jain's index, utilization. DT

MM-B

MM-T

δ2 δ1

q¯/ B

T¯/ms

q¯/ B

T¯/ms

q¯/ B

T¯/ms

3

78373

8.9

87257

9.9

86753

9.9

7

74802

8.5

81723

9.3

81547

9.3

10

69219

7.9

80019

9.1

79502

9.1

20

68268

7.8

74297

8.4

74189

8.4

Table 6.3 Scenario 1 : average queue size and delay.

6.5.3 Scenario 2 The only dierence between the rst and second scenario is that there is one additional TCP connection (s3 , d3 ) sending data in the opposite direction on the bottleneck link. All the parameters are the same as in scenario 1 with the only dierence being that the buer size for the DT queue between D and S (that is the queue used by the data trac of connection 3 and the acknowledgments of connections 1 and 2) is set to 240 MSS and the δ3 = δ2 . Table 6.4 shows that as in the previous scenario MM-B outperforms DT. Not surprisingly the presence of

6.6 Conclusion and future work

111

trac competing with the acknowledgments on the (D, S) link does alter the performance of MM-B, for lower values of δ2 /δ1 there is a slight increase in Jain's index but for higher values it decreases and the utilization is always lower than in the previous case. Most likely this is due to the fact that the presence of trac disrupting the ow of the acknowledgments increases the round trip time. DT

MM-B

δ2 δ1

J

ρ

q¯/ B

J

ρ

q¯/ B

7

0.8561

0.9338

34443

0.9637

0.9600

41966

10

0.7769

0.9497

32174

0.9632

0.9510

39486

20

0.6910

0.9146

28699

0.9228

0.9702

41350

50

0.5244

0.9262

29021

0.8572

0.9937

50408

Table 6.4 Scenario 2 : Jain's index, utilization and average queue size.

6.5.4 Scenario 3 In the last scenario we have 10 connections sharing the (S, D) link and no connections √ using the reverse link, µ =70 Mbit/s, µi =300 Mbit/s, i = 1, . . . , 10, δ1 =12 ms, δi+1 = 2δi ,

i = 1, . . . , 9, θ =240 MSS, θl =200 MSS, θh =280 MSS, θDT =240 MSS. Table 6.5 shows that MM-B has a signicantly higher Jain's index, and slightly higher utilization, at the expenses of a moderate increase in the average queue size.

J

ρ

q¯/ B

T¯/ms

DT

0.5848

98,91

65207

7

MM-B

0.9313

99,99

98913

11

Table 6.5 Scenario 3 : Jain's index, utilization and average queue size and delay

6.6 Conclusion and future work We have introduced MarkMax : a simple ow-aware AQM algorithm. We have used a uid model to set the parameters of the algorithm as well as to analyze its behavior. We have also shown how to compute the per-ow backlog using such a model. We have simulated the two proposed variants (MarkMax-B and MarkMax-T) using NS-2, showing how they improve the fairness and link utilization compared to the standard DropTail algorithm.

112

Chapter 6 : Improving TCP Fairness with the MarkMax Policy These results are denitely promising and warrant further analysis. Of all the issues that

we plan on addressing we would like to mention performance and queue stability with large number of connections and comparison between MarkMax-B and MarkMax-T. So far we have conducted simulations with up to 10 connections but it is not immediately clear if the algorithm would perform equally well with more connections. It is conceivable that, at least in some cases, cutting a single connection could not be enough to bring the total sending rate to a value smaller than µ. We would also like to determine whether MarkMax-B always outperforms MarkMax-T as indicated by the simulations we run so far or if it the situation can be reversed by properly selecting the fraction of the queue that is considered while computing the per-connection backlog in MarkMax-T.

6.6 Conclusion and future work

113

114

Chapter 6 : Improving TCP Fairness with the MarkMax Policy

Chapter 7 Conclusions and perspectives

In the current thesis we propose several new contributions to improve the performance in computer networks. The obtained results concern the resource sharing problems in the Internet routers, Web servers and operating systems. We study several algorithms which decrease the mean waiting time in the system with ecient resource sharing, provide the possibility to introduce the Quality of Service and ow dierentiation to the networks. We show the eectiveness of the proposed algorithms and study the possibility of their implementation in the router queues. The studied problems open several directions for future work, some of which are the topics of our current research. In Chapter 3 we study the TLPS scheduling scheme for the case of hyper-exponential job size distribution and nd an approximation of the optimal threshold for the case of two phase job size distribution. We show that the mean sojourn time in the system with the use of the found threshold approximation can be reduced up to 36% in comparison with the DropTail policy. Still the question of the threshold selection in the case when the job size distribution is hyper-exponential with many phases or has a dierent distribution stays open. We consider this to be an important topic future studies. In Chapter 4 we prove the monotonicity of the mean conditional sojourn time in the DPS system under a restriction on the system parameters. As we did not nd a counter-example and therefore the found restriction is probably not a necessary condition, we think that it is possible to prove the theorem for the general case without additional system constraints. Also the investigation of the system parameters to nd the cases when the DPS system gives a signicant gain in comparison with PS system is an interesting topic for future research. In Chapter 5 we study the optimal Gittins policy in the multi-class single server queue. This topic opens a large area for future research, as we studied several particular cases of the 115

116

Chapter 7 : Conclusions and perspectives

Gittins policy application. Taking into account the Internet trac structure, we study the case when the jobs arrive to the system in two classes, which are Pareto distributed and represent mice and elephants in the Internet. For this case we describe the optimal system policy, nd the analytical expression of the mean sojourn time and implement the algorithm in the router queue. With the simulation results we show that with the found optimal policy the gain in the system can reach 10% in comparison with the LAS policy and 36% in comparison with the DropTail policy. Also we study several cases of particular interest when jobs arrive in classes with exponential distributions. As a future research we propose to consider the cases with more than two job classes in the system, also we may consider other types of service times distributions. It is important to investigate the system parameters to nd when the Gittins policy gives a signicant gain in comparison with the LAS policy. The applicability of our results in real systems like the Internet should also be more carefully evaluated. In Chapter 6 we introduce a new ow-aware AQM scheme, MarkMax, which reduces the sending rate of the connection with the largest sending rate when the router buer reaches some given threshold. With the uid model we found the guidelines for the threshold selection. Using the NS-2 simulator we implement MarkMax in the router queue and show that it improves fairness in the system and provides better performance than the DropTail policy. As a future research topic we propose to study more complex system topologies and cases with the large number of connections share the bottleneck link. For this case we propose to cut several connections at once. The selection of the number of connections to cut and its dependency on the number of connections present in the network constitute a challenging study. A possible research direction is a combination of MarkMax and a ow dierentiation scheduling policy like TLPS or Gittins policies. Development of the new algorithm which gives priority to the short ows and at the same time improves fairness between the long ows can be an interesting and nontrivial task. We think that such an algorithm can improve both, fairness and mean sojourn time in the system and provide better system performance.

117

118

Chapter 7 : Conclusions and perspectives

Chapter 8 Présentation des Travaux de Thèse

8.1 Introduction Au début des années 1970, des réseaux qui connectaient les ordinateurs et les terminaux commençaient à apparaître. Ces réseaux étaient développés pour partager les ressources des ordinateurs et pour échanger des données entre les ordinateurs. Depuis lors le problème de la minimisation des coûts et des temps de transmission et de la maximisation du volume de données transmises est un des problèmes les plus importants dans les réseaux d'ordinateurs. Alors qu'avec le progrès technique, les capacités des ordinateurs augmentent, le besoin de transmission de données rapide, ecace et sûre augmente également. L'Internet est le réseau d'ordinateurs le plus grand qui connecte plus d'un milliard d'utilisateurs dans le monde entier. La taille d'Internet augmente très vite, donc les ressources du réseau doivent être partagées entre un très large nombre d'utilisateurs. Un partage de ressources incorrect peut impliquer l'inaccessibilité des serveurs, de longs délais, et d'autres problèmes dans le réseau, ce qui cause l'insatisfaction des utilisateurs avec le service fourni. Même si jusqu'au jour d'aujourd'hui, beaucoup de travail a été réalisé pour arriver à une allocation optimale des ressources, une haute performance et des délais courts, le comportement de l'Internet est loin d'être idéal et il reste encore beaucoup de problèmes à résoudre. Dans cette thèse, nous adressons le problème de partage des ressources dans les réseaux d'ordinateurs. Nous considérons plusieurs algorithmes d'ordonnancement et leur application à l'ordonnancement de ux dans les routeurs Internet. Comme le temps d'attente dans le réseau est une des plus importantes caractéristiques pour l'utilisateur, nous nous concentrons sur le problème de minimisation du temps d'attente moyen. Prenant en compte la structure du trac 119

120

Chapter 8 : Présentation des Travaux de Thèse

dans l'Internet nous étudions plusieurs algorithmes de diérentiation basés sur la taille qui donnent la priorité aux ux courts et peuvent décroître signicativement le temps d'attente moyen dans le réseau. Nous introduisons un nouvel algorithme d'élimination de paquets sensible aux ux pour les routeurs de l'Internet, qui améliore la performance du réseau et l'équité entre les ux.

8.1.1 Etat de l'art Réseaux d'ordinateurs Un réseau d'ordinateurs est un ensemble d'ordinateurs ou de terminaux, qui sont interconnectés par un réseau de communication. Même si les réseaux d'ordinateurs sont largement présents dans la littérature, voir [Tan96, Sta03], dans cette introduction nous décrivons quelques bases des réseaux d'ordinateurs pour expliquer la motivation de l'analyse eectuée et de ses résultats. Avant de parler en détail des réseaux d'ordinateurs, répondons tout d'abord à la question : Pourquoi les gens sont-ils intéressés par les réseaux d'ordinateurs, et à quoi sont-ils utilisés ?. Globalement, nous pouvons classier les utilisateurs des réseaux d'ordinateurs en deux groupes, les entreprises et les particuliers. Les entreprises utilisent les réseaux d'ordinateurs principalement pour eectuer du partage de ressources (tous les programmes, l'équipement et les données sont ainsi disponibles pour l'ensemble des collaborateurs de l'entreprise), arriver à un haut degré de abilité (la possibilité de continuer le travail en cas de problèmes matériels), économiser de l'argent (comparer le coût très élevé de gros serveurs par rapport à celui de petits ordinateurs), mettre à l'échelle (possibilité d'ajouter de nouvelles stations de travail dans le réseau et d'accroître la performance du système en ajoutant de nouveau processeurs sans changement global de la structure du système), faciliter la communication entre les travailleurs de l'entreprise (rapports, discussions du travail). Pour les particuliers utilisateurs d'ordinateurs et d'Internet, les usages les plus importants sont : l'accès à l'information en ligne (vérication de compte en banque, achats, journaux, magazines, revues, bibliothèques en ligne), communication privée (courrier électronique, réunions virtuelles, vidéoconférences), loisirs (vidéos, lms, télévision, radio, musique, jeux vidéos). Ainsi, les ordinateurs occupent une grande place dans la vie de tous les jours et peuvent nous aider dans de nombreux domaines diérents. Maintenant que nous avons décrit pourquoi nous avons besoin de réseaux d'ordinateurs, retournons à notre sujet, à savoir, comment ces réseaux d'ordinateurs fonctionnent. Le but principal des réseaux d'ordinateurs est la possibilité d'échanger des données entre ordinateurs. Dans sa forme la plus simple la communication des données a lieu entre deux appareils qui sont connectés directement. Cependant, en pratique, il est fréquemment impossible pour deux appareils d'être connectés point à point. C'est le cas lorsque les appareils sont situés loin l'un de l'autre. Un exemple est le réseau téléphonique mondial, un autre exemple l'ensemble des

8.1 Introduction

121

ordinateurs d'une grande entreprise. Alors, la solution est de connecter chaque appareil à un réseau de communication. Dans le reste de cette thèse, nous nous référerons aux appareils qui communiquent les uns avec les autres comme à des stations ou à des n÷uds. Les stations peuvent être des ordinateurs, des téléphones ou d'autres appareils communiquants. Les réseaux de communication peuvent être classés en fonction de l'architecture et des techniques utilisées pour transférer les données. Globalement, il existe des réseaux à diusion et des réseaux à commutation (point à point). Dans les réseaux à diusion, la transmission d'une station est diusée et reçue par l'ensemble des autres stations. Dans les réseaux à commutation, les données sont transférées de la source à la destination par le biais de n÷uds intermédiaires. Le but de chaque n÷ud est de passer les données de n÷ud en n÷ud jusqu'à ce qu'elles atteignent leur destination. Les réseaux à commutation sont répartis en réseaux à commutation par circuits et réseaux à commutation par paquets. Dans les réseaux à commutation par circuits, le chemin entre un expéditeur et une destination est xé à l'avance puis les données sont transmises en utilisant ce canal. Dans les réseaux à commutation par paquets, les données sont envoyées comme une série de petits morceaux, appelés paquets. Chaque paquet passe par le biais du réseau d'un n÷ud à un autre le long d'un chemin menant de la source à la destination. A chaque n÷ud intermédiaire, appelé routeur, le paquet est reçu, stocké brièvement, puis transmis au n÷ud suivant. Le routeur prend la décision de la direction vers laquelle est transmis le paquet. Les réseaux à commutation par paquets sont couramment utilisés pour les communications d'ordinateur à ordinateur. Les réseaux informatiques sont également classés en fonction de leur taille en tant que réseaux locaux (LAN), qui couvrent un campus de moins de quelques kilomètres de diamètre, réseaux métropolitains (MAN), qui couvrent un groupe de bureaux ou une ville, et réseaux étendus (WAN), qui couvrent un large réseau géographique. LAN et MAN ne requièrent généralement pas l'utilisation de la commutation de paquets, en raison de leur faible taille. Les exemples de LAN sont Ethernet, anneau à jeton (Token ring) ; les MAN les plus connus sont Distributed Queue Dual Bus (DQDB), etc. Le plus célèbre et le plus grand WAN est le réseau Internet, qui relie plus d'un milliard d'utilisateurs et permet d'échanger des données entre eux. Les WAN utilisent généralement les technologies de commutation par paquets. En particulier, l'Internet est basé sur la technologie de commutation par paquets. Dans ce travail, nous étudions des problèmes liés aux réseaux à commutation par paquets, et en particulier, à l'Internet.

L'architecture des réseaux d'ordinateurs De nos jours, la communication entre les ordinateurs dans Internet est surtout basée sur le modèle Open System Interconnection (OSI) qui consiste en sept couches, [Sta94]. La structure en couches est utilisée pour décomposer un problème complexe de communication entre les

122

Chapter 8 : Présentation des Travaux de Thèse

ordinateurs en plusieurs problèmes plus petits. Les couches sont autonomes et ne dépendent pas les uns des autres. Chaque couche est responsable pour certaines fonctionnalités, elle utilise les fonctions des niveaux inférieurs et fournit des fonctionnalités aux niveaux supérieurs. Les couches sont basées sur le concept de protocole, le nombre de règles qui servent à organiser le transfert des données. Les couches de OSI sont : Physique, Liaison de Données, Réseau, Transport, Session, Présentation et Application. Nous ne donnons pas les descriptions exhaustives de toutes les couches et leurs fonctionnalités ici, mais nous nous restreignons aux couches Transport et Réseau, étant donné qu'elles correspondent au transfert des données, et assurent la correction d'erreurs et le contrôle des ux. La couche Réseau accepte les paquets en provenance de la couche Transport et les envoie de l'émetteur à la destination dans le réseau. La couche Réseau est basée sur le Protocole d'Internet (IP). La technologie de IP ne vérie pas si les paquets ont été délivrés, donc la correction d'erreurs doit être faite par la couche Transport ou d'autres protocoles de plus haut niveau. La couche Réseau fournit un service de livraison peu able dans le réseau. La couche Transport accepte les données de la couche Session, découpe les en paquets, transmet les paquets à la couche Réseau. Les deux protocoles les plus importants de la couche Transport sont le Protocole de Contrôle de Transmission (TCP) et dans une moindre mesure le Protocole de Datagramme Utilisateur (UDP). Tous deux utilisent le protocole IP au niveau de la couche Réseau, c'est pourquoi nous parlons toujours des protocoles TCP/IP. TCP vérie l'arrivée des paquets à la destination, elle fournit un mécanisme able de transport, [Sta94]. Le protocole UDP est rarement utilisé dans Internet en raison de son manque de abilité. UDP ne fournit pas de livraison able des paquets et est utilisé par les applications qui possèdent leur propre contrôle des ux et vérient les arrivées des paquets. Aussi UDP peut être utilisé quand quelques pertes des données transférés peuvent être tolérées, comme par exemple dans la téléphonie IP. Le protocole le plus utilisé dans Internet est TCP. TCP est un protocole able, orienté connexion, qui permet de transmettre l'information d'une machine à l'autre sans erreurs. Il est conçu pour fournir un débit maximal et un transfert able dans un réseau inconnu. Les diérentes parties d'un WAN peuvent avoir diérentes topologies, bandes passantes, délais, tailles des paquets et autres paramètres. De plus, tous ces paramètres peuvent changer. TCP s'adapte dynamiquement aux propriétés du réseau et il est robuste en face de plusieurs types d'échecs. TCP fournit un contrôle de ux pour être sûr qu'un émetteur rapide ne submerge pas un récepteur lent ou les n÷uds intermédiaires avec plus d'information qu'ils ne peuvent en traiter. En utilisant le mécanisme de Contrôle de Congestion, TCP réduit son taux d'émission quand une perte a lieu dans le réseau, et donc adapte son taux d'émission en fonction des paramètres du récepteur et du réseau. Nous donnons une description plus détaillée dans la

8.1 Introduction

123

Sous-section 8.1.1. Les applications qui sont les plus utilisées dans Internet et les réseaux d'ordinateurs et qui utilisent le protocole TCP sont, cf. [Sta03], Telnet (le terminal virtuel), le Protocole de Transfert de Fichiers (FTP), qui est utilisé pour le transfert des chiers entre les systèmes avec des structures et propriétés diérentes, le Protocole Simple de Transfert de Courrier (SMTP), qui est utilisé pour transférer les courriers électroniques, le Multipurpose Internet Mail Extension (MIME), qui rend possible d'inclure des images et d'autres chiers multimédia dans le message, le Système de Noms de Domaine (DNS), qui est utilisé pour trouver la relation entre les noms des hôtes et leurs adresses réseaux, le Protocole de Transfert Hypertexte (HTTP), qui est utilisé pour transmettre des pages web dans Internet, le Session Initiation Protocol (SIP), qui est un protocole de la couche Application pour le contrôle de sessions dans le réseau. A ces applications standardises au niveau de l'IETF s'ajoutent depuis quelques années les applications pair-a-pair non-standardisais telles que Bittorrent et aDonkey, représentant plus de 50% du trac Internet.

La structure de trac dans l'Internet Nous allons donner les caractéristiques les plus importantes du trac dans les réseaux d'ordinateurs et l'Internet, dont nous avons besoin dans l'analyse future. Dans le cadre de la modélisation au niveau du ux, le ux est l'unité de base du trac des données. Le ux est déni comme le ot ininterrompu des paquets qui sont envoyés de la source à la destination. Nous pouvons modéliser une connexion TCP comme un ux, qui s'ouvre, envoie un ou plusieurs chiers et se ferme, ou nous pouvons dénir chaque chier envoyé comme un ux séparé. Dans le présent travail, nous considérons que le ux correspond à l'envoi d'un chier. Dans ce travail, nous utilisons les termes <ux/connexion/chier/session> (<ow/connection/le/session>) de manière interchangeable. Dans le contexte d'ordonnancement stochastique nous utiliserons le terme . Le ux est caractérisé par sa durée, sa taille et son taux d'émission. La structure de trac d'Internet est largement étudiée dans la littérature. Dans [CB97, TMW97, NMM98, SAM99], les auteurs analysent le trac de données réel sur des serveurs web sélectionnés pendant une période de temps susamment longue et ils décrivent les caractéristiques du trac. Dans [FML+ 03], les auteurs proposent un système de surveillance qui est conçu pour analyser les mesures de trac, et ils fournissent également les résultats qu'ils ont obtenus avec le système proposé. Dans [Wil01] l'auteur décrit des mesures et des caractéristiques du trac. Dans [BGG03], les auteurs fournissent l'agglomération et l'analyse du trac entre plusieurs serveurs importants de France. Dans [BFOBR02], les auteurs étudient le contrôle d'admission à Internet en application des ux élastiques et continus. Dans des travaux plus récents [CC08], les auteurs proposent une nouvelle caractéristique, la Qualité d'Expérience (The Quality of Experience), pour mesurer comment l'utilisateur perçoit la performance du réseau, et fournissent

124

Chapter 8 : Présentation des Travaux de Thèse

des résultats de mesure de données de trac réel. Dans Internet le trac est divisé entre les ux élastiques et continus, (streaming). Les ux élastiques sont les chiers transférés, pages http, etc. Les ux continus sont créés par les applications vidéos et audio. Les ux élastiques sont encore dominants sur Internet même si lesapplications audio et vidéo sont de plus et plus utilisées, cf. [BFOBR02]. Dans le présent travail, nous étudions les ux élastiques, nous considérons que les ux continus prennent une part limitée de la bande passante. Le trac transféré par le protocole TCP représente 90% de tout le trac d'Internet, cf. [CC08, FML+ 03, BFOBR02]. Les temps entre arrivées de chiers sur Internet sont distribués Exponentiellement et les arrivées de ux dans le réseau peuvent être modélisés par un processus de Poisson. La caractéristique importante du processus de Poisson est la propriété PASTA [Wol89], qui joue un rôle important dans l'analyse mathématique de la modélisation de réseaux. La plupart des ux (90 − 95%) qui sont transférés sur Internet sont très petits, mais la plus grande partie du trac est créée par les ux longs, qui ne sont pas nombreux, représentant les 5 − 10% restants, cf. [Wil01, All00, FML+ 03]. Selon [CC08], 80% du trac est créé par les ux qui sont plus grands que 1 Mo et 50% par les ux qui sont plus grands que 10 Mo. Ceci est dû au fait que les ux les plus fréquents sont créés par les e-mails et le transfert des pages Web, qui sont de petite taille, et les ux longs sont générés par les transfert de chiers, les applications pair à pair, etc., qui sont beaucoup plus rares. Les ux courts sont appelés souris, les longs, éléphants, et le phénomène lui-même est nommé l'eet souris-éléphant ("mice-elephant eect"). Il a été montré que la distribution des tailles des chiers sur Internet est bien modélisée par la distribution à longue queue ou la distribution heavy-tailed et également qu'ils ont un taux de hasard décroissant (decreasing hazard rate, DHR). Dans [NMM98], au travers de l'analyse de données réelles, les auteurs conrment que la distribution des tailles des chiers sur Internet peut être modélisée par la distribution de Pareto heavy-tailed. Dans [CB97], les auteurs fournissent l'analyse de trac de réseaux de serveurs Web et ils trouvent aussi que la distribution des tailles des chiers est heavy-tailed. Dans [Rob01], l'auteur montre que les durées des ux en continu sont aussi heavy-tailed. En contraste avec les arrivées de ux, les arrivées de paquets ne sont généralement pas Poisson. En raison de la politique DropTail sur le routeur, qui crée la synchronisation globale dans le réseau, et en raison du fonctionnement du protocole TCP, (dont nous allons discuter plus tard dans la Sous-section 8.1.1), les paquets ont tendance à arriver en groupes, qui s'appellent lots (batches). Ces arrivées sont également appelées arrivées en rafales (bursty arrivals). Les tailles des paquets sur Internet varient depuis l'unité de transmission maximale (Maximum Transmit Unit ou MTU), à la taille d'un acquittement (ACK), qui est de 40 octets. Selon [Wil01, FML+ 03], les grands paquets MTU représentent 50% de tous les paquets dans le réseau,

8.1 Introduction

125

les ACK représentent 40% et le reste des paquets ont une taille qui est distribuée aléatoirement entre ces deux valeurs. Le trac sur le lien est usuellement bidirectionnel, mais pas symétrique.

Le contrôle du trac dans les réseaux d'ordinateurs Un des problèmes associés avec les réseaux d'ordinateurs est le contrôle du trac, [Sta94], qui est la régulation de la quantité de trac entrant dans le réseau de façon à ce que la performance du réseau soit haute. Le contrôle du trac peut être séparé en contrôle des ux et contrôle de congestion. Le contrôle de ux est nécessaire pour éviter que l'émetteur ne transfère les données avec un taux d'émission plus haut que le taux de réception du destinataire. Le contrôle de ux controle le taux de transmission de données entre deux n÷uds. La congestion est une situation quand le taux d'arrivée de données est plus haut que la capacité de transmission du réseau. Dans ce cas-là, le routeur ne peut pas servir tous les paquets qui arrivent et qui, donc, sont accumulés dans le tampon du routeur et attendent dans la le d'attente pour être servis. Si le taux d'arrivée ne diminue pas, la taille de la le d'attente augmente dramatiquement, il n'y a plus de place pour les paquets et les nouveaux paquets sont jetés et, plus tard, retransmis. La congestion dans le réseau est responsable de la plus grande partie des délais. Les techniques de contrôle de congestion essayent de prévenir la situation de congestion avant qu'elle n'arrive, ou, au moins, de réagir à cette situation proprement, i.e., de diminuer le taux d'arrivée des données. Un algorithme de contrôle de congestion ecace doit éviter le débordement du tampon et dans le même temps essayer de ne pas vider complètement la le d'attente, pour atteindre un débit le plus élevé possible. Pour eectuer un contrôle du trac ecace, l'émetteur a besoin de savoir la situation sur le routeur et dans le réseau, ce qui n'est pas toujours facile et en general impossible. D'un autre côté, le routeur non plus n'a pas d'accès direct sur l'émetteur de données pour contrôler son taux d'émission. Dans Internet le contrôle du trac est réalisé par la combinaison de la politique DropTail sur le routeur et le protocole TCP. La politique DropTail est la politique la plus simple et la plus utilisée pour le management de la taille du tampon dans les réseaux TCP/IP. Avec la politique DropTail le routeur jette les paquets arrivés en dernier dans la le d'attente si le tampon est plein. Les paquets mis en le d'attente sont servis avec la politique FCFS (First Come First Served, premier arrivé, premier servi). La taille du tampon est limitée. Même si techniquement c'est possible de rendre la taille du tampon très grande, ce n'est pas utilisé en pratique, parce qu'une taille élevée de la le d'attente crée un délai dans la le d'attente. La sélection de la taille du tampon représente un problème important, qui n'est pas encore résolu. Vous pouvez trouver de plus amples informations sur ce

126

Chapter 8 : Présentation des Travaux de Thèse

sujet dans [AANB02, AKM04, BGG+ 08], etc. L'implémentation actuelle de TCP fournit les contrôles de ux et de congestion. Nous donnons une description du protocole TCP plus détaillée dans la Sous-section suivante.

Le protocole TCP Le protocole TCP est maintenant beaucoup utilisé dans Internet et joue un rôle important dans la détermination des performances du réseau. La description formelle de TCP est donnée dans [Pos81]. L'idée de la fenêtre de congestion dynamique était proposée dans [Jac88]. Les changements et extensions plus tardifs sont donnés dans [Bra89, JBB92, Ste97, APS99]. Aussi le description du TCP peut etre trouve dans les livres comme [Sta03, Tan96, Wil98] ou dans les études, cf. [Kri00]. TCP est basé sur l'idée de l'argument de bout en bout (end-to-end argument), qui est que le taux d'émission du ux de données est contrôlé par le récepteur. Pour réaliser la transmission de données entre deux n÷uds, TCP doit être installé sur les deux n÷uds, l'émetteur et le récepteur. Quand TCP envoie un chier de données, il le découpe en paquets (ou segments) de taille donnée et envoie chaque segment séparément dans le ux de données. Quand les paquets arrivent à la destination, ils sont donnés à l'entité de TCP, qui reconstruit le chier original. Comme le protocole IP ne donne pas la garantie d'arrivée de paquets, c'est à TCP de retrouver les paquets perdus et de les retransmettre. Dans ce but, chaque fois que le destinataire TCP reçoit un paquet, il envoie à l'émetteur un paquet de petite taille, qui est appelé accusé de réception (acknowledgement, ACK), et qui contient des informations sur le paquet reçu. Le récepteur accuse réception du dernier paquet dans le ux continu de paquets qu'il reçoit. S'il y a un paquet qui n'arrive pas dans l'ordre, TCP envoie un ACK pour le dernier paquet qui était reçu dans l'ordre. Si l'émetteur reçoit plus que trois fois le même ACK, qui sont appelés duplicate ACK, il en deduit que le paquet du ux était perdu et le retransmet. Le temps qui passe entre l'envoi du paquet et la réception de l'ACK pour ce paquet est appelé round-trip-time (RTT) et est une notion importante pour l'étude de TCP. Le mécanisme de contrôle de congestion dans TCP est réalisé avec la fenêtre de congestion (cwnd), qui contrôle le volume de données qui peut être envoyé sans recevoir d'accusé de réception et en fait, contrôle le taux d'émission. Les algorithmes qui sont utilisés dans le contrôle de congestion sont : Slow Start, Congestion Avoidance, Fast Retransmit and Fast Recovery. L'algorithme Slow Start est utilisé au début de la transmission d'un chier pour déterminer la capacité du réseau. Pendant le Slow Start, TCP augmente sa fenêtre de congestion par un paquet pour chaque ACK reçu. Le Slow Start est ni quand la fenêtre de congestion atteint un seuil donné. Après l'algorithme Slow Start, l'algorithme Congestion Avoidance est utilisé. Pendant le Congestion Avoidance, la fenêtre de congestion est augmentée d'un paquet par RTT ou d'un paquet quand les données qui correspondent à la taille actuelle de la fenêtre de congestion

8.1 Introduction

127

reçoivent un accusé de réception. Le Congestion Avoidance est utilisé jusqu'à ce que la congestion soit détectée. Tandis que l'algorithme Slow Start augmente rapidement le taux d'émission de TCP au début du transfert de chier, au contraire, pendant la phase de Congestion Avoidance, le taux augmente lentement pour éviter le débordement du réseau. Pour détecter la congestion et la perte de paquets, TCP utilise une horloge. Pour retransmettre le paquet perdu plus vite que l'horloge expire, l'algorithme Fast Recovery est utilisé. Avec Fast Recovery, le paquet est retransmis s'il y a trois duplicate ACK qui ont été reçus. Avec l'algorithme Fast Recovery, la fenêtre de congestion est divisée par deux si la perte de paquet est détectée. Ceci aide TCP à restaurer la fenêtre de congestion plus vite que si elle était réduite à un paquet et ainsi contribue à obtenir un plus grand débit. L'implémentation de TCP Tahoe inclue Slow Start, Congestion Avoidance et Fast Recovery. Reno inclue les propriétés de Tahoe, plus Fast Retransmit. NewReno est une légère modication de Reno qui améliore les performances pendant Fast Recovery et Fast Retransmit. Dans notre étude et simulations, nous considérons la version de TCP NewReno.

8.1.2 Les problèmes dans les réseaux d'ordinateurs et les solutions proposées Les avantages et les désavantages des protocoles de contrôle de trac TCP est le protocole de transmission de données le plus utilisé sur Internet, car il est exible et assure une transmission able des données et un contrôle du trac. Au niveau des ux, TCP essaie de réaliser un partage juste de la capacité du serveur entre tous les ux présents dans la le d'attente. Etant donné qu'en général, le routeur n'utilise pas de discrimination entre les ux ou de politiques prioritaires, le partage de capacité dans un goulet d'étranglement dépend uniquement des taux d'émission de chaque ux. Donc, si les taux d'émission de chaque ux sont identiques, le partage de bande passante est lui aussi égal. L'avantage de la politique DropTail est sa simplicité. Elle n'a pas besoin de xer les valeurs de plusieurs paramètres, ni de garder d'information additionnelle sur les ux et l'état de la le d'attente. Cependant, il y a beaucoup de désavantages à la combinaison actuelle de DropTail et de protocole TCP. Nous pouvons citer la perte et la retransmission de paquets, la synchronisation globale, le partage injuste de bande passante, l'absence de Qualité de Service. Avec la politique DropTail, les paquets sont jetés quand le tampon est plein, TCP réduit son taux d'émission juste après que la perte de paquets est détectée, en conséquence de quoi ont lieu des retransmissions multiples des paquets dans le réseau. La politique DropTail ne fait pas de diérence entre les ux et donc il n'y pas de Qualité de Service. Quand plusieurs connections TCP partagent le même goulet d'étranglement, la bande passante du goulet d'étranglement est partagée injustement et les ux avec de petits RTTs béné-

128

Chapter 8 : Présentation des Travaux de Thèse

cient d'un avantage sur les ux avec de grands RTTs. Ce phénomène est dû au fait que pendant les moments de congestion toutes les connexions qui partagent le goulet d'étranglement diminuent leur taux d'émission, mais pour les connexions avec de petits RTT, l'augmentation du taux d'émission prend moins de temps que pour les connexions avec de grands RTTs. Donc le volume de données transférées au nal est beaucoup plus grand pour les connexions rapides que pour les connexions lentes. Egalement, le fait que toutes les connexions augmentent leurs taux d'émission presque en même temps crée la synchronisation globale sur Internet, qui cause la sous-utilisation des capacités du réseau. Il y a eu beaucoup de propositions pour augmenter la performance des couches Réseau et Transport sur Internet. Parmi celles-ci, nous pouvons citer network pricing, ECN, AQM et les algorithmes d'ordonnancement stochastiques. Network Pricing est une sorte de contrôle de congestion, où le coût de transmission est introduit. Rendre les transmissions sur Internet payantes peut éviter les congestions et peut forcer les utilisateurs à minimiser le volume de trac généré. De plus amples informations sur ce sujet peuvent être trouvées dans [Bre96, SCEH96, FR01, FR04, FORR98]. ECN est le nom du champ d'information qui est utilisé pour avertir l'émetteur TCP de la situation de congestion dans le réseau, [Flo95, RF99, RFB01]. Quand la congestion a lieu, le routeur marque les paquets avec le champ ECN au lieu de les jeter. Le paquet marqué ECN arrive à destination et le récepteur renvoie un acquittement (ACK) avec le champ ECN. Quand le ACK avec le champ ECN est reçu par l'émetteur, il réduit sa fenêtre de congestion comme si une perte de paquet était détectée. Donc, si le routeur, au lieu de jeter les paquets, les marque avec le champ ECN, la fenêtre de congestion de TCP est réduite mais il n'y a pas de raison d'eectuer une retransmission. Pour éviter le partage inéquitable des ressources dans l'Internet, plusieurs systèmes de gestion active de la le d'attente (Active Queue Management, AQM) ont été proposés. AQM est une famille d'algorithmes de suppression de paquets pour des les d'attente premier arrivé premier servi (FCFS) qui gèrent la durée des les d'attente de paquets en supprimant des paquets lorsque c'est nécessaire. Les algorithmes AQM informent l'expéditeur quant à la possibilité de congestion avant qu'un débordement de tampon ait lieu. Parmi les algorithmes AQM, nous pouvons citer RED [FJ93], GREEN [WZ02], BLUE [FSKS02], MLC(l) [SS07], CHOKe [PPP00], etc. Aucun d'entre eux n'a été largement mis en ÷uvre dans les réseaux en raison de leur complexité et de la sélection non triviale des valeurs des paramètres. Du point de vue utilisateur, la caractéristique la plus importante dans les réseaux informatiques est le temps d'attente, c'est à dire le temps qui s'écoule entre le clic de souris et l'apparition de la page désirée sur l'écran. Le délai dans les réseaux comprend le délai de transfert, le délai de propagation, le délai de traitement et le délai de mise en le d'attente. Dans les réseaux, le délai de mise en le d'attente et le délai qui est causé par les suppressions et les retransmissions de paquets représentent la plus grande partie du temps d'attente. Les délais d'attente dans le

8.1 Introduction

129

réseau peuvent être réduits grâce à des algorithmes d'ordonnancement ecaces. Alors que les algorithmes AQM permettent de déterminer quels paquets doivent être supprimés pour éviter la congestion dans le réseau, les algorithmes d'ordonnancement permettent de trouver les paquets qui seront servis les prochains et sont utilisés pour réduire le délai d'attente et gérer la répartition de bande passante entre les ux. Pour développer un algorithme d'ordonnancement ecace, il faut prendre en compte les problèmes spéciques de son domaine d'application. Dans le cas des réseaux informatiques, ces problèmes sont les suivants : le grand nombre de connexions partageant le lien du goulet d'étranglement, les caractéristiques du trac, les changements de taux d'envoi, les éventuels changements de la topologie du réseau et de ses propriétés, et ainsi de suite. Même s'il existe une multitude d'algorithmes d'ordonnancement, il n'est pas évident d'en trouver un qui soit à la fois ecace, évolutif et facile à mettre en ÷uvre, et qui n'ait pas besoin de connaître certains paramètres spéciques du système. Dans la sous-section suivante, nous donnons un bref aperçu des algorithmes d'ordonnancement qui ont été proposés pour être appliqués dans les réseaux informatiques et Internet.

Modélisation de réseaux d'ordinateurs avec l'ordonnancement stochastique De la théorie d'ordonnancement stochastique, il est connu que, en appliquant diérentes politiques d'ordonnancement à une le d'attente, il est possible d'inuer grandement sur les caractéristiques des systèmes. L'objectif de l'ordonnancement stochastique est de trouver un algorithme qui améliore les performances du système et, en même temps, qui soit simple à mettre en ÷uvre. Il est assez dicile de modéliser le réseau au niveau paquet, les arrivées de paquets se font par rafales, et les ux d'arrivée ne sont pas distribués selon un processus de Poisson, voir la Sous-section 8.1.1. Ainsi, les réseaux sont souvent modélisés au niveau des ux. Chaque chier envoyé par la connexion TCP est présenté comme une tâche et chaque routeur comme une le d'attente. Lorsque nous parlons de taille d'une tâche, nous considérons que le temps de service d'une tâche dans la le d'attente s'il n'y a pas d'autres tâches dans le système. Ainsi, dans la suite de ce travail, nous utilisons les termes de taille de tâche et temps de service de façon équivalente. Comme nous l'avons mentionné à la Sous-section 8.1.2, la part de bande passante sur le goulot d'étranglement des ux TCP dans le cas où leurs RTT sont du même ordre est bien modélisé par la politique de service à temps partagé, Processor Sharing (PS), cf. [HLN97, NMM98, MR00, FBP+ 01, CJ07]. Dans le cadre de la politique PS, chaque tâche présente dans le système reçoit une part égale de la capacité du processeur. La politique PS est simple à analyser, Kleinrock dans son livre [Kle76a, Sec. 4.4] a obtenu l'expression de la moyenne et la moyenne conditionnelle du temps de séjour dans le système M/G/1 soumis à la politique PS. Toutefois, la politique PS ne minimise pas le temps moyen de séjour dans le système.

130

Chapter 8 : Présentation des Travaux de Thèse Il est connu que la politique du plus court temps de traitement restant Shortest Remaining

Processing Time (SRPT), cf. [Kle76a, Ch. 3], minimise le temps moyen de séjour dans le système, voir aussi [Sch68]. La politique SRPT exige la connaissance de la taille des tâches, ce qui n'est pas toujours possible, comme le routeur ne dispose pas d'informations sur la taille du chier qui a été envoyé. Kleinrock dans son livre [Kle76b, Kle76a] donne un aperçu des politiques, qui ne font pas usage d'informations sur les tailles des tâches et sont appelées non-anticipatoires. Au cours des dernières années, ces politiques ont reçu une attention particulière en raison de leur possible application au partage des ressources dans les réseaux informatiques. Il est démontré dans [Yas87] que les politiques du moindre service obtenu, Least Attained Service (LAS) ou Foreground-Background (FB), cf. [Kle76a, Sec. 4.6], minimisent le temps de séjour moyen dans le système parmi toutes les politiques d'ordonnancement non-anticipatoires, quand la distribution de temps de service a un taux de hasard décroissant. Etant donné que c'est le cas pour la distribution de la taille des chier dans Internet, la politique LAS a reçu beaucoup d'attention et a été étudiée notamment dans [RS89, FM03b, RUKB03, RUKVB04, RBUK05]. Une étude sur LAS est présentée dans [NW08]. Cependant, la politique LAS a quelques inconvénients, par exemple, elle peut être très injuste vis-à-vis des longs ux dans certains cas et elle augmente dans de grandes proportions le temps de service pour les longs ux, cf. [FM03b]. De plus, le temps d'attente moyen dans le système sous LAS dépend beaucoup de la distribution de temps de service, cf. [RUKB03]. S'il y a un long ux dans le système qui a presque ni d'être servi et qu'un autre long ux arrive, alors le premier ux doit attendre presque tout le temps de service du second ux avant de pouvoir quitter le système. Le problème d'injustice de LAS avec les grandes tâches a été étudié dans [RUKB03, WHB03]. A propos de ce problème, dans [Bro06], il a été montré que quand le deuxième moment de la distribution de la taille des tâches est inni, LAS peut avoir un temps de séjour moyen conditionnel plus petit que PS. Dans le cas de distribution de la taille des tâches asymptotiquement Pareto avec le paramètre α < 1, 5, le temps de séjour moyen conditionnel dan LAS est toujours plus petit que dans PS. Les deux politiques SRPT et LAS donnent la priorité aux ux courts et minimisent donc le temps de séjour moyen dans le système. La distribution de la taille des chiers dans Internet est heavy-tailed et la plupart des ux sont de courte taille, voir Sous-section 8.1.1. En conséquence, il paraît logique de donner la priorité aux ux courts dans le réseau. La distinction entre les ux courts et les ux longs dans Internet a été largement étudiée, voir [GM01, NT02, GM02a, GM02b, RUKB02, RUKB03, FM03b, WBHB04, AANO04, AABN04]. Parmi les politiques diérenciant entre les ux, il y a la politique de service à temps partagé à niveaux multiples, Multi Level Processor Sharing (MLPS) qui a été présentée et décrite par Kleinrock, voir [Kle76a, Sec. 4.7]. Il montre que la moyenne des temps de séjour dans le système MLPS peut être assez réduit en comparaison avec le système PS. Lorsque la politique MLPS est

8.1 Introduction

131

appliquée, les tâches sont servis en fonction de leur service obtenu, jusqu'à atteindre un certain nombre de seuils. Dans [AANO04, AANO05], les auteurs montrent que lorsque la distribution de temps de service a une DHR, MLPS diminue le temps de séjour moyen dans le système par rapport à la discipline PS. Dans [AA06], les auteurs montrent que, avec MLPS, le délai moyen dans le système peut être très proche de l'optimal lorsque la distribution de temps de service a une DHR. Le cas particulier de MLPS avec deux niveaux, Two Level Processor Sharing (TLPS) et son application au partage des ressources des réseaux informatiques a été étudié dans [AANO04, AABN04]. Dans [AABN04], en se basant sur le modèle TLPS, les auteurs développent l'algorithme RuN2C et montrent qu'il réduit considérablement le temps de séjour moyen dans le système en comparaison avec la politique DropTail standard. Le temps de séjour moyen dans le modèle TLPS dépend fortement du choix du seuil de sélection, qui n'avait pas encore été étudié analytiquement. L'idée principale derrière les politiques LAS et TLPS est de donner la priorité aux ux courts, mais elles n'orent pas la possibilité de donner la préférence à certains ux. En revanche, la politique discriminatoire à temps partagé, Discriminatory Processor Sharing (DPS) permet d'introduire la Qualité de Service dans le réseau. DPS fournit une approche naturelle pour modéliser de partage des ressources entre les ux TCP avec diérents RTTs ou l'algorithme Round-Robin, qui est utilisé dans les systèmes d'exploitation. En outre, la politique DPS peut être utilisée pour modéliser les politiques de coût sur le serveur, lorsque les diérents services sont fournis selon les taux payés. DPS a été introduite par Kleinrock [Kle67]. En vertu de la politique DPS, les tâches sont séparés en classes et sont servis selon un vecteur de poids, de sorte que chaque classe a sa priorité dans le système. La politique DPS a été étudiée dans [FMI80, RS94, RS96, GM02b, AJK04, KNQB04, KNQB05, AABNQ05]. La plupart des résultats obtenus pour la le d'attente DPS ont été collectés dans l'étude [AAA06]. Toutefois, la sélection du vecteur des poids dans DPS n'est pas triviale en raison de la complexité du système. Le problème de trouver une politique optimale entre toutes les politiques d'ordonnancement non-anticipatoires dans la le d'attente M/G/1 a été résolu par Gittins dans [Git89]. Il a montré que, dans la le d'attente M/G/1, la politique qui donne le service à une tâche présent dans le système avec le plus haut valeur d'indice de fonction de Gittins minimise le temps de séjour moyen dans le système entre toutes les politiques d'ordonnancement non-anticipatoires. Les résultats bien connus de l'optimalité de la LAS pour la distribution de temps de service avec DHR peuvent être obtenus en tant que corollaire de l'optimalité de la politique de Gittins. Toutefois, ce résultat d'optimalité n'a pas reçu beaucoup d'attention et n'a pas été pleinement exploité.

132

Chapter 8 : Présentation des Travaux de Thèse

8.1.3 La contribution et l'organisation de la thèse Dans cette thèse, nous étudions le problème de partage des ressources dans les réseaux d'ordinateurs. Nous étudions quelques algorithmes stochastiques de la théorie de l'ordonnancement stochastique et leur application aux réseaux d'ordinateurs. Dans les Chapitres 2 à 5, nous étudions le problème de minimisation du temps moyen de sejour dans le système avec diérents algorithmes stochastiques. Dans le Chapitre 6, nous étudions le problème de contrôle de congestion dans les réseaux et nous introduisons un nouvel algorithme d'élimination de paquets dépendant des ux pour les routeurs de l'Internet, qui améliore la performance du réseau et l'équité entre les ux. Dans le Chapitre 2 nous étudions le modèle à temps partagé avec des arrivées en rafales Batch Processor Sharing (BPS) avec une distribution de temps de service hyper-exponentielle. Pour cette distribution nous résolvons l'équation intégrale de Kleinrock pour la fonction de temps de séjour moyen conditionnel et nous prouvons la concavité de cette solution par rapport à la taille des tâches. Nous appliquons les résultats trouvés pour trouver l'expression analytique des moyennes conditionnelle et inconditionnelle de temps de séjour dans le système TLPS dans le Chapitre 3. Nous utilisons également l'analyse de la le d'attente avec les arrivées en rafales pour dériver une expression du temps moyen conditionnel de réponse dans le Chapitre 5. Les résultats du Chapitre 2 sont publiés dans [Osi08a]. Dans le Chapitre 3, nous analysons la politique d'ordonnancement stochastique TLPS avec une distribution de temps de service hyper-exponentielle et un processus d'arrivée de Poisson. Dans la première partie du chapitre, nous étudions le cas où la distribution de temps de service a deux phases. Le choix de la distribution hyper-exponentielle avec deux phases est motivé par l'eet mice-elephant des tailles de chiers dans Internet, voir la Sous-section 8.1.1. Dans le cas de distribution de temps de service avec deux phases, nous trouvons une expression analytique du temps moyen de séjour dans le système et l'approximation de la valeur de seuil optimal qui minimise le temps moyen de séjour. Avec les résultats numériques, nous montrons que le temps moyen de séjour dans le système TLPS est très proche de l'optimal si la valeur approximée du seuil optimal est utilisée. Avec le simulateur NS-2, nous fournissons des simulations avec plusieurs valeurs de seuils dans le système TLPS. Les résultats de simulations montrent que la valeur approximée du seuil optimal que nous avons trouvée minimise le temps de réponse moyen dans le système TLPS par rapport aux autres valeurs de seuil et donne un gain relatif important en comparaison avec la politique DropTail. Dans la deuxième partie du Chapitre 3, nous étudions le système TLPS avec une distribution de temps de service hyper-exponentielle avec plusieurs phases. Dans ce cas-là, nous trouvons une borne supérieure ne pour le temps de séjour moyen conditionnel sur la taille des tâches. Nous montrons que quand la variance de la distribution de temps de service augmente, le gain de performance du système augmente aussi et la sensibilité au choix du seuil diminue. Les résultats

8.1 Introduction

133

du Chapitre 3 sont publiés dans [ABO07]. Dans le Chapitre 4, nous étudions la comparaison de deux politiques DPS avec des vecteurs de poids diérents. Nous montrons la monotonie du temps de réponse moyen dans le système en fonction du vecteur de poids sous certaines conditions sur le système. Les restrictions sur le système sont telles que le résultat est vrai pour les systèmes pour lesquels les valeurs des moyennes des tailles des tâches sont très diérentes les unes des autres. Les restrictions peuvent être surmontées en donnant les même poids aux classes qui ont des moyennes similaires. La condition est une condition susante, mais pas nécessaire. Elle est moins stricte si le système est moins chargé. Les résultats de ce chapitre peuvent être trouvés dans [Osi08b]. Dans le Chapitre 5, nous obtenons la politique optimale pour l'ordonnancement dans une le d'attente multi-classe avec un serveur unique. Nous appliquons les résultats de Gittins [Git89], où il avait trouvé la politique optimale qui minimise le temps moyen de sejour dans le système dans la le d'attente M/G/1 avec un serveur unique parmi toutes les politiques non-anticipatoires. Dans ce chapitre, nous montrons que l'extension des résultats de Gittins permet de caractériser la politique d'ordonnancement optimale dans la le d'attente M/G/1 multi-classe. Nous appliquons le résultat général dans plusieurs cas, lorsque la distribution de temps de service a un taux de hasard décroissant, comme Pareto et hyper-exponentielle. Nous montrons que dans le cas de plusieurs classes, la politique optimale est la politique prioritaire, dans laquelle les tâches de classes diérentes sont classiés sur plusieurs niveaux de priorité en fonction de leur service obtenu. Nous obtenons pour chaque classe l'expression du temps moyen conditionnel de séjour en utilisant une approche de tâche marquées. Avec ça, nous comparons numériquement le temps moyen de séjour dans le système entre les politiques de Gittins et les politiques populaires comme PS, FCFS et LAS. Comme dans Internet, la distribution de la taille des chiers est heavy-tailed et possède la propriété de DHR, voir Sous-section 8.1.1, la politique optimale de Gittins peut être appliquée dans les routeurs d'Internet, où les paquets générés par des applications diérentes doivent être servis. Typiquement, le routeur n'a pas d'accès au temps exact de séjour requis (en paquets) de la connexion TCP, mais il peut avoir l'accès au service atteint de chaque connexion. Ainsi, nous implémentons l'algorithme optimal de Gittins en NS-2 et nous faisons des simulations numériques pour évaluer le gain de performance possible. Dans le Chapitre 6, nous introduisons un nouvel algorithme d'élimination de paquets sensible aux ux pour les routeurs de l'Internet, appelé MarkMax. L'idée principale derrière MarkMax est de déterminer quelles connections doivent diminuer leurs taux d'émission, plutôt que de partir de l'idée de trouver quels paquets doivent être éliminés de la le d'attente. En contraste avec les algorithmes AQM déjà proposés, MarkMax utilise la diérentiation entre les ux qui sont présents dans le système et il coupe le taux d'émission des ux avec le taux d'émission le plus grand. MarkMax envoie un signal de congestion à la connexion choisie quand le nombre de paquets total en le d'attente (backlog) atteint le seuil donné. Le mécanisme de sélection est

134

Chapter 8 : Présentation des Travaux de Thèse

basé sur l'état des ux longs. En utilisant le modèle uide, nous dérivons quelques bornes qui peuvent être utilisées pour analyser le comportement de MarkMax et nous calculons le backlog par ux. Nous fournissons les résultats de simulations avec NS-2, nous comparons le MarkMax avec DropTail. Nous montrons que MarkMax améliore la performance du réseau et l'équité entre les ux quand les ux ont des RTTs susamment diérents. Nous spécions l'algorithme, accomplissons son analyse théorique, et fournissons les résultats de simulations qui illustrent la performance de MarkMax. Les résultats de ce Chapitre sont publiés dans [OBA08]. Nous donnons la conclusion et des pistes pour le travail futur dans le Chapitre 7.

8.2 File d'attente avec service à temps partagé avec des arrivées en rafales et avec une distribution de temps de service hyper-exponentielle

135

8.2 File d'attente avec service à temps partagé avec des arrivées en rafales et avec une distribution de temps de service hyperexponentielle Dans le Chapitre 2, nous étudions le modèle à temps partagé avec des arrivées en rafales Batch Processor Sharing (BPS). La politique BPS a été étudiée par Kleinrock et al. dans [KMR71]. Ils ont montré que la diérentielle d'une fonction du temps moyen de séjour satisfait une équation intégrale et ils ont trouvé la solution dans le cas où la distribution de temps de service a la forme F (x) = 1 − p(x)e−µx où p(x) est un polynôme. Ensuite, la politique BPS a été étudiée dans [Ban03], [RS93], [FM03a], [AAB05], [KK08]. Nous étudions la politique BPS avec la distribution de temps de service hyper-exponentielle. Pour ces distributions, nous résolvons l'équation intégrale de Kleinrock pour la fonction de temps de séjour moyen conditionnel et nous prouvons la concavité de cette solution par rapport à la taille des tâches. Nous notons que la concavité du temps moyen de séjour dans le système BPS avec la distribution de temps de service hyper-exponentielle a été trouvée avec une autre méthode dans [KK08]. Une des motivations de cette étude est que la le BPS apparaît naturellement lorsque l'on étudie les procédures d'ordonnancement favorisant les connexions courtes sur un réseau Internet. Ainsi, avec BPS, il est possible d'étudier la structure des arrivées en rafales, qui sont fréquentes dans les systèmes modernes comme les serveurs web. La politique à temps partagé avec deux niveaux, Two-Level Processor Sharing (TLPS) proposée par Kleinrock dans son livre, [Kle76b], favorise les connexions courtes dans le système. Avec les résultats sur la politique BPS, nous trouvons l'expression du temps moyen de séjour dans le système TLPS. Nous utilisons les résultats trouvés dans les études de la politique TLPS dans le Chapitre 3. Nous utilisons également l'analyse de la le d'attente avec les arrivées en rafales pour dériver une expression du temps moyen conditionnel de séjour dans le Chapitre 5. Les résultats du Chapitre 2 sont publiés dans [Osi08a].

8.2.1 Analyse d'une le d'attente avec service à temps partagé avec des arrivées en rafales Nous considérons le système M/G/1 avec des arrivées en rafales et avec la politique de service à temps partagé Processor Sharing (PS). Les rafales arrivent selon un processus de Poisson de taux λ. Soit n > 0 la taille moyenne d'une rafale, b > 0 le nombre moyen de tâches qui arrivent avec (et en addition) d'une tâche arbitraire qui est marqué à l'arrivée. Soit B(x) la distribution de temps de service et B = 1 − B(x) sa distribution complémentaire. La charge R∞ dans le système est donnée par ρ = λnm, ou m = 0 xdB(x). Nous considérons que le système

136

Chapter 8 : Présentation des Travaux de Thèse

est stable, ρ < 1. La distribution de la taille des chiers dans Internet est bien représentée par les distributions heavy-tailed, voir Sous-section 8.1.1, qui sont diciles à analyser. Dans [BM06, FW98], les auteurs montrent que les distributions heavy-tailed peuvent être approchées par les distributions hyper-exponentielles. Donc, dans notre travail, nous utilisons la fonction de distribution de temps de service hyper-exponentielle, X B(x) = 1 − pi e−µi x , 1 < N ≤ ∞,

pi > 0, µi > 0, i = 1, . . . , N ,

P

i pi

i

= 1. Sans perte de généralité, nous pouvons supposer que

0 < µN < µN −1 < . . . < µ2 < µ1 < ∞. Soit α(x) le temps de réponse moyen conditionnel dans le système BPS pour la tâche de taille

x et α0 (x) sa dérivée. Dans [Kle76a, Sec. 4.7], Kleinrock montre que α0 (x) satisfait l'équation intégrale-diérentielle suivante. Z ∞ Z x 0 0 α (x) = λn α (y)B(x + y)dy + λn α0 (y)B(x − y)dy + bB(x) + 1. 0

0

Nous prouvons le théorème suivant.

Theorem 8.1 Le temps de réponse moyen conditionnel dans le système BPS avec la distribution de temps de service hyper-exponentielle est donné par : X ck X ck e−bk x + , α(x) = c0 x − bk bk

α(0) = 0,

k

k

1 c0 = , 1−ρ Ã Q ! 2 2 b q (µq − bk ) Q ck = , 2λn bk q6=k (b2q − b2k )

k = 1, . . . , N,

P pi où les coecients bk , k = 1, . . . , N sont les solutions de l'équation 1 − λn i s+µ = 0 et sont i tous positifs, distincts, réels et satisfont l'inégalité suivante : 0 < bN < µN , µi+1 < bi < µi , i = 1, . . . , N − 1. Avec le résultat du Théorème 8.1 nous montrons que le temps moyen conditionnel de séjour dans le système BPS avec la distribution de temps de service hyper-exponentielle est une fonction strictement concave. Ce résultat est aussi montré dans l'article [KK08] avec un méthode diérent. Nous montrons que le nombre moyen de tâche qui sont dans le système et qui ont obtenu la quantité de service x est une fonction décroissante en x. Avec le résultat du Théorème 8.1 nous trouvons l'expression du temps moyen de séjour dans le système BPS, T

BP S

, avec la distribution de temps de service hyper-exponentielle, X pi cj m BP S T = + . 1−ρ µi + bj i,j

8.3 Choix du seuil optimal pour la le d'attente munie d'une politique à temps partagé avec deux niveaux 137

8.2.2 Analyse d'une politique à temps partagé avec deux niveaux Dans la deuxième partie du Chapitre 2, nous étudions la politique TLPS. Nous considérons que la distribution de temps de service F (x) est hyper-exponentielle. Soit F (x) = 1 − F (x). La description du système TLPS est donnée dans la Sous-section 8.3. Soit Xθn le n-ième moment et ρθ le facteur d'utilisation pour la distribution F (x) qui est tronqué en θ. Soit Wθ le travail inachevé moyen dans le système avec la distribution de temps de service tronquée en θ. En utilisant les résultats du Théorème 8.1, nous trouvons le temps moyen de séjour T (θ) dans le système TLPS avec la distribution de temps de service hyperexponentielle.

T (θ) = +

Xθ1 + Wθ F (θ) (m − Xθ1 ) + 1 − ρθ 1−ρ Q (µ2q − b2j (θ)) Fθi (Wθ + θ) X Q q 2 , 1 − ρθ bj (θ)(µi + bj (θ)) q6=j (bq (θ) − b2j (θ))

(8.1)

i,j

où les bi (θ), i = 1, . . . , N sont les racines de la fonction rationelle 1 −

λ 1−ρθ

P

Fθi i s+µi

= 0, et satis-

font les inégalites suivantes : 0 < bN (θ) < µN , µi+1 < bi (θ) < µi , i = 1, . . . , N − 1. Ici Fθi = pi e−µi θ ,

i = 1, . . . , N .

8.3 Choix du seuil optimal pour la le d'attente munie d'une politique à temps partagé avec deux niveaux Dans le Chapitre 3, nous analysons la politique d'ordonnancement stochastique TLPS. La politique TLPS utilise la diérentiation des tâches en utilisant un seuil sur le service atteint par les tâches et donne la préférence aux tâches courts. La politique de service TLPS peut être utilisée pour ordonner l'accès aux ressources en fonction de la taille dans un réseau TCP/IP, [AANO04, AABN04, FM03b], ou sur un serveur Web, [GM02a, HBSBA03]. Dans [AA06], les auteurs montrent que quand la distribution de temps de service a un taux de hasard décroissant, la performance du TLPS avec un choix du seuil approprié peut être très proche de l'optimal. Le problème du choix du seuil dans le système TLPS est un problème très important. Dans le Chapitre 3 nous étudions le système ordonné par la politique TLPS avec une distribution de temps de service hyper-exponentielle et un processus d'arrivée de Poisson. Dans la première partie du chapitre, nous étudions le cas de la distribution hyper-exponentielle avec deux phases. Le choix de la distribution avec deux phases est motivé par l'eet miceelephant des tailles de chiers dans Internet, voir la Sous-section 8.1.1. Dans le cas d'une distribution hyper-exponentielle avec deux phases, nous trouvons une expression analytique du

138

Chapter 8 : Présentation des Travaux de Thèse

temps moyen de séjour dans le système et l'approximation de la valeur de seuil optimal qui minimise le temps moyen de séjour. Avec les résultats numériques, nous montrons que le temps moyen de séjour dans le système TLPS est très proche de l'optimal si la valeur approchée du seuil optimal est utilisée. Avec le simulateur NS-2, nous fournissons des simulations avec plusieurs valeurs de seuils dans le système TLPS. Les résultats de simulations montrent que la valeur approchée du seuil optimal que nous avons trouvée minimise le temps moyen de s'ejour dans le système TLPS par rapport aux autres valeurs de seuil et donne un gain relatif important en comparaison avec la politique DropTail. La politique DropTail est une politique FCFS au niveaux des paquets dans la le d'attente, voir Sous-Section 8.1.1. Les résultats du Chapitre 3 sont publiés dans [ABO07].

8.3.1 Description du modèle Soit θ un seuil donné. Il y a deux les d'attente dans le système, la le d'attente de priorité haute et la le d'attente de priorité basse. Quand une tâche arrive dans le système, il est servi dans la le d'attente de priorité haute jusqu'à ce qu'il obtienne la quantité de service θ. Après que la tâche a été servi jusqu'au seuil θ, il est envoyé dans la le d'attente de priorité basse. Chacune des deux les d'attente est servie avec la politique PS. La le d'attente de priorité basse est servie quand la le d'attente de priorité haute est vide. La le d'attente de priorité basse est une le d'attente avec arrivées en rafales, voir aussi [Kle76a, Sec. 4.7]. Soit F (x) la fonction de la distribution de temps de service, F (x) = 1 − F (x).

F (x) = 1 −

N X

pi e−µi x ,

(8.2)

i=1

PN

= 1. Les tâches arrivent dans le système selon un R∞ R∞ processus de Poisson de taux λ. Soient m = 0 x dF (x) et d = 0 x2 dF (x) respectivement le µi > 0, pi ≥ 0, i = 1, . . . , N , et

i=1 pi

premier et le deuxième moment de la distribution de temps de service. Soit ρ = λm la charge du système, ρ < 1. Dans le Chapitre 2, nous trouvons l'expression du temps moyen dans le système TLPS, T (θ), voir (8.1), que nous utilisons dans la Sous-section présente.

8.3.2 Distribution de temps de service hyper-exponentielle avec deux phases Nous considérons le cas où la fonction de la distribution de temps de service est hyperexponentielle avec deux phases,

F (x) = 1 − p1 e−µ1 x − p2 e−µ2 x , où p1 + p2 = 1 et p1 , p2 > 0. Nous considérons une taille moyenne des tâches dans la première classe très petite et dans la deuxième classe très grande,

1 µ2

>>

1 µ1 .

Soit ² =

µ2 µ1 .

Nous trouvons

8.3 Choix du seuil optimal pour la le d'attente munie d'une politique à temps partagé avec deux niveaux 139 une approximation du seuil optimal dans le système quand ² → 0 et nous prouvons le théorème suivant.

Theorem 8.2 Soit θopt la valeur du seuil optimal, θopt = arg min T (θ). La valeur θ˜opt donnée par θ˜opt =

1 ln µ1 − µ2

µ

(µ1 − λ) µ2 (1 − ρ)

¶

0 approche θopt de façon que T (θ˜opt ) = ρ(µ2 /µ1 ).

Nous montrons que la performance du système TLPS peut être signicativement meilleure que la performance du système PS par la sélection de paramètres spéciques. Avec les résultats numériques, nous montrons que le gain relatif entre le temps moyen de séjour dans le système TLPS avec l'approximation du seuil trouvé et le temps moyen de séjour dans le système PS peut monter jusqu'à 50% quand la charge du système augmente. Nous montrons que le temps moyen de séjour dans le système TLPS avec l'approximation du seuil optimal est très proche de la valeur optimale. Nous étudions la sensibilité de choix du seuil et nous montrons que la valeur du seuil n'est pas très sensible à proximité de la valeur optimale. Il vaut mieux choisir le seuil plus grand que plus petit. Avec le logiciel NS-2, nous simulons la mise en ÷uvre de l'algorithme TLPS dans la le d'attente d'un routeur. Nous fournissons les simulations pour le cas de deux classes de tâches qui arrivent dans le système. Nous comparons le temps moyen de séjour dans les systèmes dirigés par les politiques TLPS, LAS et DropTail, voir Sous-Section 8.1.1. Pour la politique TLPS nous fournissons les résultats pour des valeurs de seuil diérentes. Nous montrons que l'approximation du seuil analytiquement trouvée minimise le temps moyen de séjour dans le système TLPS parmi les valeurs possibles des seuils. Aussi, le temps moyen de séjour dans le système TLPS avec l'approximation du seuil trouvée est très proche de la valeur du temps moyen dans le système LAS, qui est optimal. Le gain relatif maximum obtenu avec la politique simul en comparaison avec la politique DropTail monte a 35.7%, quand le TLPS quand θ = θ˜opt gain relatif obtenu avec la politique optimale LAS en comparaison avec la politique DropTail est 36.7%.

8.3.3 Distribution de temps de service hyper-exponentielle avec plusieurs phases Dans la deuxième partie du Chapitre 3, nous étudions le système TLPS avec une distribution de temps de service hyper-exponentielle avec plusieurs phases. Soit la distribution de temps de service est donnée par (8.2), où 1 < N < ∞. Dans le cas d'une distribution hyper-exponentielle avec plusieurs phases, le problème de trouver l'expression explicite de la valeur du seuil optimal

140

Chapter 8 : Présentation des Travaux de Thèse

est très dicile. Dans la deuxième partie du Chapitre 3, nous obtenons une borne supérieure du temps moyen de séjour dans le système TLPS. La borne trouvée a une expression simple en fonction des paramètres du système et peut être utilisée facilement dans les simulations numériques. Nous utilisons les notations suivants : F (x) est la distribution de temps de service, m et

d sont respectivement le premier et le deuxième moment de la distribution F (x). Soit Xθn le n-ième moment et ρθ le facteur d'utilisation pour la distribution F (x) tronquée en θ. Soit Wθ le travail inachevé moyen dans le système avec la distribution de temps de service tronquée en

θ. D'apres Pollaczek-Khinchin Wθ =

λXθ2 . 2(1 − ρθ )

Soit Fθi = pi e−µi θ , i = 1, . . . , N . Soit b(θ) le nombre moyen de tâches qui arrivent dans le batch avec (et en addition) d'une tâche arbitraire qui est marqué à l'arrivée. Nous prouvons le théorème suivant.

Theorem 8.3 Une borne Υ(θ) pour la fonction de temps moyen de séjour T (θ) dans le système TLPS avec la distribution de temps de service hyper-exponentielle avec plusieurs phases est : T (θ) ≤ Υ(θ) =

X Fi Fj Xθ1 + Wθ F (θ) m − Xθ1 b(θ) θ θ + + . 1 − ρθ 1−ρ F (θ)(1 − ρ) i,j µi + µj

Avec les résultats numériques, nous montrons que cette borne est très proche et peut être utilisée comme une approximation de la fonction du temps moyen de séjour. Nous montrons que quand la variance de la distribution de temps de service augmente, le gain obtenu en utilisant TLPS par apport à PS est considérable et la sensibilité au choix du seuil sous optimal décroît.

8.4 Comparaison des politiques discriminatoires à temps partagé Dans le Chapitre 4, nous analysons la politique discriminatoire à temps partagé Discriminatory Processor Sharing (DPS), qui a été introduite par Kleinrock [Kle67]. La politique DPS peut être appliquée dans beaucoup de domaines comme les télécommunications, les applications Web et la modélisation de ux TCP. Avec DPS, les tâches arrivent dans le système en plusieurs classes. La priorité de service dans le système entre les classes des tâches est contrôlée par un vecteur de poids. En modiant le vecteur de poids, il est possible de contrôler les taux de service des tâches, donner la priorité à certaines classes de tâches et optimiser certaines caractéristiques du système. Le problème du choix des poids est donc très important et très dicile en raison de la complexité du système.

8.4 Comparaison des politiques discriminatoires à temps partagé

141

Le système DPS a été étudié dans [FMI80], [RS94], [RS96, KNQB05], [KNQB04], [GM02b], [AJK04]. La plupart des résultats sur la politique DPS sont regroupés dans la revue [AAA06]. Le problème de la sélection des poids dans le système DPS quand les distributions de temps de service sont exponentielles a été étudié dans [AABNQ05], [KK06]. Dans [KK06], les auteurs montrent que le temps moyen de séjour dans le système DPS diminue en comparaison avec le système PS quand les poids diminuent dans l'ordre opposé des moyennes des classes. Egalement dans [KK06], les auteurs formulent la conjecture sur la monotonie du temps moyen de séjour dans le système DPS. L'idée de la conjecture est que quand nous comparons deux systèmes DPS avec des vecteurs de poids diérents, celui qui a le vecteur de poids le plus proche du vecteur optimal, qui est appelé cµ-rule [Rig94], a le temps moyen de séjour le plus petit. En utilisant la méthode décrite dans [KK06], nous prouvons cette conjecture avec quelques restrictions sur les paramètres du système. Les restrictions sur les paramètres sont telles que le résultat est vrai pour les systèmes pour lesquels les valeurs des moyennes sont très diérents les uns des autres. Le modèle DPS est un modèle de le d'attente avec un seul serveur et plusieurs classes de tâches qui arrivent dans le système. Tous les tâches sont organisées en N classes et partagent le même serveur. Les tâches de classe k = 1, . . . , N arrivent selon un processus de Poisson de taux λk et ils ont la distribution de temps de service Fk (x) = 1 − e−µk x de moyenne 1/µk . La P charge dans chaque classe est ρk , k = 1, . . . , N , la charge du système est ρ = N k=1 ρk . Nous PN considérons que le système est stable, ρ < 1. Soit λ = k=1 λk . Tous les tâches présentes dans le système sont servis simultanément avec les taux qui sont contrôlés par le vecteur de poids

g = (g1 , . . . , gN ). Si à un moment donné dans la classe k il y a Nk tâches, k = 1, . . . , N , chaque P tâche de classe k est servie avec un taux gk / N j=1 gj Nj . Donc, la capacité obtenue par chaque classe dépend du nombre de tâches de chaque classe qui sont présents dans le système. Quand tous les poids gk , k = 1, . . . , N sont égaux, la politique DPS est équivalente a une politique PS. Soit T

DP S

(g) le temps moyen de séjour dans le système DPS avec le vecteur g . Nous avons T

DP S

(g) =

N X λk k=1

λ

T k (g),

où T k (g) est le temps moyen de séjour pour les tâches de classe k . Les expressions pour les

T k (g), k = 1, . . . , N peuvent être trouvées comme les solutions du système linéaire suivant, voir [FMI80],

 T k (g) 1 −

N X j=1



N

X λj gj T j (g) λj gj 1 − = , µj gj + µk gk µj gj + µk gk µk j=1

k = 1, . . . , N . Nous notons que pour le système PS le temps moyen de séjour est égal a T ρ/λ 1−ρ .

PS

=

Nous utilisons la méthode utilisée dans le papier [KK06] et nous prouvons le théorème

suivant.

142

Chapter 8 : Présentation des Travaux de Thèse

Theorem 8.4 Supposons que la distribution de temps de service de chaque classe est exponentielle avec les moyennes 1/µi , i = 1, . . . , N et qu'ils sont renumérotés de la façon suivante µ1 ≥ µ2 ≥ . . . ≥ µN .

Supposons que nous considérons deux poids diérents pour la politique DPS, α et β , qui satisfont les propriétés suivantes α1 ≥ α2 ≥ . . . ≥ αN , β1 ≥ β2 ≥ . . . ≥ βN .

Les temps moyens de séjour des systèmes DPS avec les vecteurs α et β satisfont T

DP S

(α) ≤ T

DP S

(β),

si les vecteurs de poids α et β satisfont : αi+1 βi+1 ≤ , αi βi

i = 1, . . . , N − 1,

(8.3)

et si la restriction suivante est satisfaite : µj+1 ≤ 1 − ρ, µj

(8.4)

pour chaque j = 1, . . . , N .

Remark 8.1 Si pour certaines classes j et j + 1 la condition (8.4) n'est pas satisfaite, avec le choix de poids identiques pour ces classes, nous pouvons utiliser le résultat du Théorème 8.4. µ Pour les classes j et j + 1 telles que µj+1 > 1 − ρ, si nous posons αj+1 = αj et βj+1 = βj , alors j la condition (8.3) du Théorème 8.4 est vériée. Le Théorème 8.4 montre que le temps moyen de séjour T

DP S

(g) est une fonction monotone

de la sélection du vecteur de poids g . Plus le vecteur des poids est proche du vecteur de poids optimal, cµ-rule, plus le temps moyen de séjour dans le système diminue. C'est montré par la condition (8.3), qui montre que le vecteur α est plus proche de la politique optimale cµ-rule que le vecteur β . Le Théorème 8.4 est aecté de la restriction (8.4). Cette restriction est une condition susante, mais pas nécessaire sur les paramètres du système. Elle demande que les moyennes des classes du système soient très diérentes les unes des autres. Cette restriction peut être enlevée en donnant les même poids aux tâches des classes qui ont les moyennes proches. La condition (8.4) est moins stricte quand la charge du système est plus petite. Les résultats trouvés peuvent être utilisés comme guides pour le choix du vecteur de poids dans le système DPS.

8.5 Politique d'ordonnancement optimale dans une le d'attente multi-classe avec un serveur unique 143

8.5 Politique d'ordonnancement optimale dans une le d'attente multi-classe avec un serveur unique Dans le Chapitre 5, nous obtenons la politique optimale pour l'ordonnancement dans une le d'attente multi-classe avec un serveur unique. Nous appliquons les résultats de Gittins [Git89], où il avait trouvé la politique optimale qui minimise le temps moyen de séjour dans le système M/G/1 avec un serveur unique parmi toutes les politiques non-anticipatoires. Dans le Chapitre 5, nous montrons que l'extension des résultats de Gittins permet de caractériser la politique d'ordonnancement optimale dans la le d'attente M/G/1 multi-classe. La distribution de la taille des chiers dans Internet est bien représentée par les distributions heavy-tailed avec un taux de hasard décroissant , voir Sous-section 8.1.1. En particulier, nous étudions l'ordonnancement optimal dans les cas suivants de la distribution de temps de service : deux classes de Pareto, plusieurs classes de Pareto, une classe hyper-exponentielle et une classe exponentielle. En utilisant l'approche de tagged-job et la méthode de collective marks [Kle76b] nous obtenons le temps moyen conditionnel de séjour pour chaque classe. Nous comparons le temps moyen de séjour dans le système de Gittins et dans les systèmes populaires comme PS, LAS et FCFS.

8.5.1 Politique de Gittins dans une le d'attente multi-classe M/G/1 Nous considérons une le d'attente M/G/1 avec un serveur unique. Nous considérons les politiques non-anticipatoires. Soit π une politique non-anticipatoire. Soit F (x) la distribution de temps de service, f (x) = F 0 (x), F (x) = 1 − F (x) et la fonction de hasard h(x) = f (x)/F (x). π

π

Soit T (x) le temps moyen conditionnel de séjour pour la tâche de taille x et T le temps moyen de séjour dans le système avec la politique d'ordonnancement π . Soit a le service atteint par une tâche. L'index de Gittins G(a) est une fonction du service atteint par une tâche dans le système, (voir Chapitre 5), et est déni par Gittins dans [Git89]. La politique de Gittins πg , voir [Git89], est la politique d'ordonnancement, qui donne le service à la tâche dans le système avec l'index de Gittins G(a) le plus haut, où a est le service atteint par la tâche. La politique de Gittins minimise le temps moyen de séjour dans le système entre toutes les politiques d'ordonnancement non-anticipatoires. Nous généralisons le résultat de Gittins dans le cas de systèmes multi-classe. Supposons qu'il y ait N classes dans le système. Soit Fi (x) la distribution de temps de service dans chaque classe i = 1, . . . , N , fi (x) = Fi0 (x), F i (x) = 1 − Fi (x), hi (x) = fi (x)/Fi (x). Le temps moyen π

conditionnel de séjour pour la tâche de taille x de classe i est T i (x), i = 1, . . . , N , et le temps π

moyen de séjour dans le système avec la politique d'ordonnancement π est T . Pour chaque classe

i = 1, . . . , N , nous pouvons dénir l'index de Gittins Gi (a) et nous pouvons dénir la politique de Gittins pour le système multi-classe. Dans le système M/G/1 multi-classe, la politique qui

144

Chapter 8 : Présentation des Travaux de Thèse

donne le service à la tâche avec l'index Gi (a) le plus haut, où a est le service atteint par la tâche, est la politique optimale qui minimise le temps moyen de séjour dans le système. En utilisant la dénition de l'index de Gittins nous trouvons que dans le cas où le taux de hasard de chaque classe est non-croissant, Gi (a) = hi (a). Nous dénissons la politique optimale dans le système multi-classe avec les distributions de temps de service avec le taux de hasard non-croissant.

Politique optimale Dans le système M/G/1 multi-classe avec les taux de hasard non-croissants, la politique qui donne le service à la tâche avec la valeur hi (a) la plus haute, où a est le service atteint par la tâche, est la politique optimale qui minimise le temps moyen de séjour dans le système. Nous notons que dans le système avec une classe unique, la politique multi-classe de Gittins est la politique LAS, étant donné que tous les taux de hasard sont identiques et donc la politique optimale donne le service à la tâche avec le service atteint le plus petit.

8.5.2 Deux classes Pareto Nous considérons le cas où il y a deux classes dans le système, la distribution de temps de service de chaque classe est Pareto, les tâches arrivent selon des processus de Poisson de taux

λ1 et λ2 . Fi (x) = 1 −

bi , (x + bi )ci

i = 1, 2.

(8.5)

Ici bi = mi (ci − 1), mi est la moyenne de la classe-i, i = 1, 2, fi (x) = bci i ci /(x + bi )ci +1 , i = 1, 2 et les fonctions de taux de hasard sont

hi (x) =

ci , (x + bi )

i = 1, 2.

Soit g(x) la fonction telle que

h1 (x) = h2 (g(x)),

h1 (θ) = h2 (0).

Nous supposons que c1 > c2 et nous donnons la description de la politique optimale.

La politique optimale Les tâches dans le système sont servies dans deux les d'attente, la le d'attente de priorité haute et la le d'attente de priorité basse. Les tâches de classe-1 qui ont atteint le service a < θ sont servis dans la le d'attente de priorité haute avec la politique LAS. Quand les tâches de classe-1 atteignent le service θ, ils sont envoyés dans la le d'attente de priorité basse. Les tâches de classe-2 sont placés directement dans la le d'attente de priorité basse. La le d'attente de priorité basse est servie seulement si la le d'attente de priorité haute est vide. Dans la le d'attente de priorité basse, les tâches sont servis de la façon suivante : le service est donné à la

8.5 Politique d'ordonnancement optimale dans une le d'attente multi-classe avec un serveur unique 145 tâche avec la valeur de hi (a) la plus haute, où a est le service atteint par la tâche. Donc, pour chaque tâche de classe-1 avec le service atteint a, la fonction h1 (a) est calculée. Pour chaque tâche de classe-2 avec le service atteint a, la fonction h2 (a) est calculée. Ensuite, toutes les valeurs de hi (a) sont comparées et la tâche avec le hi (a) le plus haut est servi.

Les temps moyens de séjour En utilisant l'approche tagged-job et la méthode de collective marks [Kle76b], nous trouvons les expressions des temps moyens conditionnels de séjour pour chaque classe. Soit Xyn

(i)

(i)

le n-ième moment et ρy le facteur d'utilisation pour la distribution Fi (x) qui est

tronqué à y pour i = 1, 2. Soit Wx,y le travail inachevé moyen dans le système qui consiste des tâches de classe-1 de taille inférieure à x et des tâches de classe-2 de taille inférieure à y . D'apres de la formule de Pollaczek-Khinchin

Wx,y =

λ1 Xx2

(1)

+ λ2 Xy2 (1)

(2)

(2)

2(1 − ρx − ρy )

.

Maintenant nous pouvons formuler le théorème suivant.

Theorem 8.5 Dans la le d'attente M/G/1 avec deux classes distribuées avec Pareto donnés par (8.5), et qui sont ordonnées par la politique de Gittins, les temps moyens conditionnels de séjour pour les tâches des classes-1 et 2 sont T1 (x) =

x + Wx,0

, x < θ, (1) 1 − ρx x + Wx,g(x) T1 (x) = , x ≥ θ, (1) (2) 1 − ρx − ρg(x) T2 (g(x)) =

g(x) + Wx,g(x) (1)

(2)

1 − ρx − ρg(x)

,

x ≥ θ.

Les résultats numériques Avec les résultats numériques, nous comparons le temps moyen de séjour dans le système ordonné par les politiques de Gittins, LAS, PS et FCFS. Nous choisissons les paramètres du système de façon que la classe-1 représente les chiers petits et la classe-2 représente les chiers larges dans Internet. Nous montrons que la politique de Gittins minimise le temps moyen de séjour dans le système et donne le gain relatif par rapport à la politique LAS aux environs de

25 − 20% quand la charge du système est 90%.

Les résultats des simulations Nous mettons en ÷uvre l'algorithme de Gittins sur la le d'attente de routeur pour le cas de deux classes Pareto dans le système. Il y a deux connexions TCP qui envoient les chiers et qui utilisent le même goulet d'étranglement. Chaque connexion représente une classe dans le système. Les distributions de la taille des chiers sont Pareto, les fonctions de taux de hasard sont

146

Chapter 8 : Présentation des Travaux de Thèse

h1 (x) et h2 (x). Sur le routeur, nous gardons la trace du service atteint par chaque connexion pendant une certaine période après laquelle il n'y a plus de paquets de la connexion dans le système. Chaque fois qu'il y a au moins un paquet à servir dans la le d'attente du routeur, l'algorithme sert le premier paquet de la connexion avec hi (a) le plus grand, où a est le service obtenu par la connexion, i est la classe de la connexion. Nous fournissons les simulations dans le système avec les politiques d'ordonnancement de Gittins, LAS et DropTail (voir Sous-Section 8.1.1) et nous comparons le temps moyen de séjour dans le système pour ces politiques. Nous trouvons que le gain relatif entre le temps moyen de séjour de la politique de Gittins en comparaison avec la politique LAS peut être de 10% quand la charge donné par la classe-1 est de 50%. Quand la charge donnée par la classe-1 est moins lourde, le gain relatif est plus petit. Dans le système analytique correspondant, le gain relatif entre les politiques de Gittins et LAS est beaucoup plus important et monte vers 36%. Nous expliquons ce résultat par l'inuence de mécanisme de TCP.

8.5.3 Autres cas Nous considérons le cas où les tâches arrivent dans le système en plusieurs classes qui sont distribuées selon Pareto, et où les tâches arrivent en deux classes, qui suivent des distributions hyper-exponentielle et exponentielle. Dans ces cas, nous donnons les descriptions des politiques optimales et nous dérivons les expressions des temps moyens de séjour pour chaque classe.

8.5.4 Conclusions Nous appliquons le résultat général de Gittins dans plusieurs cas, lorsque la distribution de temps de service a un taux de hasard décroissant, comme Pareto et hyper-exponentielle. Nous montrons que dans le cas de plusieurs classes, la politique optimale est la politique prioritaire, dans laquelle les tâches de classes diérentes sont classiés sur plusieurs niveaux de priorité en fonction de leur service obtenu. Nous obtenons pour chaque classe l'expression du temps moyen conditionnel de séjour en utilisant une approche de tâches marqués. Avec ça, nous comparons numériquement le temps moyen de séjour dans le système entre les politiques de Gittins et les politiques populaires comme PS, FCFS et LAS. Etant donné que, dans Internet, la distribution de la taille des chiers est heavy-tailed et possède la propriété de taux de hasard décroissant, voir Sous-section 8.1.1, la politique optimale de Gittins peut être appliquée dans les routeurs d'Internet, où les paquets générés par des applications diérentes sont servis. Typiquement, le routeur n'a pas d'accès au temps exact de service requis (en paquets) de la connexion TCP, mais il peut avoir l'accès au service atteint de chaque connexion. Ainsi, nous implémentons l'algorithme optimal de Gittins en NS-2 et nous faisons des simulations numériques pour évaluer le gain de performance possible. Nous trouvons que l'algorithme améliore la performance dans

8.6 Amélioration de l'équité de TCP avec la politique MarkMax

147

le système par rapport aux politiques DropTail et LAS.

8.6 Amélioration de l'équité de TCP avec la politique MarkMax Il est connu que si des connexions TCP avec des RTTs diérents partagent le même goulet d'étranglement, les connexions TCP avec les RTTs plus petits prennent la part la plus importante de la bande passante, voir [Flo91, Man90]. Ce problème a été étudié dans [LM97], [Bro00], [ABL+ 00], [AJN02], [AART06], etc. Pour éviter un partage inéquitable des ressources dans l'Internet, plusieurs systèmes de gestion active de la le d'attente Active Queue Management (AQM) ont été proposés. AQM est une famille d'algorithmes de suppression de paquets pour des les d'attente premier arrivé premier servi (FCFS) qui gèrent la longueur des les d'attente de paquets en supprimant des paquets de maniere préventive. Les algorithmes AQM informent l'expéditeur quant à la possibilité de congestion avant qu'un débordement de tampon ait lieu. Parmi les algorithmes AQM, nous pouvons citer RED [FJ93], GREEN [WZ02], BLUE [FSKS02], MLC(l) [SS07], CHOKe [PPP00], etc. Aucun d'entre eux n'a été largement mis en ÷uvre dans les réseaux en raison de leur complexité et de la sélection non triviale des valeurs des paramètres. Nous introduisons un nouvel algorithme d'élimination de paquets sensible aux ux pour les routeurs de l'Internet, appelé MarkMax. L'idée principale derrière MarkMax est de déterminer quelles connexions doivent diminuer leurs taux d'émission, plutôt que de partir de l'idée de trouver quels paquets doivent être éliminés de la le d'attente. En contraste avec les algorithmes AQM déjà proposés, MarkMax utilise la diérentiation entre les ux qui sont présents dans le système et il réduit le taux d'émission des ux avec le taux d'émission le plus grand. MarkMax envoie un signal de congestion à la connexion choisie quand le nombre de paquets total en le d'attente (backlog) atteint le seuil donné. Le mécanisme de sélection est basé sur l'état des ux longs. Nous proposons également d'utiliser le mécanisme ECN [RFB01] pour réduire le nombre de retransmissions de paquets. Les résultats de ce Chapitre sont publiés dans [OBA08].

8.6.1 L'algorithme MarkMax L'algorithme MarkMax utilise trois paramètres, les seuils θ, θl , θh , qui sont sélectionnés de façon à ce que θl < θ < θh . Le seuil θ est un déclencheur (trigger), quand la taille de backlog est plus haute que θ, l'algorithme envoie le signal de congestion à la connexion choisie, qui réduit son taux d'émission. Pour envoyer le signal de congestion à la connexion, l'algorithme marque le premier paquet de la connexion dans la le d'attente avec un ag ECN. Quand le récepteur reçoit un paquet avec un ag ECN, il envoie un ACK avec un ag ECN également. Quand l'émetteur reçoit le ACK avec le ag ECN, il réduit sa fenêtre de congestion de moitié. Pour plus de détails, voir la Sous-section 8.1.1. Nous proposons deux façons pour choisir la connexion à couper, MarkMax-T et MarkMax-B (MM-T et MM-B). Nous décrivons ces deux algorithmes

148

Chapter 8 : Présentation des Travaux de Thèse

plus tard. Nous avons besoin de deux autres seuils pour MarkMax parce que nous considérons un système qui assure la transmission des paquets et que ce système subit un retard non négligeable dû à la propagation et à la mise en le d'attente. Soit q la taille de la le d'attente et flag la variable booléenne initialisée comme true. Chaque fois qu'un nouveau paquet arrive dans le système, l'algorithme suivant est exécuté : mettre un paquet dans la le d'attente

if q ≤ θl or q ≥ θh then flag ← true if q ≥ θ and flag =true then a. choisir la connexion avec MM-B ou MM-T b. mettre le ag ECN dans le premier paquet de la connexion selectionnee c. flag ← false Les seuils θh et θl sont utilisés pour déterminer si le signal de congestion doit être envoyé ou non quand q ≥ θ. Après que le signal de congestion a été envoyé, l'autre signal de congestion n'est pas envoyé tant que la taille de la le d'attente est dans l'intervalle [θl , θh ]. Le seuil θh est un mécanisme de sécurité qui évite les situations suivantes : après que le taux d'émission d'une connexion coupée est réduit, il est possible que le taux d'émission total ne soit pas réduit susamment et qu'il soit plus grand que le taux de départ. Pour éviter cela, quand la taille de la le d'attente est plus grande que θh , chaque fois qu'un paquet arrive, MM envoie le signal de congestion à la connexion choisie. Le seuil θl est utilisé pour déterminer si la connexion a réagi au signal de congestion. Comme la taille de la le d'attente varie aux environs de la valeur ta, ce n'est pas possible de déterminer la réaction de la connexion avec un seul seuil. Le choix des seuils θh et θl est une problème non trivial. A l'aide de résultats de simulations, nous proposons de choisir θh = 1.15 · θ et θl = 0.85 · θ. Nous proposons deux critères diérents pour choisir la connexion à couper : MM-T et MMB. Avec MM-T, nous choisissons la connexion qui a le taux d'émission le plus haut et avec MM-B nous choisissons la connexion qui a plus de paquets dans la backlog du routeur. Le taux d'émission et le nombre de paquets dans le backlog sont deux paramètres qui sont très liés. Quand MM-T coupe la connexion avec le plus grand taux d'émission immédiat, MM-B coupe la connexion avec le plus grand taux d'émission moyen. Avec le modèle uide, nous trouvons les règles pour choisir les seuils θ et θh pour l'algorithme MM-T. Avec le simulateur du modèle uide (Python), nous trouvons que les algorithmes MM-T et MM-B se comportent de façon très similaire quand le seuil θ n'est pas très grand, donc les valeurs des seuils θ et θh peuvent être aussi utilisés pour l'algorithme MM-B.

8.6 Amélioration de l'équité de TCP avec la politique MarkMax

149

8.6.2 Modèle uide Soient N connexions TCP dans le système qui partagent un unique goulet d'étranglement de capacité µ. Soit RT Ti le RTT de la connexion i = 1, . . . , N et λi (t) son taux d'émission à l'instant t. Nous approximons le système avec le modèle uide. Les données sont représentées P avec le ux qui arrive dans le tampon avec le taux λ(t) = i λi (t), et qui quitte le tampon avec le taux µ. Nous considérons que les émetteurs augmentent leurs taux d'émission avec les taux linéaires αi = 1/(RTT i )2 , où RTT i sont constants pour i = 1, . . . , N . P Soit α = i αi et soit β , (0 < β < 1) un paramètre donné. Chaque fois que le taux d'émission de la connexion est reduit, le taux d'émission de la connexion est multiplié par le paramètre

β . Dans notre travail, nous considérons que β = 1/2. Soit λ− et λ+ respectivement le taux d'émission total avant et le taux d'émission total après la coupure de la connexion.

Lemma 8.1 Avec MM-T si θ≤

1 µ2 (1 − β)2 , 2α (N − 1 + β)2

alors λ+ ≤ µ, i.e., après une unique coupure de la connexion avec le taux d'émission le plus haut, le taux d'émission total est plus petit que le taux de départ.

Lemma 8.2 Avec MM-T si θ>

où ζ =

√β , 1+ 2αθ/µ

µ2 (1 − ζ)2 , 2α

le backlog dans le système est positif.

8.6.3 Résultats de simulations Nous avons modié le simulateur NS-2 pour implémenter les algorithmes MM-T et MMB. Pour la version MM-T, nous approximons la valeur du taux d'émission exact de la façon suivante : nous considérons les derniers 10% de la le d'attente et nous choisissons la connexion avec plus de paquets dans cette partie de la le d'attente. Nous comparons les algorithmes MM-T et MM-B avec la politique DropTail (voir Sous-Section 8.1.1), quand la taille de la le d'attente pour DropTail est θ. Pour MM-T et MM-B, la taille de la le d'attente est illimitée, donc nous vérions que MM stabilise la taille de la le d'attente. Pour comparer l'équité dans le système, nous utilisons l'index de Jains. Nous calculons la quantité de données transmises par chaque connexion pendant le temps de simulation, gi ,

i = 1, . . . , N . Ensuite, nous calculons l'index de Jains qui est déni comme ³P N

´2 g i=1 i J= PN 2 . N i=1 gi

150

Chapter 8 : Présentation des Travaux de Thèse Nous notons que

1 N

≤ J ≤ 1 et qu'une valeur plus grande indique une plus grande équité.

Nous fournissons les simulations pour le cas où il y a deux connexions avec des RTTs très diérents dans le système. Le rapport entre les RTTs est compris entre trois à cinquante. Nous trouvons qu'avec les algorithmes MM-B et MM-T, l'index de Jains est beaucoup plus grand qu'avec la politique DropTail quand la diérence entre les RTTs des connexions est large. MMT et MM-B permettent également une utilisation plus importante du système que la politique DropTail. Nous montrons ainsi que MarkMax améliore la performance du réseau et l'équité entre les ux quand les ux ont des RTTs susamment diérents.

8.7 Les conclusions et les perspectives

151

8.7 Les conclusions et les perspectives Dans la thèse présente, nous proposons plusieurs contributions pour améliorer la performance dans les réseaux d'ordinateurs. Les résultats obtenus concernent les problèmes de partage de ressources dans les routeurs d'Internet, les serveurs Web et les systèmes d'exploitation. Nous étudions quelques algorithmes qui diminuent le temps moyen de séjour dans le système avec un partage des ressources ecace, qui fournissent la possibilité d'introduire la Qualité de Service, Network Pricing et la diérentiation entre les ux dans les réseaux. Nous montrons l'ecacité des algorithmes proposés et nous étudions la possibilité de leur application dans les les d'attente de routeurs. Les problèmes étudiés ouvrent plusieurs directions pour des recherches futures, quelques-uns d'entre eux sont les sujets de notre recherche présente. Dans le Chapitre 3, nous étudions la politique d'ordonnancement TLPS quand la distribution de temps de service est hyper-exponentielle et nous trouvons une approximation du seuil optimal pour le cas de distribution de temps de service avec deux phases. Nous montrons que le temps d'attente moyen dans le système peut être réduit jusqu'à 36% avec l'approximation du seuil trouvé en comparaison avec la politique DropTail. Cependant, la question de sélection du seuil quand la distribution de temps de service est hyper-exponentielle avec plusieurs phases ou n'est pas hyper-exponentielle est ouverte. Nous considérons que c'est un sujet important pour des futures études. Dans le Chapitre 4, nous prouvons la monotonie du temps moyen conditionnel de séjour dans le système DPS quand les distributions de temps de service sont exponentielles avec les restrictions sur les paramètres du système. Comme les restrictions trouvées ne sont pas des conditions nécessaires, nous pensons que c'est possible de prouver le théorème dans le cas général sans conditions additionnelles. Aussi l'investigation des paramètres du système pour trouver les cas quand la politique DPS donne un gain susant en comparaison avec la politique PS est un intéressant sujet de recherches futures. Dans le Chapitre 5, nous étudions la politique optimale de Gittins dans le cas de la le d'attente multi-classe avec un serveur unique. Ce sujet ouvre une large aire pour les recherches futures comme nous avons étudié quelques cas particuliers d'application de la politique de Gittins. En tenant compte de la structure du trac dans Internet, nous étudions le cas quand les tâches arrivent dans le système en deux classes, qui ont une distribution de la taille selon Pareto et qui représentent les mice et elephant dans Internet. Pour ce cas, nous décrivons la politique du système optimale, nous trouvons l'expression analytique du temps moyen de séjour et nous mettons en ÷uvre l'algorithme dans la le d'attente du routeur. Avec les résultats des simulations, nous montrons que, avec la politique optimale trouvée, le gain dans le système peut atteindre 10% en comparaison avec la politique LAS et 36% en comparaison avec la politique DropTail. Aussi nous étudions quelques cas d'intérêt pratique quand les tâches arrivent

152

Chapter 8 : Présentation des Travaux de Thèse

en classes avec les distributions exponentielles. Comme recherches futures, nous proposons de considérer les cas avec plus de deux classes de tâches dans le système, également, nous proposons de considérer d'autres distributions de temps de service. C'est important d'investiguer les paramètres du système pour trouver quand la politique de Gittins donne le gain susant par rapport à la politique LAS. Egalement, il faut étudier avec plus d'attention l'application des résultats obtenus dans les systèmes réels comme Internet. Dans le Chapitre 6, nous introduisons un nouvel algorithme d'élimination de paquets sensible aux ux, MarkMax, qui réduit le taux d'émission des connections avec le taux d'émission le plus grand quand le tampon de routeur atteint un seuil donné. Avec le modèle uide, nous trouvons les indications pour la sélection du seuil. Avec le simulateur NS-2, nous mettons en ÷uvre MarkMax dans la le d'attente de routeur et nous montrons qu'il améliore l'équité et la performance dans le système par rapport à la politique DropTail. Comme sujet de recherches futures, nous proposons d'étudier des topologies du système plus complexes et le cas quand un grand nombre de connections partage le goulet d'étranglement. Dans ce cas, nous proposons diminuer les taux d'émission de plusieurs connections à chaque fois que le seuil est atteint. La sélection du nombre de connections à couper et sa dépendance sur le nombre des connections présentes dans le système constituent un sujet d'étude plein de dés. Une direction possible de recherche est une combinaison de MarkMax et d'une politique d'ordonnancement stochastique qui utilise la diérentiation entre les ux, comme TLPS ou Gittins. Le développement d'algorithmes qui donnent la priorité aux ux courts et en même temps améliorent l'équité entre les ux longs peut être une tâche intéressante et non triviale. Nous pensons qu'un tel algorithme peut améliorer les deux, l'équité et le temps moyen de séjour dans le système et aussi fournir une meilleure performance du système.

8.7 Les conclusions et les perspectives

List of Acronyms

ACK

Acknowledgment

AQM

Active Queue Management

DNS

Domain Name System

DPS

Discriminatory Processor Sharing

DT

DropTail

FB

Foreground Background

FCFS

First Come First Served

FTP

File Transfer Protocol

HTTP

Hypertext Transfer Protocol

HE

hyper-exponential

ICMP

Internet Control Message Protocol

IP

Internet Protocol

LAN

Local Area Network

LAS

Least Attained Service

MIME

Multipurpose Internet Mail Extension

MLPS

Multi Level Processor Sharing

MM

MarkMax

MSS

Maximum Segment Size

MTU

Maximum Transmission Unit

NS

Network Simulator

OSI

Open Systems Interconnection

PASTA

Poisson Arrivals See Time Averages

PS

Processor Sharing

RED

Random Early Dropping

RFC

Request for Comment

RTT

Round-Trip Time

153

154

Chapter 8 : Présentation des Travaux de Thèse SMTP

Simple Mail Transfer Protocol

SRPT

Shortest Remaining Processing Time

SPT

Shortest Processing Time

TCP

Transmission Control Protocol

Telnet

remote terminal protocol

TLPS

Two Level Processor Sharing

UDP

User Datagram Protocol

WAN

Wide Area Network

WWW

World Wide Web

8.7 Les conclusions et les perspectives

155

156

Bibliography [AA06]

S. Aalto and U. Ayesta. Mean delay analysis of Multilevel Processor Sharing disciplines. Proceedings of IEEE INFOCOM 2006, 2006. 13, 18, 30, 33, 131, 137

[AA07]

S. Aalto and U. Ayesta. Mean delay optimization for the M/G/1 queue with Pareto type service times. In Extended abstract in ACM SIGMETRICS 2007, San

Diego, CA, pages 383384, 2007. 73, 90 [AAA06]

E. Altman, K. Avrachenkov, and U. Ayesta. A survey on Discriminatory Processor Sharing. Queueing Syst., 53(1-2) :5363, 2006. 14, 54, 131, 141

[AAB00]

E. Altman, K. Avrachenkov, and C. Barakat. A stochastic model of TCP/IP with stationary random loses. In ACM SIGCOMM 2000, Stockholm, Sweden, volume 30, pages 231242, 2000. 103

[AAB05]

K. Avrachenkov, U. Ayesta, and P. Brown.

Batch Arrival Processor-Sharing

with Application to Multi-Level Processor-Sharing Scheduling. Queueing Systems, 50 :459480, 2005. 18, 44, 135 [AABN04]

K. Avrachenkov, U. Ayesta, P. Brown, and E. Nyberg. Dierentiation between short and long TCP ows : Predictability of the response time. In IEEE INFOCOM

2004, volume 2, pages 762773, 2004. 13, 18, 30, 40, 44, 100, 130, 131, 137 [AABNQ05] K. Avrachenkov, U. Ayesta, P. Brown, and R. Núñez-Queija. Discriminatory Processor Sharing revisited. In INFOCOM, 24th Annual Joint Conference of the IEEE

Computer and Communications Societies, pages 784795. IEEE, 2005. 14, 54, 55, 131, 141 [AANB02]

K. E. Avrachenkov, U. Ayesta, P. Nain, and C. Barakat. The eect of router buer size on the TCP performance. In In Proceedings of the LONIIS Workshop

on Telecommunication Networks and Teletrac Theory, pages 116121, 2002. 9, 126 [AANO04]

S. Aalto, U. Ayesta, and E. Nyberg-Oksanen. Two-level processor-sharing scheduling disciplines : mean delay analysis. SIGMETRICS Perform. Eval. Rev., 32(1) :97105, 2004. 13, 30, 130, 131, 137 157

158 [AANO05]

S. Aalto, U. Ayesta, and E. Nyberg-Oksanen.

M/G/1/M LP S compared to

M/G/1/P S . Operation Reserch Letters, 33(5) :519524, 2005. 13, 131 [AAP05]

K. Avrachenkov, U. Ayesta, and A. Piunovskiy. Optimal choice of the buer size in the Internet routers. In Decision and Control, 2005 and 2005 European Control

Conference. CDC-ECC '05. 44th IEEE Conference on, pages 11431148, December 2005. 103 [AART06]

E. Altman, R. E. Azouzi, D. Ros, and B. Tun. Loss strategies for competing AIMD ows. Comput. Networks, 50(11) :17991815, 2006. 100, 147

[ABL+ 00]

E. Altman, C. Barakat, E. Laborde, P. Brown, and D. Collange. Fairness analysis of TCP/IP. Decision and Control, 2000. Proceedings of the 39th IEEE Conference, 1 :6166, 2000. 100, 147

[ABN+ 95]

E. Altman, J. Bolot, P. Nain, D. Elouadghiri, M. Erramdani, P. Brown, and D. Collange. Performance modelling of TCP/IP in a Wide-Area network. In 34th IEEE

Conference on Decision and Control, December 1995. 103 [ABO07]

K. Avrachenkov, P. Brown, and N. Osipova. Optimal choice of threshold in Two Level Processor Sharing. Annals of Operations Research journal, 2007. 15, 24, 29, 133, 138

[AFG06]

K. Avrachenkov, L. Finlay, and V. Gaitsgory. Analysis of TCP-AQM interaction via periodic optimization and linear programming : the case of sigmoidal utility function. In NEW2AN, Also LNCS v.4003, pages 517529, December 2006. 100

[AJK04]

E. Altman, T. Jimenez, and D. Kofman. DPS queues with stationary ergodic service times and the performance of TCP in overload. In in Proceedings of IEEE

Infocom, Hong-Kong, 2004. 14, 54, 131, 141 [AJN02]

E. Altman, T. Jiménez, and R. Núñez-Queija. Analysis of two competing TCP/IP connections. Perform. Eval., 49(1-4) :4355, 2002. 100, 147

[AKM04]

G. Appenzeller, I. Keslassy, and N. McKeown. Sizing router buers. SIGCOMM

Comput. Commun. Rev., 34(4) :281292, 2004. 9, 126 [All00]

M. Allman. A Web server's view of the Transport layer. ACM Computer Com-

munication Review, 30, 2000. 7, 124 [APS99]

M. Allman, V. Paxson, and W. Stevens. TCP Congestion Control. RFC 2581 (Proposed Standard), April 1999. Updated by RFC 3390. 9, 126

[Ban03]

N. Bansal. Analysis of the M/G/1 Processor-Sharing queue with bulk arrivals.

Operations Research Letters, 31(5) :401405, 2003. 18, 36, 43, 135 [BFOBR02] N. Benameur, S. B. Fredj, S. Oueslati-Boulahia, and J. W. Roberts. Quality of service and ow level admission control in the Internet. Computer Networks, 40(1) :5771, 2002. 7, 123, 124

BIBLIOGRAPHY

[BGG03]

159

M. Barthelemy, B. Gondran, and E. Guichard. Spatial structure of the Internet trac. Physica A Statistical Mechanics and its Applications, 319 :633642, March 2003. 7, 123

[BGG+ 08]

N. Beheshti, Y. Ganjali, M. Ghobadi, N. McKeown, and G. Salmon. Experimental study of router buer sizing. Systems and Networking Laboratory Technical Report TR08-UT-SN, University of Toronto, Department of Computer Science, May 2008. 9, 126

[BM06]

F. Baccelli and D. R. McDonald. A stochastic model for the rate of non-persistent TCP ows. Proceedings of ValueTools 2006, 2006. 19, 30, 32, 42, 136

[BNM00]

D. Bertsimas and J. Niño-Mora. Restless bandits, linear programming relaxations and a Primal-Dual index heuristic. Operations Research, 48 :8090, 2000. 70

[Bra89]

R. Braden. Requirements for Internet hosts - communication layers. RFC 1122 (Standard), October 1989. Updated by RFCs 1349, 4379. 9, 126

[Bre96]

L. P. Breker. A survey of network pricing schemes. In University of Saskatchewan, 1996. 11, 128

[Bro00]

P. Brown. Resource sharing of TCP connections with dierent round trip times. In INFOCOM 2000, pages 17341741, 2000. 100, 147

[Bro06]

P. Brown. Comparing FB and PS scheduling policies. SIGMETRICS Perform.

Eval. Rev., 34(3) :1820, 2006. 13, 130 [BT01]

T. Bu and D. Towsley. Fixed point approximations for TCP behavior in an AQM network. In SIGMETRICS '01 : Proceedings of the 2001 ACM SIGMETRICS

international conference on Measurement and modeling of computer systems, pages 216225, New York, NY, USA, 2001. ACM. 54 [BVW85]

C. Buyukkoc, P. Varaya, and J. Walrand. The cµ rule revisited. Adv. Appl. Prob., 17 :237238, 1985. 70

[CB97]

M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web trac : evidence and possible causes. IEEE/ACM Transactions on Networking, 5 :835 846, 1997. 7, 71, 123, 124

[CC08]

D. Collange and J.-L. Costeux. Passive estimation of Quality of Experience. Jour-

nal of Universal Computer Science, 14(5) :625641, 2008. 7, 123, 124 [CJ07]

N. Chen and S. Jordan. Throughput in Processor-Sharing queues. IEEE Transac-

tions on Automatic Control, 52 (2) :299305, 2007. 12, 129 [Cru91]

R. L. Cruz. A calculus for network delay. I. Network elements in isolation. Infor-

mation Theory, IEEE Transactions on, 37(1) :114131, 1991. 104

160 [CvdBB+ 05] S. K. Cheung, J. L. van den Berg, R. J. Boucherie, R. Litjens, and F. Roijers. An analytical packet/ow-level modelling approach for wireless LANs with quality-ofservice support. In in Proceedings of ITC-19, 2005. 54 [DGNM96]

M. Dacre, K. Glazebrook, and J. Niño-Mora. The achievable region approach to the optimal control of stochastic systems. Journal of the Royal Statistical Society.

Series B, Methodological, 61(4) :747791, 1996. 70 [FBP+ 01]

S. B. Fred, T. Bonald, A. Proutiere, G. Régnié, and J. W. Roberts. Statistical bandwidth sharing : a study of congestion at ow level. SIGCOMM Comput.

Commun. Rev., 31(4) :111122, 2001. 12, 129 [FJ93]

S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Trans. Netw., 1(4) :397413, 1993. 11, 100, 128, 147

[Flo91]

S. Floyd. Connections with multiple congested gateways in packet-switched networks part 1 : one-way trac. SIGCOMM Comput. Commun. Rev., 21(5) :3047, 1991. 100, 147

[Flo95]

S. Floyd. TCP and Explicit Congestion Notication. ACM Computer Communi-

cation Review, 24(5) :1023, 1995. 11, 128 [FM03a]

H. Feng and V. Misra. Asymptotic bounds for M X /G/1 Processor Sharing queues.

Technical report CUCS-00-04, Columbia University, 2003. 18, 135 [FM03b]

H. Feng and V. Misra. Mixed scheduling disciplines for network ows. ACM

SIGMETRICS Performance Evaluation Review, 31(2) :3639, 2003. 12, 13, 30, 130, 137 [FMI80]

G. Fayolle, I. Mitrani, and R. Iasnogorodski. Sharing a processor among many job classes. Journal of the ACM, 27 :519532, 1980. 14, 20, 54, 56, 131, 141

[FML+ 03]

C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell, T. Seely, and C. Diot. Packet-level trac measurements from the Sprint IP backbone. IEEE

Network, 17 :616, 2003. 7, 8, 123, 124 [FORR98]

E. W. Fulp, M. Ott, D. Reininger, and D. S. Reeves. Paying for QoS : an optimal distributed algorithm for pricing network resources. Quality of Service, 1998.

(IWQoS 98) 1998 Sixth International Workshop on, pages 7584, May 1998. 11, 128 [FR01]

E. W. Fulp and D. S. Reeves. Optimal provisioning and pricing of dierentiated services using QoS class promotion. In In Proceedings of the INFORMATIK :

Workshop on Advanced Internet Charging and QoS Technology, 2001. 11, 128 [FR04]

E. W. Fulp and D. S. Reeves. Bandwidth provisioning and pricing for networks with multiple classes of service. Comput. Netw., 46(1) :4152, 2004. 11, 128

BIBLIOGRAPHY

[FSKS02]

161

W.-C. Feng, K. G. Shin, D. D. Kandlur, and D. Saha. The BLUE active queue management algorithms. IEEE/ACM Trans. Netw., 10(4) :513528, 2002. 11, 128, 147

[FW98]

A. Feldmann and W. Whitt. Fitting mixtures of exponentials to long-tail distributions to analyze network performance models. Performance Evaluation, 31 :245 258, 1998. 19, 30, 32, 42, 136

[FW99]

E. Frostig and G. Weiss. Four proofs of Gittins' multiarmed bandit theorem.

Applied Probability Trust, 1999. 70 [Git89]

J. Gittins. Multi-armed Bandit Allocation Indices. Wiley, Chichester, 1989. 14, 15, 70, 72, 73, 90, 131, 133, 143

[GM01]

L. Guo and I. Matta. The war between mice and elephants. Technical report, Boston University, Boston, MA, USA, 2001. 13, 130

[GM02a]

L. Guo and I. Matta. Dierentiated control of web trac : A numerical analisys.

SPIE ITCOM, Boston, 2002. 13, 30, 130, 137 [GM02b]

L. Guo and I. Matta. Scheduling ows with unknown sizes : approximate analysis. In in Proceedings of the 2002 ACM SIGMETRICS international conference on

Measurement and modeling of computer systems, pages 276277, 2002. 13, 14, 54, 130, 131, 141 [HBSBA03] M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal. Size-based scheduling to improve web performance. ACM Transactions on Computer Systems, 21(2) :207233, 2003. 30, 137 [HLN97]

D. P. Heyman, T. V. Lakshman, and A. L. Neidhardt. A new method for analyzing feedback-based protocols with applications to engineering Web trac over the Internet. In SIGMETRICS '97 : Proceedings of the 1997 ACM SIGMETRICS

international conference on Measurement and modeling of computer systems, pages 2438, New York, NY, USA, 1997. ACM. 12, 129 [HT05]

Y. Hayel and B. Tun. Pricing for heterogeneous services at a Discriminatory Processor Sharing queue. In Networking 2005, volume 3462/2005, pages 816827. Springer Berlin / Heidelberg, 2005. 54

[Jac88]

V. Jacobson. Congestion avoidance and control. In SIGCOMM '88 : Symposium

proceedings on Communications architectures and protocols, volume 18, pages 314 329, New York, NY, USA, August 1988. ACM Press. 9, 126 [JBB92]

V. Jacobson, R. Braden, and D. Borman. TCP extensions for high performance. RFC 1323 (Proposed Standard), May 1992. 9, 126

162 [KK06]

B. Kim and J. Kim. Comparison of DPS and PS systems according to DPS weights.

Communications Letters, IEEE, 10(7) :558560, July 2006. 55, 56, 58, 141 [KK08]

J. Kim and B. Kim. Concavity of the conditional mean sojourn time in the M/G/1 Processor Sharing queue with batch arrivals. Queueing Systems, 58(1) :5764, 2008. 17, 18, 22, 135, 136

[Kle67]

L. Kleinrock. Time-shared Systems : a theoretical treatment. J. ACM, 14(2) :242 261, 1967. 14, 54, 131, 140

[Kle76a]

L. Kleinrock. Queueing systems, volume 2. John Wiley and Sons, 1976. 12, 13, 18, 19, 22, 23, 25, 30, 32, 34, 76, 81, 91, 129, 130, 136, 138

[Kle76b]

L. Kleinrock. Queueing systems, volume 1. John Wiley and Sons, 1976. 12, 92, 130, 135, 143, 145

[Kli74]

G. Klimov. Time-sharing service systems. i. Theory of Probability and Its Appli-

cations, 19 :532551, 1974. 72 [Kli78]

G. Klimov. Time-sharing service systems. ii. Theory of Probability and Its Appli-

cations, 23 :314321, 1978. 72 [KMR71]

L. Kleinrock, R. R. Muntz, and E. Rodemich. The Processor-Sharing queueing model for time-shared systems with bulk arrivals. Networks Journal, 1 :113, 1971. 18, 135

[KNQB04]

G. Van Kessel, R. Núñez-Queija, and S. Borst. Asymptotic regimes and approximations for Discriminatory Processor Sharing. SIGMETRICS Perform. Eval. Rev., 32(2) :4446, 2004. 14, 54, 131, 141

[KNQB05]

G. Van Kessel, R. Núñez-Queija, and S. Borst. Dierentiated bandwidth sharing with disparate ow sizes. INFOCOM 2005. 24th Annual Joint Conference of the

IEEE Computer and Communications Societies. Proceedings IEEE, 4 :24252435, March 2005. 14, 54, 131, 141 [Kri00]

J. Kristo. TCP Congestion Control. Technical report, DePaul University, 2000. 9, 126

[KSH03]

R. E. A. Khayari, R. Sadre, and B.R. Haverkort. Fitting world-wide web request traces with the EM-algorithm. Performance Evaluation, 52(2-3) :175191, 2003. 30, 42

[Kur72]

A. G. Kurosh. Higher algebra. MIR, 1972. 26

[LM97]

T. V. Lakshman and U. Madhow. The performance of TCP/IP for networks with high bandwidth-delay products and random loss. IEEE/ACM Trans. Netw., 5(3) :336350, 1997. 100, 147

BIBLIOGRAPHY

[Man90]

163

A. Mankin. Random drop congestion control. In SIGCOMM '90 : Proceedings

of the ACM symposium on Communications architectures & protocols, pages 17, 1990. 100, 147 [MR00]

L. Massoulié and J. W. Roberts. Bandwidth sharing and admission control for elastic trac. In Telecommunication Systems, volume 15, pages 185201(17). Springer, 2000. 12, 129

[NMM98]

M. Nabe, M. Murata, and H. Miyahara. Analysis and modeling of World Wide Web trac for capacity dimensioning of Internet access lines. Perform. Eval., 34(4) :249271, 1998. 7, 12, 71, 123, 124, 129

[NT94]

P. Nain and D. Towsley. Optimal scheduling in a machine with stochastic varying processing rate. IEEE/ACM Transactions on Automatic Control, 39 :18531855, 1994. 70

[NT02]

W. Noureddine and F. Tobagi. Improving the performance of interactive TCP applications using service dierentiation. In Computer Networks Journal, pages 2002354. IEEE, 2002. 13, 130

[NW08]

M. Nuyens and A. Wierman. The Foreground-Background queue : A survey.

Perform. Eval., 65(3-4) :286307, 2008. 13, 130 [OBA08]

N. Osipova, A. Blanc, and K. Avrachenkov. Improving TCP fairness with the MarkMax policy. In In Proceedings of the 15th International Conference on Tele-

communications, ICT 2008, 2008. 16, 99, 134, 147 [Osi07]

N. Osipova. Batch Processor Sharing with Hyper-Exponential service time. Technical Report RR-6180, INRIA, 2007. 19

[Osi08a]

N. Osipova. Batch Processor Sharing with Hyper-Exponential service time. Ope-

rations Research Letters, 36(3) :372376, 2008. 14, 17, 19, 45, 132, 135 [Osi08b]

N. Osipova. Comparison of the Discriminatory Processor Sharing Policies. Technical Report RR-6475, INRIA, 2008. 15, 53, 133

[Pos81]

J. Postel. Transmission Control Protocol. RFC 793 (Standard), September 1981. Updated by RFC 3168. 9, 126

[PPP00]

R. Pan, B. Prabhakar, and K. Psounis. Choke : a stateless active queue management scheme for approximating fair bandwidth allocation. In INFOCOM 2000.

Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 2, pages 942951, 2000. 11, 100, 128, 147 [RBUK05]

I. A. Rai, E. W. Biersack, and G. Urvoy-Keller. Size-based scheduling to improve the performance of short TCP ows. IEEE Network, 19 :1217, 2005. 12, 130

164 [RF99]

K. Ramakrishnan and S. Floyd. A Proposal to add Explicit Congestion Notication (ECN) to IP. RFC 2481 (Experimental), January 1999. Obsoleted by RFC 3168. 11, 128

[RFB01]

K. Ramakrishnan, S. Floyd, and D. Black. The Addition of Explicit Congestion Notication (ECN) to IP. RFC 3168 (Proposed Standard), September 2001. 11, 100, 128, 147

[Rig94]

R. Righter. Scheduling. M. Shaked and J. Shanthikumar (eds), Stochastic Orders,

New York : Academic Press, pages 381432, 1994. 55, 141 [Rob01]

J. Roberts. Trac theory and the Internet. IEEE Communication Magazine, 39(1) :9499, 2001. 7, 124

[RS89]

R. Righter and J. Shanthikumar. Scheduling multiclass single server queueing systems to stochastically maximize the number of successful departures. Probability

in the Engineering and Informational Sciences, 3 :323333, 1989. 12, 130 [RS93]

K. M. Rege and B. Sengupta. The M/G/1 Processor Sharing queue with bulk arrivals. In In Proceedings of Modelling and Evaluation of ATM Networks, pages 417432, 1993. 18, 135

[RS94]

K. M. Rege and B. Sengupta. A decomposition theorem and related results for the Discriminatory Processor Sharing queue. Queueing Systems, 18(3-4) :333351, 1994. 14, 54, 131, 141

[RS96]

K. M. Rege and B. Sengupta. Queue-length distribution for the Discriminatory Processor-Sharing queue. Operations Research, 44(4) :653657, 1996. 14, 54, 131, 141

[RUKB02]

I. A. Rai, G. Urvoy-Keller, and E. W. Biersack. Size-based scheduling with dierentiated services to improve response time of highly varying ows. In in Procee-

dings of the 15th ITC Specialist Seminar, Internet Trac Engineering and Trac Management, 2002. 13, 130 [RUKB03]

I. A. Rai, G. Urvoy-Keller, and E. W. Biersack. Analysis of LAS scheduling for job size distributions with high variance. In SIGMETRICS '03 : Proceedings of the

2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 218228, New York, NY, USA, 2003. ACM. 12, 13, 130 [RUKVB04] I. A. Rai, G. Urvoy-Keller, M. K. Vernon, and E. W. Biersack. Performance analysis of LAS-based scheduling disciplines in a packet switched network. SIG-

METRICS Perform. Eval. Rev., 32(1) :106117, 2004. 12, 130

BIBLIOGRAPHY

[SAM99]

165

M. Murata S. Ata and H. Miyahara. Analysis and application of network trac characteristics to design of high-speed routers. Internet. Conference No2, Boston

MA ,ETATS-UNIS (20/09/1999), 3842 :1424, 1999. 7, 123 [SCEH96]

S. Shenker, D. Clark, D. Estrin, and S. Herzog. Pricing in computer networks : reshaping the research agenda. SIGCOMM Comput. Commun. Rev., 26(2) :1943, 1996. 11, 128

[Sch68]

L. E. Schrage. A proof of the optimality of the Shortest Remaining Processing Time discipline. Operations Research, 16(3) :678690, 1968. 12, 30, 130

[Sev74]

K. Sevcik. Scheduling for minimum total loss using service time distributions.

Journal of the ACM, 21 :6675, 1974. 72 [SS07]

R. Stanojevic and R. Shorten. Beyond CHOKe : Stateless fair queueing. In

NETCOOP 2007, LNCS v.4465, pages 4353, 2007. 11, 100, 128, 147 [Sta94]

W. Stallings. Data and Computer Communications : 4th edition. Macmillian Publishing Company, 1994. 5, 8, 121, 122, 125

[Sta03]

W. Stallings. Computer Networking with Internet Protocols and Technology. Pearson Education, Inc., Prentice Hall, 2003. 4, 6, 9, 120, 123, 126

[Sta07]

R. Stanojevic. Router-based algorithms for improving Internet Quality of Service. Phd thesis, Hamilton Institute, National University of Ireland Maynooth, 2007. 100

[Ste97]

W. Stevens. TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. RFC 2001 (Proposed Standard), January 1997. Obsoleted by RFC 2581. 9, 126

[SY92]

J. Shanthikumar and D. Yao. Multiclass queueing systems : Polymatroidal structure and optimal scheduling control. Operations Research, 40(2) :293299, 1992. 70

[SZC90]

S. Schenker, L. Zhang, and D. D. Clark. Some observations on the dynamics of a congestion control algorithm. SIGCOMM Comput. Commun. Rev., 20(5) :3039, 1990. 103

[Tan96]

A. S. Tanenbaum. Computer networks : 3rd edition. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996. 4, 9, 120, 126

[TMW97]

K. Thompson, G. J. Miller, and R. Wilder. Wide-area Internet trac patterns and characteristics. IEEE Network, 11 :1023, 1997. 7, 123

[Tsi93]

J.N. Tsitsiklis. A short proof of the Gittins index theorem. In IEEE CDC, pages 389390, 1993. 70

166 [VWB85]

P. Varaiya, J. Walrand, and C. Buyukkoc. Extensions of the multiarmed bandit problem : the discounted case. IEEE Transactions on Automatic Control, 30 :426 439, 1985. 70

[WBHB04]

A. Wierman, N. Bansal, and M. Harchol-Balter. A note comparing response times in the M/GI/1/F B and M/GI/1/P S queues. Operations Research Let-

ters, 32(1) :7376, 2004. 13, 130 [Web92]

R. Weber. On the Gittins index for multiarmed bandits. Annals of Appllied

Probability, 2(4) :10241033, 1992. 70 [WHB03]

A. Wierman and M. Harchol-Balter. Classifying scheduling policies with respect to unfairness in an M/GI/1. SIGMETRICS Perform. Eval. Rev., 31(1) :238249, 2003. 13, 130

[Whi88]

P. Whittle. Restless bandits : activity allocation in a changing world. Journal of

Applied Probability, 25 :287298, 1988. 70 [Wil98]

F. Wilder. A guide to the TCP/IP protocol suite : second edition. Artech House, INC, 1998. 9, 126

[Wil01]

C. Williamson. Internet trac measurement. IEEE Internet Computing, 5 :7074, 2001. 7, 8, 71, 123, 124

[Wol89]

R. W. Wolf. Stochastic modeling and the theory of queues. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989. 7, 124

[WZ02]

B. Wydrowski and M. Zukerman. GREEN : An Active Queue Management algorithm for a self managed Internet. In ICC, volume 4, pages 23682372, 2002. 11, 128, 147

[Yas87]

S. F. Yashkov. Processor-Sharing queues : some progress in analysis. Queueing

Syst. Theory Appl., 2(1) :117, 1987. 12, 130 [Yas92]

S. Yashkov. Mathematical problems in the theory of shared-processor systems.

Journal of Mathematical Sciences, 58 :101147, 1992. 72

Résumé Dans la thèse présente, nous proposons plusieurs contributions pour améliorer la performance dans les réseaux d'ordinateurs. Les résultats obtenus concernent les problèmes de partage de ressources dans les routeurs d'Internet, les serveurs Web et les systèmes d'exploitation. Nous étudions quelques algorithmes qui diminuent le temps moyen de séjour dans le système avec un partage des ressources ecace et qui fournissent la possibilité d'introduire la diérentiation entre les ux dans les réseaux. Nous montrons l'ecacité des algorithmes proposés et nous étudions la possibilité de leur application dans les les d'attente de routeurs. Nous notons les résultats obtenus les plus importants. Pour la politique de service à temps partagé à deux niveaux avec le temps de service hyper-exponentielle avec deux phases nous trouvons une expression de l'approximation de la valeur de seuil optimal qui minimise le temps moyen de séjour dans le système. Avec les résultats de simulations nous montrons que la politique TLPS améliore la performance dans le système quand la valeur approchée du seuil est utilisé. Nous appliquons le résultat de Gittins pour caractériser la politique optimale pour l'ordonnancement dans une le d'attente multiclasse avec un serveur unique. La politique trouvé minimise le temps moyen de séjour dans le systeme entre toutes les politiques non-anticipatoires. Nous introduisons un nouvel algorithme d'élimination de paquets sensible aux ux pour les routeurs de l'Internet, qui améliore la performance du réseau et l'équité entre les ux.

Mots-clés : Ordonnancement stochastique, partage des ressources, Internet, TCP, équité, politique à

temps partagé avec deux niveaux, politique à temps partagé discriminatoire, arrivées en rafales, temps de service hyper-exponentielle, index de Gittins.

Abstract In the current thesis we propose several new contributions to improve the performance of computer networks. The obtained results concern the resource sharing problems in the Internet routers, Web servers and operating systems. We study several stochastic scheduling algorithms which decrease the mean waiting time in the system with ecient resource sharing and provide the possibility to introduce the Quality of Service and ow dierentiation to the networks. We show the eectiveness of the proposed algorithms and study the possibility of their implementation in the router queues. The most important obtained results are the following. For the Two Level Processor Sharing scheduling discipline with the hyper-exponential job size distribution with two phases we nd an approximation for the optimal value of the threshold that minimizes the expected sojourn time. With the simulation results (NS-2) we show that TLPS improves signicantly the system performance when the found approximation of the optimal threshold is used. We study the Discriminatory Processor Sharing policy and show the monotonicity of the expected sojourn time in the system depending on the weight vector under certain conditions on the system. We apply the Gittins optimality result to characterize the optimal scheduling discipline in a multi-class single server queue. The found policy minimizes the mean sojourn time in the system between all non-anticipating scheduling policies. In several cases of practical interest we describe the policy structure and provide the simulation results (NS-2). For the the congestion control problem in the networks we propose a new ow-aware algorithm to improve the fair resource sharing of the bottleneck capacity.

Keywords : Stochastic scheduling, resource sharing, Internet, TCP, fairness, Two Level Processor Sharing, Discriminatory Processor Sharing, Batch Processor Sharing, hyper-exponential service time, Gittins index.

IP Address Sharing in Large Scale Networks: DNS64 ... - F5 Networks

Heracles: Improving Resource Efficiency at Scale - Computer Systems ...

Resource Sharing Networking.pdf

FAIRNESS OF RESOURCE ALLOCATION IN CELLULAR NETWORKS

Resource pooling in congested networks ... - Semantic Scholar

A Flexible Approach to Efficient Resource Sharing in ...

RESOURCE SHARING THROUGH INDEST ...

Tao of Resource Sharing

COMPUTER NETWORKS IN MANUFACTURING AND FUTURE ...

Computer Networks

Resource pooling in congested networks: proportional ...

$pdf-1466\communication-networks-computer-science-computer ...$

pdf-1466\communication-networks-computer-science-computer ...

PDF Computer Networks

Computer Networks -II.pdf

COMPUTER NETWORKS -II.pdf

Forensic Investigation of Peer-to-Peer File Sharing Networks by ...

Global Clock Synchronization in Sensor Networks - Computer Science