On the Performance of Persistent Connection in Modern Web Servers Xiao-zhu LIN, Jing-jun ZHU and Hai-yan WU
Computer and Information Management Center Dept. of Computer Science and Technology, Tsinghua University Beijing, China
[email protected] ABSTRACT In the development of HTTP protocol, the technique to overlap multiple HTTP requests and replies on single TCP connection, called ‘keep-alive’ or ‘persistent connection’, has won great success. It is already verified that, persistent connection could help to save the cost of frequently creating TCP connection, and could also reduce the number of operations such like forking and destroying process. Many years passed, persistent connection mechanism has been implemented widely to support all kinds of web services. However recently, dramatic changes to networking conditions and server computing capacity challenge the motivations of such mechanism and reveal some of its drawbacks. This paper models the connection management procedure in concurrent web servers with Petri Nets, and determines the parameters for the model by measuring a bunch of key metrics in modern web servers and networks. Plus, experiments are carried out in real modern test-bed with real traffic. The analytical results yielded by the model, along with experimental results, help us to clarify the negative role that persistent connection actually plays in modern web servers, especially in those busy ones.
Keywords Web Servers, Persistent Connection, Performance Evaluation
1.
INTRODUCTION
As various web services flourish, workloads of web servers become explosively heavy, challenging the qualities of web service such as reliability, responsitivity and fairness. These qualities are ensured by connection management schemes for web servers , namely the policy of admitting, queueing, processing and terminating connections. The connection management schemes for web servers have evolved together with the development of HTTP protocol. In early times, every HTTP request is served by individual TCP connection. Since in those times, the price of creating a new connection is high, the idea of persistent connection is proposed to enables several consecutive requests to reuse one TCP connection.By employing the
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
Figure 1: The Lifetime of Individual HTTP Connection in Web Server. The connection is created at t1, reused once at t4 and terminated due to keep-alive timeout at t6.
persistent feature, TCP connections and web server workers (processes or threads) could live longer, therefore saving the server resources to fork workers and open/close sockets. Moreover, persistent connection makes it is possible for the client to pipeline requests, which is, to issue requests continuously without waiting for the previous one to be ACK’ed. Though proved to be helpful for performance in the past, recent studies report that persistent connection inclines to be abused, and therefore becomes negative for the performance of busy web servers. Specifically, persistent connection makes too many open sockets and waiting workers, which occupy large amounts of server resources, preventing other pending requests from getting served. It is reported that, in many modern web servers, persistent connection is becoming the major bottleneck. This paper tries to clarify the exact effect of persistent connection on both user-perceived metrics such like latency and fairness, and server resource utilization metrics. To achieve this goal, we refer to the prediction of an theoretical model based on SRN and the analysis of experimental data.
2. PRIOR WORKS Definitions for the original HTTP protocol could be found in [1]. The draft [2] discussed about schemes on HTTP connection management and made a simple comparison. In [3], Mogul presented his motivation about persistent connection for HTTP/1.1. [4] used experiments to support the advantages of HTTP/1.1. Finally, this new feature had been added to modern HTTP protocol[5]. [6] provided a short overview about performance issue in HTTP/1.1. It is also noticed that some active connection management schemes were proposed, such as LRU[7], but it was never widely implemented. Recent research works on the connection management schemes for web servers included [8], which performed detailed benchmark
about creating workers 1 , pipes and sockets, and [9], which gave complete survey about performance of modern busy servers. These works were quite sound and enlightening, however they didn’t touch the issue about persistent connection directly.
3.
SYSTEM MODELING
3.1
Connection Management Procedure
Most of the mainstream web servers nowadays are multi-process or multi-thread style.For them, the handling of one new HTTP request could be mainly divided into four stages: • Connect. Before exchange of any meaningful data, the TCP connection should be established between client and server. The stage of connection establishment, namely the TCP handshaking routine usually takes 1 round-trip-time (RTT), because the last ACK could be attached to the first HTTP request. • Accept. When the connection is built, server creates a new socket, forks or wakes one worker to assign the new connection to. In POSIX-compliant operation systems, this is finished by invoking socket(), fork(). Besides, some pipe system-calls might be used to do synchronization. • Process. Server generates and sends back the requested page. During this stage, the worker would use its own CPU timeslice to parse and process the request, issues disk I/O operations and even delivers part of the task to the back-end application server. • Keep-alive. After transmitting the required page back, the worker will keep the connection open to wait for other possible following requests from the same client. Usually, MaxKeepAlive (MKA) variable in web server configuration indicates how long should the worker wait. (For example, in Apache 2.0.95, MKA is advised to be 120 sec.) In MKA seconds, if no request is received, the connection will be shutdown, otherwise the connection is reused by the next request and turns to stage process. Figure 1 shows the life cycle of connections in web server. Note that accept and process are marked as resource-shared, for connections in these stages will occupy common system resources in such like CPU and disk. On the other hand, in the rest stages marked as private, connections don’t directly interfere with each other. Only resource-shared stages have waiting queues to buffer requests.
3.2
Petri Net Model Construction
As one advanced form of Petri Net (PN)[10], Stochastic Reward Nets (SRN)[11] is a powerful tool for modeling complicated systems and is widely used in the analysis of networking protocols and server performances. Even it is clear that, the presumptions of analysis by SRN (e.g. exponential distribution of both request interval and service time) could not be satisfied in real HTTP traffic and real servers, SRN are also very helpful. We could adjust the parameters for SRN model to give a conservative estimation, thus to obtain the lower bound of server performance. Hopefully, prediction made by SRN model can give us a more generic guide on persistent connection than experiments do. The SRN model for connection management mechanism is shown in Fig.2, and the semantics of important terms are: 1 In the following text, ‘worker’ refers to the concurrent unit in web servers (process or thread).
Figure 2: SRN Model for the Connection Management in Web Servers. • P laces describe the status of connections and queues in the server system, they are: (1) p5 contains all requests that need to get served in new connections. It initially has tokens of number C, namely the total number of concurrent clients. (2) p6 is the waiting queue that includes all established connections but are not yet delivered to workers. (3) p7 represents all connections being served. (4) p1, p2 and p3 hold connections in keep-alive status. (5) p4 contains potential ready workers. In the initial state it has P tokens, indicating the server could afford P workers at most. • T ransitions are abstracted from operations such like processing and dispatching in web servers. They are classified as immediate transition, which often describes selecting and preemption, and time transition, whose firing time obeys exponential distribution with average rate λ. In our model, transitions and their semantics are : (1) t5 is TCP handshaking routine before creating a new connection. (2) t6 represents the action to allocate resources for the new connection. (3) t7 means parsing the request and generating or retrieving the required document. (4) t1 and t2 form a random switch, indicating whether connection in keep-alive status will finally be reused or terminated. (5) t3 is the lingering procedure before disconnect and recreating connections. (6) t4 is the waiting period before reusing the same connection to handle the next request. Key terms of models are summarized in Table 1.
3.3 Parameters Determination Different values for parameters listed in Table 1 could lead us to totally different conclusions about persistent connection. In order to ensure the reliability of model analysis, we carry out the measurement on Sun XFire 2100 server with linux 2.6.17, which is a very common combination for modern web server. Besides, data related to clients’ behaviors are collected from raw logs of web server A.edu.It is the portal site of our university which usually handles 106 requests from all over the world per day. In discussion below about SRN model, we use M (p) to denote the number of tokens in place p. Also, ‘server’ is in its queueing theory semantic.
1
Table 1: Key Parameters for SRN model. term meaning
3.3.1
Probability
-0.1x
1E-3 1E-4
0
300
400
(10,0.759)
1
(20, 0.785)
• scut.edu.cn.(scut), in south China, across almost the whole country; • www.ubuntu.com.(den), overseas host, placed in Denmark. The script runs for 24 hours and average RTTs to three hosts are 0.57 ms (info), 39.96 ms (scut) and and 539.2 ms (den). For an online server, networking latencies are the mixture of three typical types. According to the IP classification of real trace, 10% requests are inside campus, 64% are domestic and other 26% are overseas. To make a conservative estimation, the cost of re-establishing connection could be regarded a bit higher. Thus we set: (1)
Creating Connection and Processing Request
t6 and t7 are associated with creating connections and processing requests correspondingly. They are both CPU-consuming operations, and could be regarded as being served by the common server. According to the scheduling algorithm in modern OS, the service policy is processor-sharing. We have to determine the service rate for single operation of each type. Most well-known concurrent web servers, such as Apache, mainly execute three kinds of system calls to create new connections. In POSIX-compliant OSes, they are: fork(), socket() and pipe(). Also, write() and read() might be invoked to synchronize workers. According to measurement, time cost by fork() and pipe operations is generally O(1) with the number of existing workers, which is from 120µs to 150µs. Quite the same, time cost by calling socket() is almost constant going with the number of existing sockets, which is from 1000µs to 1500µs, thus we could set: (2)
Regarding processing rate, since workloads are various, and web log provides less information about processing time, to estimate it for individual request is quite hard. Luckily a coarse estimation
(50, 0.812)
(5,0.721)
measured
(3,0.676)
0
20
40
60
80
100
measured
0.1
-0.11x
y=0.11e
-0.23x
y=0.23e
0.01
-0.50x
y=0.50e
-.0.70x
y=0.70e
-1.20x
1E-3
y=1.20e
1E-4 0
500 (M (p7) + M (p6))
200
Figure 3: Distribution of Request Processing Time. The slope line shows one conservative estimation using exponential distribution.
• info.thu.edu (info), a web site just within the same campus networks;
λdispatch =
100
Processing Time / us
In SRN model, t5 could be regarded as associating with infiniteserver, and its delay is mainly contributed by networking latency about 1 RTT. Typical RTT is measured by periodically executing ping tool. We choose a common weekday to fulfill the test - to ping hosts from our test-bed once every hour and collect outputs. Three stable hosts are selected in representation of clients with different networking distances:
3.3.2
y=0.1*e
1E-5
Networking Latency
λacc = M (p5) ∗ 10
measured
0.01
Cumulative Probability
λdispatch λprocess λreuse ptimeout pagain C P
avg. firing rate of t5 avg. firing rate of t6 avg. firing rate of t7 avg. firing rate of t4 prob. of not reusing connection prob. of reusing connection number of concurrent clients max number of workers in server
Probability
λacc
0.1
20
40
60
80
Interval / sec
Figure 4: Distribution and Cumulative Distribution of Request Arrival Intervals. that is approximate in magnitude can support our analysis. Some research works point out that most documents retrieved by web servers are static[12], which is also true according to our trace. Plus, the time consumption to generate dynamic documents is in proportion to their sizes[13]. Based on such observations, we could estimate the request processing time from the distribution of file sizes, which is recorded in web log. In practice, we test the access times corresponding to different file sizes, from 1 Byte to 1M Bytes. Combining them with the estimation of file size distribution coming from about 578790 classified web documents, we obtain the distribution of processing time, as depicted in Fig.3. An estimation of λprocess (ignoring networking delay) is given by the slope line Fig.3. Thus we have: λprocess =
106 (M (p7) + M (p6))
(3)
3.3.3 Arrival Intervals The arrival intervals, as well as different MKAs, correspond to different parameters for transitions t1, t2, t3 and t4. The intervals of HTTP requests directly determine the efficiency of persistent connection mechanism. To estimate the distribution of them, we extract about 600000 items from the web log and construct statistics for about 69194 clients. We calculate and count intervals between any of consecutive requests coming from the same client. Finally, probability density function together with its cumulative distribution is shown in Fig.4. On the other side, a series of typical MKAs (3, 5, 10, 20, 50 sec) are chosen. Since individual connection could not stay in keepalive status longer than MKA, pagain and ptimeout could be read
Table 2: Firing Rates and Enabling Probability for Transitions t1, t2, t3, t4, corresponding to different MKAs MKA
pagain
ptimeout
λreuse
λreconn
3 5 10 20 50
0.68 0.72 0.76 0.78 0.81
0.32 0.28 0.24 0.22 0.19
1.20 0.70 0.50 0.23 0.11
1.0 0.6 0.3 0.15 0.06
out from cumulative probability shown in Fig.4 (upper subfigure), and are listed in Table 2. Now that the distribution of arrival intervals is cut into two parts by the MKA point, we can study each part separately. For intervals that are smaller than MKA, the connection is almost surely reused. For different MKAs, we use different exponential functions to approach this part of the distribution. as shown in Fig.4. Note that intersections of exponential functions and real distribution curve are chosen equaling MKA points. For any arrival interval of requests that is larger than MKA, the connection must have been terminated, either by the client side, or due to keep-alive timeout on the server side. In this situation, the lifetime of single connection obeys a cut-off distribution, and we approximate it by an exponential function with λreconn = 3/MKA. For such approximation, the expectation of connection lifetime is MKA/3, and the probability of connection lifetime exceeding MKA is 1 − e−2 , that is only 5%. In fact, this is a quite conservative estimation. In real-world, any connection after serving its last request could stay as long as MKA, in the worst situation. At last, we define necessary transition enabling conditions. For t5, it is: M (p6) < 5 and for t6: M (p4) > 0.
3.4
Model Solution
One SRN model could be mapped uniquely to a continuous-time Markov Chain (CTMC), and all markings in this SRN could be related to states in the CTMC on one-on-one basis. In practice, we could use SRN analysis tool SPNP[14] to solve steady probabilities for CTMC states, and thus calculate some key metrics for SRN model, in this paper including the average number of available workers and the service responding latency perceived by clients. If we denote the steady probability of marking M as P (M), and u ¯p4 as the expectation of numbers of tokens in place p4, we have the expectation of ready workers: u ¯p4 =
P X
j ∗ P [M (p4) = j]
(4)
j=0
This metric shows how idle the service is. Besides, if we treat p5, t5, p6 and p6 as one sub queueing system, according to Little’s Law, the client-perceived responding time is u ¯p5 + u ¯p6 (5) T¯respond = R(t6, p7) in which R(t6, p7) as the expectation of number of tokens that move through t6 to p7 in unit time. It could be expressed by X P [M ] ∗ λd R(t6, p7) = (6) (M (p7) + M (p6)) M ∈E E is the set of all markings that enable t6.
Figure 5: Main Performance Metrics Given by SRN Analysis for underloaded situations (upper) and overloaded situations (lower). P=20. In the solution, we alter parameters for SRN according to Table 2, to demonstrate how MKA will affect the performance. On the other hand, given P=20, we regulate C to make both underloaded and overloaded situations. SPNP gives out metrics we desire, as shown in Fig.5. When concurrent connections are far less than the server capacity (C=10), the change to MKA does not bring any significant variance in service performance. As concurrency hikes (C=20), negative features of persistent connection emerge: available workers in server system fall to as few as 10%, making it is harder for new request to get admitted. Besides, extra workers blocked in keep-alive status will waste considerable amount of memory. Persistent connection worsens performance as workload goes heavy (C=30), shown in the sub figure below. On one side, latencies perceived by clients increase dramatically (from 5 sec to 25 sec or more) when MKA increases. On the other, most workers are blocked in keep-alive status, while the queues in system are jammed with requests (50% or more as reported). This is obvious unreasonable. Recalling our assumptions in both modeling and parameters determination, we find the real condition will be much worse than the model predicted. First, we underestimate latencies brought by request processing, and therefore underestimate the total latency perceived by clients. Besides, the sojourn time of connection after its last request is modeled as an exponential distribution, which is certainly shorter than that in real services. From the theoretical analysis by SRN, we have figured out the influence of persistent connection on server performance. For slightly loaded server, persistent connection consumes extra workers and memory but does not bring significant benefits, and for busy servers, large MKA leads to a dramatic increase in response latency.
4. EXPERIMENTS 4.1 Test Configuration To further validate the analysis results, we need experiments based on real server. As stated before, the validity of our evaluation work highly depends on the reliability of traffic trace. To ensure the reliability, a custom traffic generator tool called yatg2 is specially designed to carry out the experiment. Driven by logs from web servers, yatg extracts access records for individual users, and 2
Yet Another Traffic Generator
Figure 6: Total Number of Processes in Underload Scenario for MKA=10 (left), 20 (middle) and 50 (right).
Figure 7: Average User Perceived Request Latency in Overload Scenario for MKA=10 (left), 20 (middle) and 50 (right), every 2 seconds.
Figure 8: Average Number of Waiting Clients in Overload Scenario for MKA=10 (left), 20 (middle) and 50 (right), every 2 seconds.
Figure 9: Counts of Request Latency in Overload Scenario for MKA=10 (left), 20 (middle) and 50 (right).
Figure 10: Cumulative Probability Distribution of Request Latency for MKA=10 (left), 20 (middle) and 50 (right).
simulate each user’s behaviors by independent thread, including connecting, issuing HTTP request with specific document sizes, thinking time and disconnecting, etc. By doing these, yatg could generate very realistic and intricately overlapped HTTP request series. We use a segment of web log from 9PM to 10PM of one weekday, which is the peak time for web traffic, to feed yatg. In order to study both underloaded and overloaded situations, we setup two scenarios. In the former one, the web server is configured to use 150 workers; and in the latter one, the server is set to has 50 workers, a bit smaller than it usually has. MKA is set to 10sec, 20sec, 30sec respectively for both two scenarios. Every scenario takes almost 10 minutes. During runtime, the number of busy processes are recorded on the server side every 2 sec. The latency of every request is tracked by clients, and average of all latencies is calculated every 2 sec. After data collection, all latencies are counted. Also, the average number of waiting clients are recorded by yatg every 2 sec.
4.2
Results Analysis
With the same workload, the server system with different MKAs behaves quite different. Just as our model predicted, when server is underloaded, server with longer persistent connections puts more workers on work (Fig.6). And when it is overloaded, increase in MKA will degrade the server performance. That is, when MKA goes from 20 to 50, average request latency increases by about 300% (Fig. 7). This magnitude is largely consistent with the model results. At the same time, a large amount of clients are blocked in waiting status (Fig. 8). This is quite understandable, because limited workers are all occupied by connections in keep-alive status. When workloads is becoming heavier (after 200 sec), larger MKA worsens the performance (Fig. 7 and 8, right), just as the SRN model predicted. The experiments also reveal the problem about fairness that is not reflected by SRN modeling. The counts for all request latencies are shown in Fig. 9, and for convenience their corresponding probability cumulative functions are depicted in Fig. 10. These results indicate that the larger the MKA is, the less fair the service will be. When MKA is relatively low, say 10, there is only one sharp peak in its counts function, which means most latencies are clustered tightly and the service is fair enough. As MKA goes from 10 to 30, the counts function becomes smoother and even another peak emerges. From probability cumulative functions we could see it much clearer - the requests are divided into two parts sharply, namely the already admitted ones and not yet admitted ones. Because the keep-alive period is long, there is a high possibility that workers are made use of by the former sort cyclically, and a large part requests of the latter sort have few chances to get served. Apparently, the fairness is impaired by the increase in MKA. Generally speaking, in results coming from both analysis and experiments, we find that the merits brought by the persistent connection, e.g. to create new connections and wake workers less frequently, are totally overshadowed by its major shortcomings - extra waiting workers, larger latency and excessive memory consumption. From this perspective, the reason why persistent connection is not as effective as it used to be might be attributed to its assumption - forking and creating new connection are expensive operations but the maintenece of them are cheap - has been weakened. Specifically, modern cutting-edge web servers might afford forking 1k workers per sec, but they might suffer heavily from another 1k persistent connections. We also notice that there are many non-forking, single-thread web servers such like thttpd. In fact, most of them lack the
support for persistent connection, for they are basically needless to fork to admit any new connection. Those designs are smart and are in accordance with our conclusion from the implementation practice.
5. CONCLUSION Then main contribution of this paper is to re-evaluate the significance of persistent connection for modern web servers. We construct one SRN model for the connection management procedure in concurrent web servers, and determine its key parameters by measured metrics. Analytical results yielded by CTMC theory predicate that, for underloaded server, persistent connection may bring in extra busy workers, thus consuming extra memory, and for overloaded server, long persistent connection will increase request latency severely. Experiments using realistic traffic and test-bed not only verify SRN model’s prediction but also show that persistent connection will impair the fairness of service in overloaded servers. The results from both theoretical model and experiments challenges the long-held idea towards persistent connection, showing that for busy servers nowadays, the traditional mechanism of persistent connection does not scale well with server capacity and workloads.
6. REFERENCES [1] R. Fielding T. Berners-Lee and H. Frystyk. Hypertext transfer protocol – http/1.0. RFC 1945, May 1996. [2] J. Gettys and A. Freier. Http connection management. Internet draft, 1997. [3] Venkata N. Padmanabhan and Jeffrey C. Mogul. Improving http latency. In Proc. Second WWW Conference ’94: Mosaic and the Web, pages 995–1005, Chicago, IL, October 1994. [4] Henrik F. Nielsen and James Gettys. Network performance effects of http/1.1, css1 and png. In Proc. ACM SIGCOMM ’97, pages 155–166, Cannes, France, October 1997. [5] H. Frystyk Fielding R. J. Gettys, Jeffrey C. Mogul and T. Berners-Lee. Hypertext transfer protocol – http/1.1. RFC 2481, January 1999. [6] Henrik Frystyk Nielsen. Http performance overview, 2003. [7] Jeffrey C. Mogul. The case for persistent-connection http. Research Report 95/4, Digital Equipment Corporation Western Research Laboratory, May 1995. [8] Felix von Leitner. Scalable network programming. Technical report, Linux Congress, October 2003. [9] Dan Kegel. The c10k problem, 2006. [10] J. Peterson. Petri Net Theory and the Modeling of Systems, volume 1. Pretince-Hall, ENGLEWOOD CLIFFS, NJ, 1989. [11] G Ciardo et al. Automated genearation analysis of markov reward models using stocahstic reward nets. Linear Algebra, Markov Chains, and Queueing Models, 48:145–191, November 1993. [12] Nikhil Bansal Mor Harchol-Balter, Bianca Schroeder and Mukesh Agrawal. Size-based scheduling to improve web performance. ACM Transcations on Computer Systems, 21:207–233, May 2003. [13] Mark S. Ghosh and S. Squillantel. Analysis and control of correlated web server queues. Computer Communication, 27:1771–1785, May 2004. [14] Trividi K S Ciardo G, Muppala J. Spnp: stochastic petri net package. In Proc. of the Petri Nets and Performance Models, pages 142–151, Kyoto, Japan, December 1989.