Efficient Caching Mechanism in Proxy Servers S.Rajeev1 Member IEEE, K.V.Sreenaath2, A.S.Bharathi Manivannan3 1

Department of Electronics and Communication Engineering, PSG College of Technology, Coimbatore 2 Software Engineer, Motorola India Pvt. Ltd, Bangalore 3 Software Engineer, Caritor, Chennai email : [email protected], [email protected], [email protected]

Abstract – Caching is the mechanism for storing frequently accessed objects. Web caching mechanism is used in proxy servers in World Wide Web (WWW) to cache web pages accessed by clients. Web caching in proxy servers in real time should be carried on with high efficiency and accuracy. Data structures using hashing mechanisms such as bloom filters are used for web caching in proxy servers. Conventional bloom filter perform poorly when restrictions on both time efficiency and accuracy are placed. In this paper extensions to bloom filters are proposed for efficient caching in proxy servers. Keywords – Web Caching, Proxy Server, World Wide Web, Bloom Filter



Caching in proxy servers is the process of storing the recently accessed web page URLs along with its contents. Proxy Server acts as a proxy for the real server in World Wide Web (WWW) to which the client’s request for a web page (URL) is being sent. The whole process of a web page request from the client, to its delivery is shown in Figure. 1. The Clients in the local network are connected to the Internet (WWW) through a proxy server.

Figure.1 Real Time Internet Involving Proxy Servers

The client in the local network let us say “Client 2”, requests for the URL “www.yahoo.com”. The request goes through the proxy server. The proxy server first checks for the presence of the requested URL in its cache. If the requested URL is present in its cache then the proxy server responds back to the client with the web page of the requested URL. In this case the request does not go to the remote server through the public domain (Internet). If the requested URL is cached in the proxy server then the proxy server forwards the request to the destination server, which in this case is the remote server (Yahoo server). The remote

server processes the client’s request and responds back. The proxy server caches this page in its web cache so that any further request for that particular URL need not go to the destination server through the public domain. Any further request for the cached page will be responded back by the proxy server itself. However there will be time limit and timeout in the cache for the cached page in the proxy server. In this way further needless requests (i.e. the same requested URL which is cached in the proxy server) flowing through the public domain can be prevented, resulting in less traffic in the public domain. Conventional mechanism using hashing techniques such as Bloom Filters [1], [6] are used for web caching in the proxy servers in WWW. The Bloom filter is a way of using hash transforms [2] to determine set membership [3]. They are very poor for real time usage due to ‘high error rate’ and ‘lesser rejection time’. Error occurs when two or more transforms map to the same element. The membership test for a key K , works by checking the elements that would have been updated if the key had been inserted into the vector. If all the appropriate flag bits have been set by hashes then K will be reported as a member of the set. If the elements have been updated by hashes on other keys- and not K - then the membership test will incorrectly report the element as being present. This is how error occurs in a conventional bloom filter where a non member is reported as a member where the cells are filled by some other hash transform for some other key. ‘Rejection time’ in bloom filter is defined as the time taken for the number of hash functions required for rejecting a non-member. In conventional bloom filter the rejection time is very less. These two performance metrics (i.e. error rate and rejection time) play a very important role in the real time usage of bloom filters for web caching. These two performance metrics can be drastically improved to suit for web caching by using extensions to the Conventional Bloom filter which is mathematically proven in section III.

II. CONVENTIONAL METHOD FOR CACHING IN PROXY SERVERS A. Web Caching Mechanism Proxy Servers are widely used in World Wide Web (WWW). The WWW can be viewed as a distributed system for delivering documents which are sent by servers to clients only upon request. Caching improves performance when clients obtain copies of files from neighboring servers (i.e. proxy servers) instead of from the destination server (which may be several slow network links away). Proxy servers intercept requests from clients and either fulfill the requests themselves or re-issue them to servers [4]. If the proxy can obtain a copy of the document from a cache (either its own or that of a nearby co-operating proxy) then the document is retrieved from the cache and a cache hit is registered. The hit rate of a cache is a measure of a cache's effectiveness. Proxies are typically deployed as hierarchies (which mimic logical network architectures) or as a series of co-operating proxies (without regard to network architecture) [4]. The performance of a Web cache scheme depends on the size of its client community; the bigger the user community, the higher the probability that a cached document will soon be requested again.

B. Shortcomings of Conventional Bloom Filter Conventional Bloom Filters perform poorly in conditions where both time efficiency and accuracy are critical [5]. The Conventional Bloom Filter can be made error-free by Modified Bloom Filters [5] which introduce additional fields in each cell that help identify all cells that have been set by the same key. Two cells that are set by different keys will both contain a set bit, followed by a field (bit-pattern), which is different in the two cells. The unique field that can be incorporated to make Conventional Bloom Filters error-free is the time of updating. Incorporating the time of updating would require 8 Bytes of storage space, which is 64 times the storage space required for Conventional Bloom filters. A solution to this problem is in reducing the precision time of updating field, with a corresponding reduction in accuracy. An obvious choice of doing this is by using Random numbers. III. MATHEMATICAL ANALYSIS AND PERFORMANCE RESULTS The performance of Modified Bloom Filters is analyzed: A. Web Caching Mechanism INITIALIZE - The naive implementation requires O ( N )

Bloom filters are used in Web caches to efficiently determine the existence of an object (existence of an URL) in a cache [James Blustein, 02]. The proxy server implements the web caching through bloom filters. Whenever the presence for a URL in the cache needs to be checked the URL is hashed and checked for membership test in the bloom filter. If the membership test proves successful then it means that the URL is already cached by the proxy server and the corresponding page is responded back to the client by the proxy server. If the membership test fails then it means that the requested URL is not cached in the proxy server. So the request is forwarded to the destination server which responds back to the client. When the response comes back to the client through the proxy server, the proxy server caches the requested web page and hashes that URL and adds entry in to the bloom filter so that further requests for that URL will go for membership test and the whole process repeats. Bloom Filters are used for Web cache sharing too. Web caches are shared to reduce message traffic. Caching proxies are implemented so as not to transfer the exact content of their caches (i.e., lists of URLs) but instead to broadcast much smaller Bloom filters that represent the contents of the cache. If a proxy wants to determine if another proxy has a page in its cache, it checks the appropriate Bloom filter.

time, however if N is the size of a native data type then it can be done in constant time. INSERT-Insertion requires the computation of m hash transforms, each of which requires O (1) time. (Since collision detection is not necessary all the hash transforms have to do is compute values.) Insertion therefore takes O ( m) time per key or O ( mK ) for all K keys. ISMEMBER-In order to check for membership we need to compute as many as m transforms. The time taken for the rejection of a non-member is given below. In the average case, on an average only two transforms will be required to reject any non-member. In the worst case, when the key is a member of the set, the time complexity is O ( m ) . B. Relationship between Parameters The behavior of Randomized Bloom Filters is determined by four parameters: N -The number of elements (or cells) in the Filter m -The number of hash transforms to be used K -The Key f -The fraction of elements (or cells) that are set in the Filter x -Number of decimal places in the random number used, i.e. the precision of the random number used.

Here we derive equations that describe the relationship between these factors to predict the amount of space the Modified Bloom Filter will require. The expected fraction of false positive results given the parameter values is the error rate. The rejection time is the expected number of hashes that will be required to determine that a key is not a member of the set.

d h ∞  1  = (1 − f ) × ∑ f × ∑ x  h =1 df h =1  10 

Taking Differentiation outside the summation we get,

= (1 − f ) ×

= (1 − f ) ×

In the Conventional Bloom Filters, the rejection time when using m hash transforms is given by


1 Rejection Time = where f ≤ 1 1− f

h =1


In the Modified Bloom Filters, the rejection time when using m hash transforms with random numbers taken till the x th decimal point is as given below. If f is the fraction of the non zero elements in the Modified Bloom Filter then The Probability of not rejecting a non member by the first hash transform = 1 − f Assuming that the hash transforms are independent, Probability of not rejecting a non member by further hash transforms

= 1−

d df


h =1

 f  = ∑ h × (1 − f ) ×  x   10  h =1 m

h −1

 f  = (1 − f ) × ∑ h ×  x   10  h =1 m

(2) h −1

 f  ≤ (1 − f ) × ∑ h ×  x   10  h =1 ∞



 f  h× x  ∑  10  h =1








as sum of two geometric series.

= (1 − f ) × ∑ h × f h =1

h −1



h =1

h =1

f h and ∑ h =1


(10 )

x h −1


(10 )

x h −1

geometric series, a, ar , ar


where f ≤ 1

is in the form of a

,.... we have

 d  1   1 = (1 − f ) ×   × 1 + df  1 − f   1 − 1 10 x 

   where f ≤ 1  

  1 × 1 + 1   1− x 10 

   where f ≤ 1   

= (1 − f ) ×


(1 − f )


−1   1 1    = × 1 + 1 −  where f ≤ 1 (1 − f )   10 x  


1   1 − x   10 




is in the form of (1 + x ) expanding it n

using Binomial Expansion we have

h −1

h −1

f . 10 x

In general then the probability of h hashes being required to reject a non-member


h −1

∞  1  f h × ∑  x  where f ≤ 1 h =1  10 

d df

C. Rejection Time The time taken for the number of hash functions required for rejecting a non-member is called as rejection time

h −1

 1  × ∑ x  h =1  10 

   where f ≤ 1   


when using random numbers with the rather limited precision of up to the second decimal, equation (8) becomes:

h −1

Writing equation (4) in terms of differentiation we get,

  1 2 = × (1 − f )  1 − 1   10 x



1 × 2.02 (1 − f )


The achieved rejection time, with the use of only 2 hash transforms, is thus twice rejection time of the conventional Bloom Filter. The comparison of the rejection times of both

the conventional Bloom Filter and Modified Bloom Filter is shown in Figure.2.

Even when using only 2 hash transforms ( m = 2) , and using random numbers with the rather limited precision of up to the second decimal, equation (11) becomes: (12) = f × f / 100 The achieved error rate, with the use of only 2 hash transforms, is thus 100 times lower than the error rate in a conventional Bloom Filter. The comparison between the Error rates of the Conventional Bloom Filter and Modified Bloom Filter for m=2 is shown in Figure.3.


D. Error Rate The expected fraction of false positive results – cases of nonmembers being falsely identified as members – given the parameter values is called the error rate.


IV. EFFICIENT CACHING MECHANISM IN PROXY SERVERS The error rate in a conventional Bloom Filter is very high which is shown in equation (10). Moreover the rejection time is also less as shown in equation (1) (which means that the probability of only a non member being rejected is less). But proxy servers are special-purpose servers that must operate with high efficiency in real time. For efficient caching in proxy servers, error rate should be lesser and rejection time should be higher (in order that only a non member is rejected). This can be achieved by using the Modified Bloom Filter where the error rate is very less when compared to a conventional bloom filter (as in equation (10), (11) and (12)) and rejection time is more than that of the conventional filter (as in equation (1), (8) and (9)).

Figure. 2. Rejection Time Comparison

In the conventional Bloom Filters the Error rate when using m hash transforms is given by m (10) Error rate = f In the Modified Bloom Filters, the error rate when using m hash transforms with random numbers taken till the x -th decimal point is: The probability that each of the m hash transforms hash to a cell that is non-zero & and all those cells contain the same random number. This probability is given by:


) (


= f × f / 10 x × f / 10 x × .......up to m terms


Whenever a page is requested by a client and responded back by the server, the proxy server through which the request and response to the client passes, caches the page. Let us assume that we use three hash functions h1, h2 and h3 in a Modified Bloom Filter of size 8 cells. The proxy server hashes the URL of the page and adds entry in the corresponding cells of the Modified Bloom Filter. Let us assume that the URL of the page is “google.com”. Let us assume that on hashing the URL with the three hash functions we get

h1(google.com) = 3; h2(google.com) = 7; h3(google.com) = 1; time_of_updating (google.com) = ‘00:00:01’; where time of updating = Hours : Minutes : Seconds. Then the Modified Bloom Filter is updated with the hash entries as follows

1 2 00:00:01 0 7 00:00:01 Figure. 3. Error Rate Comparison for m=2

3 4 5 00:00:01 00:00:03 0 8 0

6 0

When the page (i.e. URL-“google.com”) is requested by the client, the proxy server hashes the URL of the page and checks for membership test in the Modified Bloom Filter. The URL “google.com” is hashed by the three hash functions which is shown below.

h1(google.com) = 3; h2(google.com) = 7; h3(google.com) = 1; Then the content of the three cells ‘1, 3, 7’ of the Modified Bloom Filter is checked for its content. Since the “time of updation” in all the three cells is same the URL“google.com” is returned back to the client from the web cache. If the membership test fails then the requested page is retrieved from the target web server.







Modified Bloom Filters can also be used in cache digests. A cache digest is essentially a lossy compression of all cache keys with a lookup capability. Digests are made available via HTTP (the main network protocol of the WWW), and a cache downloads its neighbors digest at startup. By checking a neighbor's digest, a cache can determine with certainty if a neighboring cache does not hold a given object. Their use in cache digest allows caches to efficiently inform each other about their contents without any per-request delays. The main goal is to reduce `cache directory' size while keeping the number of collisions low. Modified Bloom Filters are an efficient way of web caching in proxy servers with minimal error rate. Modified Bloom Filters are used as a space-and time-efficient method to keep track of which URLs are requested and are much more efficient than Conventional Bloom Filter for web caching in proxy servers as shown in equation (1), (7), (8) ,(9), (10) and (11) and in Figure.2 and Figure.3. V. CONCLUSIONS Caching mechanism deployed in Proxy Servers, in real time, needs high accuracy and less error rate. Modified Bloom Filter has the main advantage of reduced error rate and increased rejection time. The proposed model overcomes all the limitations of Conventional Bloom Filter, which is more suited for efficient caching mechanism in proxy servers in real time World Wide Web (WWW).

ACKNOWLEDGEMENTS The authors wish to thank the Network System Design Center In-charge in The Department of Electronics and Communication Engineering, PSG College of Technology and Intel Inc., for providing facilities to implement and test the Modified Bloom Filter on routers constructed using Intel IXP1200 and IXP2400 Network processors.

Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, “Concrete Mathematics -A Foundation for Computer Science” Chapter 8.5 , Addison-Wesley Publishing Company, pp 397-412,1989. Zbigniew J. Czech, George Havas, and Bohda S. Majewski, “An optimal algorithm for generating minimal perfect hash functions”, Information Processing Letters, Vol. 43 No. 5, pp257-264, 1992. James Blustein and Amal El-Maazawi, “Bloom Filters- A Tutorial, Analysis, and Survey”, Technical Report CS-2002-10. Faculty of Computer Science, Dalhousie University, Canada, 2002. Jia Wang, “A survey of web caching schemes for the internet”, ACM SIGCOMM Computer Communication Review, Vol. 29, No. 5, pp36– 39, 1999. S. Rajeev, S.N. Sivanandam, Suren.M, Govindraj.J, “Randomized Bloom Filters for Time Critical Applications in Mobile Handheld Devices”, Proc. Intl. Conf. on Communications, Devices and Intelligent Systems , India, 2004. Burton H. Bloom: Space/time trade-offs in hashing coding with allowable errors. Communications of the ACM, Vol. 13(7) pp. 422 426, 1970.

Efficient Caching Mechanism in Proxy Servers

Web caching mechanism is used in proxy servers in World Wide Web ..... “Bloom Filters- A Tutorial,. Analysis, and Survey”, Technical Report CS-2002-10.

47KB Sizes 1 Downloads 165 Views

Recommend Documents

Seamless mobility management based on proxy servers
future, wireless service providers will start to provide new, enhanced wireless data services using ... interface card) for high data-rate services. Thus, these users.

Unobserved Investment and Efficient Mechanism for ...
The buyer and the mechanism designer cannot distinguish if the good is high ..... We are required to define the following notations to characterize the solution:.

HPC5: An Efficient Topology Generation Mechanism for ...
networks (Gnutella, FastTrack etc) are the most popular file-sharing overlay .... collects the address of an online ultra-peer from a pool of online ultra-peers.

Self-Manageable Replicated Servers
Replication is a well-known approach to provide service scalability and availability. Two successful applications are data replication [6], and e-business server.

The Effect of Caching in Sustainability of Large Wireless Networks
today, [1]. In this context, network caching has a key role, as it can mitigate these inefficiencies ... nectivity, and deliver high quality services as the ones already.

Development of environmental management mechanism in Myanmar
Jun 17, 2008 - the effort to keep a balance between development and environment, Myanmar has made efforts and will ..... 4.4.6 Application management.

Caching layer PageSpeed server - GitHub
www.example.com/index.html. PageSpeed server. Partially rewritten response for www.example.com/index.html with reinstrumentation done. Cache miss/expiry.

Java-based HTTP proxy server
standing between the mobile handset and web ... based on WSP(Wireless Session Protocol [2])/HTTP ... order to enable handset to access the IP network,.

Geographically Fair In-Network Caching for Mobile Data Offloading
... download popular contents (e.g., maps of a city, shopping information, social media, ... content popularity, even limited-size caches at the network edges can ... locations in the network. ..... store only copies of the most popular contents and