Enabling Efficient Content Location and Retrieval in Peer-to-Peer Systems by Exploiting Locality in Interests Kunwadee Sripanidkulchai, Bruce Maggs, Hui Zhang, Carnegie Mellon University
Challenges Challengesposed posedby bypeer-to-peer peer-to-peer • Want scalable and high performance content location and peer
Gnutella overlay
selection. Existing solutions provide scalable location, but have not addressed peer selection. • Retrieval performance between end-hosts is highly variable and dynamic.
Content Peer list overlay
May 1, 2001
High variability (σ ≈ 1 sec) in ping times over a 24-hour period to a random end-host on the Internet (typical for 1/3 of 2400 end-hosts pinged in our experiments)
4
Ping Time (ms)
10
3
10
3) The list evolves as more content is retrieved and more peers are discovered
Content location Queries for content are sent to peers in the list
18:00:00
00:00:00 Time
06:00:00
12:00:00
• Need to use up-to-date performance state to select a peer • For scalability, cannot maintain up-to-date state for all peers • Which peers should we maintain state for? - Peers that have locality in interests
Locality in interests Observation: people share common interests. Can we exploit this to improve content location and retrieval? D, E, F 0/3
Fine-grained dynamic performance state can be maintained for peers on the list
Potential benefits and overhead • Use Boeing corporate web proxy traces to drive the request stream for the simulations • Treat a request for a new document (a compulsory cache miss for a web cache) as a publish in peer-to-peer system • Ran simulations over a period of 5 minutes to 3 hours • Content location algorithms are based on - Asking random peers - Asking peers with same interests (1 hop) - Asking peers of peers with same interests (2 hops)
A, C, D, E
2/3 0/3
3/3 A, B, C, D
Peer selection and content retrieval
A, B, C
60
11
max
F, G, H
10
50 9 content−based 1 hop 8
Miss rate (%)
Proposedsolution solution Proposed
average over 16 runs
30
min
Number of peers
random
40
7
content−based 2 hops
6
5
20 4
A distributed algorithm for peers to self-organize into clusters based on interests (peer list) Why is it easier to incorporate dynamic performance state when using locality in interests to locate and retrieve content? - Only need to keep performance state for peers that are likely to provide the content one is looking for.
Peer list 1) Each peer maintains a list of peers who share the same interests 2) Peer lists are initially bootstrapped using existing protocols, such as Gnutella, Tapestry, Pastry, CAN, or Chord. We use the following heuristic: peers that have the content you are looking for have the same interests.
content−based 1 hop
3
content−based 2 hops
2
10
0
0
2000
4000
6000 Simulation length (s)
8000
10000
12000
Locating content amongst peers with locality in interests results in low miss rates
1
0
2000
4000
6000 Simulation length (s)
8000
10000
12000
Maintaining a small list of peers who share the same interests provides good hit rate
Implementationstatus status Implementation • Refining algorithm by ranking peers in one’s list to select peers that are more likely to have content • Exploring alternative mechanisms to bootstrap peer lists • Developing techniques for incorporating dynamic performance state into algorithm • Implementing our solution using Gnutella to bootstrap peer lists
Peer-to-Peer Systems by Exploiting Locality in Interests. Kunwadee ... Gnutella overlay. Peer list overlay. Content. (a) Peer list overlay. A, B, C, D. A, B, C. F, G, H.
service architectures peer-to-peer systems, and end-hosts participating in such systems .... we run simulations using the Boeing corporate web proxy traces [2] to.
Section VIII, and related work in Section IX. II. ... First, shortcuts are modular in that they can work with ..... participate in a Web content file-sharing system.
a case-insensitive match of full name or e-mail address [4]. For. CERC, we make use of publicly released ... merical placeholder token. During our experiments we prune V by only retaining the 216 ..... and EX103), where the former is associated with
Epidemic routing [10], which floods the entire network. ... popular data at high social-level nodes to which most content ... 2015 International Conference on Computing, Networking and Communications, Wireless Ad Hoc and Sensor Networks.
Jan 18, 2001 - several different content distribution systems such as the Web and ..... host is connected to monitoring ports of the two campus border routers.
Jan 18, 2001 - several different content distribution systems such as the Web and popular peer- .... (a) Top 20 most popular queries. 1. 10. 100. 1000. 10000. 100000 ..... host is connected to monitoring ports of the two campus border routers. .....