Enabling Efficient Content Location and Retrieval in ...

Viewer
Transcript

Enabling Efficient Content Location and Retrieval in Peer-to-Peer Systems by Exploiting Locality in Interests Kunwadee Sripanidkulchai, Bruce Maggs, Hui Zhang, Carnegie Mellon University

Challenges Challengesposed posedby bypeer-to-peer peer-to-peer • Want scalable and high performance content location and peer

Gnutella overlay

selection. Existing solutions provide scalable location, but have not addressed peer selection. • Retrieval performance between end-hosts is highly variable and dynamic.

Content Peer list overlay

May 1, 2001

High variability (σ ≈ 1 sec) in ping times over a 24-hour period to a random end-host on the Internet (typical for 1/3 of 2400 end-hosts pinged in our experiments)

4

Ping Time (ms)

10

3

10

3) The list evolves as more content is retrieved and more peers are discovered

Content location Queries for content are sent to peers in the list

18:00:00

00:00:00 Time

06:00:00

12:00:00

• Need to use up-to-date performance state to select a peer • For scalability, cannot maintain up-to-date state for all peers • Which peers should we maintain state for? - Peers that have locality in interests

Locality in interests Observation: people share common interests. Can we exploit this to improve content location and retrieval? D, E, F 0/3

Fine-grained dynamic performance state can be maintained for peers on the list

Potential benefits and overhead • Use Boeing corporate web proxy traces to drive the request stream for the simulations • Treat a request for a new document (a compulsory cache miss for a web cache) as a publish in peer-to-peer system • Ran simulations over a period of 5 minutes to 3 hours • Content location algorithms are based on - Asking random peers - Asking peers with same interests (1 hop) - Asking peers of peers with same interests (2 hops)

A, C, D, E

2/3 0/3

3/3 A, B, C, D

Peer selection and content retrieval

A, B, C

60

11

max

F, G, H

10

50 9 content−based 1 hop 8

Miss rate (%)

Proposedsolution solution Proposed

average over 16 runs

30

min

Number of peers

random

40

7

content−based 2 hops

6

5

20 4

A distributed algorithm for peers to self-organize into clusters based on interests (peer list) Why is it easier to incorporate dynamic performance state when using locality in interests to locate and retrieve content? - Only need to keep performance state for peers that are likely to provide the content one is looking for.

Peer list 1) Each peer maintains a list of peers who share the same interests 2) Peer lists are initially bootstrapped using existing protocols, such as Gnutella, Tapestry, Pastry, CAN, or Chord. We use the following heuristic: peers that have the content you are looking for have the same interests.

content−based 1 hop

3

content−based 2 hops

2

10

0

0

2000

4000

6000 Simulation length (s)

8000

10000

12000

Locating content amongst peers with locality in interests results in low miss rates

1

0

2000

4000

6000 Simulation length (s)

8000

10000

12000

Maintaining a small list of peers who share the same interests provides good hit rate

Implementationstatus status Implementation • Refining algorithm by ranking peers in one’s list to select peers that are more likely to have content • Exploring alternative mechanisms to bootstrap peer lists • Developing techniques for incorporating dynamic performance state into algorithm • Implementing our solution using Gnutella to bootstrap peer lists