Enabling Efficient Content Location and Retrieval in Peer-to-Peer Systems by Exploiting Locality in Interests Kunwadee Sripanidkulchai, Bruce Maggs, Hui Zhang, Carnegie Mellon University

Challenges Challengesposed posedby bypeer-to-peer peer-to-peer • Want scalable and high performance content location and peer

Gnutella overlay

selection. Existing solutions provide scalable location, but have not addressed peer selection. • Retrieval performance between end-hosts is highly variable and dynamic.

Content Peer list overlay

May 1, 2001

High variability (σ ≈ 1 sec) in ping times over a 24-hour period to a random end-host on the Internet (typical for 1/3 of 2400 end-hosts pinged in our experiments)

4

Ping Time (ms)

10

3

10

3) The list evolves as more content is retrieved and more peers are discovered

Content location Queries for content are sent to peers in the list

18:00:00

00:00:00 Time

06:00:00

12:00:00

• Need to use up-to-date performance state to select a peer • For scalability, cannot maintain up-to-date state for all peers • Which peers should we maintain state for? - Peers that have locality in interests

Locality in interests Observation: people share common interests. Can we exploit this to improve content location and retrieval? D, E, F 0/3

Fine-grained dynamic performance state can be maintained for peers on the list

Potential benefits and overhead • Use Boeing corporate web proxy traces to drive the request stream for the simulations • Treat a request for a new document (a compulsory cache miss for a web cache) as a publish in peer-to-peer system • Ran simulations over a period of 5 minutes to 3 hours • Content location algorithms are based on - Asking random peers - Asking peers with same interests (1 hop) - Asking peers of peers with same interests (2 hops)

A, C, D, E

2/3 0/3

3/3 A, B, C, D

Peer selection and content retrieval

A, B, C

60

11

max

F, G, H

10

50 9 content−based 1 hop 8

Miss rate (%)

Proposedsolution solution Proposed

average over 16 runs

30

min

Number of peers

random

40

7

content−based 2 hops

6

5

20 4

A distributed algorithm for peers to self-organize into clusters based on interests (peer list) Why is it easier to incorporate dynamic performance state when using locality in interests to locate and retrieve content? - Only need to keep performance state for peers that are likely to provide the content one is looking for.

Peer list 1) Each peer maintains a list of peers who share the same interests 2) Peer lists are initially bootstrapped using existing protocols, such as Gnutella, Tapestry, Pastry, CAN, or Chord. We use the following heuristic: peers that have the content you are looking for have the same interests.

content−based 1 hop

3

content−based 2 hops

2

10

0

0

2000

4000

6000 Simulation length (s)

8000

10000

12000

Locating content amongst peers with locality in interests results in low miss rates

1

0

2000

4000

6000 Simulation length (s)

8000

10000

12000

Maintaining a small list of peers who share the same interests provides good hit rate

Implementationstatus status Implementation • Refining algorithm by ranking peers in one’s list to select peers that are more likely to have content • Exploring alternative mechanisms to bootstrap peer lists • Developing techniques for incorporating dynamic performance state into algorithm • Implementing our solution using Gnutella to bootstrap peer lists

Enabling Efficient Content Location and Retrieval in ...

May 1, 2001 - Retrieval performance between end-hosts is highly variable and dynamic. ... miss for a web cache) as a publish in peer-to-peer system.

56KB Sizes 2 Downloads 272 Views

Recommend Documents

Enabling Efficient Content Location and Retrieval in Peer ... - CiteSeerX
Peer-to-Peer Systems by Exploiting Locality in Interests. Kunwadee ... Gnutella overlay. Peer list overlay. Content. (a) Peer list overlay. A, B, C, D. A, B, C. F, G, H.

Enabling Efficient Content Location and Retrieval in ...
service architectures peer-to-peer systems, and end-hosts participating in such systems .... we run simulations using the Boeing corporate web proxy traces [2] to.

Efficient Content Location Using Interest-Based Locality ...
Section VIII, and related work in Section IX. II. ... First, shortcuts are modular in that they can work with ..... participate in a Web content file-sharing system.

Efficient Speaker Identification and Retrieval - Semantic Scholar
identification framework and for efficient speaker retrieval. In ..... Phase two: rescoring using GMM-simulation (top-1). 0.05. 0.1. 0.2. 0.5. 1. 2. 5. 10. 20. 40. 2. 5. 10.

Unsupervised, Efficient and Semantic Expertise Retrieval
a case-insensitive match of full name or e-mail address [4]. For. CERC, we make use of publicly released ... merical placeholder token. During our experiments we prune V by only retaining the 216 ..... and EX103), where the former is associated with

Social Caching and Content Retrieval in Disruption ...
Epidemic routing [10], which floods the entire network. ... popular data at high social-level nodes to which most content ... 2015 International Conference on Computing, Networking and Communications, Wireless Ad Hoc and Sensor Networks.

Indexing Shared Content in Information Retrieval Systems - CiteSeerX
We also show how our representation model applies to web, email, ..... IBM has mirrored its main HR web page at us.ibm.com/hr.html and canada.

CONTENT LOCATION IN PEER-TO-PEER SYSTEMS: EXPLOITING ...
Jan 18, 2001 - several different content distribution systems such as the Web and ..... host is connected to monitoring ports of the two campus border routers.

CONTENT LOCATION IN PEER-TO-PEER SYSTEMS: EXPLOITING ...
Jan 18, 2001 - several different content distribution systems such as the Web and popular peer- .... (a) Top 20 most popular queries. 1. 10. 100. 1000. 10000. 100000 ..... host is connected to monitoring ports of the two campus border routers. .....