Atomic Broadcast Generic Broadcast

Latency-optimal fault-tolerant replication Piotr Zieli´ nski Computer Laboratory University of Cambridge

May 24, 2005

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Hotel booking system

Protocol

Client

1

client → server: “book room 5”

2

server → client: “room booked”

book room 5

Server

client server

room booked

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Fault tolerance by replication Problem a single server crash blocks the entire system

A

A

A

A

B

B

C

C

Solution introduce many servers system still usable despite some servers being down

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Consistency problems Problem messages might reach the replicas in different orders A and B book the room to

client , replica C to client . results: unpredictable

A B C

Solution ensure that replicas receive requests in the same order by using Atomic Broadcast to disseminate requests Piotr Zieli´ nski

A B C Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Outline

1

Atomic Broadcast Chandra-Toueg algorithm Latency considerations

2

Generic Broadcast General approach Infinitely many instances with finite resources

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Atomic Broadcast

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

System model and problem specification

Atomic Broadcast

atomic broadcast

clients atomically broadcast messages, such as and replicas atomically deliver them replicas atomically deliver all messages in the same order fault-tolerant

A B C

formal definition

Piotr Zieli´ nski

Atomic Broadcast atomic delivery

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Our goal

Goal Atomic Broadcast algorithm that is as fast as possible when no failures occur

System assumptions processes communicate by sending/receiving messages no time bounds for messages, no clocks processes can fail by crashing, no malicious faults less than a half/third of the servers can crash unreliable leader oracle Ω

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Consensus A B C

1 3

3 Consensus

6

3 3

propose

decide

Validity Every decision was proposed by some process. Agreement No two processes decide differently. Termination All correct processes eventually decide. proposal/decision events can occur at different times Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Consensus A B C

xA xB

x Consensus

xC

x x

propose

decide

Validity Every decision was proposed by some process. Agreement No two processes decide differently. Termination All correct processes eventually decide. proposal/decision events can occur at different times Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Implementing Consensus A B C

1

3

3

3

6

3

propose

decide

Comments step 1: servers learn the proposal of the leader step 2: majority confirmation round, for fault-tolerance if no decision, repeat with a new leader the leader might not be unique Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Implementing Consensus A B C

1

3

3

3

6 propose

decide

Comments step 1: servers learn the proposal of the leader step 2: majority confirmation round, for fault-tolerance if no decision, repeat with a new leader the leader might not be unique Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Implementing Consensus A B C

1

3

3 6

3

propose

decide

Comments step 1: servers learn the proposal of the leader step 2: majority confirmation round, for fault-tolerance if no decision, repeat with a new leader the leader might not be unique Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Implementing Consensus A B C

1

1

3 6

1

propose

decide

Comments step 1: servers learn the proposal of the leader step 2: majority confirmation round, for fault-tolerance if no decision, repeat with a new leader the leader might not be unique Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Implementing Consensus A B C

1

3

3

3

6

3

propose

decide

Comments step 1: servers learn the proposal of the leader step 2: majority confirmation round, for fault-tolerance if no decision, repeat with a new leader the leader might not be unique Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg Atomic Broadcast algorithm

Chandra-Toueg algorithm uses a sequence of Consensus instances Cons1 , Cons2 , . . . in each instance Consi , replicas 1 2 3

propose the first i messages decide on some set {m1 , . . . , mi } atomically deliver m1 , . . . , mi

no message delivered twice

A B

Cons1

C

propose { } decide { }

instances Consi can run in parallel

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm Example

A

Cons1

B

Cons2

C

propose { }

decide { } propose { , } decide { , }

Comments Cons1 : all propose { }, decide on { }, and deliver Cons2 : all propose { , }, decide on { , }, and deliver all replicas deliver

before Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm Example

A

Cons1

B

Cons2

C

propose { }

decide { } propose { , } decide { , }

Comments Cons1 : all propose { }, decide on { }, and deliver Cons2 : all propose { , }, decide on { , }, and deliver all replicas deliver

before Piotr Zieli´ nski

(even if failures occur) Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm Code 1 2 3 4 5 6 7 8 9 10

when a client atomically broadcasts m do broadcast m to all replicas task proposing at every replica is for k = 1, 2, . . . do wait for some message mk propose Mk = {m1 , . . . , mk } to Consensus instance k task delivery at every replica is for k = 1, 2, . . . do wait until Consensus instance k decides on some Mk atomically deliver all undelivered messages in Mk in order

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm

A B C

Comments message

to A is delayed

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm Replica A B C

1:{ } 2:{ , }

A proposes B proposes C proposes

Cons1

Cons2

{ } { } { }

{ , } { , } { , }

1:{ } 2:{ , } 1:{ } 2:{ , } { , }={ , }

Comments

message to A is delayed replicas start instances of Consensus at different times Cons1 : A proposes { }, B and C propose { } Cons2 : all replicas propose { , }, and decide on { , }

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm Replica A B C

1:{ } 2:{ , } 1:{ } 2:{ , } 1:{ } 2:{ , }

A proposes B proposes C proposes

all decide all deliver

Cons1

Cons2

{ } { } { }

{ , } { , } { , }

{ }

{ , }

{ , }={ , }

Comments

message to A is delayed replicas start instances of Consensus at different times Cons1 : A proposes { }, B and C propose { } Cons2 : all replicas propose { , }, and decide on { , } If Cons1 decides on { }, then replicas deliver Piotr Zieli´ nski

followed by

Latency-optimal fault-tolerant replication

.

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Chandra-Toueg algorithm Replica A B C

1:{ } 2:{ , } 1:{ } 2:{ , } 1:{ } 2:{ , }

A proposes B proposes C proposes

all decide all deliver

Cons1

Cons2

{ } { } { }

{ , } { , } { , }

{ }

{ , }

{ , }={ , }

Comments

message to A is delayed replicas start instances of Consensus at different times Cons1 : A proposes { }, B and C propose { } Cons2 : all replicas propose { , }, and decide on { , } If Cons1 decides on { }, then replicas deliver Piotr Zieli´ nski

followed by

Latency-optimal fault-tolerant replication

.

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Latency 1

1 Consensus

A

A

B

B

C

C

Direct Broadcast 1 step

Consk

Chandra-Toueg 1 step + Consensus

Latency the number of communication steps from atomically broadcasting a message to its atomic delivery Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Latency 1

1

A

A

B

B

C

C

Direct Broadcast 1 step

2

Consk

Chandra-Toueg 3 steps

Latency the number of communication steps from atomically broadcasting a message to its atomic delivery Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Latency 1

1

A

A

B

B

C

C

Direct Broadcast 1 step

1

Ck

Chandra-Toueg 2 steps

Latency the number of communication steps from atomically broadcasting a message to its atomic delivery Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Latency of Consensus Two kinds of Consensus algorithms 1-3 Consensus: 1 step if proposal the same, otherwise 3 steps 2-2 Consensus: 2 steps decision in all cases

{ } A B C

{ } { } { }

{ } A

{ }

B

{ }

C

{ }

D

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

D

1 step Consensus 2 steps Atomic Broadcast

2 steps Consensus 3 steps Atomic Broadcast

(if replicas receive client messages in the same order) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Latency of Consensus Two kinds of Consensus algorithms 1-3 Consensus: 1 step if proposal the same, otherwise 3 steps 2-2 Consensus: 2 steps decision in all cases

{ } A B C

{ } { } { }

{ } A

{ }

B

{ }

C

{ }

D

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

D

3 steps Consensus 4 steps Atomic Broadcast

2 steps Consensus 3 steps Atomic Broadcast

(if replicas receive client messages in different orders) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

1-2 Consensus Properties 1 decides in one step if proposal are the same 2 decides in two steps otherwise

A B C

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

D

1 step Consensus 2 steps Atomic Broadcast (if replicas receive client messages in the same order) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

1-2 Consensus Properties 1 decides in one step if proposal are the same 2 decides in two steps otherwise

A B C

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

D

2 steps Consensus 3 steps Atomic Broadcast (if replicas receive client messages in different orders) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

1-2 Consensus

Instance 1

:A

:A

A

A

:B

:A

A

A

:C

:A

A

A

:D

:A

A

A

Instance 2

Instance L

Implementation We use three parallel Consensus instances: 1

instance 1 of 1-3 Consensus (to decide in one step)

2

instance 2 of 2-2 Consensus (to decide in two steps)

3

instance L of 1-3 Consensus (to agree on the leader) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

1-2 Consensus Code 1 2 3 4 5 6 7 8 9 10 11

when propose(x) by process p do broadcast(x, p) propose1 (x) propose2 (x, p) proposeL (leader ) task decide at process p is wait until decideL (leader ) wait until one of the conditions is true and decide on x condition 1: decide1 (x) and receive(x, leader ) condition 2: decide2 (x, leader ) condition 3: decide1 (x) and decide2 (y , q) with q 6= leader

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Condition 1 :A

:A

A

A

:B

:A

A

A

:C

:A

A

A

:D

:A

A

A

Instance 1

Instance 2

Instance L

Condition 1 decideL (leader ) and decide1 (x) and receive(x, leader ) x=

leader = A

decideL ( A ) and decide1 ( ) and receive( , A ) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Condition 2 :A

:A

A

A

:B

:A

A

A

:C

:A

A

A

:D

:A

A

A

Instance 1

Instance 2

Instance L

Condition 2 decideL (leader ) and decide2 (x, leader ) x=

leader = A

decideL ( A ) and decide2 ( , A ) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Condition 3 :B

:B

A

A

:C

:B

A

A

:D

:B

B

A

Instance 1

Instance 2

Instance L

Condition 3 decideL (leader ) and decide1 (x) and decide2 (y , q) with q 6= leader x=

y=

q= B

leader = A

decideL ( A ) and decide1 ( ) and decide2 ( , B ) with B 6= A Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

Chandra-Toueg algorithm Latency considerations

Related work Consensus Schiper [1997], Lamport [2001] Brasileiro et al. [2001] This work

Atomic Broadcast Chandra and Toueg [1996] Pedone and Schiper [1998] This work

identical proposals

different proposals

2 steps 1 step 1 step

2 steps 3 steps 2 steps

same order

different orders

3 steps 2 steps 2 steps

3 steps 4 steps 3 steps

The latencies of our algorithms are provably optimal. Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Generic Broadcast

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Generic Broadcast read x write to x r w r w r r w w

A B

Atomic Broadcast

r r w w r r w w

C

Observations ordering all messages is expensive (Atomic Broadcast) not all messages have to be ordered (Generic Broadcast)

r

r w w = r Piotr Zieli´ nski

r w w 6= r

r w w

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Generic Broadcast read x write to x r w r w r r w w

A B

Generic Broadcast

r r w w r r w w

C

Observations ordering all messages is expensive (Atomic Broadcast) not all messages have to be ordered (Generic Broadcast)

r

r w w = r Piotr Zieli´ nski

r w w 6= r

r w w

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Generic Broadcast r w

w r

Meta-solution 1

Define the conflict relation“ ”. Only conflicting messages must be delivered in the same order.

2

Determine the partial order “ of conflicting messages.

3

Deliver messages in any total order consistent with “ ”.

r w

w r

r

r w w

r

r w w Piotr Zieli´ nski

Latency-optimal fault-tolerant replication



Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1:

m5

m1

order 2: Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1:

m5

m1

order 2: Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1:

m5

m1

order 2: Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1:

m5

m1

order 2: Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1:

m5

m1

order 2: Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2

m5

m1

order 2: Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m3

m1

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2:

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3 ,m4

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3 ,m4

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3 ,m4 ,m5

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3 ,m4 ,m5 ,m6

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Example m4

m5

m3

m4

m6

m2

m1

m3

m6

m2

order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6

m5

m1

order 2: m1 ,m2 ,m3 ,m4 ,m5 ,m6

Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Problems m1

m2 1

use a separate Consensus instance for each pair of messages, all executed in parallel

m2 m3

2

Cycles if no failures, the leader dictates the order, no cycles if failures occur, a (slow) cycle-resolution algorithm employed

m1 m1

Different processes perceive different orders

m2 m3 m4 m5 m6

3

The graph contains all possible messages infinitely many parallel instances of Consensus most of them identical, only finitely many different implementable with finite resources

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Determining the order of messages When a message

in sent

Replicas propose that precede Consensus decides that

should

1



Done in parallel for all possible conflicting with Message

1 or 2

A

is delivered

B C

Latency Two steps if all conflicting messages arrive at the replicas in the same order, and three otherwise.

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

?

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Related work

Generic Broadcast

same order

Chandra and Toueg [1996] Pedone and Schiper [1998] Pedone and Schiper [1999] Aguilera et al. [2000] This work

3 2 4 4 2

steps steps steps steps steps

no conflicts 3 4 2 2 2

steps steps steps steps steps

no failures 3 4 4 4 3

The latency of our algorithm is provably optimal.

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

steps steps steps steps steps

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Summary of the talk

A B C

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

A B C

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

New algorithms

D

D

A

1

1-2 Consensus

2

Fast Atomic Broadcast

3

Fast Generic Broadcast

Cons1

B C

propose { } decide { }

Remarks messages delivered in 2 or 3 steps

r w

w

provably optimal

r

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Infinitely many instances with finite resources

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

A single instance

A

state of A : state of B : state of C :

B

state D :

C D

Comments A and B propose

, and C proposes

D only collects information, does not propose

Consider the state of D as more and more messages arrive

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

A single instance

A

state of A : state of B : state of C :

B

state D :

C D

Comments A and B propose

, and C proposes

D only collects information, does not propose

Consider the state of D as more and more messages arrive

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

A single instance

A

state of A : state of B : state of C :

B

state D :

C D

Comments A and B propose

, and C proposes

D only collects information, does not propose

Consider the state of D as more and more messages arrive

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

A single instance

A

state of A : state of B : state of C :

B

state D :

C D

Comments A and B propose

, and C proposes

D only collects information, does not propose

Consider the state of D as more and more messages arrive

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Many instances at the same time A proposes B proposes C proposes

in 1–8 in 1–4 in 1–12

in 9–15 in 5–15 in 13–15

A:

A

B:

B

C:

C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

D D:

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Many instances at the same time A proposes B proposes C proposes

in 1–8 in 1–4 in 1–12

in 9–15 in 5–15 in 13–15

A:

A

B:

B

C:

C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

D D:

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Many instances at the same time A proposes B proposes C proposes

in 1–8 in 1–4 in 1–12

in 9–15 in 5–15 in 13–15

A:

A

B:

B

C:

C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

D D:

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Many instances at the same time A proposes B proposes C proposes

in 1–8 in 1–4 in 1–12

in 9–15 in 5–15 in 13–15

A:

A

B:

B

C:

C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

D D:

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Grouping Before

A:

Instances I1 , . . . , I15 executed independently.

B: C: 0

9

15

A:

After Instances Ik with the same state are simulated by the same interval instance Ia,b = { Ik : a < k ≤ b }

0

3

I0,9

I9,15

B: 0

I0,3

I3,15

12

15

C:

I0,12 Piotr Zieli´ nski

15

Latency-optimal fault-tolerant replication

I12,15

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: A: B: C: 0

A B

15

D:

C

I0,15

D

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

A: B: C: 0

A B

9

15

D:

C

I0,9

D

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

I9,15

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

2

pass the message to the appropriate instances

A: B: C: 0

A B

9

15

D:

C

I0,9

D

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

I9,15

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

2

pass the message to the appropriate instances

A: B: C: 0

A B

3

9

15

D:

C

I0,3

D

Piotr Zieli´ nski

I3,9

Latency-optimal fault-tolerant replication

I9,15

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

2

pass the message to the appropriate instances

A: B: C: 0

A B

3

9

15

D:

C

I0,3

D

Piotr Zieli´ nski

I3,9

Latency-optimal fault-tolerant replication

I9,15

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

2

pass the message to the appropriate instances

A: B: C: 0

A B

3

9

12

15

D:

C

I0,3

D

Piotr Zieli´ nski

I3,9

I9,12 I12,15

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

2

pass the message to the appropriate instances

A: B: C: 0

A B

3

9

12

15

D:

C

I0,3

D

Piotr Zieli´ nski

I3,9

I9,12 I12,15

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

How it works Algorithm When a new message arrives: 1

split some instances, cloning the state

2

pass the message to the appropriate instances

A: B: C:

Remarks finitely many (4) actual instances created dynamically generalizations:

0

3

9

12

15

D:

replacing 15 with ∞ integers with reals intervals with other sets other algorithms Piotr Zieli´ nski

I0,3

I3,9

I9,12 I12,15

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

Summary of the talk

A B C

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

A B C

D

{ }

{ }

{ }

{ }

{ }

{ }

{ }

{ }

New algorithms 1

1-2 Consensus

2

Fast Atomic Broadcast

3

Fast Generic Broadcast

4

Infinitely many parallel instances

D

A

Cons1

B C

propose { } decide { } r

Remarks

w

w r 0

3

9

12

messages delivered in 2 or 3 steps

15

provably optimal I0,3

I3,9

I9,12 I12,15

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast Generic Broadcast

General approach Infinitely many instances with finite resources

References M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier, and S. Toueg. Thrifty generic broadcast. Lecture Notes in Computer Science, 1914:268–282, 2000. F. Brasileiro, F. Greve, A. Mostefaoui, and M. Raynal. Consensus in one communication step. Lecture Notes in Computer Science, 2127:42–50, 2001. Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, 1996. Leslie Lamport. Paxos made simple. ACM SIGACT News, 32(4):18–25, December 2001. Fernando Pedone and Andr´ e Schiper. Optimistic atomic broadcast. In Proceedings of the 12th International Symposium on Distributed Computing, September 1998. Fernando Pedone and Andr´ e Schiper. Generic broadcast. In Proceedings of the Thirteenth International Symposium on Distributed Computing (DISC’99, formerly WDAG), 1999. Andr´ e Schiper. Early Consensus in an asynchronous system with a weak failure detector. Distributed Computing, 10(3):149–157, April 1997.

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Atomic Broadcast

Validity For any message m, every learner delivers m at most once, and only if m was broadcast by a proposer. Agreement If some learner delivers message m0 after message m, then every learner delivers m0 only after it has delivered m. Termination If a correct proposer broadcasts a message, Validity then all correct learners will eventually deliver it. Termination If a learner delivers a message, Agreement then all correct learners will eventually deliver it.

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Sequencer

Sequencer-based algorithm clients broadcast messages to the main replica the main replica assigns sequence numbers to them and broadcasts the to other replicas

A

replicas deliver messages in order

C

k

B

if the main replica fails, another takes over

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

k k k

Sequencer ensures the same order Example

A

1

1

2

2

B C

2

1

main replica A assigns 1 to

1

2

and 2 to

replicas A and C deliver messages

and

straight away

replica B waits with delivering until it has delivered all replicas deliver and in the same order Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

When the sequencer fails Examples

A

1

B C

1

2

2

1

2

1

2

Case 1: no failures A assigns 1 to

all replicas deliver

, and 2 to followed by

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

When the sequencer fails Examples

A

1

B C

1 1

2

1

2 2

Case 2: Sequencer fails, no message loss A assigns 1 to

, and fails

B takes over and assigns 2 to

all replicas deliver

before (possibly) Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

When the sequencer fails Examples

A

1

1 1

B

1 1

C

Case 3: Sequencer fails, message loss occurs A crashes and its messages to the others are lost B does not know about A delivers

, it assigns 1 to

, replicas B and C deliver Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

When the sequencer fails Examples

A

1

1 1

B

1 1

C

Case 4: Sequencer is just very slow A is correct but very slow B thinks A crashed, it assigns 1 to A delivers

first, replicas B and C deliver Piotr Zieli´ nski

first

Latency-optimal fault-tolerant replication

Sequencer-based algorithm (3) Sequencer-based Algorithm 1 2 3 4 5 6 7 8 9 10

when a client executes abcast(m) do send m to the main replica a1 task sequencer at the main replica is for k = 1, 2, . . . do wait for a message m broadcast (m, k) to all replicas

{ including itself }

task delivery at any replica is for k = 1, 2, . . . do wait for message (m, k) abdeliver (m)

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Chandra-Toueg Chandra-Toueg 1 2 3 4 5 6 7 8 9 10

when a client atomically broadcasts m do broadcast m to all replicas task proposing at every replica is for k = 1, 2, . . . do wait for some message mk propose Mk = {m1 , . . . , mk } to Consensus instance k task delivery at every replica is for k = 1, 2, . . . do wait until Consensus instance k decides on some Mk atomically deliver all undelivered messages in Mk in order

Piotr Zieli´ nski

Latency-optimal fault-tolerant replication

Latency-optimal fault-tolerant replication

May 24, 2005 - less than a half/third of the servers can crash unreliable leader oracle Ω. Piotr Zielinski. Latency-optimal fault-tolerant replication ...

601KB Sizes 1 Downloads 247 Views

Recommend Documents

DNA Replication - Paper Clip Model.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. DNA Replication ...

2.7 DNA Replication, Transcription, and Translation.pdf
2.7 DNA replication, transcription and translation. Essential Idea: Genetic information in DNA can be accurately copied and can be translated to make the proteins needed by the cell. The image shows an electron micrograph of a Polysome,. i.e. multipl

eukaryotic dna replication pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. eukaryotic dna replication pdf. eukaryotic dna replication pdf. Open. Extract.

Content Replication in Mobile Networks
Index Terms—Content replication, mobile networks, node cooperation, distributed ..... range, the node degree likely has a binomial distribution with parameters (V − 1) and p ..... computer networks and large-scale distributed sys- tems. Claudio .

Latency-optimal fault-tolerant replication
Feb 1, 2006 - client → server: “book room 5”. 2 server → client: “room booked” client server book room 5 ..... Define the conflict relation“. ”. Only conflicting ...

Database-Replication-Synthesis-Lectures-On-Data-Management.pdf
Page 3 of 4. Database-Replication-Synthesis-Lectures-On-Data-Management.pdf. Database-Replication-Synthesis-Lectures-On-Data-Management.pdf. Open.

Latency-optimal fault-tolerant replication
Jan 24, 2006 - System assumptions processes communicate by sending/receiving messages no time bounds for messages, no clocks processes can fail by crashing, no malicious faults less than a half/third of the servers can crash unreliable leader oracle

DNA replication transcritption translation worksheet.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. DNA replication ...

Replication files for “Natural amenities, neighborhood ...
Feb 28, 2017 - Appendix B for details about the construction of the data set and which sources were used for each ... Mathe- matica. Mathematica program to run Monte carlo sim- ulations in online Ap- pendix C. tract xwalk files/ * csv. Normalize hist

Ethics in Speech Events: A Replication and Extension
sities" lists four rules in Article II dealing with Competitor Prac- .... We may consider forensic events as educational activities, ...... lab or to applied technology.

pdf-2178\zero-downtime-database-upgrade-active-active-replication ...
... DATABASE UPGRADE &. ACTIVE ACTIVE REPLICATION USING. ORACLE GOLDENGATE 11G RELEASE 2 BY. KASHIF ASLAM. DOWNLOAD EBOOK : ZERO DOWNTIME DATABASE UPGRADE & ACTIVE. ACTIVE REPLICATION USING ORACLE GOLDENGATE 11G RELEASE 2 BY. KASHIF ASLAM PDF. Page 1

pro sql server 2008 replication pdf download
pro sql server 2008 replication pdf download. pro sql server 2008 replication pdf download. Open. Extract. Open with. Sign In. Main menu. Displaying pro sql ...

A Light-weight Data Replication for Cloud Data ... -
In general, the Cloud Computing provides the software and hardware infrastructure as .... node determines the mapping of blocks to data nodes. B. Cloud Data ...

Fabrication and Replication of Polymer Integrated ...
Fabrication and Replication of Polymer Integrated Optical Devices Using Electron-Beam. Lithography and ..... Numerical fitting of experimental data gives the ...

Component Replication in Distributed Systems: a Case ...
checked remote invocations and standard ways of using commonly required services ... persistence, transactions, security and so forth and a developer's task is ...

Soft lithography replication of polymeric microring ...
Y. Xia, J. A. Rogers, K. E. Paul, and G. M. Whitesides, “Unconventional ... past two decades to meet the demands of high-speed telecommunications and large-.

A Review and Implementation of Option Replication in ...
Dec 9, 2002 - The problem of option pricing and replication in the presence of transaction costs is considered in this report. ... 2.3 Standard deviation of P&L as a function of strike Price . . . . . 11. 3.1 Comparison of ..... The HJB equation is o

Causal non-locality can arise from constrained replication
S. E·da = ∫. V ρ dV/ϵ0. Whereas the first form is purely defined locally, the second form equates non- local quantities obtained by integrating over a non-local.