Automatic Verification and Discovery of Byzantine ...

Viewer
Transcript

Automatic Verification and Discovery of Byzantine Consensus Protocols Piotr Zieli´nski Cavendish Laboratory, University of Cambridge, UK [email protected]

Abstract Model-checking of asynchronous distributed protocols is challenging because of the large size of the state and solution spaces. This paper tackles this problem in the context of low-latency Byzantine Consensus protocols. It reduces the state space by focusing on the latency-determining first round only, ignoring the order of messages in this round, and distinguishing between state-modifying actions and state-preserving predicates. In addition, the monotonicity of the predicates and verified properties allows one to use a Tarski-style fixpoint algorithm, which results in an exponential verification speed-up. This model checker has been applied to scan the space of possible Consensus algorithms in order to discover new ones. The search automatically discovered not only many familiar patterns but also several interesting improvements to known algorithms. Due to its speed and reliability, automatic protocol design is an attractive paradigm, especially in the notoriously difficult Byzantine case.

1. Introduction In the Consensus problem, a fixed group of processes communicating through an asynchronous network cooperate to reach a common decision. Each of the processes proposes a value, say a number, and then they all try to agree on one of the proposals. Despite the apparent simplicity, Consensus is universal: it can be used to implement any sequential object in a distributed and fault-tolerant way [11]. Consensus is surprisingly difficult to solve in a faulttolerant manner, so that even if some processes fail, the others will still reach an agreement. This is especially true in the presence of malicious participants; protocols operating in such settings are extremely subtle and complicated, and the proofs of their correctness are rather lengthy. Moreover, slight changes in the requirements often require a complete redesign of the algorithm. A more efficient approach is necessary. This paper proposes such an approach: automatic verifi-

cation and discovery through model-checking. In automatic verification, the user provides a collection of decision rules such as “if all four processes report to have proposed x, then decide on x”. These rules are given to the model-checker, which tests the correctness of the implied Consensus algorithm. In automatic discovery, the user specifies a set of latency conditions such as “if at most one non-leader process fails, then all correct processes must decide within two communication steps.” A model-checker is then used to check all possible Consensus algorithms that satisfy those conditions, and output the correct ones if any. The first challenge is choosing a good model: it should be general enough to express all “sensible” Consensus algorithms, but sufficiently specific to avoid an intractably large state space. Several such models have been used in the literature [26, 28], however, none of them allows for Byzantine algorithms, which benefit from automated verification most. Secondly, any sufficiently expressive model permits infinitely many correct Consensus algorithms, so we need a criterion for selecting the “best” ones. This paper focuses on the number of communication steps necessary to decide in typical runs. This criterion allows us to concentrate on the latency-determining first round only, thereby reducing the size of the state space. Nevertheless, we still need to ensure that processes will always have enough information to decide even if the first round fails. The formalization of the required properties is provided by the Optimistically Terminating Consensus abstraction [30]. This abstraction can handle malicious participants and is flexible enough to match the latencies of all known asynchronous Consensus protocols in a single framework. Depending on the chosen model, the search space can be sufficiently small for a limited number of verifications, however, performing it for millions of possible protocols requires a different approach. The method presented in this paper uses the monotonicity of Consensus properties. Instead of checking every possible state, the algorithm computes the “minimal possibly violating state” and checks whether that state actually violates the required properties. The resulting exponential speed-up makes automatic discovery practical.

Processes   P F M

Correctness

Honesty

Behaviour

correct faulty faulty

honest honest malicious

according to the specification according to the specification until it crashes completely arbitrary

Figure 1. Categories of processes Roadmap. Section 2 provides a short summary of the Optimistically Terminating Consensus framework (OTC), with an emphasis on the concepts relevant to automated verification. The OTC properties are used in Section 3 to construct an execution model and present an algorithm to verify the correctness of OTC protocols. Section 4 uses automated correctness testing to search the space of possible protocols to discover new ones. Section 5 presents the results. Section 6 concludes the paper.

1.1. Related work A large amount of work has been done on asynchronous Consensus protocols, see [23] for a survey. A number of algorithms have been proposed, both for the crash-stop model [4, 16, 19, 24] and for the Byzantine settings [3]. Automatic reasoning about protocols is common in security protocol research [2, 5, 17, 18, 20]. In the area of agreement protocols, Paxos [13] and its variants [7, 8, 15] seem to have undergone the most significant amount of formal analysis [12, 14, 22, 27, 28]. Other work on formal verification and/or model checking of Consensus algorithms can be found in [21, 26]. All those methods are restricted to crash failures and verification only. Bar-David and Taubenfeld [1] used a combination of model checking and program generation to automatically discover new mutual exclusion algorithms. Apart from their work, I am not aware of any previous attempt at automatic discovery of fault-tolerant distributed algorithms.

x

OTC propose(x)

OTC decide(x)

p1 p2 p3 p4

OTC

Figure 2. A run of Byzantine Consensus [3] Channels. Processes communicate through asynchronous reliable channels: messages sent from one correct process to another correct process will eventually be received (reliability) but the message delay is unbounded (asynchrony). These assumptions are sufficient to implement the first round of Consensus; to construct the entire protocol and ensure its liveness, failure detectors or eventual synchrony assumptions are needed, but these are protocol-independent and do not affect the latency in typical runs [30].

2.2. Consensus In Consensus, processes propose values and are expected to eventually agree on one of them. The following holds: Validity. The decision was proposed by some process. Agreement. No two processes decide differently. Termination. All correct processes eventually decide.

2. Introduction to OTC 2.1. System model Processes. This paper assumes a system consisting of a fixed number n of processes. In the set of all processes P = {p1 , . . . , pn }, some processes F ⊆ P are faulty, and some M ⊆ F are malicious (Figure 1). Processes do not know sets F and M , however, they do know the sets F and M of possible values of F and M . Note that the standard model of at most f faulty processes out of which at most m malicious is a special case: F = { F ⊆ P | |F | ≤ f },

M = { M ⊆ P | |M | ≤ m }.

In the Byzantine model, these requirements apply only to honest (non-malicious) processes. Since malicious processes can undetectably lie about their proposals, I also assume that Validity must be satisfied only if all processes are honest in a particular run. In the most popular approach to solve Consensus, a distinguished process, called the leader or coordinator, tries to impose its proposal on the others. This succeeds if sufficiently many processes accept the coordinator’s proposal. Otherwise, another process becomes the coordinator and repeats the protocol. Coordinators keep changing until one of them succeeds and makes all correct processes decide. As an example, Figure 2 shows a single round the Consensus protocol [3], in a four-process system with one pro-

Name

Type

Meaning

Definition in the OT (P, P, 1) example

propose(x) stop

action action

propose x stop processing

broadcast x if no stop before broadcast ⊥ if no propose before

decision(x) valid(x) possible(x)

predicate predicate predicate

if true, then x is the decision if true, then an honest process proposed x if any process ever decides on x, then true

x received from all 4 processes x received from more than 1 process non-x received from at most 1 process

Figure 3. Summary of the primitives provided by OTC Standard Validity

decision(x) at some honest process

F = M = {∅, {p1 }, {p2 }, {p3 }, {p4 }}

Optimistically Terminating Consensus (OTC) [30] is a formalization of the grey cloud in Figure 2. Various Consensus protocols can be constructed by changing the implementation of the OTC in one or more rounds, and keeping the rest of the algorithm intact. In fact, it is possible to match the latencies of all known asynchronous Consensus algorithms just by manipulating the first round OTC [30]. This is why this paper focuses on constructing the firstround OTC only. Treated as a black-box, OTC communicates with the environment with predicates and actions (Figure 3). Actions can change a process’ state (e.g., send messages) but they do not return any information. On the other hand, predicates do not affect the state but return information. In a sense, the difference between predicates and actions is similar to that between reads and writes. OTC-based Consensus algorithms [30] proceed in the same way as that in Figure 2. When a process receives the coordinator’s proposal x, it executes the OTC action propose(x). Each process monitors its decision(y) predicate for all y; when it becomes true, y is the decision. If the coordinator fails, some or all processes might never decide. When processes suspect this is the case, they execute action stop of the first OTC, and start the second round with another coordinator. Since some processes might have decided in the first round, the second coordinator must

ard A greem ent

lity ibi

2.3. Optimistically Terminating Consensus

Stand

ss Po

The coordinator p1 broadcasts its proposal. In the second step, processes rebroadcast the proposal received, to protect against a malicious coordinator broadcasting different proposals. In the third step, they broadcast again, this time to ensure the recoverability of the decision in case of failures, and decide. The first round can only succeed if p1 is correct. Therefore, if no decision has been made after a while, the next round is started with another coordinator, and so on, until all correct processes decide.

x proposed by some honest process

t x is men gree A ent man Per Permanent Validity

possible(x)

unique

Int eg rit y

cess possibly maliciously faulty:

valid(x)

Figure 4. OTC properties graphically choose its proposal with care. To this end, it monitors the first-round OTC predicates possible(x) and valid(x) for all x (Figure 3). It waits until either (i) possible(x) is false for all x, and proposes its original proposal, or (ii) possible(x) holds for exactly one x and valid(x) holds, then it proposes x. In a sense, valid(x) and possible(x) ensure recoverability of the decision in case of failures. Subsequent rounds proceed in a similar way; digital signatures are needed only if the first round does not decide [30].

2.4. OTC properties As summarized in Figure 3, OTC equips each process with two actions: propose(x) and stop, as well as three predicates: valid(x), possible(x), and decision(x). These primitives satisfy the following properties [30]: Integrity. If valid(x) holds at an honest process, then an honest process executed propose(x). Possibility. If decision(x) holds at an honest process, then possible(x) holds at all honest processes, at all times. Permanent Validity. Statement possible(x) ⇒ valid(x) holds at any complete process (see below). Permanent Agreement. Predicate possible(x) holds for at most one x at any complete process. Optimistic Termination (X, C, k). If all processes in the set X propose x, and all processes in the set C are correct, and none of them execute stop, then decision(x) will hold at all correct processes in k communication steps. The sets satisfy X ⊆ C ⊆ P = {p1 , . . . , pn }.

1 2 3 4 5 6

when a process executes propose(x) do broadcast hx : 1i when a process received hx : ii from all p ∈ XCi do broadcast hx : i + 1i predicate decision(x) holds iff received hx : ii from all p ∈ XCi and i = 1, . . . , k Figure 5. Pred. decision(x) for OT (X, C, k)

Properties Integrity and Possibility just formalize the definitions of valid(x) and possible(x) from Figure 3. In Permanent Validity and Permanent Agreement, an honest process p is complete if all correct processes executed stop, and p received all messages sent by those processes before or during executing stop1 . These properties ensure that the second and later coordinators will not block waiting to establish a safe proposal. Permanent Validity and Permanent Agreement are stronger than standard (Uniform) Validity and Agreement required by Consensus (Figure 4) [30]. Because of a more general failure model, our Optimistic Termination (OT) property is more general than that in [30]. It ensures that, under some favourable conditions described by sets X and C, OTC will decide in k steps. In favourable runs, in which decision is taken in the first round, the latency of Consensus is k+1: the latency k of OTC plus one step for the coordinator’s proposal to reach the processes (Figure 2). Example. Consider a system P = {p1 , p2 , p3 , p4 }, with at most one pi (maliciously) faulty, and an OTC algorithm, similar to that in Figure 2, which decides in one step if all processes are correct, and in two otherwise. Such an algorithm satisfies five OT conditions: (P \ {pi }, P \ {pi }, 2) for i = 1, 2, 3, 4, and (P, P, 1). Although each OT condition (X, C, k) is a requirement, it easily translates into an implementation of the predicate decision(x). For example, for the five above OT conditions, decision(x) must hold if (i) there is a three-process set C such that all processes in C report that all processes in C report proposing x, or (ii) all processes report proposing x. Figure 5 shows a simple algorithm defining decision(x) that satisfies a given OT (X, C, k) but not necessarily other OTC properties. There, XCi means X for i = 1, and C otherwise. The above OT conditions result in the following five message patterns:

1 Note that process p does not know which processes are correct, so it does not know whether it is complete or not.

Manual verification. This section shows an example of a manual verification of an OTC protocol satisfying a single OT condition (P, P, 1). This condition requires an OTC to decide if all processes are correct and propose the same value. General OTC constructions and proofs are in [30]. Predicate valid(x) ensures that at least one honest process proposed x. Since we assume at most one malicious process (and OTC properties must hold even if no decision is made), we can define valid(x) to hold when two or more processes report proposing x (Figure 3). Condition OT (P, P, 1) implies that decision(x) holds if x was received from all four processes. At most one process is malicious, so if more than one process reports proposing a non-x or executing stop, value x will never become a decision. Thus, possible(x) holds if at most one process reports proposing a non-x or executing stop. Having constructed all three predicates, we can now check Permanent Validity and Permanent Agreement. A complete process has received all messages from at least three processes. If possible(x) holds, then at most one is a non-x, so at least two are x, which implies valid(x) (Permanent Validity). Those two x messages are non-y for any y 6= x, so possible(y) cannot hold (Permanent Agreement).

3. Automated verification For any two predicates p(x) and q(x), let “p(x) ≤ q(x)” be defined as “p(x) =⇒ q(x)”, that is, as the standard arithmetic “≤” with TRUE=1 and FALSE=0. For example, predicate decision(x) is increasing in time (a decision cannot be unmade), whereas possible(x) is decreasing in time (a once impossible decision cannot become possible again). With this formalism, the manual verification process can be described as follows. First, use the Validity condition to define the maximum possible predicate valid(x). Then, use OT to define the minimum decision(x), and use it together with Possibility to get the minimum possible(x): Validity =⇒ max valid(x) Optimistic Termination =⇒ min decision(x) Possibility + min decision(x) =⇒ min possible(x) Having defined the predicates, we can check whether Permanent Validity and Permanent Agreement hold. Note that these two properties, viewed as Boolean formulae, are increasing wrt. valid(x) and decreasing wrt. possible(x). Therefore, if they do not hold for the maximum valid(x) and the minimum possible(x), they cannot hold at all. To save space, the rest of this paper is devoted to verifying Permanent Agreement, which generally implies Permanent Validity [30]. For testing Permanent Validity see [29].

1 2 3 4 5 6 7 8 9 10 11 12

state ← ∅

{ the empty set }

when a process executes propose(x) do incorporate hx : εi into the state { ε: empty seq. } when a process receives hx : q1 . . . qi−1 i from qi do incorporate hx : q1 . . . qi i into the state function incorporate hx : q1 . . . qi i into the state is if hy : q1 . . . qi i ∈ / state for any y then insert hx : q1 . . . qi i into state broadcast hx : q1 . . . qi i when a process executes stop do for all sequences q1 . . . qi do { including ε } incorporate h⊥ : q1 . . . qi i into the state Figure 6. Evolution of states

3.1. Events and states I employ a full-information approach, in which each process’ state represents its full knowledge about the system. When a process executes an action (propose(x) or stop) or receives a message, this event is added to the process’ state and broadcast to others. At any given time, the state of a process is a sequence of events experienced so far. Consider runs in which no stop is performed. I assume that, in such runs, the order of events does not matter, and the state of a process is actually a set of events, not a sequence. If the order of events at some process p mattered and p failed, then other processes could never learn about it, rendering OTC properties impossible to satisfy. If the order of events does not matter, then the only effect of experiencing them is adding them to the state set and informing other processes about them. For example, executing propose(x) broadcasts hx : εi, where ε is the empty sequence (Figure 6). When a process receives hx : εi from some process q1 , it adds hx : q1 i to its state and broadcasts hx : q1 i2 . In general, receiving hx : q1 . . . qi−1 i requires broadcasting hx : q1 . . . qi i. Event hx : q1 . . . qi i means “qi claims that qi−1 claims that . . . q1 claims to have proposed x”. I say “claims” because some of the processes are malicious and can lie. This algorithm (Figure 6) can be viewed as a more refined version of that in Figure 5. The test in line 7 deserves an explanation. No honest process proposes two different values, so if we receive hx : εi and hy : εi from the same process q1 , then we can ignore the latter. This means that honest processes will never send hx : q1 i with two different x’s; if we receive such, then the sender is malicious and its messages can be ignored. By induction, we need to pay attention only to the first message 2 In this paper, p is the name of i-th process in P = {p , . . . , p }, n 1 i and qi is the i-th process in a given sequence of processes.

of the form hx : q1 . . . qi i for any given sequence q1 . . . qi . The purpose of the stop action is to put the process into a final, unchangeable state [30]. Lines 10–12 accomplish this by filling all the “unused” sequences in the state with the special symbol “⊥”. After this operation, the test in line 7 will never succeed again.

3.2. State formalism A state is an arbitrary set of events, not necessarily received by the same process, and possibly conflicting (eg. {h1 : p1 p2 i, h2 : p1 p2 i}). Each event is of the form hx : αi where α = q1 . . . qi is a sequence of processes. By analogy, for any set A of sequences, hx : Ai denotes the state consisting of all events hx : αi with α ∈ A: def

hx : Ai = { hx : αi | α ∈ A }.

(1)

For example, h2 : {p1 p2 , p2 }i = {h2 : p1 p2 i, h2 : p2 i} The opposite operation, extracting the set of sequences corresponding to a given proposal x, can be accomplished using the following operator: def

S(x) = { α | hx : αi ∈ S }.

(2)

For example, if S = {h1 : p1 i, h2 : p2 i, h1 : p1 p2 i}, then S(1) = {p1 , p1 p2 } and S(2) = {p2 }.

3.3. Minimum decision(x) and possible(x) Figure 5 defined a predicate decision(x) for a single OT condition (X, C, k). Using the model from this section, the minimum decision(x) predicate holds at a process iff it experienced all the events hx : Di (see (1)), where D = { q1 . . . qi | q1 ∈ X, q2 , . . . , qi ∈ C, 1 ≤ i ≤ k }. (3) For example, for OT h{p1 }, {p1 p2 }, 3i, we have D = {p1 , p1 p1 , p1 p2 , p1 p1 p1 , p1 p1 p2 , p1 p2 p1 , p1 p2 p2 }. Denoting the set of D’s corresponding all required OTs by D, the minimum predicate decision(x) in state S is decision(x)

def

⇔

S(x) ⊇ D for some D ∈ D.

(4)

The minimum possible(x) holds if the possibility of having decision(x) somewhere in the system is consistent with the process’ knowledge. In other words, possible(x) means that we can add hx : Di to the process state, and still get a state that is consistent. I will now present a formalism that allows us to express this requirement in a formal way.

3.4. State consistency The notion of state consistency is based on two concepts: event conflict and event inference.

Conflict. Events hx : αi and hy : βi conflict if they have different proposal values and the same sequence of processes (x 6= y and α = β). For example, h1 : p1 p3 i and h2 : p1 p3 i conflict, whereas h1 : p1 p3 i and h2 : p1 p2 i do not. Only malicious processes produce conflicting events. For any state S, let conflict(S) to be the set of sequences α, for which some events hz : αi ∈ S conflict: def

conflict(S) = { α | ∃ x 6= y : hx : αi ∈ S ∧ hy : αi ∈ S }.

Consistency. A state is consistent only if all conflicting inferred events come from malicious processes: conflict(infer (S, M )) ⊆ αM.

(6)

Since hx : εi ≡ propose(x), the reason for ε ∈ αM is that different processes can propose different values.

3.5. Verifying Permanent Agreement

For example, conflict({h1 : p1 p2 i, h2 : p2 i, h2 : p1 p2 i}) = {p1 p2 }. Prefixes and inference. I will define an operator infer (S, M ), which takes a state S and a set of malicious processes M . It outputs the set of events whose occurrence can be inferred from S. The basic idea is that if an honest process claims that event e has occurred, then it has indeed occurred. For example, if qi ∈ / M , then hx : q1 . . . qi i implies hx : q1 . . . qi−1 i. If in addition qi−1 ∈ / M , then hx : q1 . . . qi−2 i, and so on. In general, for a singleton state S = {hx : q1 . . . qi i}: infer (S, M ) = hx : prefs(q1 . . . qi , M )i. Here, the operator prefs(q1 . . . qi , M ) produces a set of sequences that can be obtained by removing a sequence of honest processes from the end of q1 . . . qi : def

prefs(q1 . . . qi , M ) = { q1 . . . qj | qj+1 , . . . , qi ∈ / M }. The definition of pref s can be extended to sets of sequences A in the obvious way: [ def prefs(A, M ) = prefs(α, M ). α∈A

Since α ∈ prefs(α, M ), we have A ⊆ prefs(A, M ). ˆ where (2) We can define infer (S, M ) = S, ˆ S(x) = prefs(S(x), M ) for any x 6= ⊥.

(5)

The case x = ⊥ requires a special treatment. The state-propagation algorithm in Figure 6 shows that the event h⊥ : q1 . . . qi i can occur for two reasons: either because of process qi claiming that the event h⊥ : q1 . . . qi−1 i occurred at qi or because of the stop action. Since the latter reason is always a possibility, the occurrence of h⊥ : q1 . . . qi i does not imply the occurrence of h⊥ : q1 . . . qi−1 i anywhere in the system, even if qi is honest. Nothing can be inferred ˆ = S(⊥), which completes (5). here, so S(⊥) Special sets. For any set Q of processes, define αQ = { q1 . . . qi | qi ∈ Q } ∪ {ε}, / Q }. αQ = { q1 . . . qi | qi ∈ Set αQ contains all process sequences ending with an process in Q, set αQ contains all others.

Permanent Agreement requires that, for any complete state, predicate possible(z) holds for at most one z. This section presents an algorithm that checks whether a given OTC algorithm violates this property. It searches for a complete state in which possible(z) holds for two different z ∈ {x, y}. More precisely, we are looking for sets F ∈ F and M ∈ M, with M ⊆ F , and a state S such that A1: State S can occur, that is, conflicting events come only from malicious processes (6): conflict(infer (S, M )) ⊆ αM. A2: State S is complete, that is, the process received all the events produced by all correct processes (∈ / F) in lines 10–12 and before: [ αF ⊆ S(x). all x including ⊥

In practice, I only consider sequences q1 . . . qi no longer than the highest k in any required OT condition (X, C, k). A3: Predicate possible(z) holds for two different z ∈ {x, y}. In other words, for each z ∈ {x, y}, decision events hz : Dz i (3) are all consistent with state S. Formally, there is Dz ∈ D (4) and a set of malicious processes Mz ∈ M such that the combined state S ∪ hz : Dz i is still consistent (6): conflict(infer (S ∪ hz : Dz i, Mz )) ⊆ αMz Note that sets Mz (Mx and My ) are only possible sets of malicious processes, and can differ from the real M , which is unknown to the processes. Without loss of generality, assume that the state S consists only of events of the form hx : αi, hy : αi, and h⊥ : αi. This is because all events hu : αi ∈ S with u ∈ / {x, y, ⊥} can be replaced by h⊥ : αi without invalidating any of A123 (replacing u with ⊥ means we can infer less (5)). For this reason, I assume S = hx : Sx i ∪ hy : Sy i ∪ h⊥ : S⊥ i, for some pairwise disjoint sets of sequences Sx , Sy , S⊥ (1). Given this assumption, Properties A can be rewritten as

A1: State S is consistent:

1

prefs(Sx , M ) ∩ prefs(Sy , M ) ⊆ αM

(a)

prefs(Sx , M ) ∩ S⊥ ⊆ αM

3 4

prefs(Sy , M ) ∩ S⊥ ⊆ αM

5

A ⊆ prefs(A, M ) for all sets of sequences A, so the first inequality implies prefs(Sx , M ) ∩ Sy ⊆ αM and prefs(Sy , M ) ∩ Sx ⊆ αM . Therefore, we can rewrite the last two inequalities as: prefs(Sx , M ) ∩ (Sy ∪ S⊥ ) ⊆ αM,

(b)

prefs(Sy , M ) ∩ (Sx ∪ S⊥ ) ⊆ αM.

(c)

This transformation is needed for (7) below. A2: State S is complete: αF ⊆ Sx ∪ Sy ∪ S⊥ . A3: Predicate possible(z) holds for z ∈ {x, y}. Defindef def ing x ¯ = y and y¯ = x, and using the same transformations as in Property A1, we get: prefs(Sx , Mz ) ∩ prefs(Sy , Mz ) ⊆ αMz

(a)

prefs(Sx , Mz ) ∩ (Sy ∪ S⊥ ) ⊆ αMz

(b)

prefs(Sy , Mz ) ∩ (Sx ∪ S⊥ ) ⊆ αMz

(c)

prefs(Dz , Mz ) ∩ prefs(Sz¯, Mz ) ⊆ αMz

(d)

prefs(Dz , Mz ) ∩ (Sz¯ ∪ S⊥ ) ⊆ αMz .

(e)

Property A2 is increasing with respect to Sx , Sy , S⊥ ; all the other properties are decreasing. For this reason, we can assume that αF = Sx ∪ Sy ∪ S⊥ , which makes Property A2 automatically satisfied. Then, for any set A: A ∩ (Sz¯ ∪ S⊥ ) ⊆ αM

⇔

A ∩ αF ∩ Sz ∩ αM = ∅

⇔

A ∩ αF ∩ αM ⊆ Sz . Thus, Properties A1(bc) and A3(bce) can be rewritten as prefs(Sz , M ) ∩ αF ∩ αM ⊆ Sz , prefs(Sz , Mx ) ∩ αF ∩ αMx ⊆ Sz , prefs(Sz , My ) ∩ αF ∩ αMy ⊆ Sz ,

(7)

prefs(Dz , Mz ) ∩ αF ∩ αMz ⊆ Sz . The left-hand side of each of these inequalities is an increasing function of Sz . As a result, we can compute the smallest set Sz that satisfies these inequalities using Tarski’s least fixed point algorithm (see below). Then, it is sufficient to check Properties A1(a) and A3(ad) for the computed Sz (Sx and Sy ), that is, whether prefs(Sx , M ) ∩ prefs(Sy , M ) ⊆ αM, prefs(Sx , Mx ) ∩ prefs(Sy , Mx ) ⊆ αMx , prefs(Sx , My ) ∩ prefs(Sy , My ) ⊆ αMy , prefs(Dz , Mz ) ∩ prefs(Sz¯, Mz ) ⊆ αMz .

2

(8)

6 7 8

function PermanentAgreement(OTs) is for all Dx , Dy corresponding to OTs (3) do for all F ∈ F and M, Mx , My ∈ M do if M ⊆ F then compute the least fixpoints Sx and Sy of (7) if computed Sx and Sy satisfy (8) then return FALSE return TRUE Figure 7. Testing Permanent Agreement

If this is the case, then we have found a state S for which Permanent Agreement does not hold. If not, then the above statement will be false for all supersets of Sx and Sy because function pref s is increasing. Testing all possible (Dx , Dy , F, M, Mx , My ) can ensure that Permanent Agreement is never violated (Figure 7). Computing Sz as as the least fixpoint. can be rewritten as φ(Sz ) ⊆ Sz , where φ(Sz )

Inequalities (7)

= prefs(Dx , Mx ) ∩ αF ∩ αMx

∪

prefs(Sz , M ) ∩ αF ∩ αM

∪

prefs(Sz , Mx ) ∩ αF ∩ αMx

∪

prefs(Sz , My ) ∩ αF ∩ αMy . Function φ is increasing, which allows us to use Tarski’s method [25] to find the smallest Sz such that φ(Sz ) ⊆ Sz . This method constructs an increasing sequence Sz0 ⊆ Sz1 ⊆ · · · defined as Sz0 = ∅ and Szi+1 = φ(Szi ). The first Szi = Szi+1 = φ(Szi ) encountered is the least fixpoint of φ. In the sequence Sz0 ⊂ · · · ⊂ Szi , each set has at least one element more than its predecessor, so the number i of iterations does not exceed the maximum size of Sz , that is the number of possible sequences q1 . . . qi with i ≤ k. This number, 1 + · · · + nk , is much smaller than the number of all states S = hx : Sx i∪hy : Sy i∪h⊥ : S⊥ i, which is in the ork der of 31+···+n . Therefore, exploiting monotonicity results in an exponential speed-up of the search process.

4. Discovering new protocols Section 3 showed how we can construct an OTC algorithm satisfying given OT conditions, and test its correctness. This section goes a step further and attempts to discover new algorithms by generating possible sets T of OT conditions and testing whether they can be satisfied. The search starts with the empty T and recursively adds new OT conditions, while testing for correctness. Note that once we reach an incorrect T , we can safely backtrack because

OTC propose(x)

function OTCSearch(T ) is if PermValidity(T ) and PermAgreement(T ) then output T for all possible OT conditions T = (X, C, k) do if T is greater (“>OT ”) than all elements of T and T does not dominate any element of T and T is not dominated by any element of T then OTCSearch(T ∪ {T })

1 2 3 4 5 6 7 8

OTC decide(x)

p1 p2 p3 p4

OTC

Figure 9. Direct first-round OTC

Figure 8. Discovering new OTC protocols adding new OT conditions to an incorrect OTC algorithm T cannot produce a correct one. The details are shown in Figure 8. Lines 5–7 implement two optimization techniques described below: a linear order and a domination relation. OT order. The order of the OT conditions in T does not matter. Therefore, adding OTs to T in a specific order will result in the same set of conditions being analyzed several times; an n-element set can be obtained in n! different orders, slowing the algorithm down exponentially. To ensure that each set T is analysed only once, Line 5 guarantees that elements are added to T in some arbitrary but fixed total order “
Not all correct OTC algorithms are listed; I omit those that can be obtained from others by permuting the set of processes, and those that are dominated by others. (A set T is dominated by T 0 iff every OT condition in T is dominated by some in T 0 .) All considered OT (X, C, k) have k ≤ 3. In a normal round, processes wait for the coordinator’s proposal to propose to OTC (Figure 2). However, in the first round, processes can propose their proposals to the OTC directly (Figure 9) [30]. This incorporates the first step into the OTC, allowing for multi-coordinator algorithms and justifying the distinction between X and C in OT (X, C, k). This also implies that, as in the Validity property of Consensus, the valid(x) predicate of the first-round OTC can assume all processes to be honest. Therefore, as far as the first round is concerned, valid(x) becomes true once a process receives one event hx : εi, making Permanent Validity easily satisfied. All OTCs presented in section are firstround OTCs to which processes propose directly; complete Consensus algorithms can be constructed as shown in [30]. Crash-stop 3 processes with 1 failure. We have:

If this is the case, then the OTC algorithms corresponding to the sets T1 = {T1 } and T12 = {T1 , T2 } are the same. Lines 6–7 prevent us from analyzing sets T that contain a pair of conditions such that one dominates the other.

5. Results I have implemented the algorithms from Figures 7 and 8 in C, and then verified the computed OTC algorithms using an independent Python implementation of the algorithm in Figure 7. This section presents computed algorithms for several choices of P , F, and M. While the verification is instantaneous, the search for new algorithms takes time: n

failures

algs tested

found

time

3 4 5 5

1 crash-stop 1 crash-stop 1 crash-stop 2 crash-stop

360 8,512 341,312 32,620,109

1 2 3 6

0.03 sec 0.33 sec 0.83 sec 61.52 sec

4 5

1 malicious 1 malicious

47,990 11.9 billion

7 6

0.41 sec 39.4 hours

F = {∅, {p1 }, {p2 }, {p3 }},

M = {∅}.

The generated OTC algorithms correspond to the following two sets of OT conditions (X, C, k):        h{p1 , p2 }, {p1 , p2 }, 1i   h{p1 , p2 }, {p1 , p2 }, 1i  h{p1 }, {p1 , p2 }, 2i and h{p1 , p3 }, {p1 , p3 }, 2i ,         h{p1 }, {p1 , p3 }, 2i h{p2 , p3 }, {p2 , p3 }, 2i which can be depicted as (Section 2.4): and As many standard algorithms [10], the first one decides in two steps if the leader p1 and a majority of processes are correct. Additionally, it decides in one step if p1 and p2 propose the same value. Such an algorithm was presented in [9] for general n. The second algorithm decides if two processes proposed the same value; in one step if the processes are p1 and p2 , and in two steps otherwise. If all processes are correct,

but propose different values, this algorithm will not decide. Next sections exclude such algorithms by requiring at least one OT (X, C, k) to have X = {p1 }.

Byzantine 5 processes with 1 failure. This case generated six variants of the following:

Crash-stop 4 processes with 1 failure. In this case, the following two OTCs have been generated: and Both algorithms decide in two steps if the leader p1 and at least one other process are correct. The first decides in one step if p1 and p2 propose the same value, the second if at least three processes including p1 propose the same value. Note that this protocol can sometimes decide if two processes are faulty, even though we assume at most one failure. In practice, this means that, in some situations, the algorithm might not need to wait for response from some slow, but formally correct, processes. Crash-stop 5 processes with 2 failures. In this case, the following two OTCs have been generated:

Both decide in two steps if a majority of processes, including the leader, are correct. The first additionally decides in one step if p1 , p2 , p3 propose the same value; the other, if at least four processes, including p1 propose the same value. Byzantine 4 processes with 1 failure. The algorithm by Castro and Liskov [3] decides in three steps if the leader p1 is correct and at most one other process is faulty. Later papers observed [6, 30] that if all processes are correct, then the decision can be made in two steps:

In this case, seven OTCs have been generated, two of which are extensions of the above protocol:

In addition to the original properties, these two algorithms can also decide in two steps in some runs with one process faulty and two processes proposing the same value.

The first three OT conditions correspond to the 4-process algorithm [3]. They decide if the leader p1 is correct, and at most one of p1 , ..., p4 is faulty; p5 is completely ignored. The next four conditions decide in two steps if at most one process is faulty (not p1 ). Finally, the algorithm can sometimes decide if two processes propose the same and at most two are faulty or slow.

6. Conclusion This paper presented a method for automatic verification and discovery of low-latency Consensus protocols through model checking. The main challenge here is the enormous size of the state and solution spaces. My method reduces the state space by focusing on the latency-determining first round only, ignoring the order of messages in this round, and distinguishing between state-modifying actions and state-preserving predicates. In addition, monotonicity of the predicates and verified properties allows one to use a Tarski-style fixpoint algorithm, which results in an exponential verification speed-up. While no ground-breaking protocols have been discovered, the method described here generated interesting improvements to the existing algorithms. In particular, several combinations of fast one-step protocols and resilient leader-based algorithms have been discovered. Because of the number of cases involved, such composite algorithms are difficult to design and check manually. I believe that automated protocol design is an interesting paradigm that can be successfully used together with the traditional manual process. From the practical point of view, it allows one to quickly analyze intricate interplays as well as discover subtle errors and improvements to existing protocols. This is especially useful in custom failure models with malicious participants. From the theoretical point of view, generated protocols as well as counterexample states can provide useful insights for general n-process protocols and lower bounds. Security protocol research community recognized some time ago that automated correctness testing is ideal for short algorithms that are difficult to get right. It seems to me that distributed agreement protocols would greatly benefit from a similar treatment. Atomic broadcast, atomic commitment, etc., especially in the crash-recovery model, are interesting problems to tackle in this way.

References [1] Y. Bar-David and G. Taubenfeld. Automatic discovery of Mutual Exclusion algorithms. In Proc. of the 17th Int. Symposium on Distributed Computing, 2003. [2] M. Burrows, M. Abadi, and R. Needham. A logic of authentication. ACM Transactions on Computer Systems, 8(1):18–36, 1990. [3] M. Castro and B. Liskov. Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, pages 173–186, New Orleans, Louisiana, Feb. 1999.

[16] L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169, 1998. [17] G. Lowe. Breaking and fixing the Needham-Schroeder public-key protocol using FDR. In Proc. of the 2nd Int. Workshop on Tools and Algorithms for Construction and Analysis of Systems, pp. 147–166, UK, 1996. [18] J. C. Mitchell, M. Mitchell, and U. Stern. Automated analysis of cryptographic protocols using Murφ. In Proc. of the 1997 Symposium on Security and Privacy, pages 141–153, Washington, DC, USA, 1997.

[4] T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving Consensus. Journal of the ACM, 43(4):685–722, 1996.

[19] A. Most´efaoui and M. Raynal. Solving Consensus using Chandra-Toueg’s unreliable failure detectors: A general quorum-based approach. In Proceedings of the 13th International Symposium on Distributed Computing, pages 49–63, London, UK, 1999.

[5] E. M. Clarke, S. Jha, and W. Marrero. Verifying security protocols with Brutus. ACM Trans. on Software Engineering and Methodology, 9(4):443–487, 2000.

[20] L. C. Paulson. The inductive approach to verifying cryptographic protocols. Journal of Computer Security, 6:85–128, 1998.

[6] P. Dutta, R. Guerraoui, and M. Vukolic. Asynchronous Byzantine Consensus: Complexity, resilience and authentication. TR 200479, EPFL, Sept. 2004.

[21] Pogosyants, Segala, and Lynch. Verification of the randomized Consensus algorithm of Aspnes and Herlihy: A case study. DISTCOMP, 13, 2000.

[7] E. Gafni and L. Lamport. Disk Paxos. In Int. Symposium on Distributed Computing, pp. 330–344, 2000.

[22] R. D. Prisco, B. W. Lampson, and N. A. Lynch. Revisiting the Paxos algorithm. In Workshop on Distributed Algorithms, pages 111–125, 1997.

[8] J. Gray and L. Lamport. Consensus on Transaction Commit. TR 2003-96, Microsoft, Jan. 2004.

[23] M. Raynal. Consensus in synchronous systems: a concise guided tour. TR 1497, IRISA, Jul 2002.

[9] R. Guerraoui and M. Raynal. The information structure of indulgent Consensus. TR 1531, IRISA, 2003.

[24] A. Schiper. Early Consensus in an asynchronous system with a weak failure detector. Distributed Computing, 10(3):149–157, Apr. 1997.

[10] R. Guerraoui, M. Hurfin, A. Most´efaoui, R. Oliveira, M. Raynal, and A. Schiper. Consensus in asynchronous distributed systems: A concise guided tour. In LNCS 1752, pages 33–47. Springer, 2000. [11] M. P. Herlihy. Impossibility and universality results for wait-free synchronization. In Proc. of the 7th Annual ACM Symposium on Principles of Distributed Computing, pages 276–290, New York, USA, 1988. [12] P. Kellomaki. An annotated specification of the Consensus protocol of Paxos using superposition in PVS. TR 36, Tampere University of Technology, 2004. [13] L. Lamport. Paxos made simple. ACM SIGACT News, 32(4):18–25, December 2001. [14] L. Lamport. Specifying systems: the TLA+ language and tools for hardware and software engineers. Addison-Wesley Professional, 2002. [15] L. Lamport. Fast Paxos. Technical Report MSR-TR2005-112, Microsoft Research (MSR), July 2005.

[25] A. Tarski. A fixed point theorem and its applications. Pacific Journal of Mathematics, pages 285–309, 1955. [26] T. Tsuchiya and A. Schiper. Model Checking of Consensus Algorithms. Technical report, EPFL, 2006. [27] T. N. Win and M. D. Ernst. Verifying distributed algorithms via dynamic analysis and theorem proving. TR 841, MIT Lab for Computer Science, May 2002. [28] T. N. Win, M. D. Ernst, S. J. Garland, D. Kırlı, and N. Lynch. Using simulated execution in verifying distributed algorithms. Software Tools for Technology Transfer, 6(1):67–76, July 2004. [29] P. Zieli´nski. Minimizing latency of agreement protocols. PhD thesis, Computer Laboratory, University of Cambridge, UK, 2006. TR 667. [30] P. Zieli´nski. Optimistically Terminating Consensus. In Proc. of the 5th Int. Symposium on Parallel and Distributed Computing, Timisoara, Romania, July 2006.

AUTOMATIC DISCOVERY AND OPTIMIZATION OF PARTS FOR ...

Automatic Verification of Algebraic Transformations

Asynchronous Byzantine Consensus - automatic ...

efficient automatic verification of loop and data-flow ...

Automatic Verification of Confidentiality Properties of ...

automatic pronunciation verification - Research at Google

Automatic Compositional Verification of Timed Systems

Code Mutation in Verification and Automatic Code ...

An Automatic Verification Technique for Loop and Data ...

Automatic Functional Verification of Memory Oriented ...

Geometric Model Checking: An Automatic Verification ...

Byzantine Supplemental Readings and Images.pdf

Background Cutout with Automatic Object Discovery

$pdf-1840\history-of-byzantine-music-and-hymnography-by-egon ...$

pdf-1840\history-of-byzantine-music-and-hymnography-by-egon ...

Prosphora Byzantine style.pdf

Discovery and Authentication of Electronically Stored Information.pdf

AUTOMATIC REGISTRATION OF SAR AND OPTICAL IMAGES ...

Discovery, Settlement, and Demographics

Verification of Employment.pdf