Systems Research Barbara Liskov



Yüklə 402 Kb.
tarix08.08.2018
ölçüsü402 Kb.
#61182


Systems Research

  • Barbara Liskov

  • October 2007


Replication

  • Goal: provide reliability and availability by storing information at several nodes









Replication Issues

  • Semantics

  • What is being replicated

  • Failure assumptions



One-copy consistency

  • One-copy consistency

  • Or weaker



Only reads and writes

  • Only reads and writes

  • General operations

  • acct.deposit($$);

  • acct.withdraw($$$);



Replication protocols

  • Data replication

    • Quorums and voting
  • Operations



Issue 3: Failure Assumptions

  • Network is asynchronous

    • Eventual delivery
  • Network is malicious

    • Corruption
    • Replay
    • Spoofing
    • Handled via cryptography
  • Nodes are failstop or Byzantine



Failstop Failures

  • Nodes fail by crashing

    • A machine is either working correctly or it is doing nothing!
  • The assumption made in the 1980s



Failstop failures

  • Requires 2f+1 replicas

    • Operations must intersect at at least one replica
    • In general want availability for both reads and writes: f+1 nodes is sufficient
    • Read and write quorums








Data Replication

  • R.H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM TODS, 1979

  • D.K. Gifford, Weighted voting for replicated data, SOSP 1979

  • H. Attiya, A. Bar-Noy, and D. Dolev, Sharing memory robustly in message-passing systems, JACM , Jan. 1995



Quorum Consensus

  • Each data item has a version number

    • A sequence of values
  • write(d, val, v#)

    • Waits for f+1 oks
  • read(d) returns (val, v#)

    • Waits for f+1 matching v#’s
    • Else does a write-back


Replicas must execute operations in the same order

  • Replicas must execute operations in the same order

  • Implies replicas will have the same state, assuming

    • replicas start in the same state
    • operations are deterministic


Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988

  • Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988

    • Thesis, May 1988
  • Replication in the Harp file system, S. Ghemawat et. al, SOSP 1991

  • The part-time parliament, L. Lamport, TOCS 1998

  • Paxos made simple, L. Lamport, Nov. 2001



Use a primary

  • Use a primary

    • It orders the operations
    • Other replicas obey this order


System moves through a sequence of views



Client sends request to primary



Client sends request to primary

  • Client sends request to primary

  • Primary sends prepare message

  • Replicas receive prepare

    • Send prepare-ok message to the primary


Client sends request to primary

  • Client sends request to primary

  • Primary sends prepare message to all

  • Replicas receive prepare

    • Send prepare-ok message to the primary
  • Primary waits for f prepare-oks

    • Sends response to client


Normal Case

  • A 2-phase protocol:

    • Prepare; commit
  • Only 3 message delays



Byzantine Failures

  • Nodes fail arbitrarily

    • They lie, they collude
  • Causes

    • Malicious attacks
    • Non-deterministic software errors


3f+1 replicas are needed to survive f failures

  • 3f+1 replicas are needed to survive f failures

  • 2f+1 replicas is a quorum

    • Insures intersection
  • The minimum in an asynchronous network







BFT

  • M. Castro and B. Liskov, Practical Byzantine faulty tolerance and proactive recovery, ACM TOCS, 2002



Primary runs the protocol in the normal case

  • Primary runs the protocol in the normal case

  • Replicas watch the primary and do a view change if it fails

  • Key difference: replicas might lie

  • Solution: add a pre-prepare phase



Client sends request to primary

  • Client sends request to primary



Client sends request to primary

  • Client sends request to primary

  • Primary sends pre-prepare message to all



Client sends request to primary

  • Client sends request to primary

  • Primary sends pre-prepare message to all

    • Why not a prepare message?
    • Because primary might be malicious


Client sends request to primary

  • Client sends request to primary

  • Primary sends pre-prepare message to all

  • Replicas check the pre-prepare and if it is ok:

    • Send prepare messages to all


Replicas wait for 2f+1 matching prepares

  • Replicas wait for 2f+1 matching prepares

    • Send commit message to all


Replicas wait for 2f+1 matching prepares

  • Replicas wait for 2f+1 matching prepares

    • Send commit message to all
  • Replicas wait for 2f+1 matching commits

    • Execute operation and send result to client


Follow-on Work

    • BASE: using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001
    • R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004
    • J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07
    • Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05
    • HQ replications: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06


Papers in SOSP 07

  • Monday 1:30-3:30

    • Zyzzyva: Speculative Byzantine fault tolerance
    • Tolerating Byzantine faults in database systems using commit barrier scheduling
    • Low-overhead Byzantine fault-tolerant storage
    • Attested append-only memory: making adversaries stick to their word
  • Tuesday: 11:00-12:00

    • PeerReview: practical accountability for distributed systems


Yüklə 402 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə