Example Consider a chain of stores and suppose a manager



Yüklə 456 b.
tarix02.06.2018
ölçüsü456 b.
#47006



Example

  • Consider a chain of stores and suppose a manager

    • wants to query all the stores,
    • find the inventory of toothbrushes at each, and
    • issue instructions to move toothbrushes from store to store in order to balance the inventory.
  • The operation is done by a global transaction T that has component Ti at the ith store and a component T0 at the office where the manager is located.

  • The sequence of activities performed by T is as follows:

    • Component T0 is created at the site of the manager.
    • T0 sends messages to all the stores instructing them to create components Ti.
    • Each Ti executes a query at store i to discover the number of toothbrushes in inventory and reports this number to T0.
    • T0 takes these numbers and determines, by some algorithm, what shipments of toothbrushes are desired.
    • T0 then sends messages such as "store 10 should ship 500 toothbrushes to store 7" to the appropriate stores.
    • Stores receiving instructions update their inventory and perform the shipments.


What can go wrong?

  • Example 1

  • Suppose a bug in the algorithm to redistribute toothbrushes might cause store 10 to be instructed to ship more toothbrushes than it has.

    • T10 will abort, and no toothbrushes will be shipped from store 10;
    • Neither will the inventory at store 10 be changed.
    • However, T7 detects no problems and commits at store 7, updating its inventory to reflect the supposedly shipped toothbrushes.
  • Now, not only has T failed to execute atomically (since T10 never completes), but it has left the distributed database in an inconsistent state.

  • Example 2

  • Suppose T10 replies to T0's first message by telling its inventory of toothbrushes.

  • However, the machine at store 10 then crashes, and the instructions from T0 are never received by T10.

  • Can distributed transaction T ever commit? What should T10 do when its site recovers?



Two­Phase Commit

  • Assume a transaction T has parts executing at several sites. How can they agree to commit T?

  • Phase 1

  • 1. Coordinator (site originating the transaction) does:

    • Log
      .
    • Send prepare T messages to all sites involved in T .
  • 2. Each site receiving this message decides to commit T or abort T , and

    • Logs either or , and
    • Sends to the coordinator the corresponding message: either ready T or don't commit T.


Phase 2

  • Phase 2

  • 1. Coordinator decides to commit or abort. Commit if all ready messages received; abort if at least one abort or timeout.

    • Log or .
    • Send commit T or abort T messages to all sites involved in T .
  • 2. Sites receiving these messages log them and follow instruction.



Exercise

  • Site 0 is the coordinator.

  • Sites 1 and 2 are the components.

  • Give an example of the sequence of messages that would occur if site 1 wants to commit and site 2 wants to abort.

  • (0,1,P) (0,2,P) (1,0,R) (2,0,D) (0,1,A) (0,2,A)



Recovery

  • If a site fails with or on its log, it can redo or undo, respectively.

  • If a site fails with as its last T entry, it knows the coordinator could not have signaled commit, so it may abort T and undo.

  • If a site fails with as its last T entry, it doesn't know what has happened to T; it must ask the coordinator (or any other site).



Recovery When Coordinator Fails

  • Other sites elect a leader and try to figure out what happened.

  • Some site has on its log.

    • Then the original coordinator must have wanted to send commit T messages everywhere, and it is safe to commit T.
  • Some site has on its log.

    • Then the original coordinator must have decided to abort T, and it is safe for the new coordinator to order that action.
  • No site has or on its log, but at least one site does not have on its log.

    • Then we know that the old coordinator never received ready T from this site and therefore could not have decided to commit.
    • So, it’s safe for the new coordinator to decide to abort T.


Recovery When Coordinator Fails

  • 4. There is no or to be found, but every surviving site has .

    • We can’t be sure whether the old coordinator found some reason to abort T or not;
      • it could have decided to do so because of actions at its own site, or
      • because of a don't commit T message from another failed site.
    • Or the old coordinator may have decided to commit T and already committed its local component of T.
    • Thus, the new coordinator is not able to decide whether to commit or abort T and must wait until the original coordinator recovers.


Yüklə 456 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə