Example Consider a chain of stores and suppose a manager

Yüklə 456 b.

tarix	02.06.2018
ölçüsü	456 b.
	#47006

Example

Consider a chain of stores and suppose a manager

wants to query all the stores,
find the inventory of toothbrushes at each, and
issue instructions to move toothbrushes from store to store in order to balance the inventory.

The operation is done by a global transaction T that has component Ti at the ith store and a component T0 at the office where the manager is located.
The sequence of activities performed by T is as follows:

Component T0 is created at the site of the manager.
T0 sends messages to all the stores instructing them to create components Ti.
Each Ti executes a query at store i to discover the number of toothbrushes in inventory and reports this number to T0.
T0 takes these numbers and determines, by some algorithm, what shipments of toothbrushes are desired.
T0 then sends messages such as "store 10 should ship 500 toothbrushes to store 7" to the appropriate stores.
Stores receiving instructions update their inventory and perform the shipments.

What can go wrong?

Example 1
Suppose a bug in the algorithm to redistribute toothbrushes might cause store 10 to be instructed to ship more toothbrushes than it has.

T10 will abort, and no toothbrushes will be shipped from store 10;
Neither will the inventory at store 10 be changed.
However, T7 detects no problems and commits at store 7, updating its inventory to reflect the supposedly shipped toothbrushes.

Now, not only has T failed to execute atomically (since T10 never completes), but it has left the distributed database in an inconsistent state.
Example 2
Suppose T10 replies to T0's first message by telling its inventory of toothbrushes.
However, the machine at store 10 then crashes, and the instructions from T0 are never received by T10.
Can distributed transaction T ever commit? What should T10 do when its site recovers?

TwoPhase Commit

Assume a transaction T has parts executing at several sites. How can they agree to commit T?
Phase 1
1. Coordinator (site originating the transaction) does:

Log
.
Send prepare T messages to all sites involved in T .

2. Each site receiving this message decides to commit T or abort T , and

Logs either or , and
Sends to the coordinator the corresponding message: either ready T or don't commit T.

Phase 2

Phase 2
1. Coordinator decides to commit or abort. Commit if all ready messages received; abort if at least one abort or timeout.

Log or .
Send commit T or abort T messages to all sites involved in T .

2. Sites receiving these messages log them and follow instruction.

Exercise

Site 0 is the coordinator.
Sites 1 and 2 are the components.
Give an example of the sequence of messages that would occur if site 1 wants to commit and site 2 wants to abort.
(0,1,P) (0,2,P) (1,0,R) (2,0,D) (0,1,A) (0,2,A)

Recovery

If a site fails with or on its log, it can redo or undo, respectively.
If a site fails with as its last T entry, it knows the coordinator could not have signaled commit, so it may abort T and undo.
If a site fails with as its last T entry, it doesn't know what has happened to T; it must ask the coordinator (or any other site).

Recovery When Coordinator Fails

Other sites elect a leader and try to figure out what happened.
Some site has on its log.

Then the original coordinator must have wanted to send commit T messages everywhere, and it is safe to commit T.

Some site has on its log.

Then the original coordinator must have decided to abort T, and it is safe for the new coordinator to order that action.

No site has or on its log, but at least one site does not have on its log.

Then we know that the old coordinator never received ready T from this site and therefore could not have decided to commit.
So, it’s safe for the new coordinator to decide to abort T.

Recovery When Coordinator Fails

4. There is no or to be found, but every surviving site has .

We can’t be sure whether the old coordinator found some reason to abort T or not;

it could have decided to do so because of actions at its own site, or
because of a don't commit T message from another failed site.

Or the old coordinator may have decided to commit T and already committed its local component of T.
Thus, the new coordinator is not able to decide whether to commit or abort T and must wait until the original coordinator recovers.

Yüklə 456 b.

Dostları ilə paylaş:

Example Consider a chain of stores and suppose a manager

Example

Consider a chain of stores and suppose a manager

The operation is done by a global transaction T that has component Ti at the ith store and a component T0 at the office where the manager is located.

The sequence of activities performed by T is as follows:

What can go wrong?

Example 1

Suppose a bug in the algorithm to redistribute toothbrushes might cause store 10 to be instructed to ship more toothbrushes than it has.

Now, not only has T failed to execute atomically (since T10 never completes), but it has left the distributed database in an inconsistent state.

Example 2

Suppose T10 replies to T0's first message by telling its inventory of toothbrushes.

However, the machine at store 10 then crashes, and the instructions from T0 are never received by T10.

Can distributed transaction T ever commit? What should T10 do when its site recovers?

TwoPhase Commit

Assume a transaction T has parts executing at several sites. How can they agree to commit T?

Phase 1

1. Coordinator (site originating the transaction) does:

2. Each site receiving this message decides to commit T or abort T , and

Phase 2

Phase 2

1. Coordinator decides to commit or abort. Commit if all ready messages received; abort if at least one abort or timeout.

2. Sites receiving these messages log them and follow instruction.

Exercise

Site 0 is the coordinator.

Sites 1 and 2 are the components.

Give an example of the sequence of messages that would occur if site 1 wants to commit and site 2 wants to abort.

(0,1,P) (0,2,P) (1,0,R) (2,0,D) (0,1,A) (0,2,A)

Recovery

If a site fails with or on its log, it can redo or undo, respectively.

If a site fails with as its last T entry, it knows the coordinator could not have signaled commit, so it may abort T and undo.

If a site fails with as its last T entry, it doesn't know what has happened to T; it must ask the coordinator (or any other site).

Recovery When Coordinator Fails

Other sites elect a leader and try to figure out what happened.

Some site has on its log.

Some site has on its log.

No site has or on its log, but at least one site does not have on its log.

Recovery When Coordinator Fails

4. There is no or to be found, but every surviving site has .

Example Consider a chain of stores and suppose a manager

Example

Consider a chain of stores and suppose a manager

The operation is done by a global transaction T that has component Ti at the ith store and a component T0 at the office where the manager is located.

The sequence of activities performed by T is as follows:

What can go wrong?

Example 1

Suppose a bug in the algorithm to redistribute toothbrushes might cause store 10 to be instructed to ship more toothbrushes than it has.

Now, not only has T failed to execute atomically (since T10 never completes), but it has left the distributed database in an inconsistent state.

Example 2

Suppose T10 replies to T0's first message by telling its inventory of toothbrushes.

However, the machine at store 10 then crashes, and the instructions from T0 are never received by T10.

Can distributed transaction T ever commit? What should T10 do when its site recovers?

Two­Phase Commit

Assume a transaction T has parts executing at several sites. How can they agree to commit T?

Phase 1

1. Coordinator (site originating the transaction) does:

2. Each site receiving this message decides to commit T or abort T , and

Phase 2

Phase 2

1. Coordinator decides to commit or abort. Commit if all ready messages received; abort if at least one abort or timeout.

2. Sites receiving these messages log them and follow instruction.

Exercise

Site 0 is the coordinator.

Sites 1 and 2 are the components.

Give an example of the sequence of messages that would occur if site 1 wants to commit and site 2 wants to abort.

(0,1,P) (0,2,P) (1,0,R) (2,0,D) (0,1,A) (0,2,A)

Recovery

If a site fails with or on its log, it can redo or undo, respectively.

If a site fails with as its last T entry, it knows the coordinator could not have signaled commit, so it may abort T and undo.

If a site fails with as its last T entry, it doesn't know what has happened to T; it must ask the coordinator (or any other site).

Recovery When Coordinator Fails

Other sites elect a leader and try to figure out what happened.

Some site has on its log.

Some site has on its log.

No site has or on its log, but at least one site does not have on its log.

Recovery When Coordinator Fails

4. There is no or to be found, but every surviving site has .

TwoPhase Commit