The Chandy–Lamport algorithm is a snapshot algorithm used in distributed computing for recording a consistent global state of an asynchronous distributed system. It was introduced by K. Mani Chandy and Leslie Lamport in 1985 and is one of the foundational algorithms for distributed-system observation; later applications include termination detection, deadlock detection, distributed debugging, and checkpointing for fault tolerance.[1][2][3][4]

Background

edit

The need for a snapshot algorithm arises because, in an asynchronous distributed system without a shared global clock, no single process can observe the simultaneous state of all other processes and the messages in transit between them. A naive collection of local states from individual processes can produce a "global state" that no real execution ever passed through, because messages may be in flight while the snapshots are taken.[2] Recording a meaningful snapshot therefore requires coordination so that the recorded local states and the recorded channel contents together correspond to a state through which a possible execution of the system could have passed. Such a state is called a consistent global state.[1][3]

History

edit

According to Lamport, the algorithm was developed during a visit he made to Chandy, who was then on the faculty of the University of Texas at Austin. Chandy posed the snapshot problem to Lamport over dinner; Lamport later wrote that he worked out the solution in the shower the next morning and arrived at Chandy's office to find that Chandy had independently arrived at the same solution. Lamport described the algorithm as a straightforward application of ideas in his earlier paper Time, Clocks, and the Ordering of Events in a Distributed System.[5]

Chandy and Lamport published the algorithm in February 1985 in ACM Transactions on Computer Systems under the title Distributed Snapshots: Determining Global States of Distributed Systems.[1] The paper received the ACM SIGOPS Hall of Fame Award in 2013 and the Edsger W. Dijkstra Prize in Distributed Computing in 2014.[6][7] Lamport's Turing Award biography credits him and Chandy with introducing the first algorithm for taking a snapshot of the state of an arbitrary distributed system.[4]

Model and assumptions

edit

The algorithm is presented for a model of distributed computation in which:[1][2]

  • The system consists of a fixed set of processes connected by directed FIFO channels.
  • There is a path of channels between every pair of processes, so that any process can send a message to any other process directly or by relay.
  • Channels do not lose, duplicate, or reorder messages.
  • There are no process failures.
  • The snapshot algorithm runs concurrently with the underlying computation and does not alter the behavior of that computation, other than by introducing additional marker messages.

A snapshot may be initiated by any single process, and multiple snapshots can run concurrently if marker messages are tagged with snapshot identifiers.[3]

Algorithm

edit

The algorithm uses a special control message called a marker. Each process records its own local state and the contents of each of its incoming channels.[1][8]

  1. A process initiates the snapshot by recording its own state and then sending a marker on every outgoing channel before sending any further application messages on those channels.
  2. When a process p receives a marker on an incoming channel c:
    1. If p has not yet recorded its own state, it records its state, records the state of channel c as empty, and then sends a marker on every outgoing channel before sending any further application messages.
    2. If p has already recorded its state, it records the state of channel c as the sequence of application messages received on c after p recorded its own state and before it received the marker on c.

The algorithm terminates when every process has recorded its state and the state of all of its incoming channels. The collected local states and channel states constitute a consistent global state of the system.[1][2]

Diagram of the Chandy–Lamport algorithm with three processes exchanging messages and marker tokens
Three-process illustration of the Chandy–Lamport snapshot algorithm

Properties and applications

edit

Chandy and Lamport proved that the recorded global state corresponds to a possible execution of the system; that is, the recorded state is reachable from the initial state and the final state of the recorded execution is reachable from the recorded state.[1] The recorded state need not coincide with any state that actually occurred during the recorded execution, but it represents a state through which a possible execution could have passed and is therefore suitable for evaluating stable predicates—predicates that, once true, remain true—such as deadlock or termination.[2][3]

Distributed snapshot algorithms have been used in:[4][8]

  • Termination detection for distributed computations.
  • Deadlock detection in distributed databases and resource managers.
  • Checkpointing for fault tolerance and rollback recovery.
  • Distributed debugging and the evaluation of stable predicates.
  • Garbage collection in distributed object systems.

Subsequent work has generalized the algorithm to relax some of its assumptions. Variants exist for systems with non-FIFO channels, for systems with failures, and for systems with causal-message-ordering or other communication models.[3][8]

See also

edit

References

edit
  1. ^ a b c d e f g Chandy, K. Mani; Lamport, Leslie (February 1985). "Distributed snapshots: determining global states of distributed systems". ACM Transactions on Computer Systems. 3 (1): 63–75. doi:10.1145/214451.214456.
  2. ^ a b c d e Lynch, Nancy A. (1996). Distributed Algorithms. Morgan Kaufmann. ISBN 978-1-55860-348-6.
  3. ^ a b c d e Tel, Gerard (2000). Introduction to Distributed Algorithms (2nd ed.). Cambridge University Press. ISBN 978-0-521-79483-1.
  4. ^ a b c "Leslie Barry Lamport: A.M. Turing Award Laureate". Association for Computing Machinery. Retrieved May 15, 2026.
  5. ^ Lamport, Leslie. "The Writings of Leslie Lamport: Distributed Snapshots". lamport.azurewebsites.net. Retrieved May 15, 2026.
  6. ^ "ACM SIGOPS Hall of Fame Award". ACM Special Interest Group on Operating Systems. Retrieved May 15, 2026.
  7. ^ "The 2014 Edsger W. Dijkstra Prize in Distributed Computing". DISC 2014. Retrieved May 15, 2026.
  8. ^ a b c Ghosh, Sukumar (2014). Distributed Systems: An Algorithmic Approach (2nd ed.). Chapman and Hall/CRC. ISBN 978-1-4665-5297-5.
edit

📚 Artikel Terkait di Wikipedia

Leslie Lamport

resources at the same time, the Chandy–Lamport algorithm for the determination of consistent global states (snapshot), and the Lamport signature, one of the prototypes

K. Mani Chandy

systems. Chandy is known for work on BCMP networks, the Chandy–Lamport algorithm for distributed snapshots, the Chandy–Misra–Haas algorithm for distributed

List of algorithms

Algorithm Ricart–Agrawala Algorithm Snapshot algorithm: record a consistent global state for an asynchronous system Chandy–Lamport algorithm Vector clocks: generate

Snapshot algorithm

snapshot algorithm would avoid this as it makes sure to record the whole state in a point in time. Chandy–Lamport algorithm Lai–Yang algorithm Spezialetti–Kearns

Jayadev Misra

Leslie Lamport says: "The first major step in getting beyond traditional programming languages to describe concurrent algorithms was Misra and Chandy's Unity"

List of University of Texas at Austin faculty

the original on September 23, 2016. Retrieved September 9, 2016. "K. Mani Chandy | Infospheres". Archived from the original on October 6, 2008. Retrieved

Global state (computer science)

by K. Mani Chandy and Leslie Lamport in 1985, is also expressed as a consistent cut of the system's execution history. Chandy and Lamport proved that

Distributed operating system

systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238. Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of