Spring 2024
Distributed protocols are at the foundation of modern cloud computing services but are notoriously difficult to define and implement correctly. In this class, we will survey some important methods and concepts in distributed computing, including logical clocks, consistency models, fault tolerance, consensus, sharding, peer-to-peer protocols, cache coherence and distributed transactions, as well as important systems applying these ideas. The class will be divided roughly into one half theory and one half practical application.
An overview of the concepts covered in the class can be found in the introductory slides from Spring 2023.
The class will consist of lecture and a series of lab assignments. The labs are protocol design and implementation exercises that build from simple replicated stores to a scalable fault-tolerant system using distributed consensus. The labs will be implemented in Java. We will test our systems in a simulated distributed environment and use model checking, a technique of systematic state space exploration, to help us find subtle concurrency bugs that are difficult to find by testing alone. Grading will be based on the labs. There will be no final exam.