Scalable Coordination Of A Tightly-Coupled Service In The Wide Area

Document Type

Conference Proceeding

Publication Date


Published In

Proceedings Of The First ACM SIGOPS Conference On Timely Results In Operating Systems


Today's large-scale services generally exploit loosely-coupled architectures that restrict functionality requiring tight cooperation (e.g., leader election, synchronization, and reconfiguration) to a small subset of nodes. In contrast, this work presents a way to scalably deploy tightly-coupled distributed systems that require significant coordination among a large number of nodes in the wide area. Our design relies on a new group membership abstraction, circuit breaker, to preserve efficient pairwise communication despite transient link failures. We characterize our abstraction in the context of a distributed rate limiting (DRL) service's deployment on a global testbed infrastructure. Unlike most distributed services, DRL can safely operate in separate partitions simultaneously, but it requires timely snapshots of global state within each. Our DRL deployment leverages the circuit breaker abstraction along with a robust gossipbased communication protocol to meet its demanding communication requirements. Through local and widearea experiments, we illustrate that DRL remains accurate and fair in the face of a variety of failure scenarios.

Published By



24th Symposium On Operating Systems Principles

Conference Dates

November 3-6, 2013

Conference Location

Farmington, PA