Michael  (Mike) Spreitzer photo

Bison - overview

In Bison project, we build a communication substrate called Bulletin Board for a management systems that dynamically allocates resources to a set of applications or virtual machines in a server farm.

Our bulletin board (BB) supports an eventually consistent topic-based shared memory abstraction. It is used by various distributed controllers (which are the brain of the system) to support a variety of control loops involving agents and application containers running on each one of the managed machines. The BB is used only for management overhead, of which there is a relatively fixed load, related only qualitatively to the application activity. The BB needs to provide a reliable shared memory with adequate latency and acceptable costs for the system size and throughput needed for that management work.

Robustness and administrative simplicity were the primary concerns affecting our design choices. Intended dynamicity as well as failures (such as process crashes and network partitions) and flaky processes are common at the scales we target; these should disrupt the system operation to the minimum extent possible, and the recovery should be autonomous involving minimum possible (if any) human intervention.

In addition, our system is dynamic in which (even in the absence of faults) starts and stops of application server processes, infrastructure processes, and even machines can be common. The BB should therefore be incrementally scalable allowing automated addition, removal, and restart of processes with minimum configuration effort. Also, accurate capacity planning is difficult, and customers sometimes do extreme acceptance tests, so it is also necessary to degrade gracefully under general overload.

The above led us to focus on decentralized approaches with a built-in ability to cope with dynamic changes in an autonomous fashion. Our implementation is built in a fully peer-to-peer fashion with each process maintaining a portion of the shared memory state pertaining to its topics of interest.

A paper describing this work may be found here.