Diego Components and Architecture
Page last updated:
This topic provides an overview of the structure and components of Diego, the new container management system for Cloud Foundry.
To deploy Diego, see the GitHub Diego-Release.
This topic includes the following sections:
- Managing Desired Instances in Diego
- Diego Flow and Architecture Diagram
- Diego Component Glossary
Diego is a self-healing container management system that attempts to keep the correct number of instances running in Diego Cells to avoid network failures and crashes. Diego schedules and runs Tasks and Long-Running Processes (LRP):
Tasks run only once and terminate.
LRPs may have multiple instances.
You can submit, update, and retrieve the desired number of Tasks and LRPs using the Bulletin Board System (BBS) API.
When you push an application to Cloud Foundry using Diego, Cloud Foundry will go through the following process:
- Cloud Foundry signals the Diego Brain to set up Auctioneer, which creates an auction based on the desired instances configured in BBS.
- The Executor creates a Garden container and executes the work encoded in the Tasks and LRPs. This work is encoded as a generic, platform-independent recipe of composable actions.
- The Converger periodically analyzes snapshots of this representation and corrects discrepancies, ensuring that Diego is eventually consistent.
- Diego sends real-time streaming logs for Tasks and LRPs to the Loggregator system. Diego also registers its running LRP instances with the Gorouter to route external web traffic to them.
View a larger version of this image at the Diego Design Notes repo.
The following summarizes the roles and responsibilities of the various components depicted in the Diego architecture diagram above.
Diego Cell directly manages and maintains Tasks and LRPs with the following components:
- Maintains a presence record for the Cell in the BBS.
- Participates in auctions to accept new Tasks and LRP instances.
- Runs Tasks and LRPs by telling its in-process Executor to create a container and then to run actions in it.
- Reacts to container events coming from the Executor.
- Periodically ensures its set of Tasks and
ActualLRPsin the BBS is in sync with the containers actually present on the Cell.
- is concerned with Tasks and LRPs and knows details about their lifecycles.
- Runs as a logical process inside the Rep
- Manages container allocations against resource constraints on the Cell, such as memory and disk space,
- Implements the actions detailed in the API documentation,
- Streams stdout and stderr from container processes to the metron-agent running on the Cell, which in turn forwards to the Loggregator system,
- Periodically collects container metrics and emits them to Loggregator.
- Knows only how to manage a collection of containers and to run actions in these containers
- Provides a platform-independent server and client to manage garden containers.
- Defines an interface to be implemented by container-runners, such as guardian and garden-windows.
- Knows nothing about actions and simply provides a concrete implementation of a platform-specific containerization technology that can run arbitrary commands in containers.
- Forwards application logs, errors, and application and Diego metrics to the Loggregator Doppler component
Diego Brain components distribute Tasks and LRPs to Diego Cells, and correct discrepancies between Actual and Desired counts to ensure fault-tolerance and long-term consistency. The Diego Brain consists of the Auctioneer:
- Holds auctions for Tasks and LRP instances.
- Runs auctions using the auction package. Auction communication goes over HTTP and is between the Auctioneer and the Cell Reps.
- Maintains a lock in consul to ensure only one auctioneer handles auctions at a time.
- Maintains a lock in consul to ensure that only one converger performs convergence. This exclusivity is primarily for performance considerations, as convergence is idempotent.
ActualLRPsand takes action to enforce the desired state:
- Resends auction requests for Tasks that have been pending for too long and completion callbacks for Tasks that have remained completed for too long,
- Periodically sends aggregate metrics about DesiredLRPs, ActualLRPs, and Tasks to Loggregator.
- Serves static assets used by our various components, such as the App Lifecycle binaries.
- Brokers connections between SSH clients and SSH servers running inside instance containers,
- Authorizes access to CF app instances based on Cloud Controller roles.
- Provides dynamic service registration and load-balancing via DNS resolution,
- Provides a consistent key-value store for maintenance of distributed locks and component presence.
- Provides abstractions for locks and service registration that encapsulate interactions with consul.