Diego Components and Architecture

Page last updated:

This topic provides an overview of the structure and components of Diego, the new container management system for Cloud Foundry.

To deploy Diego, see the GitHub Diego-Release.

This topic includes the following sections:

Managing Desired Instances in Diego

Diego is a self-healing container management system that attempts to keep the correct number of instances running in Diego Cells to avoid network failures and crashes. Diego schedules and runs Tasks and Long-Running Processes (LRP):

  • Tasks
    Tasks run only once and terminate.

  • LRPs
    LRPs may have multiple instances.

You can submit, update, and retrieve the desired number of Tasks and LRPs using the Bulletin Board System (BBS) API.

Diego Flow and Architecture Diagram

When you push an application to Cloud Foundry using Diego, Cloud Foundry will go through the following process:

  1. Cloud Foundry signals the Diego Brain to set up Auctioneer, which creates an auction based on the desired instances configured in BBS.
  2. The Executor creates a Garden container and executes the work encoded in the Tasks and LRPs. This work is encoded as a generic, platform-independent recipe of composable actions.
  3. The Converger periodically analyzes snapshots of this representation and corrects discrepancies, ensuring that Diego is eventually consistent.
  4. Diego sends real-time streaming logs for Tasks and LRPs to the Loggregator system. Diego also registers its running LRP instances with the Gorouter to route external web traffic to them.

Diego flow View a larger version of this image at the Diego Design Notes repo.

Diego Component Glossary

The following summarizes the roles and responsibilities of the various components depicted in the Diego architecture diagram above.

Diego Cell Components

Diego Cell directly manages and maintains Tasks and LRPs with the following components:

  • Rep

    • Maintains a presence record for the Cell in the BBS.
    • Participates in auctions to accept new Tasks and LRP instances.
    • Runs Tasks and LRPs by telling its in-process Executor to create a container and then to run actions in it.
    • Reacts to container events coming from the Executor.
    • Periodically ensures its set of Tasks and ActualLRPs in the BBS is in sync with the containers actually present on the Cell.
    • is concerned with Tasks and LRPs and knows details about their lifecycles.
  • Executor

    • Runs as a logical process inside the Rep
    • Manages container allocations against resource constraints on the Cell, such as memory and disk space,
    • Implements the actions detailed in the API documentation,
    • Streams stdout and stderr from container processes to the metron-agent running on the Cell, which in turn forwards to the Loggregator system,
    • Periodically collects container metrics and emits them to Loggregator.
    • Knows only how to manage a collection of containers and to run actions in these containers
  • Garden

    • Provides a platform-independent server and client to manage garden containers.
    • Defines an interface to be implemented by container-runners, such as guardian and garden-windows.
    • Knows nothing about actions and simply provides a concrete implementation of a platform-specific containerization technology that can run arbitrary commands in containers.
  • Metron Agent

    • Forwards application logs, errors, and application and Diego metrics to the Loggregator Doppler component

Diego Brain

Diego Brain components distribute Tasks and LRPs to Diego Cells, and correct discrepancies between Actual and Desired counts to ensure fault-tolerance and long-term consistency. The Diego Brain consists of the Auctioneer:

  • Auctioneer
    • Holds auctions for Tasks and LRP instances.
    • Runs auctions using the auction package. Auction communication goes over HTTP and is between the Auctioneer and the Cell Reps.
    • Maintains a lock in consul to ensure only one auctioneer handles auctions at a time.
  • Converger
    • Maintains a lock in consul to ensure that only one converger performs convergence. This exclusivity is primarily for performance considerations, as convergence is idempotent.
    • Compares DesiredLRPs and their ActualLRPs and takes action to enforce the desired state:
    • Resends auction requests for Tasks that have been pending for too long and completion callbacks for Tasks that have remained completed for too long,
    • Periodically sends aggregate metrics about DesiredLRPs, ActualLRPs, and Tasks to Loggregator.

Database VMs

  • Diego Bulletin Board System
    Maintains a real-time representation of the state of the Diego cluster, including all desired LRPs, running LRP instances, and in-flight Tasks

  • MySQL
    Provides a consistent key-value data store to Diego

Access VMs

  • File Server

    • Serves static assets used by our various components, such as the App Lifecycle binaries.
  • SSH Proxy

    • Brokers connections between SSH clients and SSH servers running inside instance containers,
    • Authorizes access to CF app instances based on Cloud Controller roles.

Service Registration and Component Coordination

  • Consul
    • Provides dynamic service registration and load-balancing via DNS resolution,
    • Provides a consistent key-value store for maintenance of distributed locks and component presence.
  • Locket
    • Provides abstractions for locks and service registration that encapsulate interactions with consul.
Create a pull request or raise an issue on the source for this page in GitHub