How the Diego Auction Allocates Jobs

Page last updated:

This topic provides an overview of the structure and components of Diego, the container management system for Cloud Foundry.

To deploy Diego, see the GitHub Diego-Release.

This topic includes the following sections:

Introduction to Diego

Diego is a self-healing container management system that attempts to keep the correct number of instances running in Diego Cells to avoid network failures and crashes. Diego schedules and runs Tasks and Long-Running Processes (LRP). For more about Tasks and LRPs, see How the Diego Auction Allocates Jobs.

You can submit, update, and retrieve the desired number of Tasks and LRPs using the Bulletin Board System (BBS) API.

Diego Flow and Architecture Diagram

When you push an app to Cloud Foundry using Diego, Cloud Foundry performs the following process:

  1. The Cloud Controller sends a request to the BBS API.
  2. The BBS contacts the Auctioneer to create an Auction based on the desired resources for that LRP. The Rep accepts the Auction request.
  3. The Rep creates a Garden container and executes the work encoded in the Tasks and LRPs. This work is encoded as a generic, platform-independent recipe of composable actions.
  4. The Converger periodically analyzes snapshots of the desired state of the system and corrects discrepancies, ensuring that Diego is eventually consistent.
  5. Diego sends real-time streaming logs for Tasks and LRPs to the Loggregator system. Diego also registers its running LRP instances with the Gorouter to route external web traffic to them.

Diego flow Click the image for a larger representation.

Diego Component Glossary

The following summarizes the roles and responsibilities of various components depicted in the Diego architecture diagram above.

Diego Cell Components

Diego Cell directly manages and maintains Tasks and LRPs with the following components:

  • Rep

    • Maintains a presence record for the Cell in the Consul/Locket
    • Participates in auctions to accept new Tasks and LRP instances
    • Runs Tasks and LRPs by creating a container and then running actions in it
    • Reacts to container events
    • Periodically ensures its set of Tasks and ActualLRPs in the BBS is in sync with the containers actually present on the Cell
    • Manages container allocations against resource constraints on the Cell, such as memory and disk space
    • Streams stdout and stderr from container processes to the metron-agent running on the Cell, which in turn forwards to the Loggregator system
    • Periodically collects container metrics and emits them to Loggregator
  • Garden

    • Provides a platform-independent server and client to manage garden containers
    • Defines the API for creating and managing containers
  • Metron Agent

    • Forwards app logs, errors, and app and Diego metrics to the Loggregator Doppler component.
  • Route-Emitter

    • Monitors DesiredLRP and ActualLRP states.
    • Periodically emits route registration and unregistration messages for instances running on the local cell.

Diego Brain

Diego Brain distribute Tasks and LRPs to Diego Cells and corrects discrepancies between Actual and Desired counts to ensure fault-tolerance and long-term consistency. The Diego Brain consists of the Auctioneer.

  • Auctioneer
    • Holds auctions for Tasks and LRP instances
    • Distributes work using the auction algorithm. Auction communication is sent between the Auctioneer and the Cell Reps over HTTPS. For more information about the auction algorithm, see How the Diego Auction Allocates Jobs.
    • Maintains a lock in Consul/Locket to ensure only one auctioneer handles auctions at a time

Database VMs

  • BBS

    • Maintains a real-time representation of the state of the Diego cluster, including all desired LRPs, running LRP instances, and in-flight Tasks
    • Maintains a lock in Consul/Locket to ensure that only one BBS is active
    • Periodically compares DesiredLRPs and ActualLRPs and takes action to enforce the desired state
    • Resends auction requests for Tasks that have been pending for too long and completion callbacks for Tasks that have remained completed for too long
  • MySQL

    • MySQL provides a consistent key-value data store to Diego

Access VMs

  • File Server

    • Serves static assets used in the app lifecycle
  • SSH Proxy

    • Brokers connections between SSH clients and SSH servers
    • Runs inside instance containers and authorizes access to app instances based on Cloud Controller roles

Cloud Controller Bridge Components

The Cloud Controller Bridge (CC-Bridge) components translate app-specific requests from the Cloud Controller to the BBS. These components include the following:

  • Stager

    • Translates staging requests from the Cloud Controller into generic Tasks and LRPs
    • Sends a response to the Cloud Controller when a Task completes
  • CC-Uploader

    • Mediates uploads from the Rep to the Cloud Controller
    • Translates simple HTTP POST requests from the Rep into complex multipart-form uploads for the Cloud Controller
  • Nsync Bulker

    • Periodically polls the Cloud Controller for each app to ensure that Diego maintains accurate DesiredLRPs counts
  • Nsync Listener

    • Listens for app requests
    • Updates and creates the DesiredLRPs count and updates DesiredLRPs through the BBS
  • TPS Listener

    • Provides the Cloud Controller with information about currently running LRPs to respond to cf apps and cf app APP_NAME requests
  • TPS Watcher

    • Monitors ActualLRP activity for crashes and reports them the Cloud Controller

Service Registration and Component Coordination

  • Consul
    • Provides dynamic service registration and load-balancing via DNS resolution
    • Provides a consistent key-value store for maintenance of distributed locks and component presence
  • Locket
    • Provides abstractions for locks and service registration
    • Relies on a SQL backend for persistence
    • Lives in the Database VM

App Lifecycle Binaries

The following three platform-specific binaries deploy apps and govern their lifecycle:

  • The Builder, which stages an app. The CC-Bridge runs the Builder as a Task on every staging request. The Builder performs static analysis on the app code and performs any necessary pre-processing before the app is first run.
  • The Launcher, which runs an app. The CC-Bridge sets the Launcher as the Action on the DesiredLRP for the app. The Launcher executes the start command with the correct system context, including working directory and environment variables.
  • The Healthcheck, which performs a status check on running apps from inside the container.

Current Implementations

Create a pull request or raise an issue on the source for this page in GitHub