Component: DEA Placement Algorithm
Page last updated:
Note: This topic refers to versions of Cloud Foundry that use the deprecated Droplet Execution Agent (DEA) architecture instead of the current Diego architecture. See Diego Architecture for more information about how containers are managed in Cloud Foundry.
This topic describes how the Cloud Controller uses an algorithm to schedule apps in Cloud Foundry’s older Droplet Execution Agent (DEA) architecture. Newer versions of Cloud Foundry use the Diego Auction.
Whenever Cloud Foundry needs to spin up a new instance of an app, the Cloud Controller is responsible for selecting a droplet execution agent (DEA) to run it.
DEAs broadcast their availability to the Cloud Controller through a NATS message called an
advertisement, which contains a
stats hash with information about their available memory, available disk, the stack which the DEA runs, and an expiration time.
The Cloud Controller collects these advertisements in a construct called a pool. When the Cloud Controller needs to find a DEA to run an app, it runs through the following steps, using criteria (minimum thresholds for disk, memory, etc.) specific to the app that the chosen DEA will run:
- It removes the expired DEA advertisements from the pool.
It filters the remaining advertisements to include only those:
- With adequate disk
- With adequate memory
- Running the required stack (Linux or Windows)
It then narrows its search to DEAs running in the availability zone with the fewest running instances (according to the information provided by the advertisements in the pool).
It then narrows its search to the DEAs with the fewest running instances.
It then narrows its search to the top half of the DEAs, sorted by memory.
It then randomly selects one of the remaining DEAs.
It is important to note that the Cloud Controller uses this algorithm to balance new app instances between DEAs when the new app instances are created, but do not balance already-running apps.
For example, suppose a set of apps are running on DEAs in two AZs, and one AZ temporarily goes down. While the second AZ is down, all instances will be placed on the remaining AZ. After the second AZ comes back online, new instances will be allocated to DEAs there, since the algorithm favors DEAs in the zone with the fewest running instances. However, instances running on the first AZ will not be moved to the other AZ, so the imbalance will persist.
An imbalance may also result from a deploy where DEAs have had a change to their source code or stemcell.
It is possible to rebalance the DEAs between AZs in two ways:
- Restarting the app, which may result in a brief down-time.
- Terminating and restarting half of the instances one by one.