Cloud Controller Multi-Process Mode (Puma)

Page last updated:

This page explains how the Cloud Controller API handles traffic in Single-Process vs. Multi-Process modes, how to migrate it from the older Single-Process mode to newer Multi-Process mode, and how running the Cloud Controller in Multi-Process mode affects how to scale it.

How Cloud Controller Scaling Works

In Single-Process Mode, the CF API has been limited to scaling horizontally. The web server technology that Cloud Controller is built on top of, Thin, can only serve traffic using a single process.

With the introduction of Multi-Process support for Cloud Controller, operators can configure their foundations to use fewer VM instances to handle the same amount of traffic to the Cloud Controller API.

Scaling in Single-Process Mode

Using Single-Process mode (powered by Thin), Cloud Controller cannot utilize more than a single process per VM. This is because Thin cannot use multiple processes to serve traffic on the same port at the same time.

In Single-Process mode, Cloud Controller sees little benefit by being deployed to a multiple-core machine, as Cloud Controller can only utilize a single core.

By default, Cloud Controller in Single-Process mode is configured to use 20 threads, meaning a single Cloud Controller process can comfortably handle 20 concurrent API requests before response time is slowed due to requests being stuck in the queue.

Once a foundation’s API group is at its limit, the only way to support more HTTP requests is to add more API VMs.

Each Single-Process API instance has 2 colocated job queue workers (called ‘local workers’) to handle package and droplet uploads to the blobstore. It’s generally sufficient for both the API process and the 2 local workers to run on a single core.

Scaling in Multi-Process Mode

Using Multi-Process mode (powered by Puma), Cloud Controller can utilize multiple processes (called ‘workers’) to serve traffic on the same port.

Cloud Controller only benefits from multiple workers if there is a CPU core available to each worker.

By default each worker is configured to use 10 threads, meaning each worker can process up to 10 concurrent requests. We don’t recommend configuring more than 20 threads per worker.

More workers on an individual instance results in a higher memory utilization.

Note A single Puma worker with 20 threads is generally not as resilient as a single Thin instance. However, a cluster of Puma workers is more resilient than a group of individual Thin instances, because if one Puma worker is stalled or crashes, queued requests can be routed to the other workers.

You should configure each API instance to have twice as many local job workers as there are API workers. This can vary depending on foundation load. See Scaling Local Workers for more details.

Enable Multi-Process Mode

Enable Multi-Process mode with the following settings, and see Migrating from Single-Process to Multi-Process Mode below for how to adjust VM type, instance count, workers, and threads/worker to account for and take advantage of the migration.

  1. To enable Multi-Process mode, set the cc.experimental.puma property to true.

  2. Set the cc.puma.workers to the number of CPU cores your machines have.

  3. Set cc.puma.max_threads to the desired number of threads. We recommend at least 10.

  4. Set cc.puma.max_db_connections_per_process to at least the same number of values as cc.puma.threads.

Migrating from Single-Process to Multi-Process Mode

When upgrading your Cloud Controller API server from Single-Process to Multi-Process mode, ensure the API instance group can support the same number of concurrent requests.

Typically a Single-Process API instance will support 20 threads. To ensure parity when migrating to Multi-Process API instances, ensure that the total number of workers multiplied by the number of threads per worker is equal to (or greater than) the number of Single-Process instances multiplied by the number of threads (usually 20).

It’s recommended to initially over-allocate resources and then fine-tune resource allocations after migration, as described in Post-Migration Fine-Tuning. For example, you might follow any or all of the following strategies:

  • Provision one or more extra API instances than the above concurrent requests calculation would require
  • Keep the thread count per worker less than 20
  • Increase memory for your API instances by a factor of how many workers each instance will have in Multi-Process mode.

Examples

Here are a few possible configurations that would result in similar number of threads:

Profile 1 (Thin) Profile 2 (Puma) Profile 3 (Puma) Profile 4 (Puma)
VM type 1 CPU 2 CPU 4 CPU 8 CPU
# instances 20 10 8 4
# API workers 1 (Thin) 2 4 7
# threads/worker 20 20 13 15
Total concurrent requests supported 400 400 416 420

Profile 1 (Thin): A typical API instance group for Single-Process mode. The only recommended way to increase the maximum number of concurrent requests is to scale the number of instances.

Profile 2 (Puma): With only 2 cores, this profile is not taking full advantage of Multi-Process support. Since each worker is already configured to use 20 threads, this profile can only be scaled by increasing the number of machine cores or total instances.

Profile 3 (Puma): This profile can be scaled by increasing the number of cores, total instances, or threads.

Profile 4 (Puma): This profile only has 7 API workers for 8 CPU cores. This can be helpful, but not strictly necessary, to ensure colocated processes have enough CPU resources to run. This profile can be scaled by increasing the number of workers to 8 as well as number of cores, total instances, or threads. This profile will be the most demanding in terms of memory footprint.

Post-Migration Fine-Tuning

After upgrading, watch the following metrics to see if you can reduce or need to scale up your provisioned resources:

  • cc.requests.outstanding.gauge: If this value is at or consistently near the total number of available Puma threads on the VM, you may need to scale your instance group.
  • system.cpu.user: This value should stay below 0.85 total utilization.
  • cc.vitals.cpu_load_avg: This value should stay below the total number of CPU cores.
Create a pull request or raise an issue on the source for this page in GitHub