Cloud Foundry Component Metrics

Page last updated:

This topic lists and describes the metrics available for Cloud Foundry (CF) system components. These metrics are streamed from the Loggregator Firehose.

About this Topic

The CF component metric names and descriptions listed in this topic may be out of date because CF component metrics change often. If you have questions about CF component metrics, consider contacting the component teams directly on their respective channels in the Cloud Foudry Slack organization. For example, you can contact the Diego team at #diego.

Cloud Controller

Default Origin Name: cc

Metric Name Description
diego_sync.invalid_desired_lrps Number of invalid DesiredLRPs found during CF apps and Diego DesiredLRPs periodic synchronization. Emitted every 30 seconds.
failed_job_count.<VM_NAME>-<VM_INDEX> Number of failed jobs in the <VM_NAME>-<VM_INDEX> queue. This is the number of delayed jobs where the failed at column is populated with the time of the most recently failed attempt at the job. The failed job count is not specific to the jobs run by the Cloud Controller worker. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM.
diego_sync.duration Time in milliseconds that it took to synchronize CF apps and Diego DesiredLRPs. Emitted every 30 seconds.
failed_job_count.cc-generic Number of failed jobs in the cc-generic queue. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM.
failed_job_count.total Number of failed jobs in all queues. By default, Cloud Controller deletes failed jobs after 31 days. Emitted every 30 seconds per VM.
http_status.1XX Number of HTTP response status codes of type 1xx (informational). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle.
http_status.2XX Number of HTTP response status codes of type 2xx (success). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request.
http_status.3XX Number of HTTP response status codes of type 3xx (redirection). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request.
http_status.4XX Number of HTTP response status codes of type 4xx (client error). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle. Emitted for each Cloud Controller request.
http_status.5XX Number of HTTP response status codes of type 5xx (server error). This resets when the Cloud Controller process is restarted and is incremented at the end of each request cycle.
job_queue_length.cc-<VM_NAME>-<VM_INDEX> Number of background jobs in the <VM_NAME>-<VM_INDEX> queue that have yet to run for the first time. Emitted every 30 seconds per VM.
job_queue_length.cc-generic Number of background jobs in the cc-generic queue that have yet to run for the first time. Emitted every 30 seconds per VM.
job_queue_length.total Total number of background jobs in the queues that have yet to run for the first time. Emitted every 30 seconds per VM.
log_count.all Total number of log messages, sum of messages of all severity levels. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.debug Number of log messages of severity “debug.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.debug1 Not used.
log_count.debug2 Number of log messages of severity “debug2.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.error Number of log messages of severity “error.” Error is the most severe level. It is used for failures and during error handling. Most errors can be found under this log level, eg. failed unbinding a service, failed to cancel a task, Diego app crashed error, staging completion errors, staging errors, and resource not found. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.fatal Number of log messages of severity “fatal.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.info Number of log messages of severity “info.” Examples of info messages are droplet created, copying package, uploading package, access denied due to insufficient scope, job logging, blobstore actions, staging requests, and app running requests. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.off Number of log messages of severity “off.” The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
log_count.warn Number of log messages of severity “warn.” Warn is also used for failures and during error handling, eg. diagnostics written to file, failed to capture diagnostics, app rollback failed, service broker already deleted, and UAA token problems. The count resets when the Cloud Controller process is restarted. Emitted every 30 seconds per VM.
requests.completed Number of requests that have been processed. Emitted for each Cloud Controller request.
requests.outstanding Number of request that are currently being processed. Emitted for each Cloud Controller request.
staging.requested Cumulative number of requests to start a staging task handled by each Cloud Controller.
staging.succeeded Cumulative number of successful staging tasks handled by each Cloud Controller. Emitted every time a staging task completes successfully.
staging.succeeded_duration Time in milliseconds that the successful staging task took to run. Emitted each time a staging task completes successfully.
staging.failed Cumulative number of failed staging tasks handled by each Cloud Controller. Emitted every time a staging task fails.
staging.failed_duration Time in milliseconds that the failed staging task took to run. Emitted each time a staging task fails.
tasks_running.count Number of currently running tasks. Emitted every 30 seconds per VM. This metric is only seen in version 3 of the Cloud Foundry API.
tasks_running.memory_in_mb Memory being consumed by all currently running tasks. Emitted every 30 seconds per VM. This metric is only seen in version 3 of the Cloud Foundry API.
thread_info.event_machine.connection_count Number of open connections to event machine. Emitted every 30 seconds per VM.
thread_info.event_machine.resultqueue.num_waiting Number of scheduled tasks in the result. Emitted every 30 seconds per VM.
thread_info.event_machine.resultqueue.size Number of unscheduled tasks in the result. Emitted every 30 seconds per VM.
thread_info.event_machine.threadqueue.num_waiting Number of scheduled tasks in the threadqueue. Emitted every 30 seconds per VM.
thread_info.event_machine.threadqueue.size Number of unscheduled tasks in the threadqueue. Emitted every 30 seconds per VM.
thread_info.thread_count Total number of threads that are either runnable or stopped. Emitted every 30 seconds per VM.
total_users Total number of users ever created, including inactive users. Emitted every 10 minutes per VM.
vcap_sinatra.recent_errors 50 most recent errors. DEPRECATED
vitals.cpu Percentage of CPU used by the Cloud Controller process. Emitted every 30 seconds per VM.
vitals.cpu_load_avg System CPU load averaged over the last 1 minute according to the OS. Emitted every 30 seconds per VM.
vitals.mem_bytes The RSS bytes (resident set size) or real memory of the Cloud Controller process. Emitted every 30 seconds per VM.
vitals.mem_free_bytes Total memory available according to the OS. Emitted every 30 seconds per VM.
vitals.mem_used_bytes Total memory used (active + wired) according to the OS. Emitted every 30 seconds per VM.
vitals.num_cores The number of CPUs of a host machine. Emitted every 30 seconds per VM.
vitals.uptime The uptime of the Cloud Controller process in seconds. Emitted every 30 seconds per VM.

Top

Diego

Diego metrics have the following origin names:

Default Origin Name: auctioneer

Metric Name Description
AuctioneerFetchStatesDuration Time in nanoseconds that the auctioneer took to fetch state from all the cells when running its auction. Emitted every 30 seconds during each auction.
AuctioneerLRPAuctionsFailed Cumulative number of LRP instances that the auctioneer failed to place on Diego cells. Emitted every 30 seconds during each auction.
AuctioneerLRPAuctionsStarted Cumulative number of LRP instances that the auctioneer successfully placed on Diego cells. Emitted every 30 seconds during each auction.
AuctioneerTaskAuctionsFailed Cumulative number of Tasks that the auctioneer failed to place on Diego cells. Emitted every 30 seconds during each auction.
AuctioneerTaskAuctionsStarted Cumulative number of Tasks that the auctioneer successfully placed on Diego cells. Emitted every 30 seconds during each auction.
LockHeld.v1-locks-auctioneer_lock Whether an auctioneer holds the auctioneer lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active auctioneer.
LockHeldDuration.v1-locks-auctioneer_lock Time in nanoseconds that the active auctioneer has held the auctioneer lock. Emitted every 30 seconds by the active auctioneer.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: bbs

Metric Name Description
BBSMasterElected Emitted once when the BBS is elected as master.
ConvergenceLRPDuration Time in nanoseconds that the BBS took to run its LRP convergence pass. Emitted every 30 seconds when LRP convergence runs.
ConvergenceLRPPreProcessingActualLRPsDeleted Cumulative number of times the BBS has detected and deleted a malformed ActualLRP in its LRP convergence pass. Emitted every 30 seconds.
ConvergenceLRPPreProcessingMalformedRunInfos Cumulative number of times the BBS has detected a malformed DesiredLRP RunInfo in its LRP convergence pass. Emitted every 30 seconds.
ConvergenceLRPPreProcessingMalformedSchedulingInfos Cumulative number of times the BBS has detected a malformed DesiredLRP SchedulingInfo in its LRP convergence pass. Emitted every 30 seconds.
ConvergenceLRPRuns Cumulative number of times BBS has run its LRP convergence pass. Emitted every 30 seconds.
ConvergenceTaskDuration Time in nanoseconds that the BBS took to run its Task convergence pass. Emitted every 30 seconds when Task convergence runs.
ConvergenceTaskRuns Cumulative number of times the BBS has run its Task convergence pass. Emitted every 30 seconds.
ConvergenceTasksKicked Cumulative number of times the BBS has updated a Task during its Task convergence pass. Emitted every 30 seconds.
ConvergenceTasksPruned Cumulative number of times the BBS has deleted a malformed Task during its Task convergence pass. Emitted every 30 seconds.
CrashedActualLRPs Total number of LRP instances that have crashed. Emitted every 30 seconds.
CrashingDesiredLRPs Total number of DesiredLRPs that have at least one crashed instance. Emitted every 30 seconds.
Domain.cf-apps Whether the ‘cf-apps’ domain is up-to-date, so that CF apps from CC have been synchronized with DesiredLRPs for Diego to run. 1 means the domain is up-to-date, no data means it is not. Emitted every 30 seconds.
Domain.cf-tasks Whether the ‘cf-tasks’ domain is up-to-date, so that CF tasks from CC have been synchronized with tasks for Diego to run. 1 means the domain is up-to-date, no data means it is not. Emitted every 30 seconds.
ETCDLeader Index of the leader node in the etcd cluster. Emitted every 30 seconds.
ETCDRaftTerm Raft term of the etcd cluster. Emitted every 30 seconds.
ETCDReceivedBandwidthRate Number of bytes per second received by the follower etcd node. Emitted every 30 seconds.
ETCDReceivedRequestRate Number of requests per second received by the follower etcd node. Emitted every 30 seconds.
ETCDSentBandwidthRate Number of bytes per second sent by the leader etcd node. Emitted every 30 seconds.
ETCDSentRequestRate Number of requests per second sent by the leader etcd node. Emitted every 30 seconds.
ETCDWatchers Number of watches set against the etcd cluster. Emitted every 30 seconds.
LockHeld.v1-locks-bbs_lock Whether a BBS holds the BBS lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active BBS server.
LockHeldDuration.v1-locks-bbs_lock Time in nanoseconds that the active BBS has held the BBS lock. Emitted every 30 seconds by the active BBS server.
LRPsClaimed Total number of LRP instances that have been claimed by some cell. Emitted every 30 seconds.
LRPsDesired Total number of LRP instances desired across all LRPs. Emitted periodically.
LRPsExtra Total number of LRP instances that are no longer desired but still have a BBS record. Emitted every 30 seconds.
LRPsMissing Total number of LRP instances that are desired but have no record in the BBS. Emitted every 30 seconds.
LRPsRunning Total number of LRP instances that are running on cells. Emitted every 30 seconds.
LRPsUnclaimed Total number of LRP instances that have not yet been claimed by a cell. Emitted every 30 seconds.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
MetricsReportingDuration Time in nanoseconds that the BBS took to emit metrics about etcd. Emitted every 30 seconds.
MigrationDuration Time in nanoseconds that the BBS took to run migrations against its persistence store. Emitted each time a BBS becomes the active master.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.
RequestCount Cumulative number of requests the BBS has handled through its API. Emitted for each BBS request.
RequestLatency Time in nanoseconds that the BBS took to handle requests to its API endpoints. Emitted when the BBS API handles requests.
TasksCompleted Total number of Tasks that have completed. Emitted every 30 seconds.
TasksPending Total number of Tasks that have not yet been placed on a cell. Emitted every 30 seconds.
TasksResolving Total number of Tasks locked for deletion. Emitted every 30 seconds.
TasksRunning Total number of Tasks running on cells. Emitted every 30 seconds.

Default Origin Name: cc_uploader

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: file_server

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: garden_linux

Metric Name Description
BackingStores Number of container backing store files. Emitted every 30 seconds.
DepotDirs Number of directories in the Garden depot. Emitted every 30 seconds.
LoopDevices Number of attached loop devices. Emitted every 30 seconds.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
MetricsReporting How long it took to emit the BackingStores, DepotDirs, and LoopDevices metrics. Emitted every 30 seconds.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: nsync_bulker

Metric Name Description
DesiredLRPSyncDuration Time in nanoseconds that the nsync-bulker took to synchronize CF apps and Diego DesiredLRPs. Emitted every 30 seconds.
LockHeld.v1-locks-nsync_bulker_lock Whether an nsync-bulker holds the nsync-bulker lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active nsync-bulker.
LockHeldDuration.v1-locks-nsync_bulker_lock Time in nanoseconds that the active nsync-bulker has held the convergence lock. Emitted every 30 seconds by the active nsync-bulker.
LRPsDesired Cumulative number of LRPs desired through the nsync API. Emitted on each request desiring a new LRP, every 30 seconds.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
NsyncInvalidDesiredLRPsFound Number of invalid DesiredLRPs found during nsync-bulker periodic synchronization. Emitted every 30 seconds.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: nsync_listener

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: rep

Metric Name Description
CapacityRemainingContainers Remaining number of containers this cell can host. Emitted every 60 seconds.
CapacityRemainingDisk Remaining amount in MiB of disk available for this cell to allocate to containers. Emitted every 60 seconds.
CapacityRemainingMemory Remaining amount in MiB of memory available for this cell to allocate to containers. Emitted every 60 seconds.
CapacityTotalContainers Total number of containers this cell can host. Emitted every 60 seconds.
CapacityTotalDisk Total amount in MiB of disk available for this cell to allocate to containers. Emitted every 60 seconds.
CapacityTotalMemory Total amount in MiB of memory available for this cell to allocate to containers. Emitted every 60 seconds.
CM Emitted every 30 seconds.
ContainerCount Number of containers hosted on the cell. Emitted every 30 seconds.
GardenContainerCreationDuration Time in nanoseconds that the rep Garden backend took to create a container. Emitted after every successful container creation.
LogMessage Emitted every 30 seconds.
logSenderTotalMessagesRead Count of application log messages sent by Diego Executor. Emitted every 30 seconds.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.
RepBulkSyncDuration Time in nanoseconds that the cell rep took to synchronize the ActualLRPs it has claimed with its actual garden containers. Emitted every 30 seconds by each rep.
UnhealthyCell Whether the cell has failed to pass its healthcheck against the garden backend. 0 signifies healthy, and 1 signifies unhealthy. Emitted every 30 seconds.

Default Origin Name: route_emitter

Metric Name Description
LockHeld.v1-locks-route_emitter_lock Whether a route-emitter holds the route-emitter lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active route-emitter.
LockHeldDuration.v1-locks-route_emitter_lock Time in nanoseconds that the active route-emitter has held the route-emitter lock. Emitted every 30 seconds by the active route-emitter.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
MessagesEmitted The cumulative number of registration messages that this process has sent. Emitted every 30 seconds.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.
RouteEmitterSyncDuration Time in nanoseconds that the active route-emitter took to perform its synchronization pass. Emitted every 60 seconds.
RoutesRegistered Cumulative number of route registrations emitted from the route-emitter as it reacts to changes to LRPs. Emitted every 30 seconds.
RoutesSynced Cumulative number of route registrations emitted from the route-emitter during its periodic route-table synchronization. Emitted every 30 seconds.
RoutesTotal Number of routes in the route-emitter’s routing table. Emitted every 30 seconds.
RoutesUnregistered Cumulative number of route unregistrations emitted from the route-emitter as it reacts to changes to LRPs. Emitted every 30 seconds.

Default Origin Name: ssh_proxy

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process .

Default Origin Name: stager

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.
StagingRequestFailedDuration Time in nanoseconds that the failed staging task took to run. Emitted each time a staging task fails.
StagingRequestsFailed Cumulative number of failed staging tasks handled by each stager. Emitted every time a staging task fails.
StagingRequestsSucceeded Cumulative number of successful staging tasks handled by each stager. Emitted every time a staging task completes successfully.
StagingRequestSucceededDuration Time in nanoseconds that the successful staging task took to run. Emitted each time a staging task completes successfully.
StagingStartRequestsReceived Cumulative number of requests to start a staging task. Emitted by a stager each time it handles a request.

Default Origin Name: tps_listener

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the process.

Default Origin Name: tps_watcher

Metric Name Description
LockHeld.v1-locks-tps_watcher_lock Whether a tps-watcher holds the tps-watcher lock: 1 means the lock is held, and 0 means the lock was lost. Emitted every 30 seconds by the active tps-watcher.
LockHeldDuration.v1-locks-tps_watcher_lock Time in nanoseconds that the active tps-watcher has held the convergence lock. Emitted every 30 seconds by the active tps-watcher.
memoryStats.lastGCPauseTimeNS Duration in nanoseconds of the last garbage collector pause.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine. Emitted every 30 seconds.
numGoRoutines Instantaneous number of active goroutines in the process.

Top

DopplerServer

Default Origin Name: DopplerServer

Metric Name Description
dropsondeListener.currentBufferCount DEPRECATED
dropsondeListener.receivedByteCount DEPRECATED in favor of DopplerServer.udpListener.receivedByteCount.
dropsondeListener.receivedMessageCount DEPRECATED in favor of DopplerServer.udpListener.receivedMessageCount.
dropsondeUnmarshaller.containerMetricReceived Lifetime number of ContainerMetric messages unmarshalled.
dropsondeUnmarshaller.counterEventReceived Lifetime number of CounterEvent messages unmarshalled.
dropsondeUnmarshaller.errorReceived Lifetime number of Error messages unmarshalled.
dropsondeUnmarshaller.heartbeatReceived DEPRECATED
dropsondeUnmarshaller.httpStartStopReceived Lifetime number of HttpStartStop messages unmarshalled.
dropsondeUnmarshaller.logMessageTotal Lifetime number of LogMessage messages unmarshalled.
dropsondeUnmarshaller.unmarshalErrors Lifetime number of errors when unmarshalling messages.
dropsondeUnmarshaller.valueMetricReceived Lifetime number of ValueMetric messages unmarshalled.
httpServer.receivedMessages Number of messages received by Doppler’s internal MessageRouter. Emitted every 5 seconds.
LinuxFileDescriptor Number of file handles for the Doppler’s process.
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
messageRouter.numberOfContainerMetricSinks Instantaneous number of container metric sinks known to the SinkManager. Emitted every 5 seconds.
messageRouter.numberOfDumpSinks Instantaneous number of dump sinks known to the SinkManager. Emitted every 5 seconds.
messageRouter.numberOfFirehoseSinks Instantaneous number of firehose sinks known to the SinkManager. Emitted every 5 seconds.
messageRouter.numberOfSyslogSinks Instantaneous number of syslog sinks known to the SinkManager.
messageRouter.numberOfWebsocketSinks Instantaneous number of WebSocket sinks known to the SinkManager. Emitted every 5 seconds.
messageRouter.totalDroppedMessages Lifetime number of messages dropped inside Doppler for various reasons (downstream consumer can’t keep up internal object wasn’t ready for message, etc.).
sentMessagesFirehose.<SUBSCRIPTION_ID> Number of sent messages through the firehose per subscription id. Emitted every 5 seconds.
udpListener.receivedByteCount Lifetime number of bytes received by Doppler’s UDP Listener.
udpListener.receivedMessageCount Lifetime number of messages received by Doppler’s UDP Listener.
udpListener.receivedErrorCount Lifetime number of errors encountered by Doppler’s UDP Listener while reading from the connection.
tcpListener.receivedByteCount Lifetime number of bytes received by Doppler’s TCP Listener. Emitted every 5 seconds.
tcpListener.receivedMessageCount Lifetime number of messages received by Doppler’s TCP Listener. Emitted every 5 seconds.
tcpListener.receivedErrorCount Lifetime number of errors encountered by Doppler’s TCP Listener while handshaking, decoding or reading from the connection.
tlsListener.receivedByteCount Lifetime number of bytes received by Doppler’s TLS Listener. Emitted every 5 seconds.
tlsListener.receivedMessageCount Lifetime number of messages received by Doppler’s TLS Listener. Emitted every 5 seconds.
tlsListener.receivedErrorCount Lifetime number of errors encountered by Doppler’s TLS Listener while handshaking, decoding or reading from the connection.
TruncatingBuffer.DroppedMessages Number of messages intentionally dropped by Doppler from the sink for the specific sink. This counter event will correspond with log messages “Log message output is too high.” Emitted every 5 seconds.
TruncatingBuffer.totalDroppedMessages Lifetime total number of messages intentionally dropped by Doppler from all of its sinks due to back pressure. Emitted every 5 seconds.
listeners.totalReceivedMessageCount Total number of messages received across all of Doppler’s listeners (UDP, TCP, TLS).
numCpus Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the Doppler process.
signatureVerifier.invalidSignatureErrors Lifetime number of messages received with an invalid signature.
signatureVerifier.missingSignatureErrors Lifetime number of messages received that are too small to contain a signature.
signatureVerifier.validSignatures Lifetime number of messages received with valid signatures.
Uptime Uptime for the Doppler’s process.

Top

Etcd

Visit etcd stats API

Default Origin Name: etcd

Metric Name Description
CompareAndDeleteFail CompareAndDeleteFail operation count. Emitted every 30 seconds.
CompareAndDeleteSuccess CompareAndDeleteSuccess operation countEmitted every 30 seconds.
CompareAndSwapFail CompareAndSwapFail operation count. Emitted every 30 seconds.
CompareAndSwapSuccess CompareAndSwapSuccess operation count. Emitted every 30 seconds.
CreateFail CreateFail operation count. Emitted every 30 seconds.
CreateSuccess CreateSuccess operation count. Emitted every 30 seconds.
DeleteFail DeleteFail operation count. Emitted every 30 seconds.
DeleteSuccess DeleteSuccess operation count. Emitted every 30 seconds.
EtcdIndex X-Etcd-Index value from the /stats/store endpoint. Emitted every 30 seconds.
ExpireCount ExpireCount operation count. Emitted every 30 seconds.
Followers Number of etcd followers. Emitted every 30 seconds.
GetsFail GetsFail operation count. Emitted every 30 seconds.
GetsSuccess GetsSuccess operation count. Emitted every 30 seconds.
IsLeader 1 if the current server is the leader, 0 if it is a follower. Emitted every 30 seconds.
Latency Current latency in milliseconds from leader to a specific follower. Emitted every 30 seconds.
RaftIndex X-Raft-Index value from the /stats/store endpoint. Emitted every 30 seconds.
RaftTerm X-Raft-Term value from the /stats/store endpoint. Emitted every 30 seconds.
ReceivedAppendRequests Number of append requests this node has processed. Emitted every 30 seconds.
ReceivingBandwidthRate Number of bytes per second this node is receiving (follower only). Emitted every 30 seconds.
ReceivingRequestRate Number of requests per second this node is receiving (follower only). Emitted every 30 seconds.
SendingBandwidthRate Number of bytes per second this node is sending (leader only). This value is undefined on single member clusters. Emitted every 30 seconds.
SendingRequestRate Number of requests per second this node is sending (leader only). This value is undefined on single member clusters. Emitted every 30 seconds.
SentAppendRequests Number of requests that this node has sent. Emitted every 30 seconds.
SetsFail SetsFail operation count. Emitted every 30 seconds.
SetsSuccess SetsSuccess operation count. Emitted every 30 seconds.
UpdateFail UpdateFail operation count. Emitted every 30 seconds.
UpdateSuccess UpdateSuccess operation count. Emitted every 30 seconds.
Watchers Watchers operation count. Emitted every 30 seconds.

Top

Metron Agent

Default Origin Name: MetronAgent

Metric Name Description
MessageAggregator.counterEventReceived Lifetime number of CounterEvents aggregated in Metron.
MessageBuffer.droppedMessageCount Lifetime number of intentionally dropped messages from Metron’s batch writer buffer. Batch writing is performed over TCP/TLS only.
DopplerForwarder.sentMessages Lifetime number of messages sent to Doppler regardless of protocol. Emitted every 30 seconds.
dropsondeAgentListener.currentBufferCount Instantaneous number of Dropsonde messages read by UDP socket but not yet unmarshalled.
dropsondeAgentListener.receivedByteCount Lifetime number of bytes of Dropsonde messages read by UDP socket.
dropsondeAgentListener.receivedMessageCount Lifetime number of Dropsonde messages read by UDP socket.
dropsondeMarshaller.containerMetricMarshalled Lifetime number of ContainerMetric messages marshalled.
dropsondeMarshaller.counterEventMarshalled Lifetime number of CounterEvent messages marshalled.
dropsondeMarshaller.errorMarshalled Lifetime number of Error messages marshalled.
dropsondeMarshaller.heartbeatMarshalled Lifetime number of Heartbeat messages marshalled.
dropsondeMarshaller.httpStartStopMarshalled Lifetime number of HttpStartStop messages marshalled.
dropsondeMarshaller.logMessageMarshalled Lifetime number of LogMessage messages marshalled.
dropsondeMarshaller.marshalErrors Lifetime number of errors when marshalling messages.
dropsondeMarshaller.valueMetricMarshalled Lifetime number of ValueMetric messages marshalled.
dropsondeUnmarshaller.containerMetricReceived Lifetime number of ContainerMetric messages unmarshalled.
dropsondeUnmarshaller.counterEventReceived Lifetime number of CounterEvent messages unmarshalled.
dropsondeUnmarshaller.errorReceived Lifetime number of Error messages unmarshalled.
dropsondeUnmarshaller.heartbeatReceived DEPRECATED
dropsondeUnmarshaller.httpStartStopReceived Lifetime number of HttpStartStop messages unmarshalled.
dropsondeUnmarshaller.logMessageTotal Lifetime number of LogMessage messages unmarshalled.
dropsondeUnmarshaller.unmarshalErrors Lifetime number of errors when unmarshalling messages.
dropsondeUnmarshaller.valueMetricReceived Lifetime number of ValueMetric messages unmarshalled.
legacyAgentListener.currentBufferCount Instantaneous number of Legacy messages read by UDP socket but not yet unmarshalled.
legacyAgentListener.receivedByteCount Lifetime number of bytes of Legacy messages read by UDP socket.
legacyAgentListener.receivedMessageCount Lifetime number of Legacy messages read by UDP socket.
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCpus Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the Doppler process.
tcp.sendErrorCount Lifetime number of errors if writing to Doppler over TCP fails.
tcp.sentByteCount Lifetime number of sent bytes to Doppler over TCP.
tcp.sentMessageCount Lifetime number of sent messages to Doppler over TCP.
tls.sendErrorCount Lifetime number of errors if writing to Doppler over TLS fails.
tls.sentByteCount Lifetime number of sent bytes to Doppler over TLS. Emitted every 30 seconds.
tls.sentMessageCount Lifetime number of sent messages to Doppler over TLS. Emitted every 30 seconds.
udp.sendErrorCount Lifetime number of errors if writing to Doppler over UDP fails.
udp.sentByteCount Lifetime number of sent bytes to Doppler over UDP.
udp.sentMessageCount Lifetime number of sent messages to Doppler over UDP.

Top

Routing

Routing Release metrics have following origin names:

Default Origin Name: gorouter

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds.
memoryStats.numFrees Lifetime number of memory deallocations. Emitted every 10 seconds.
memoryStats.numMallocs Lifetime number of memory allocations. Emitted every 10 seconds.
numCPUS Number of CPUs on the machine. Emitted every 10 seconds.
numGoRoutines Instantaneous number of active goroutines in the Doppler process. Emitted every 10 seconds.
logSenderTotalMessagesRead Lifetime number of application log messages. Emitted every 5 seconds.
backend_exhausted_conns Lifetime number of requests that have been rejected due to the limit on number of connections per backend having been reached for all backends tried. Emitted every 5 seconds.
bad_gateways Lifetime number of bad gateways. Emitted every 5 seconds.
latency Time in milliseconds that the Gorouter took to handle requests to its application endpoints. Emitted per router request.
latency.{component} Time in milliseconds that the Gorouter took to handle requests from each component to its endpoints. Emitted per router request.
registry_message.{component} Lifetime number of route register messages received for each component. Emitted per route-register message.
unregistry_message.{component} Lifetime number of route unregister messages received for each component. Emitted per route-unregister message.
rejected_requests Lifetime number of bad requests received on Gorouter. Emitted every 5 seconds.
requests.{component} Lifetime number of requests received for each component. Emitted per router request.
responses Lifetime number of HTTP responses. Emitted every 5 seconds.
responses.2xx Lifetime number of 2xx HTTP responses. Emitted every 5 seconds.
responses.3xx Lifetime number of 3xx HTTP response. Emitted every 5 seconds.
responses.4xx Lifetime number of 4xx HTTP response. Emitted every 5 seconds.
responses.5xx Lifetime number of 5xx HTTP response. Emitted every 5 seconds.
responses.xxx Lifetime number of other(non-(2xx-5xx)) HTTP response. Emitted every 5 seconds.
route_lookup_time Time in nanoseconds to look up a request URL in the route table. Emitted per router request.
websocket_upgrades Lifetime number of WebSocket upgrades. Emitted every 5 seconds.
websocket_failures Lifetime number of WebSocket failures. Emitted every 5 seconds.
routed_app_requests The collector sums up requests for all dea-{index} components for its output metrics. Emitted every 5 seconds.
total_requests Lifetime number of requests received. Emitted every 5 seconds.
ms_since_last_registry_update Time in millisecond since the last route register has been been received. Emitted every 30 seconds.
total_routes Current number of routes registered. Emitted every 30 seconds.
uptime Uptime for router. Emitted every second.
file_descriptors Number of file descriptors currently used by Gorouter. Emitted every 5 seconds.
routes_pruned Lifetime number of stale routes that have been automatically pruned by Gorouter. Emitted every 5 seconds.
backend_tls_handshake_failed Lifetime number of failed TLS handshakes when connecting to a backend registered with TLS port. Corresponds to HTTP 525 error response from Gorouter. Emitted every 5 seconds.
backend_invalid_id Lifetime number of requests that were rejected because the backend presents a certificate with an invalid id. Corresponds to HTTP 503 error response from Gorouter. Emitted every 5 seconds.
backend_invalid_tls_cert Lifetime number of requests that were rejected because the backend presents a certificate that is not trusted by Gorouter. Corresponds to HTTP 526 error response from Gorouter. Emitted every 5 seconds.

Default Origin Name: routing_api

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds.
memoryStats.numFrees Lifetime number of memory deallocations. Emitted every 10 seconds.
memoryStats.numMallocs Lifetime number of memory allocations. Emitted every 10 seconds.
numCPUS Number of CPUs on the machine. Emitted every 10 seconds.
numGoRoutines Instantaneous number of active goroutines in the Doppler process. Emitted every 10 seconds.
key_refresh_events Total number of events when fresh token was fetched from UAA. Emitted every 30 seconds.
total_http_routes Number of HTTP routes in the routing table. Emitted every 30 seconds, or when there is a new HTTP route added. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval.
total_http_subscriptions Number of HTTP routes subscriptions. Emitted every 30 seconds. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval.
total_tcp_routes Number of TCP routes in the routing table. Emitted every 30 seconds, or when there is a new TCP route added. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval.
total_tcp_subscriptions Number of TCP routes subscriptions. Emitted every 30 seconds. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval.
total_token_errors Total number of UAA token errors. Emitted every 30 seconds. Interval for emitting this metric can be configured with manifest property metrics_reporting_interval.

Default Origin Name: tcp_emitter

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds.
memoryStats.numFrees Lifetime number of memory deallocations. Emitted every 10 seconds.
memoryStats.numMallocs Lifetime number of memory allocations. Emitted every 10 seconds.
numCPUS Number of CPUs on the machine. Emitted every 10 seconds.
numGoRoutines Instantaneous number of active goroutines in the Doppler process. Emitted every 10 seconds.

Default Origin Name: tcp-router

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds. Emitted every 10 seconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use. Emitted every 10 seconds.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator. Emitted every 10 seconds.
memoryStats.numFrees Lifetime number of memory deallocations. Emitted every 10 seconds.
memoryStats.numMallocs Lifetime number of memory allocations. Emitted every 10 seconds.
numCPUS Number of CPUs on the machine. Emitted every 10 seconds.
numGoRoutines Instantaneous number of active goroutines in the Doppler process. Emitted every 10 seconds.
{session_id}.ConnectionTime Average connection time to backend in current session. Emitted every 60 seconds per session ID. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval.
{session_id}.CurrentSessions Total number of current sessions. Emitted every 60 seconds per session ID. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval.
AverageConnectTimeMs Average backend response time (in ms). Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval.
AverageQueueTimeMs Average time spent in queue (in ms). Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval.
TotalBackendConnectionErrors Total number of backend connection errors. Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval.
TotalCurrentQueuedRequests Total number of requests unassigned in queue. Emitted every 60 seconds. Interval value for this metric can be configured with manifest property tcp_router.tcp_stats_collection_interval.

Top

Syslog Drain Binder

Default Origin Name: syslog_drain_binder

Metric Name Description
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the Doppler process.
pollCount Number of times the syslog drain binder has polled the cloud controller for syslog drain bindings. Emitted every 30 seconds.
totalDrains Number of syslog drains returned by cloud controller. Emitted every 30 seconds.

Top

Traffic Controller

Default Origin Name: LoggregatorTrafficController

Metric Name Description
dopplerProxy.containermetricsLatency Duration for serving container metrics via the containermetrics endpoint (milliseconds). Emitted every 30 seconds.
dopplerProxy.recentlogsLatency Duration for serving recent logs via the recentLogs endpoint (milliseconds). Emitted every 30 seconds.
memoryStats.lastGCPauseTimeNS Duration of the last Garbage Collector pause in nanoseconds.
memoryStats.numBytesAllocated Instantaneous count of bytes allocated and still in use.
memoryStats.numBytesAllocatedHeap Instantaneous count of bytes allocated on the main heap and still in use.
memoryStats.numBytesAllocatedStack Instantaneous count of bytes used by the stack allocator.
memoryStats.numFrees Lifetime number of memory deallocations.
memoryStats.numMallocs Lifetime number of memory allocations.
numCPUS Number of CPUs on the machine.
numGoRoutines Instantaneous number of active goroutines in the Doppler process.
Uptime Uptime for the Traffic Controller’s process. Emitted every 30 seconds.
LinuxFileDescriptor Number of file handles for the TrafficController’s process.

Top

User Account and Authentication (UAA)

Default Origin Name: uaa

Metric Name Description
audit_service.client_authentication_count Number of successful client authentication attempts since the last startup. Emitted every 30 seconds.
audit_service.client_authentication_failure_count Number of failed client authentication attempts since the last startup. Emitted every 30 seconds.
audit_service.principal_authentication_failure_count Number of failed non-user authentication attempts since the last startup. Emitted every 30 seconds.
audit_service.principal_not_found_count Number of times non-user was not found since the last startup. Emitted every 30 seconds.
audit_service.user_authentication_count Number of successful authentications by the user since the last startup. Emitted every 30 seconds.
audit_service.user_authentication_failure_count Number of failed user authentication attempts since the last startup. Emitted every 30 seconds.
audit_service.user_not_found_count Number of times the user was not found since the last startup. Emitted every 30 seconds.
audit_service.user_password_changes Number of successful password changes by the user since the last startup. Emitted every 30 seconds.
audit_service.user_password_failures Number of failed password changes by the user since the last startup. Emitted every 30 seconds.

Top

Create a pull request or raise an issue on the source for this page in GitHub