synapse/docs/metrics-howto.rst
2018-04-04 19:46:28 +01:00

5.9 KiB

How to monitor Synapse metrics using Prometheus

  1. Install prometheus:

    Follow instructions at http://prometheus.io/docs/introduction/install/

  2. Enable synapse metrics:

    Simply setting a (local) port number will enable it. Pick a port. prometheus itself defaults to 9090, so starting just above that for locally monitored services seems reasonable. E.g. 9092:

    Add to homeserver.yaml:

    metrics_port: 9092

    Also ensure that enable_metrics is set to True.

    Restart synapse.

  3. Add a prometheus target for synapse.

    It needs to set the metrics_path to a non-default value (under scrape_configs):

    - job_name: "synapse"
      metrics_path: "/_synapse/metrics"
      static_configs:
        - targets: ["my.server.here:9092"]

    If your prometheus is older than 1.5.2, you will need to replace static_configs in the above with target_groups.

    Restart prometheus.

Deprecated metrics removed in 0.28.0

Synapse 0.28.0 removes all of the metrics deprecated by 0.27.0, which are those listed under "Old name" below. This has been done to reduce the bandwidth used by gathering metrics and the storage requirements for the Prometheus server, as well as reducing CPU overhead for both Synapse and Prometheus.

Administrators should update any alerts or monitoring dashboards to use the "New name" listed below.

Block and response metrics renamed for 0.27.0

Synapse 0.27.0 begins the process of rationalising the duplicate *:count metrics reported for the resource tracking for code blocks and HTTP requests.

At the same time, the corresponding *:total metrics are being renamed, as the :total suffix no longer makes sense in the absence of a corresponding :count metric.

To enable a graceful migration path, this release just adds new names for the metrics being renamed. A future release will remove the old ones.

The following table shows the new metrics, and the old metrics which they are replacing.

New name Old name
synapse_util_metrics_block_count synapse_util_metrics_block_timer:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_count:count

synapse_util_metrics_block_count

synapse_util_metrics_block_db_txn_duration:count

synapse_util_metrics_block_time_seconds synapse_util_metrics_block_timer:total
synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_utime:total
synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_ru_stime:total
synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_count:total

synapse_util_metrics_block_db_txn_duration_seconds

synapse_util_metrics_block_db_txn_duration:total

synapse_http_server_response_count synapse_http_server_requests
synapse_http_server_response_count synapse_http_server_response_time:count
synapse_http_server_response_count synapse_http_server_response_ru_utime:count
synapse_http_server_response_count synapse_http_server_response_ru_stime:count
synapse_http_server_response_count synapse_http_server_response_db_txn_count:count

synapse_http_server_response_count

synapse_http_server_response_db_txn_duration:count

synapse_http_server_response_time_seconds synapse_http_server_response_time:total
synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_utime:total
synapse_http_server_response_ru_stime_seconds synapse_http_server_response_ru_stime:total
synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_count:total
synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total

Standard Metric Names

As of synapse version 0.18.2, the format of the process-wide metrics has been changed to fit prometheus standard naming conventions. Additionally the units have been changed to seconds, from miliseconds.

New name Old name
process_cpu_user_seconds_total process_resource_utime / 1000
process_cpu_system_seconds_total process_resource_stime / 1000
process_open_fds (no 'type' label) process_fds

The python-specific counts of garbage collector performance have been renamed.

New name Old name
python_gc_time reactor_gc_time
python_gc_unreachable_total reactor_gc_unreachable
python_gc_counts reactor_gc_counts

The twisted-specific reactor metrics have been renamed.

New name Old name
python_twisted_reactor_pending_calls reactor_pending_calls
python_twisted_reactor_tick_time reactor_tick_time