synapse/docs/metrics-howto.rst
Richard van der Hoff 13decdbf96 Revert "Merge pull request from matrix-org/rav/remove_redundant_metrics"
We aren't ready to release this yet, so I'm reverting it for now.

This reverts commit d1679a4ed7, reversing
changes made to e089100c62.
2018-04-09 12:59:12 +01:00

5.4 KiB

How to monitor Synapse metrics using Prometheus

  1. Install prometheus:

    Follow instructions at http://prometheus.io/docs/introduction/install/

  2. Enable synapse metrics:

    Simply setting a (local) port number will enable it. Pick a port. prometheus itself defaults to 9090, so starting just above that for locally monitored services seems reasonable. E.g. 9092:

    Add to homeserver.yaml:

    metrics_port: 9092

    Also ensure that enable_metrics is set to True.

    Restart synapse.

  3. Add a prometheus target for synapse.

    It needs to set the metrics_path to a non-default value (under scrape_configs):

    - job_name: "synapse"
      metrics_path: "/_synapse/metrics"
      static_configs:
        - targets: ["my.server.here:9092"]

    If your prometheus is older than 1.5.2, you will need to replace static_configs in the above with target_groups.

    Restart prometheus.

Block and response metrics renamed for 0.27.0

Synapse 0.27.0 begins the process of rationalising the duplicate *:count metrics reported for the resource tracking for code blocks and HTTP requests.

At the same time, the corresponding *:total metrics are being renamed, as the :total suffix no longer makes sense in the absence of a corresponding :count metric.

To enable a graceful migration path, this release just adds new names for the metrics being renamed. A future release will remove the old ones.

The following table shows the new metrics, and the old metrics which they are replacing.

New name Old name
synapse_util_metrics_block_count synapse_util_metrics_block_timer:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_count synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_count synapse_util_metrics_block_db_txn_count:count

synapse_util_metrics_block_count

synapse_util_metrics_block_db_txn_duration:count

synapse_util_metrics_block_time_seconds synapse_util_metrics_block_timer:total
synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_utime:total
synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_ru_stime:total
synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_count:total

synapse_util_metrics_block_db_txn_duration_seconds

synapse_util_metrics_block_db_txn_duration:total

synapse_http_server_response_count synapse_http_server_requests
synapse_http_server_response_count synapse_http_server_response_time:count
synapse_http_server_response_count synapse_http_server_response_ru_utime:count
synapse_http_server_response_count synapse_http_server_response_ru_stime:count
synapse_http_server_response_count synapse_http_server_response_db_txn_count:count

synapse_http_server_response_count

synapse_http_server_response_db_txn_duration:count

synapse_http_server_response_time_seconds synapse_http_server_response_time:total
synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_utime:total
synapse_http_server_response_ru_stime_seconds synapse_http_server_response_ru_stime:total
synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_count:total
synapse_http_server_response_db_txn_duration_seconds synapse_http_server_response_db_txn_duration:total

Standard Metric Names

As of synapse version 0.18.2, the format of the process-wide metrics has been changed to fit prometheus standard naming conventions. Additionally the units have been changed to seconds, from miliseconds.

New name Old name
process_cpu_user_seconds_total process_resource_utime / 1000
process_cpu_system_seconds_total process_resource_stime / 1000
process_open_fds (no 'type' label) process_fds

The python-specific counts of garbage collector performance have been renamed.

New name Old name
python_gc_time reactor_gc_time
python_gc_unreachable_total reactor_gc_unreachable
python_gc_counts reactor_gc_counts

The twisted-specific reactor metrics have been renamed.

New name Old name
python_twisted_reactor_pending_calls reactor_pending_calls
python_twisted_reactor_tick_time reactor_tick_time