forked from MirrorHub/synapse
379d2a8c39
Converting some of the rst documentation to markdown. Attempted to preserve whitespace and line breaks to minimize cosmetic change.
217 lines
15 KiB
Markdown
217 lines
15 KiB
Markdown
# How to monitor Synapse metrics using Prometheus
|
|
|
|
1. Install Prometheus:
|
|
|
|
Follow instructions at
|
|
<http://prometheus.io/docs/introduction/install/>
|
|
|
|
1. Enable Synapse metrics:
|
|
|
|
There are two methods of enabling metrics in Synapse.
|
|
|
|
The first serves the metrics as a part of the usual web server and
|
|
can be enabled by adding the \"metrics\" resource to the existing
|
|
listener as such:
|
|
|
|
resources:
|
|
- names:
|
|
- client
|
|
- metrics
|
|
|
|
This provides a simple way of adding metrics to your Synapse
|
|
installation, and serves under `/_synapse/metrics`. If you do not
|
|
wish your metrics be publicly exposed, you will need to either
|
|
filter it out at your load balancer, or use the second method.
|
|
|
|
The second method runs the metrics server on a different port, in a
|
|
different thread to Synapse. This can make it more resilient to
|
|
heavy load meaning metrics cannot be retrieved, and can be exposed
|
|
to just internal networks easier. The served metrics are available
|
|
over HTTP only, and will be available at `/`.
|
|
|
|
Add a new listener to homeserver.yaml:
|
|
|
|
listeners:
|
|
- type: metrics
|
|
port: 9000
|
|
bind_addresses:
|
|
- '0.0.0.0'
|
|
|
|
For both options, you will need to ensure that `enable_metrics` is
|
|
set to `True`.
|
|
|
|
1. Restart Synapse.
|
|
|
|
1. Add a Prometheus target for Synapse.
|
|
|
|
It needs to set the `metrics_path` to a non-default value (under
|
|
`scrape_configs`):
|
|
|
|
- job_name: "synapse"
|
|
metrics_path: "/_synapse/metrics"
|
|
static_configs:
|
|
- targets: ["my.server.here:port"]
|
|
|
|
where `my.server.here` is the IP address of Synapse, and `port` is
|
|
the listener port configured with the `metrics` resource.
|
|
|
|
If your prometheus is older than 1.5.2, you will need to replace
|
|
`static_configs` in the above with `target_groups`.
|
|
|
|
1. Restart Prometheus.
|
|
|
|
## Renaming of metrics & deprecation of old names in 1.2
|
|
|
|
Synapse 1.2 updates the Prometheus metrics to match the naming
|
|
convention of the upstream `prometheus_client`. The old names are
|
|
considered deprecated and will be removed in a future version of
|
|
Synapse.
|
|
|
|
| New Name | Old Name |
|
|
| ---------------------------------------------------------------------------- | ---------------------------------------------------------------------- |
|
|
| python_gc_objects_collected_total | python_gc_objects_collected |
|
|
| python_gc_objects_uncollectable_total | python_gc_objects_uncollectable |
|
|
| python_gc_collections_total | python_gc_collections |
|
|
| process_cpu_seconds_total | process_cpu_seconds |
|
|
| synapse_federation_client_sent_transactions_total | synapse_federation_client_sent_transactions |
|
|
| synapse_federation_client_events_processed_total | synapse_federation_client_events_processed |
|
|
| synapse_event_processing_loop_count_total | synapse_event_processing_loop_count |
|
|
| synapse_event_processing_loop_room_count_total | synapse_event_processing_loop_room_count |
|
|
| synapse_util_metrics_block_count_total | synapse_util_metrics_block_count |
|
|
| synapse_util_metrics_block_time_seconds_total | synapse_util_metrics_block_time_seconds |
|
|
| synapse_util_metrics_block_ru_utime_seconds_total | synapse_util_metrics_block_ru_utime_seconds |
|
|
| synapse_util_metrics_block_ru_stime_seconds_total | synapse_util_metrics_block_ru_stime_seconds |
|
|
| synapse_util_metrics_block_db_txn_count_total | synapse_util_metrics_block_db_txn_count |
|
|
| synapse_util_metrics_block_db_txn_duration_seconds_total | synapse_util_metrics_block_db_txn_duration_seconds |
|
|
| synapse_util_metrics_block_db_sched_duration_seconds_total | synapse_util_metrics_block_db_sched_duration_seconds |
|
|
| synapse_background_process_start_count_total | synapse_background_process_start_count |
|
|
| synapse_background_process_ru_utime_seconds_total | synapse_background_process_ru_utime_seconds |
|
|
| synapse_background_process_ru_stime_seconds_total | synapse_background_process_ru_stime_seconds |
|
|
| synapse_background_process_db_txn_count_total | synapse_background_process_db_txn_count |
|
|
| synapse_background_process_db_txn_duration_seconds_total | synapse_background_process_db_txn_duration_seconds |
|
|
| synapse_background_process_db_sched_duration_seconds_total | synapse_background_process_db_sched_duration_seconds |
|
|
| synapse_storage_events_persisted_events_total | synapse_storage_events_persisted_events |
|
|
| synapse_storage_events_persisted_events_sep_total | synapse_storage_events_persisted_events_sep |
|
|
| synapse_storage_events_state_delta_total | synapse_storage_events_state_delta |
|
|
| synapse_storage_events_state_delta_single_event_total | synapse_storage_events_state_delta_single_event |
|
|
| synapse_storage_events_state_delta_reuse_delta_total | synapse_storage_events_state_delta_reuse_delta |
|
|
| synapse_federation_server_received_pdus_total | synapse_federation_server_received_pdus |
|
|
| synapse_federation_server_received_edus_total | synapse_federation_server_received_edus |
|
|
| synapse_handler_presence_notified_presence_total | synapse_handler_presence_notified_presence |
|
|
| synapse_handler_presence_federation_presence_out_total | synapse_handler_presence_federation_presence_out |
|
|
| synapse_handler_presence_presence_updates_total | synapse_handler_presence_presence_updates |
|
|
| synapse_handler_presence_timers_fired_total | synapse_handler_presence_timers_fired |
|
|
| synapse_handler_presence_federation_presence_total | synapse_handler_presence_federation_presence |
|
|
| synapse_handler_presence_bump_active_time_total | synapse_handler_presence_bump_active_time |
|
|
| synapse_federation_client_sent_edus_total | synapse_federation_client_sent_edus |
|
|
| synapse_federation_client_sent_pdu_destinations_count_total | synapse_federation_client_sent_pdu_destinations:count |
|
|
| synapse_federation_client_sent_pdu_destinations_total | synapse_federation_client_sent_pdu_destinations:total |
|
|
| synapse_handlers_appservice_events_processed_total | synapse_handlers_appservice_events_processed |
|
|
| synapse_notifier_notified_events_total | synapse_notifier_notified_events |
|
|
| synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter_total | synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter |
|
|
| synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter_total | synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter |
|
|
| synapse_http_httppusher_http_pushes_processed_total | synapse_http_httppusher_http_pushes_processed |
|
|
| synapse_http_httppusher_http_pushes_failed_total | synapse_http_httppusher_http_pushes_failed |
|
|
| synapse_http_httppusher_badge_updates_processed_total | synapse_http_httppusher_badge_updates_processed |
|
|
| synapse_http_httppusher_badge_updates_failed_total | synapse_http_httppusher_badge_updates_failed |
|
|
|
|
Removal of deprecated metrics & time based counters becoming histograms in 0.31.0
|
|
---------------------------------------------------------------------------------
|
|
|
|
The duplicated metrics deprecated in Synapse 0.27.0 have been removed.
|
|
|
|
All time duration-based metrics have been changed to be seconds. This
|
|
affects:
|
|
|
|
| msec -> sec metrics |
|
|
| -------------------------------------- |
|
|
| python_gc_time |
|
|
| python_twisted_reactor_tick_time |
|
|
| synapse_storage_query_time |
|
|
| synapse_storage_schedule_time |
|
|
| synapse_storage_transaction_time |
|
|
|
|
Several metrics have been changed to be histograms, which sort entries
|
|
into buckets and allow better analysis. The following metrics are now
|
|
histograms:
|
|
|
|
| Altered metrics |
|
|
| ------------------------------------------------ |
|
|
| python_gc_time |
|
|
| python_twisted_reactor_pending_calls |
|
|
| python_twisted_reactor_tick_time |
|
|
| synapse_http_server_response_time_seconds |
|
|
| synapse_storage_query_time |
|
|
| synapse_storage_schedule_time |
|
|
| synapse_storage_transaction_time |
|
|
|
|
Block and response metrics renamed for 0.27.0
|
|
---------------------------------------------
|
|
|
|
Synapse 0.27.0 begins the process of rationalising the duplicate
|
|
`*:count` metrics reported for the resource tracking for code blocks and
|
|
HTTP requests.
|
|
|
|
At the same time, the corresponding `*:total` metrics are being renamed,
|
|
as the `:total` suffix no longer makes sense in the absence of a
|
|
corresponding `:count` metric.
|
|
|
|
To enable a graceful migration path, this release just adds new names
|
|
for the metrics being renamed. A future release will remove the old
|
|
ones.
|
|
|
|
The following table shows the new metrics, and the old metrics which
|
|
they are replacing.
|
|
|
|
| New name | Old name |
|
|
| ------------------------------------------------------------- | ---------------------------------------------------------- |
|
|
| synapse_util_metrics_block_count | synapse_util_metrics_block_timer:count |
|
|
| synapse_util_metrics_block_count | synapse_util_metrics_block_ru_utime:count |
|
|
| synapse_util_metrics_block_count | synapse_util_metrics_block_ru_stime:count |
|
|
| synapse_util_metrics_block_count | synapse_util_metrics_block_db_txn_count:count |
|
|
| synapse_util_metrics_block_count | synapse_util_metrics_block_db_txn_duration:count |
|
|
| synapse_util_metrics_block_time_seconds | synapse_util_metrics_block_timer:total |
|
|
| synapse_util_metrics_block_ru_utime_seconds | synapse_util_metrics_block_ru_utime:total |
|
|
| synapse_util_metrics_block_ru_stime_seconds | synapse_util_metrics_block_ru_stime:total |
|
|
| synapse_util_metrics_block_db_txn_count | synapse_util_metrics_block_db_txn_count:total |
|
|
| synapse_util_metrics_block_db_txn_duration_seconds | synapse_util_metrics_block_db_txn_duration:total |
|
|
| synapse_http_server_response_count | synapse_http_server_requests |
|
|
| synapse_http_server_response_count | synapse_http_server_response_time:count |
|
|
| synapse_http_server_response_count | synapse_http_server_response_ru_utime:count |
|
|
| synapse_http_server_response_count | synapse_http_server_response_ru_stime:count |
|
|
| synapse_http_server_response_count | synapse_http_server_response_db_txn_count:count |
|
|
| synapse_http_server_response_count | synapse_http_server_response_db_txn_duration:count |
|
|
| synapse_http_server_response_time_seconds | synapse_http_server_response_time:total |
|
|
| synapse_http_server_response_ru_utime_seconds | synapse_http_server_response_ru_utime:total |
|
|
| synapse_http_server_response_ru_stime_seconds | synapse_http_server_response_ru_stime:total |
|
|
| synapse_http_server_response_db_txn_count | synapse_http_server_response_db_txn_count:total |
|
|
| synapse_http_server_response_db_txn_duration_seconds | synapse_http_server_response_db_txn_duration:total |
|
|
|
|
Standard Metric Names
|
|
---------------------
|
|
|
|
As of synapse version 0.18.2, the format of the process-wide metrics has
|
|
been changed to fit prometheus standard naming conventions. Additionally
|
|
the units have been changed to seconds, from miliseconds.
|
|
|
|
| New name | Old name |
|
|
| ---------------------------------------- | --------------------------------- |
|
|
| process_cpu_user_seconds_total | process_resource_utime / 1000 |
|
|
| process_cpu_system_seconds_total | process_resource_stime / 1000 |
|
|
| process_open_fds (no \'type\' label) | process_fds |
|
|
|
|
The python-specific counts of garbage collector performance have been
|
|
renamed.
|
|
|
|
| New name | Old name |
|
|
| -------------------------------- | -------------------------- |
|
|
| python_gc_time | reactor_gc_time |
|
|
| python_gc_unreachable_total | reactor_gc_unreachable |
|
|
| python_gc_counts | reactor_gc_counts |
|
|
|
|
The twisted-specific reactor metrics have been renamed.
|
|
|
|
| New name | Old name |
|
|
| -------------------------------------- | ----------------------- |
|
|
| python_twisted_reactor_pending_calls | reactor_pending_calls |
|
|
| python_twisted_reactor_tick_time | reactor_tick_time |
|