Drop synapse_storage_transaction_time_bucket

This particular metric has a much too high cardinality due to the fact that the desc label can have (at present) 248 values. This results in over 3k series per Synapse. If you have a Prometheus instance that monitors multiple Synpase instances it results in a huge amount of additional series to ingest. The metric in question is also not used in the Synapse dashboard and the core team has indicated they're happy to drop this metric entirely. Fixes #11081 Signed-off-by: Daniele Sluijters <daenney@users.noreply.github.com>
2021-10-19 15:54:35 +02:00 · 2021-10-19 15:54:35 +02:00 · e08a76f3a8
parent a6c318735d
commit e08a76f3a8
2 changed files with 1 additions and 2 deletions
--- a/changelog.d/11124.removal
+++ b/changelog.d/11124.removal
@ -0,0 +1 @@
+Remove the `synapse_storage_transaction_time_bucket` metric due to the high cardinality of the metric putting undue strain on Prometheus deployments. This metric is not used in Synapse's included Grafana dashboards.
--- a/synapse/storage/database.py
+++ b/synapse/storage/database.py
@ -64,7 +64,6 @@ perf_logger = logging.getLogger("synapse.storage.TIME")
 sql_scheduling_timer = Histogram("synapse_storage_schedule_time", "sec")

 sql_query_timer = Histogram("synapse_storage_query_time", "sec", ["verb"])
-sql_txn_timer = Histogram("synapse_storage_transaction_time", "sec", ["desc"])


 # Unique indexes which have been added in background updates. Maps from table name
@ -639,7 +638,6 @@ class DatabasePool:

            self._current_txn_total_time += duration
            self._txn_perf_counters.update(desc, duration)
-            sql_txn_timer.labels(desc).observe(duration)

    async def runInteraction(
        self,
				`@ -0,0 +1 @@`
				Remove the `synapse_storage_transaction_time_bucket` metric due to the high cardinality of the metric putting undue strain on Prometheus deployments. This metric is not used in Synapse's included Grafana dashboards.