Commit graph

18020 commits

Author SHA1 Message Date
Richard van der Hoff db13a8607e Merge remote-tracking branch 'origin/develop' into matrix-org-hotfixes 2020-10-01 11:22:36 +01:00
Richard van der Hoff cfb3096e33 Revert federation-transaction-transmission backoff hacks
This reverts b852a8247, 15b2a5081, 28889d8da.

I don't think these patches are required any more, and if they are, they should
be on mainline, not hidden in our hotfixes branch. Let's try backing them out:
if that turns out to be an error, we can PR them properly.
2020-10-01 11:22:19 +01:00
Richard van der Hoff c1ef579b63
Add prometheus metrics to track federation delays (#8430)
Add a pair of federation metrics to track the delays in sending PDUs to/from 
particular servers.
2020-10-01 11:09:12 +01:00
Erik Johnston 7941372ec8
Make token serializing/deserializing async (#8427)
The idea is that in future tokens will encode a mapping of instance to position. However, we don't want to include the full instance name in the string representation, so instead we'll have a mapping between instance name and an immutable integer ID in the DB that we can use instead. We'll then do the lookup when we serialize/deserialize the token (we could alternatively pass around an `Instance` type that includes both the name and ID, but that turns out to be a lot more invasive).
2020-09-30 20:29:19 +01:00
Richard van der Hoff a0a1ba6973
Merge pull request #8425 from matrix-org/rav/extremity_metrics
Add an improved "forward extremities" metric
2020-09-30 19:33:27 +01:00
Patrick Cloke 8b40843392
Allow additional SSO properties to be passed to the client (#8413) 2020-09-30 13:02:43 -04:00
Richard van der Hoff 32acab3fa2 changelog 2020-09-30 16:49:15 +01:00
Richard van der Hoff 20e7c4de26 Add an improved "forward extremities" metric
Hopefully, N(extremities) * N(state_events) is a more realistic approximation
to "how big a problem is this room?".
2020-09-30 16:49:15 +01:00
Richard van der Hoff 6d2d42f8fb Rewrite BucketCollector
This was a bit unweildy for what I wanted: in particular, I wanted to assign
each measurement straight into a bucket, rather than storing an intermediate
Counter which didn't do any bucketing at all.

I've replaced it with something that is hopefully a bit easier to use.

(I'm not entirely sure what the difference between a HistogramMetricFamily and
a GaugeHistogramMetricFamily is, but given our counters can go down as well as
up the latter *sounds* more accurate?)
2020-09-30 16:49:15 +01:00
Richard van der Hoff 1c8ca2c543 Fix _exposition.py to stop stripping samples
Our hacked-up `_exposition.py` was stripping out some samples it shouldn't
have been. Put them back in, to more closely match the upstream
`exposition.py`.
2020-09-30 16:45:43 +01:00
Richard van der Hoff ceafb5a1c6
Drop support for ancient prometheus_client (#8426)
Drop compatibility hacks for prometheus-client pre 0.4.0. Debian stretch and
Fedora 31 both have newer versions, so hopefully this will be ok.
2020-09-30 16:42:05 +01:00
Richard van der Hoff c429dfc300
Merge pull request #8420 from matrix-org/rav/state_res_stats
Report metrics on expensive rooms for state res
2020-09-30 10:37:52 +01:00
Erik Johnston ea70f1c362
Various clean ups to room stream tokens. (#8423) 2020-09-29 21:48:33 +01:00
Aaron Raimist 8238b55e08
Update description of server_name config option (#8415) 2020-09-29 13:50:25 -04:00
Richard van der Hoff d4274dd17e changelog 2020-09-29 17:35:20 +01:00
Richard van der Hoff 057f04fa9f Report state res metrics to Prometheus and log 2020-09-29 17:35:20 +01:00
Richard van der Hoff 8412c08a87 Move Measure calls into resolve_events_with_store 2020-09-29 17:35:20 +01:00
Richard van der Hoff ba700074c6 Expose a get_resource_usage method in Measure 2020-09-29 17:35:20 +01:00
Richard van der Hoff 937393abd8 Move resolve_events_with_store into StateResolutionHandler 2020-09-29 17:35:20 +01:00
Will Hunt c2bdf040aa
Discard an empty upload_name before persisting an uploaded file (#7905) 2020-09-29 12:15:27 -04:00
Andrew Morgan e154f7ccb5
Don't check whether a 3pid is allowed to register during password reset (#8414)
* Don't check whether a 3pid is allowed to register during password reset

This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.

* Changelog
2020-09-29 16:42:25 +01:00
Erik Johnston b1433bf231
Don't table scan events on worker startup (#8419)
* Fix table scan of events on worker startup.

This happened because we assumed "new" writers had an initial stream
position of 0, so the replication code tried to fetch all events written
by the instance between 0 and the current position.

Instead, set the initial position of new writers to the current
persisted up to position, on the assumption that new writers won't have
written anything before that point.

* Consider old writers coming back as "new".

Otherwise we'd try and fetch entries between the old stale token and the
current position, even though it won't have written any rows.

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2020-09-29 16:42:19 +01:00
Richard van der Hoff 2649d545a5
Mypy fixes for synapse.handlers.federation (#8422)
For some reason, an apparently unrelated PR upset mypy about this module. Here are a number of little fixes.
2020-09-29 15:57:36 +01:00
Andrew Morgan f43c66d23b Merge branch 'develop' of github.com:matrix-org/synapse into anoa/info-mainline-no-check-password-reset 2020-09-29 14:21:41 +01:00
Andrew Morgan 12f0d18611
Add support for running Complement against the local checkout (#8317)
This PR adds a script that:

* Builds the local Synapse checkout using our existing `docker/Dockerfile` image.
* Downloads [Complement](https://github.com/matrix-org/complement/)'s source code.
* Builds the [Synapse.Dockerfile](https://github.com/matrix-org/complement/blob/master/dockerfiles/Synapse.Dockerfile) using the above dockerfile as a base.
* Builds and runs Complement against it.

This set up differs slightly from [that of the dendrite repo](https://github.com/matrix-org/dendrite/blob/master/build/scripts/complement.sh) (`complement.sh`, `Complement.Dockerfile`), which instead stores a separate, but slightly modified, dockerfile in Dendrite's repo rather than running the one stored in Complement's repo. That synapse equivalent to that dockerfile (`Synapse.Dockerfile`) in Complement's repo is just based on top of `matrixdotorg/synapse:latest`, which we opt to build here locally.

Thus copying over the files from Complement's repo wouldn't change any functionality, and would result in two instances of the same files. So just using the dockerfile in Complement's repo was decided upon instead.
2020-09-29 13:47:47 +01:00
Will Hunt 8676d8ab2e
Filter out appservices from mau count (#8404)
This is an attempt to fix #8403.
2020-09-29 13:11:02 +01:00
Andrew Morgan 1c6b8752b8
Only assert valid next_link params when provided (#8417)
Broken in https://github.com/matrix-org/synapse/pull/8275 and has yet to be put in a release. Fixes https://github.com/matrix-org/synapse/issues/8418.

`next_link` is an optional parameter. However, we were checking whether the `next_link` param was valid, even if it wasn't provided. In that case, `next_link` was `None`, which would clearly not be a valid URL.

This would prevent password reset and other operations if `next_link` was not provided, and the `next_link_domain_whitelist` config option was set.
2020-09-29 12:36:44 +01:00
Richard van der Hoff 866c84da8d
Add metrics to track success/otherwise of replication requests (#8406)
One hope is that this might provide some insights into #3365.
2020-09-29 11:06:11 +01:00
Richard van der Hoff 1c262431f9
Fix handling of connection timeouts in outgoing http requests (#8400)
* Remove `on_timeout_cancel` from `timeout_deferred`

The `on_timeout_cancel` param to `timeout_deferred` wasn't always called on a
timeout (in particular if the canceller raised an exception), so it was
unreliable. It was also only used in one place, and to be honest it's easier to
do what it does a different way.

* Fix handling of connection timeouts in outgoing http requests

Turns out that if we get a timeout during connection, then a different
exception is raised, which wasn't always handled correctly.

To fix it, catch the exception in SimpleHttpClient and turn it into a
RequestTimedOutError (which is already a documented exception).

Also add a description to RequestTimedOutError so that we can see which stage
it failed at.

* Fix incorrect handling of timeouts reading federation responses

This was trapping the wrong sort of TimeoutError, so was never being hit.

The effect was relatively minor, but we should fix this so that it does the
expected thing.

* Fix inconsistent handling of `timeout` param between methods

`get_json`, `put_json` and `delete_json` were applying a different timeout to
the response body to `post_json`; bring them in line and test.

Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Erik Johnston <erik@matrix.org>
2020-09-29 10:29:21 +01:00
Andrew Morgan fe443acaee Changelog 2020-09-28 18:51:41 +01:00
Andrew Morgan d4605d1f16 Don't check whether a 3pid is allowed to register during password reset
This endpoint should only deal with emails that have already been approved, and
are attached with user's account. There's no need to re-check them here.
2020-09-28 18:46:59 +01:00
Erik Johnston bd380d942f
Add checks for postgres sequence consistency (#8402) 2020-09-28 18:00:30 +01:00
Richard van der Hoff 5e3ca12b15
Create a mechanism for marking tests "logcontext clean" (#8399) 2020-09-28 17:58:33 +01:00
Dagfinn Ilmari Mannsåker bd715e1278
Add ui_auth_sessions_ips table to synapse_port_db ignore list (#8410)
This table was created in #8034 (1.20.0).  It references
`ui_auth_sessions`, which is ignored, so this one should be too.

Signed-off-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
2020-09-28 15:35:02 +01:00
Richard van der Hoff 450ec48445
A pair of tiny cleanups in the federation request code. (#8401) 2020-09-28 13:15:00 +01:00
Matthew Hodgson 4b3a1faa08 typo 2020-09-28 00:23:35 +01:00
Patrick Cloke 31acc5c309
Escape the error description on the sso_error template. (#8405) 2020-09-25 11:05:54 -04:00
Richard van der Hoff fec6f9ac17
Fix occasional "Re-starting finished log context" from keyring (#8398)
* Fix test_verify_json_objects_for_server_awaits_previous_requests

It turns out that this wasn't really testing what it thought it was testing
(in particular, `check_context` was turning failures into success, which was
making the tests pass even though it wasn't clear they should have been.

It was also somewhat overcomplex - we can test what it was trying to test
without mocking out perspectives servers.

* Fix warnings about finished logcontexts in the keyring

We need to make sure that we finish the key fetching magic before we run the
verifying code, to ensure that we don't mess up our logcontexts.
2020-09-25 12:29:54 +01:00
Tdxdxoz abd04b6af0
Allow existing users to login via OpenID Connect. (#8345)
Co-authored-by: Benjamin Koch <bbbsnowball@gmail.com>

This adds configuration flags that will match a user to pre-existing users
when logging in via OpenID Connect. This is useful when switching to
an existing SSO system.
2020-09-25 07:01:45 -04:00
Erik Johnston 3e87d79e1c
Fix schema delta for servers that have not backfilled (#8396)
Fixes #8395.
2020-09-25 09:58:32 +01:00
Andrew Morgan c77c4a2fcd Merge branch 'master' into develop 2020-09-24 17:00:33 +01:00
Erik Johnston f112cfe5bb
Fix MultiWriteIdGenerator's handling of restarts. (#8374)
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.

We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
2020-09-24 16:53:51 +01:00
Andrew Morgan ab903e7337 s/URLs/variables in changelog 2020-09-24 16:35:31 +01:00
Andrew Morgan 271086ebda s/accidentally/incorrectly in changelog 2020-09-24 16:33:49 +01:00
Andrew Morgan 5ce5a9f144 Update changelog wording 2020-09-24 16:26:57 +01:00
Andrew Morgan 920dd1083e 1.20.1 2020-09-24 16:25:33 +01:00
Patrick Cloke f3e5c2e702 Mark the shadow_banned column as boolean in synapse_port_db. (#8386) 2020-09-24 16:24:24 +01:00
Andrew Morgan 3f4a2a7064
Hotfix: disable autoescape by default when rendering Jinja2 templates (#8394)
#8037 changed the default `autoescape` option when rendering Jinja2 templates from `False` to `True`. This caused some bugs, noticeably around redirect URLs being escaped in SAML2 auth confirmation templates, causing those URLs to break for users.

This change returns the previous behaviour as it stood. We may want to look at each template individually and see whether autoescaping is a good idea at some point, but for now lets just fix the breakage.
2020-09-24 16:24:08 +01:00
Richard van der Hoff 11c9e17738
Add type annotations to SimpleHttpClient (#8372) 2020-09-24 15:47:20 +01:00
Erik Johnston 6fdf577593
Add new sequences to port DB script (#8387) 2020-09-24 13:43:49 +01:00