0
0
Fork 1
mirror of https://mau.dev/maunium/synapse.git synced 2024-06-09 22:28:55 +02:00
Commit graph

15898 commits

Author SHA1 Message Date
Erik Johnston c900d18647
Fix OIDC login regression (#17031)
Requests may require a User-Agent header, and the change in #16972
accidentally removed it, resulting in requests getting rejected causing
login to fail.
2024-03-26 13:26:46 +00:00
Richard van der Hoff b5322b4daf
Ensure that pending to-device events are sent over federation at startup (#16925)
Fixes https://github.com/element-hq/synapse/issues/16680, as well as a
related bug, where servers which we had *never* successfully sent an
event to would not be retried.

In order to fix the case of pending to-device messages, we hook into the
existing `wake_destinations_needing_catchup` process, by extending it to
look for destinations that have pending to-device messages. The
federation transmission loop then attempts to send the pending to-device
messages as normal.
2024-03-22 13:24:11 +00:00
Mathieu Velten b7af076ab5
Add OIDC config to add extra parameters to the authorize URL (#16971) 2024-03-22 10:35:11 +00:00
SpiritCroc 9ad49e7ecf
Do not refuse to set read_marker if previous event_id is in wrong room (#16990) 2024-03-21 18:43:07 +00:00
Hanadi f7a3ebe44d
Fix reject knocks on deactivating account (#17010) 2024-03-21 18:05:54 +00:00
Mathieu Velten 3ab9e6d524
OIDC: try to JWT decode userinfo response if JSON parsing failed (#16972) 2024-03-21 17:49:44 +00:00
Shay cf5adc80e1
Update power level default for public rooms (#16907) 2024-03-19 17:55:31 +00:00
Shay 8fb5b0f335
Improve event validation (#16908)
As the title states.
2024-03-19 17:52:53 +00:00
Mathieu Velten 74ab329eaa
Pass module API to OIDC mapping provider (#16974)
As done for SAML mapping provider, let's pass the module API to the OIDC
one so the mapper can do more logic in its code.
2024-03-19 17:20:10 +00:00
Richard van der Hoff 9635822cc1
Clarify docs for some room state functions (#16950)
State *before* an event is different to state *after* that event, and
people tend to assume the wrong one.
2024-03-19 17:16:37 +00:00
Richard van der Hoff 52f456a822
/sync: Fix edge-case in calculating the "device_lists" response (#16949)
Fixes https://github.com/element-hq/synapse/issues/16948. If the `join`
and the `leave` are in the same sync response, we need to count them as
a "left" user.
2024-03-14 17:34:19 +00:00
Richard van der Hoff 6d5bafb2c8
Split up SyncHandler.compute_state_delta (#16929)
This is a huge method, which melts my brain.

This is a non-functional change which lays some groundwork for future
work in this area.
2024-03-14 17:18:48 +00:00
Mathieu Velten cb562d73aa
Improve lock performance when a lot of locks are waiting (#16840)
When a lot of locks are waiting for a single lock, notifying all locks
independently with `call_later` on each release is really costly and
incurs some kind of async contention, where the CPU is spinning a lot
for not much.

The included test is taking around 30s before the change, and 0.5s
after.

It was found following failing tests with
https://github.com/element-hq/synapse/pull/16827.
2024-03-14 13:49:54 +00:00
dependabot[bot] 9b5eef95ad
Bump ruff from 0.1.14 to 0.3.2 (#16994) 2024-03-13 17:06:23 +00:00
dependabot[bot] e161103b46
Bump mypy from 1.5.1 to 1.8.0 (#16901) 2024-03-13 17:05:57 +00:00
dependabot[bot] 1e68b56a62
Bump black from 23.10.1 to 24.2.0 (#16936) 2024-03-13 16:46:44 +00:00
Gerrit Gogel 1f88790764
Prevent locking up while processing batched_auth_events (#16968)
This PR aims to fix #16895, caused by a regression in #7 and not fixed
by #16903. The PR #16903 only fixes a starvation issue, where the CPU
isn't released. There is a second issue, where the execution is blocked.
This theory is supported by the flame graphs provided in #16895 and the
fact that I see the CPU usage reducing and far below the limit.

Since the changes in #7, the method `check_state_independent_auth_rules`
is called with the additional parameter `batched_auth_events`:


6fa13b4f92/synapse/handlers/federation_event.py (L1741-L1743)


It makes the execution enter this if clause, introduced with #15195


6fa13b4f92/synapse/event_auth.py (L178-L189)

There are two issues in the above code snippet.

First, there is the blocking issue. I'm not entirely sure if this is a
deadlock, starvation, or something different. In the beginning, I
thought the copy operation was responsible. It wasn't. Then I
investigated the nested `store.get_events` inside the function `update`.
This was also not causing the blocking issue. Only when I replaced the
set difference operation (`-` ) with a list comprehension, the blocking
was resolved. Creating and comparing sets with a very large amount of
events seems to be problematic.

This is how the flamegraph looks now while persisting outliers. As you
can see, the execution no longer locks up in the above function.

![output_2024-02-28_13-59-40](https://github.com/element-hq/synapse/assets/13143850/6db9c9ac-484f-47d0-bdde-70abfbd773ec)

Second, the copying here doesn't serve any purpose, because only a
shallow copy is created. This means the same objects from the original
dict are referenced. This fails the intention of protecting these
objects from mutation. The review of the original PR
https://github.com/matrix-org/synapse/pull/15195 had an extensive
discussion about this matter.

Various approaches to copying the auth_events were attempted:
1) Implementing a deepcopy caused issues due to
builtins.EventInternalMetadata not being pickleable.
2) Creating a dict with new objects akin to a deepcopy.
3) Creating a dict with new objects containing only necessary
attributes.

Concluding, there is no easy way to create an actual copy of the
objects. Opting for a deepcopy can significantly strain memory and CPU
resources, making it an inefficient choice. I don't see why the copy is
necessary in the first place. Therefore I'm proposing to remove it
altogether.

After these changes, I was able to successfully join these rooms,
without the main worker locking up:
- #synapse:matrix.org
- #element-android:matrix.org
- #element-web:matrix.org
- #ecips:matrix.org
- #ipfs-chatter:ipfs.io
- #python:matrix.org
- #matrix:matrix.org
2024-03-12 15:07:36 +00:00
Alexander Fechler 48f59d3806
deactivated flag refactored to filter deactivated users. (#16874)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-03-11 16:08:04 +00:00
Patrick Cloke 696cc9e802
Stabilize support for Retry-After header (MSC4014) (#16947) 2024-03-08 09:33:46 +00:00
Quentin Gliech 4af33015af
Fix joining remote rooms when a on_new_event callback is registered (#16973)
Since Synapse 1.76.0, any module which registers a `on_new_event`
callback would brick the ability to join remote rooms.
This is because this callback tried to get the full state of the room,
which would end up in a deadlock.

Related:
https://github.com/matrix-org/synapse-auto-accept-invite/issues/18

The following module would brick the ability to join remote rooms:

```python
from typing import Any, Dict, Literal, Union
import logging

from synapse.module_api import ModuleApi, EventBase

logger = logging.getLogger(__name__)

class MyModule:
    def __init__(self, config: None, api: ModuleApi):
        self._api = api
        self._config = config

        self._api.register_third_party_rules_callbacks(
            on_new_event=self.on_new_event,
        )

    async def on_new_event(self, event: EventBase, _state_map: Any) -> None:
        logger.info(f"Received new event: {event}")

    @staticmethod
    def parse_config(_config: Dict[str, Any]) -> None:
        return None
```

This is technically a breaking change, as we are now passing partial
state on the `on_new_event` callback.
However, this callback was broken for federated rooms since 1.76.0, and
local rooms have full state anyway, so it's unlikely that it would
change anything.
2024-03-06 16:00:20 +01:00
Andrew Morgan 8a05304222
Revert "Improve DB performance of calculating badge counts for push. (#16756)" (#16979) 2024-03-05 12:27:27 +00:00
Erik Johnston cdbbf3653d
Don't lock up when joining large rooms (#16903)
Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
2024-02-20 14:29:18 +00:00
kegsay c51a2240d1
bugfix: always prefer unthreaded receipt when >1 exist (MSC4102) (#16927)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-02-20 14:12:06 +00:00
Remi Rampin 0621e8eb0e
Add metric for emails sent (#16881)
This adds a counter `synapse_emails_sent_total` for emails sent. They
are broken down by `type`, which are `password_reset`, `registration`,
`add_threepid`, `notification` (matching the methods of `Mailer`).
2024-02-14 15:30:03 +00:00
Erik Johnston 7b4d7429f8
Don't invalidate the entire event cache when we purge history (#16905)
We do this by adding support to the LRU cache for "extra indices" based
on the cached value. This allows us to efficiently map from room ID to
the cached events and only invalidate those.
2024-02-13 13:24:11 +00:00
Erik Johnston 01910b981f
Add a config to not send out device list updates for specific users (#16909)
List of users not to send out device list updates for when they register
new devices. This is useful to handle bot accounts.

This is undocumented as its mostly a hack to test on matrix.org.

Note: This will still send out device list updates if the device is
later updated, e.g. end to end keys are added.
2024-02-13 13:23:03 +00:00
Erik Johnston ea1b30940e Merge remote-tracking branch 'origin/release-v1.101' into develop 2024-02-09 10:52:35 +00:00
Erik Johnston bfa93d1d3b
Only do one concurrent fetch per server in keyring (#16894)
Otherwise if we've stacked a bunch of requests for the keys of a server,
we'll end up sending lots of concurrent requests for its keys,
needlessly.
2024-02-09 10:51:11 +00:00
Erik Johnston 02a147039c
Increase batching when fetching auth chains (#16893)
This basically reverts a change that was in
https://github.com/element-hq/synapse/pull/16833, where we reduced the
batching.

The smaller batching can cause performance issues on busy servers and
databases.
2024-02-09 10:51:00 +00:00
David Baker 71ca199165
Accept unprefixed form of MSC3981 recurse parameter (#16842)
Now that the MSC3981 has passed FCP
2024-02-06 09:48:39 +00:00
dependabot[bot] 871f51c270
Bump lxml-stubs from 0.4.0 to 0.5.1 (#16885) 2024-02-06 09:29:17 +00:00
Erik Johnston adf15c4f6b
Run ANALYZE after fiddling with stats (#16849)
Introduced in #16833

Fixes #16844
2024-01-24 13:57:12 +00:00
Erik Johnston c925b45567
Speed up e2e device keys queries for bot accounts (#16841)
This helps with bot accounts with lots of non-e2e devices.

The change is basically to change the order of the join for the case of
using `INNER JOIN`
2024-01-23 11:37:16 +00:00
Erik Johnston 23740eaa3d
Correctly mention previous copyright (#16820)
During the migration the automated script to update the copyright
headers accidentally got rid of some of the existing copyright lines.
Reinstate them.
2024-01-23 11:26:48 +00:00
Erik Johnston 14c725f73b
Preparatory work for tweaking performance of auth chain lookups (#16833) 2024-01-23 11:26:27 +00:00
Shay a68b48a5dd
Allow room creation but not publishing to continue if room publication rules are violated when creating a new room. (#16811)
Prior to this PR, if a request to create a public (public as in
published to the rooms directory) room violated the room list
publication rules set in the
[config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#room_list_publication_rules),
the request to create the room was denied and the room was not created.

This PR changes the behavior such that when a request to create a room
published to the directory violates room list publication rules, the
room is still created but the room is not published to the directory.
2024-01-22 13:59:45 +00:00
Mo Balaa b99f6db039
Handle wildcard type filters properly (#14984) 2024-01-22 10:46:30 +00:00
Hanadi 42e1aaea68
feat: add msc4028 to versions api (#16787)
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-01-16 14:36:08 +00:00
Erik Johnston c43f751013
Optimize query for fetching to-device messages in /sync (#16805)
The current query supports passing in a list of users, which generates a
query using `user_id = ANY(..)`. This is generates a less efficient
query plan that is notably slower than a simple `user_id = ?` condition.

Note: The new function is mostly a copy and paste and then a
simplification of the existing function.
2024-01-11 13:37:57 +00:00
Erik Johnston b11f7b5122
Improve DB performance of calculating badge counts for push. (#16756)
The crux of the change is to try and make the queries simpler and pull
out fewer rows. Before, there were quite a few joins against subqueries,
which caused postgres to pull out more rows than necessary.

Instead, let's simplify the query and do some of the filtering out in
Python instead, letting Postgres do better optimizations now that it
doesn't have to deal with joins against subqueries.

Review note: this is a complete rewrite of the function, so not sure how
useful the diff is.

---------

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
2024-01-11 11:52:13 +00:00
Erik Johnston a986f86c82
Correctly handle OIDC config with no client_secret set (#16806)
In previous versions of authlib using `client_secret_basic` without a
`client_secret` would result in an invalid auth header. Since authlib
1.3 it throws an exception.

The configuration may be accepted in by very lax servers, so we don't
want to deny it outright. Instead, let's default the
`client_auth_method` to `none`, which does the right thing. If the
config specifies `client_auth_method` and no `client_secret` then that
is going to be bogus and we should reject it
2024-01-10 17:16:49 +00:00
Erik Johnston cbe8a80d10
Faster load recents for sync (#16783)
This hopefully reduces the amount of state we need to keep in memory
2024-01-10 15:11:59 +00:00
Erik Johnston 0a96fa52a2
Pull less state out if we fail to backfill (#16788)
Sometimes we fail to fetch events during backfill due to missing state,
and we often end up querying the same bad events periodically (as people
backpaginate). In such cases its likely we will continue to fail to get
the state, and therefore we should try *before* loading the state that
we have from the DB (as otherwise it's wasted DB and memory).

---------

Co-authored-by: reivilibre <oliverw@matrix.org>
2024-01-10 14:42:13 +00:00
Erik Johnston 578c5c736e
Reduce amount of state pulled out when querying federation hierachy (#16785)
There are two changes here:

1. Only pull out the required state when handling the request.
2. Change the get filtered state return type to check that we're only
querying state that was requested

---------

Co-authored-by: reivilibre <oliverw@matrix.org>
2024-01-10 14:31:35 +00:00
Erik Johnston 4c67f0391b
Split up deleting devices into batches (#16766)
Otherwise for users with large numbers of devices this can cause a lot
of woe.
2024-01-10 13:55:16 +00:00
Erik Johnston c3f2f0f063
Faster partial join to room with complex auth graph (#7)
Instead of persisting outliers in a bunch of batches, let's just do them
all at once.

This is fine because all `_auth_and_persist_outliers_inner` is doing is
checking the auth rules for each event, which requires the events to be
topologically sorted by the auth graph.
2024-01-10 12:29:42 +00:00
reivilibre a83a337c4d
Filter out rooms from the room directory being served to other homeservers when those rooms block that homeserver by their Access Control Lists. (#16759)
The idea here being that the directory server shouldn't advertise rooms
to a requesting server is the requesting server would not be allowed to
join or participate in the room.

<!--
Fixes: # <!-- -->
<!--
Supersedes: # <!-- -->
<!--
Follows: # <!-- -->
<!--
Part of: # <!-- -->
Base: `develop` <!-- git-stack-base-branch:develop -->

<!--
This pull request is commit-by-commit review friendly. <!-- -->
<!--
This pull request is intended for commit-by-commit review. <!-- -->

Original commit schedule, with full messages:

<ol>
<li>

Pass `from_federation_origin` down into room list retrieval code 

</li>
<li>

Don't cache /publicRooms response for inbound federated requests 

</li>
<li>

fixup! Don't cache /publicRooms response for inbound federated requests 

</li>
<li>

Cap the number of /publicRooms entries to 100 

</li>
<li>

Simplify code now that you can't request unlimited rooms 

</li>
<li>

Filter out rooms from federated requests that don't have the correct ACL

</li>
<li>

Request a handful more when filtering ACLs so that we can try to avoid
shortchanging the requester

</li>
</ol>

---------

Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
2024-01-08 17:24:20 +00:00
Erik Johnston 5d3850b038
Port EventInternalMetadata class to Rust (#16782)
There are a couple of things we need to be careful of here:

1. The current python code does no validation when loading from the DB,
so we need to be careful to ignore such errors (at least on jki.re there
are some old events with internal metadata fields of the wrong type).
2. We want to be memory efficient, as we often have many hundreds of
thousands of events in the cache at a time.

---------

Co-authored-by: Quentin Gliech <quenting@element.io>
2024-01-08 14:06:48 +00:00
Erik Johnston 81b1c56288
Fix linting (#16780)
Introduced in #16762
2024-01-05 13:29:00 +00:00
Erik Johnston 7469fa7585
Simplify internal metadata class. (#16762)
We remove these fields as they're just duplicating data the event
already stores, and (for reasons 🤫) I'd like to simplify
the class to only store simple types.

I'm not entirely convinced that we shouldn't instead add helper methods
to the event class to generate stream tokens, but I don't really think
that's where they belong either
2024-01-05 13:03:20 +00:00