Mitigate a race where /make_join could 403 for restricted rooms (#15080)

Previously, when creating a join event in /make_join, we would decide
whether to include additional fields to satisfy restricted room checks
based on the current state of the room. Then, when building the event,
we would capture the forward extremities of the room to use as prev
events.

This is subject to race conditions. For example, when leaving and
rejoining a room, the following sequence of events leads to a misleading
403 response:
1. /make_join reads the current state of the room and sees that the user
   is still in the room. It decides to omit the field required for
   restricted room joins.
2. The leave event is persisted and the room's forward extremities are
   updated.
3. /make_join builds the event, using the post-leave forward extremities.
   The event then fails the restricted room checks.

To mitigate the race, we move the read of the forward extremities closer
to the read of the current state. Ideally, we would compute the state
based off the chosen prev events, but that can involve state resolution,
which is expensive.

Signed-off-by: Sean Quah <seanq@matrix.org>
This commit is contained in:
Sean Quah 2023-02-17 09:40:32 +00:00 committed by GitHub
parent ad1f3fa8e1
commit 4f4f27e57f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 16 additions and 1 deletions

1
changelog.d/15080.bugfix Normal file
View file

@ -0,0 +1 @@
Reduce the likelihood of a rare race condition where rejoining a restricted room over federation would fail.

View file

@ -952,7 +952,20 @@ class FederationHandler:
#
# Note that this requires the /send_join request to come back to the
# same server.
prev_event_ids = None
if room_version.msc3083_join_rules:
# Note that the room's state can change out from under us and render our
# nice join rules-conformant event non-conformant by the time we build the
# event. When this happens, our validation at the end fails and we respond
# to the requesting server with a 403, which is misleading — it indicates
# that the user is not allowed to join the room and the joining server
# should not bother retrying via this homeserver or any others, when
# in fact we've just messed up with building the event.
#
# To reduce the likelihood of this race, we capture the forward extremities
# of the room (prev_event_ids) just before fetching the current state, and
# hope that the state we fetch corresponds to the prev events we chose.
prev_event_ids = await self.store.get_prev_events_for_room(room_id)
state_ids = await self._state_storage_controller.get_current_state_ids(
room_id
)
@ -994,7 +1007,8 @@ class FederationHandler:
event,
unpersisted_context,
) = await self.event_creation_handler.create_new_client_event(
builder=builder
builder=builder,
prev_event_ids=prev_event_ids,
)
except SynapseError as e:
logger.warning("Failed to create join to %s because %s", room_id, e)