2014-09-12 17:44:07 +02:00
|
|
|
#
|
2023-11-21 21:29:58 +01:00
|
|
|
# This file is licensed under the Affero General Public License (AGPL) version 3.
|
|
|
|
#
|
2024-01-23 12:26:48 +01:00
|
|
|
# Copyright 2019 The Matrix.org Foundation C.I.C.
|
|
|
|
# Copyright 2014-2016 OpenMarket Ltd
|
2023-11-21 21:29:58 +01:00
|
|
|
# Copyright (C) 2023 New Vector, Ltd
|
|
|
|
#
|
|
|
|
# This program is free software: you can redistribute it and/or modify
|
|
|
|
# it under the terms of the GNU Affero General Public License as
|
|
|
|
# published by the Free Software Foundation, either version 3 of the
|
|
|
|
# License, or (at your option) any later version.
|
|
|
|
#
|
|
|
|
# See the GNU Affero General Public License for more details:
|
|
|
|
# <https://www.gnu.org/licenses/agpl-3.0.html>.
|
|
|
|
#
|
|
|
|
# Originally licensed under the Apache License, Version 2.0:
|
|
|
|
# <http://www.apache.org/licenses/LICENSE-2.0>.
|
|
|
|
#
|
|
|
|
# [This file includes modifications made by New Vector Limited]
|
2014-09-12 17:44:07 +02:00
|
|
|
#
|
|
|
|
#
|
2024-07-17 20:10:15 +02:00
|
|
|
import logging
|
2023-10-26 19:01:36 +02:00
|
|
|
from typing import List, Optional, Tuple, cast
|
|
|
|
|
2022-04-05 14:54:41 +02:00
|
|
|
from twisted.test.proto_helpers import MemoryReactor
|
2014-09-12 17:44:07 +02:00
|
|
|
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
from synapse.api.constants import EventContentFields, EventTypes, JoinRules, Membership
|
2024-07-17 20:10:15 +02:00
|
|
|
from synapse.api.room_versions import RoomVersions
|
|
|
|
from synapse.rest import admin
|
2019-09-06 17:45:51 +02:00
|
|
|
from synapse.rest.admin import register_servlets_for_client_rest_resource
|
2024-07-17 20:10:15 +02:00
|
|
|
from synapse.rest.client import knock, login, room
|
2022-04-05 14:54:41 +02:00
|
|
|
from synapse.server import HomeServer
|
2024-07-17 20:10:15 +02:00
|
|
|
from synapse.storage.databases.main.roommember import extract_heroes_from_room_summary
|
|
|
|
from synapse.storage.roommember import MemberSummary
|
2020-10-22 11:11:06 +02:00
|
|
|
from synapse.types import UserID, create_requester
|
2022-04-05 14:54:41 +02:00
|
|
|
from synapse.util import Clock
|
2014-09-12 17:44:07 +02:00
|
|
|
|
2018-07-09 08:09:20 +02:00
|
|
|
from tests import unittest
|
2021-12-21 17:12:05 +01:00
|
|
|
from tests.server import TestHomeServer
|
2022-08-17 11:42:01 +02:00
|
|
|
from tests.test_utils import event_injection
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
from tests.test_utils.event_injection import create_event
|
2024-07-17 20:10:15 +02:00
|
|
|
from tests.unittest import skip_unless
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
2014-12-10 19:00:57 +01:00
|
|
|
|
2014-09-12 17:44:07 +02:00
|
|
|
|
2019-09-06 17:45:51 +02:00
|
|
|
class RoomMemberStoreTestCase(unittest.HomeserverTestCase):
|
|
|
|
servlets = [
|
|
|
|
login.register_servlets,
|
|
|
|
register_servlets_for_client_rest_resource,
|
|
|
|
room.register_servlets,
|
|
|
|
]
|
|
|
|
|
2022-05-23 13:23:26 +02:00
|
|
|
def prepare(self, reactor: MemoryReactor, clock: Clock, hs: TestHomeServer) -> None: # type: ignore[override]
|
2014-09-12 17:44:07 +02:00
|
|
|
# We can't test the RoomMemberStore on its own without the other event
|
|
|
|
# storage logic
|
2022-02-23 12:04:02 +01:00
|
|
|
self.store = hs.get_datastores().main
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
self.state_handler = self.hs.get_state_handler()
|
|
|
|
persistence = self.hs.get_storage_controllers().persistence
|
|
|
|
assert persistence is not None
|
|
|
|
self.persistence = persistence
|
2014-09-12 17:44:07 +02:00
|
|
|
|
2019-09-06 17:45:51 +02:00
|
|
|
self.u_alice = self.register_user("alice", "pass")
|
|
|
|
self.t_alice = self.login("alice", "pass")
|
|
|
|
self.u_bob = self.register_user("bob", "pass")
|
2014-09-12 17:44:07 +02:00
|
|
|
|
2014-09-15 16:00:14 +02:00
|
|
|
# User elsewhere on another host
|
2015-01-23 12:47:15 +01:00
|
|
|
self.u_charlie = UserID.from_string("@charlie:elsewhere")
|
2014-09-15 16:00:14 +02:00
|
|
|
|
2022-04-05 14:54:41 +02:00
|
|
|
def test_one_member(self) -> None:
|
2019-09-06 17:45:51 +02:00
|
|
|
# Alice creates the room, and is automatically joined
|
|
|
|
self.room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)
|
|
|
|
|
|
|
|
rooms_for_user = self.get_success(
|
2020-01-15 15:59:33 +01:00
|
|
|
self.store.get_rooms_for_local_user_where_membership_is(
|
2019-09-06 17:45:51 +02:00
|
|
|
self.u_alice, [Membership.JOIN]
|
|
|
|
)
|
2014-09-12 17:44:07 +02:00
|
|
|
)
|
2019-07-23 17:55:45 +02:00
|
|
|
|
2022-02-28 13:12:29 +01:00
|
|
|
self.assertEqual([self.room], [m.room_id for m in rooms_for_user])
|
2019-09-06 17:45:51 +02:00
|
|
|
|
2022-04-05 14:54:41 +02:00
|
|
|
def test_count_known_servers(self) -> None:
|
2019-09-06 17:45:51 +02:00
|
|
|
"""
|
|
|
|
_count_known_servers will calculate how many servers are in a room.
|
|
|
|
"""
|
|
|
|
self.room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)
|
|
|
|
self.inject_room_member(self.room, self.u_bob, Membership.JOIN)
|
|
|
|
self.inject_room_member(self.room, self.u_charlie.to_string(), Membership.JOIN)
|
|
|
|
|
|
|
|
servers = self.get_success(self.store._count_known_servers())
|
|
|
|
self.assertEqual(servers, 2)
|
|
|
|
|
2022-04-05 14:54:41 +02:00
|
|
|
def test_count_known_servers_stat_counter_disabled(self) -> None:
|
2019-09-06 17:45:51 +02:00
|
|
|
"""
|
|
|
|
If enabled, the metrics for how many servers are known will be counted.
|
|
|
|
"""
|
|
|
|
self.assertTrue("_known_servers_count" not in self.store.__dict__.keys())
|
|
|
|
|
|
|
|
self.room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)
|
|
|
|
self.inject_room_member(self.room, self.u_bob, Membership.JOIN)
|
|
|
|
self.inject_room_member(self.room, self.u_charlie.to_string(), Membership.JOIN)
|
|
|
|
|
2020-08-27 12:39:53 +02:00
|
|
|
self.pump()
|
2019-09-06 17:45:51 +02:00
|
|
|
|
|
|
|
self.assertTrue("_known_servers_count" not in self.store.__dict__.keys())
|
|
|
|
|
|
|
|
@unittest.override_config(
|
|
|
|
{"enable_metrics": True, "metrics_flags": {"known_servers": True}}
|
|
|
|
)
|
2022-04-05 14:54:41 +02:00
|
|
|
def test_count_known_servers_stat_counter_enabled(self) -> None:
|
2019-09-06 17:45:51 +02:00
|
|
|
"""
|
|
|
|
If enabled, the metrics for how many servers are known will be counted.
|
|
|
|
"""
|
|
|
|
# Initialises to 1 -- itself
|
|
|
|
self.assertEqual(self.store._known_servers_count, 1)
|
|
|
|
|
2020-08-27 12:39:53 +02:00
|
|
|
self.pump()
|
2019-09-06 17:45:51 +02:00
|
|
|
|
|
|
|
# No rooms have been joined, so technically the SQL returns 0, but it
|
|
|
|
# will still say it knows about itself.
|
|
|
|
self.assertEqual(self.store._known_servers_count, 1)
|
|
|
|
|
|
|
|
self.room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)
|
|
|
|
self.inject_room_member(self.room, self.u_bob, Membership.JOIN)
|
|
|
|
self.inject_room_member(self.room, self.u_charlie.to_string(), Membership.JOIN)
|
|
|
|
|
2020-08-27 12:39:53 +02:00
|
|
|
self.pump(1)
|
2019-09-06 17:45:51 +02:00
|
|
|
|
|
|
|
# It now knows about Charlie's server.
|
|
|
|
self.assertEqual(self.store._known_servers_count, 2)
|
|
|
|
|
2022-04-05 14:54:41 +02:00
|
|
|
def test__null_byte_in_display_name_properly_handled(self) -> None:
|
2021-11-12 19:38:24 +01:00
|
|
|
room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)
|
|
|
|
|
2023-10-26 19:01:36 +02:00
|
|
|
res = cast(
|
|
|
|
List[Tuple[Optional[str], str]],
|
|
|
|
self.get_success(
|
|
|
|
self.store.db_pool.simple_select_list(
|
|
|
|
"room_memberships",
|
|
|
|
{"user_id": "@alice:test"},
|
|
|
|
["display_name", "event_id"],
|
|
|
|
)
|
|
|
|
),
|
2021-11-12 19:38:24 +01:00
|
|
|
)
|
|
|
|
# Check that we only got one result back
|
|
|
|
self.assertEqual(len(res), 1)
|
|
|
|
|
|
|
|
# Check that alice's display name is "alice"
|
2023-10-26 19:01:36 +02:00
|
|
|
self.assertEqual(res[0][0], "alice")
|
2021-11-12 19:38:24 +01:00
|
|
|
|
|
|
|
# Grab the event_id to use later
|
2023-10-26 19:01:36 +02:00
|
|
|
event_id = res[0][1]
|
2021-11-12 19:38:24 +01:00
|
|
|
|
|
|
|
# Create a profile with the offending null byte in the display name
|
|
|
|
new_profile = {"displayname": "ali\u0000ce"}
|
|
|
|
|
|
|
|
# Ensure that the change goes smoothly and does not fail due to the null byte
|
|
|
|
self.helper.change_membership(
|
|
|
|
room,
|
|
|
|
self.u_alice,
|
|
|
|
self.u_alice,
|
|
|
|
"join",
|
|
|
|
extra_data=new_profile,
|
|
|
|
tok=self.t_alice,
|
|
|
|
)
|
|
|
|
|
2023-10-26 19:01:36 +02:00
|
|
|
res2 = cast(
|
|
|
|
List[Tuple[Optional[str], str]],
|
|
|
|
self.get_success(
|
|
|
|
self.store.db_pool.simple_select_list(
|
|
|
|
"room_memberships",
|
|
|
|
{"user_id": "@alice:test"},
|
|
|
|
["display_name", "event_id"],
|
|
|
|
)
|
|
|
|
),
|
2021-11-12 19:38:24 +01:00
|
|
|
)
|
|
|
|
# Check that we only have two results
|
|
|
|
self.assertEqual(len(res2), 2)
|
|
|
|
|
|
|
|
# Filter out the previous event using the event_id we grabbed above
|
2023-10-26 19:01:36 +02:00
|
|
|
row = [row for row in res2 if row[1] != event_id]
|
2021-11-12 19:38:24 +01:00
|
|
|
|
|
|
|
# Check that alice's display name is now None
|
2023-10-26 19:01:36 +02:00
|
|
|
self.assertIsNone(row[0][0])
|
2021-11-12 19:38:24 +01:00
|
|
|
|
2022-08-30 11:58:38 +02:00
|
|
|
def test_room_is_locally_forgotten(self) -> None:
|
2022-08-17 11:42:01 +02:00
|
|
|
"""Test that when the last local user has forgotten a room it is known as forgotten."""
|
|
|
|
# join two local and one remote user
|
|
|
|
self.room = self.helper.create_room_as(self.u_alice, tok=self.t_alice)
|
|
|
|
self.get_success(
|
|
|
|
event_injection.inject_member_event(self.hs, self.room, self.u_bob, "join")
|
|
|
|
)
|
|
|
|
self.get_success(
|
|
|
|
event_injection.inject_member_event(
|
|
|
|
self.hs, self.room, self.u_charlie.to_string(), "join"
|
|
|
|
)
|
|
|
|
)
|
|
|
|
self.assertFalse(
|
|
|
|
self.get_success(self.store.is_locally_forgotten_room(self.room))
|
|
|
|
)
|
|
|
|
|
|
|
|
# local users leave the room and the room is not forgotten
|
|
|
|
self.get_success(
|
|
|
|
event_injection.inject_member_event(
|
|
|
|
self.hs, self.room, self.u_alice, "leave"
|
|
|
|
)
|
|
|
|
)
|
|
|
|
self.get_success(
|
|
|
|
event_injection.inject_member_event(self.hs, self.room, self.u_bob, "leave")
|
|
|
|
)
|
|
|
|
self.assertFalse(
|
|
|
|
self.get_success(self.store.is_locally_forgotten_room(self.room))
|
|
|
|
)
|
|
|
|
|
|
|
|
# first user forgets the room, room is not forgotten
|
|
|
|
self.get_success(self.store.forget(self.u_alice, self.room))
|
|
|
|
self.assertFalse(
|
|
|
|
self.get_success(self.store.is_locally_forgotten_room(self.room))
|
|
|
|
)
|
|
|
|
|
|
|
|
# second (last local) user forgets the room and the room is forgotten
|
|
|
|
self.get_success(self.store.forget(self.u_bob, self.room))
|
|
|
|
self.assertTrue(
|
|
|
|
self.get_success(self.store.is_locally_forgotten_room(self.room))
|
|
|
|
)
|
|
|
|
|
2022-08-30 11:58:38 +02:00
|
|
|
def test_join_locally_forgotten_room(self) -> None:
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
"""
|
|
|
|
Tests if a user joins a forgotten room, the room is not forgotten anymore.
|
|
|
|
|
|
|
|
Since a room can't be re-joined if everyone has left. This can only happen with
|
|
|
|
a room with remote users in it.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
|
|
|
|
# Create a remote room
|
|
|
|
creator = "@user:other"
|
|
|
|
room_id = "!foo:other"
|
|
|
|
room_version = RoomVersions.V10
|
|
|
|
shared_kwargs = {
|
|
|
|
"room_id": room_id,
|
|
|
|
"room_version": room_version.identifier,
|
|
|
|
}
|
|
|
|
|
|
|
|
create_tuple = self.get_success(
|
|
|
|
create_event(
|
|
|
|
self.hs,
|
|
|
|
prev_event_ids=[],
|
|
|
|
type=EventTypes.Create,
|
|
|
|
state_key="",
|
|
|
|
content={
|
|
|
|
# The `ROOM_CREATOR` field could be removed if we used a room
|
|
|
|
# version > 10 (in favor of relying on `sender`)
|
|
|
|
EventContentFields.ROOM_CREATOR: creator,
|
|
|
|
EventContentFields.ROOM_VERSION: room_version.identifier,
|
|
|
|
},
|
|
|
|
sender=creator,
|
|
|
|
**shared_kwargs,
|
|
|
|
)
|
|
|
|
)
|
|
|
|
creator_tuple = self.get_success(
|
|
|
|
create_event(
|
|
|
|
self.hs,
|
|
|
|
prev_event_ids=[create_tuple[0].event_id],
|
|
|
|
auth_event_ids=[create_tuple[0].event_id],
|
|
|
|
type=EventTypes.Member,
|
|
|
|
state_key=creator,
|
|
|
|
content={"membership": Membership.JOIN},
|
|
|
|
sender=creator,
|
|
|
|
**shared_kwargs,
|
|
|
|
)
|
2022-08-17 11:42:01 +02:00
|
|
|
)
|
|
|
|
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
remote_events_and_contexts = [
|
|
|
|
create_tuple,
|
|
|
|
creator_tuple,
|
|
|
|
]
|
|
|
|
|
|
|
|
# Ensure the local HS knows the room version
|
|
|
|
self.get_success(self.store.store_room(room_id, creator, False, room_version))
|
|
|
|
|
|
|
|
# Persist these events as backfilled events.
|
|
|
|
for event, context in remote_events_and_contexts:
|
|
|
|
self.get_success(
|
|
|
|
self.persistence.persist_event(event, context, backfilled=True)
|
|
|
|
)
|
|
|
|
|
|
|
|
# Now we join the local user to the room. We want to make this feel as close to
|
|
|
|
# the real `process_remote_join()` as possible but we'd like to avoid some of
|
|
|
|
# the auth checks that would be done in the real code.
|
|
|
|
#
|
|
|
|
# FIXME: The test was originally written using this less-real
|
|
|
|
# `persist_event(...)` shortcut but it would be nice to use the real remote join
|
|
|
|
# process in a `FederatingHomeserverTestCase`.
|
|
|
|
flawed_join_tuple = self.get_success(
|
|
|
|
create_event(
|
|
|
|
self.hs,
|
|
|
|
prev_event_ids=[creator_tuple[0].event_id],
|
|
|
|
# This doesn't work correctly to create an `EventContext` that includes
|
|
|
|
# both of these state events. I assume it's because we're working on our
|
|
|
|
# local homeserver which has the remote state set as `outlier`. We have
|
|
|
|
# to create our own EventContext below to get this right.
|
|
|
|
auth_event_ids=[create_tuple[0].event_id],
|
|
|
|
type=EventTypes.Member,
|
|
|
|
state_key=user1_id,
|
|
|
|
content={"membership": Membership.JOIN},
|
|
|
|
sender=user1_id,
|
|
|
|
**shared_kwargs,
|
2022-08-17 11:42:01 +02:00
|
|
|
)
|
|
|
|
)
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
# We have to create our own context to get the state set correctly. If we use
|
|
|
|
# the `EventContext` from the `flawed_join_tuple`, the `current_state_events`
|
|
|
|
# table will only have the join event in it which should never happen in our
|
|
|
|
# real server.
|
|
|
|
join_event = flawed_join_tuple[0]
|
|
|
|
join_context = self.get_success(
|
|
|
|
self.state_handler.compute_event_context(
|
|
|
|
join_event,
|
|
|
|
state_ids_before_event={
|
|
|
|
(e.type, e.state_key): e.event_id for e in [create_tuple[0]]
|
|
|
|
},
|
|
|
|
partial_state=False,
|
|
|
|
)
|
2022-08-17 11:42:01 +02:00
|
|
|
)
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
self.get_success(self.persistence.persist_event(join_event, join_context))
|
2022-08-17 11:42:01 +02:00
|
|
|
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
# The room shouldn't be forgotten because the local user just joined
|
|
|
|
self.assertFalse(
|
|
|
|
self.get_success(self.store.is_locally_forgotten_room(room_id))
|
|
|
|
)
|
|
|
|
|
|
|
|
# After all of the local users (there is only user1) leave and forgetting the
|
|
|
|
# room, it is forgotten
|
|
|
|
user1_leave_response = self.helper.leave(room_id, user1_id, tok=user1_tok)
|
|
|
|
user1_leave_event = self.get_success(
|
|
|
|
self.store.get_event(user1_leave_response["event_id"])
|
|
|
|
)
|
|
|
|
self.get_success(self.store.forget(user1_id, room_id))
|
|
|
|
self.assertTrue(self.get_success(self.store.is_locally_forgotten_room(room_id)))
|
|
|
|
|
|
|
|
# Join the local user to the room (again). We want to make this feel as close to
|
|
|
|
# the real `process_remote_join()` as possible but we'd like to avoid some of
|
|
|
|
# the auth checks that would be done in the real code.
|
|
|
|
#
|
|
|
|
# FIXME: The test was originally written using this less-real
|
|
|
|
# `event_injection.inject_member_event(...)` shortcut but it would be nice to
|
|
|
|
# use the real remote join process in a `FederatingHomeserverTestCase`.
|
|
|
|
flawed_join_tuple = self.get_success(
|
|
|
|
create_event(
|
|
|
|
self.hs,
|
|
|
|
prev_event_ids=[user1_leave_response["event_id"]],
|
|
|
|
# This doesn't work correctly to create an `EventContext` that includes
|
|
|
|
# both of these state events. I assume it's because we're working on our
|
|
|
|
# local homeserver which has the remote state set as `outlier`. We have
|
|
|
|
# to create our own EventContext below to get this right.
|
|
|
|
auth_event_ids=[
|
|
|
|
create_tuple[0].event_id,
|
|
|
|
user1_leave_response["event_id"],
|
|
|
|
],
|
|
|
|
type=EventTypes.Member,
|
|
|
|
state_key=user1_id,
|
|
|
|
content={"membership": Membership.JOIN},
|
|
|
|
sender=user1_id,
|
|
|
|
**shared_kwargs,
|
2022-08-17 11:42:01 +02:00
|
|
|
)
|
|
|
|
)
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
# We have to create our own context to get the state set correctly. If we use
|
|
|
|
# the `EventContext` from the `flawed_join_tuple`, the `current_state_events`
|
|
|
|
# table will only have the join event in it which should never happen in our
|
|
|
|
# real server.
|
|
|
|
join_event = flawed_join_tuple[0]
|
|
|
|
join_context = self.get_success(
|
|
|
|
self.state_handler.compute_event_context(
|
|
|
|
join_event,
|
|
|
|
state_ids_before_event={
|
|
|
|
(e.type, e.state_key): e.event_id
|
|
|
|
for e in [create_tuple[0], user1_leave_event]
|
|
|
|
},
|
|
|
|
partial_state=False,
|
|
|
|
)
|
|
|
|
)
|
|
|
|
self.get_success(self.persistence.persist_event(join_event, join_context))
|
|
|
|
|
|
|
|
# After the local user rejoins the remote room, it isn't forgotten anymore
|
2022-08-17 11:42:01 +02:00
|
|
|
self.assertFalse(
|
Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512)
Pre-populate room data for quick filtering/sorting in the Sliding Sync
API
Spawning from
https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578
This PR is acting as the Synapse version `N+1` step in the gradual
migration being tracked by
https://github.com/element-hq/synapse/issues/17623
Adding two new database tables:
- `sliding_sync_joined_rooms`: A table for storing room meta data that
the local server is still participating in. The info here can be shared
across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the
relevant room current state changes or a new event is sent in the room.
- `sliding_sync_membership_snapshots`: A table for storing a snapshot of
room meta data at the time of the local user's membership. Keyed on
`(room_id, user_id)` and only updated when a user's membership in a room
changes.
Also adds background updates to populate these tables with all of the
existing data.
We want to have the guarantee that if a row exists in the sliding sync
tables, we are able to rely on it (accurate data). And if a row doesn't
exist, we use a fallback to get the same info until the background
updates fill in the rows or a new event comes in triggering it to be
fully inserted. This means we need a couple extra things in place until
we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the
`N+2` part of the gradual migration. For context on why we can't rely on
the tables without these things see [1].
1. On start-up, block until we clear out any rows for the rooms that
have had events since the max-`stream_ordering` of the
`sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of
the `events` table). For `sliding_sync_membership_snapshots`, we can
compare to the max-`stream_ordering` of `local_current_membership`
- This accounts for when someone downgrades their Synapse version and
then upgrades it again. This will ensure that we don't have any
stale/out-of-date data in the
`sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables
since any new events sent in rooms would have also needed to be written
to the sliding sync tables. For example a new event needs to bump
`event_stream_ordering` in `sliding_sync_joined_rooms` table or some
state in the room changing (like the room name). Or another example of
someone's membership changing in a room affecting
`sliding_sync_membership_snapshots`.
1. Add another background update that will catch-up with any rows that
were just deleted from the sliding sync tables (based on the activity in
the `events`/`local_current_membership`). The rooms that need
recalculating are added to the
`sliding_sync_joined_rooms_to_recalculate` table.
1. Making sure rows are fully inserted. Instead of partially inserting,
we need to check if the row already exists and fully insert all data if
not.
All of this extra functionality can be removed once the
`SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync
tables so people can no longer downgrade (the `N+2` part of the gradual
migration).
<details>
<summary><sup>[1]</sup></summary>
For `sliding_sync_joined_rooms`, since we partially insert rows as state
comes in, we can't rely on the existence of the row for a given
`room_id`. We can't even rely on looking at whether the background
update has finished. There could still be partial rows from when someone
reverted their Synapse version after the background update finished, had
some state changes (or new rooms), then upgraded again and more state
changes happen leaving a partial row.
For `sliding_sync_membership_snapshots`, we insert items as a whole
except for the `forgotten` column ~~so we can rely on rows existing and
just need to always use a fallback for the `forgotten` data. We can't
use the `forgotten` column in the table for the same reasons above about
`sliding_sync_joined_rooms`.~~ We could have an out-of-date membership
from when someone reverted their Synapse version. (same problems as
outlined for `sliding_sync_joined_rooms` above)
Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
</details>
### TODO
- [x] Update `stream_ordering`/`bump_stamp`
- [x] Handle remote invites
- [x] Handle state resets
- [x] Consider adding `sender` so we can filter `LEAVE` memberships and
distinguish from kicks.
- [x] We should add it to be able to tell leaves from kicks
- [x] Consider adding `tombstone` state to help address
https://github.com/element-hq/synapse/issues/17540
- [x] We should add it `tombstone_successor_room_id`
- [x] Consider adding `forgotten` status to avoid extra
lookup/table-join on `room_memberships`
- [x] We should add it
- [x] Background update to fill in values for all joined rooms and
non-join membership
- [x] Clean-up tables when room is deleted
- [ ] Make sure tables are useful to our use case
- First explored in
https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables
- Also explored in
https://github.com/element-hq/synapse/commit/76b5a576eb363496315dfd39510cad7d02b0fc73
- [x] Plan for how can we use this with a fallback
- See plan discussed above in main area of the issue description
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7)
- [x] Plan for how we can rely on this new table without a fallback
- Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add
new tables and background update to backfill all rows. Since this is a
new table, we don't have to add any `NOT VALID` constraints and validate
them when the background update completes. Read from new tables with a
fallback in cases where the rows aren't filled in yet.
- Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump
`SCHEMA_COMPAT_VERSION` to `87` because we don't want people to
downgrade and miss writes while they are on an older version. Add a
foreground update to finish off the backfill so we can read from new
tables without the fallback. Application code can now rely on the new
tables being populated.
- Discussed in an [internal
meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj)
### Dev notes
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase
```
```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase
```
Reference:
- [Development docs on background updates and worked examples of gradual
migrations
](https://github.com/element-hq/synapse/blob/1dfa59b238cee0dc62163588cc9481896c288979/docs/development/database_schema.md#background-updates)
- A real example of a gradual migration:
https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514
- Adding `rooms.creator` field that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/10697
- Adding `rooms.room_version` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/6729
- Adding `room_stats_state.room_type` that needed a background update to
backfill data, https://github.com/matrix-org/synapse/pull/13031
- Tables from MSC2716: `insertion_events`, `insertion_event_edges`,
`insertion_event_extremities`, `batch_events`
- `current_state_events` updated in
`synapse/storage/databases/main/events.py`
---
```
persist_event (adds to queue)
_persist_event_batch
_persist_events_and_state_updates (assigns `stream_ordering` to events)
_persist_events_txn
_store_event_txn
_update_metadata_tables_txn
_store_room_members_txn
_update_current_state_txn
```
---
> Concatenated Indexes [...] (also known as multi-column, composite or
combined index)
>
> [...] key consists of multiple columns.
>
> We can take advantage of the fact that the first index column is
always usable for searching
>
> *--
https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys*
---
Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`),
https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219
---
<details>
<summary>SQL queries:</summary>
Both of these are equivalent and work in SQLite and Postgres
Options 1:
```sql
WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
)
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT * FROM data_table
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
Option 2:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)})
SELECT
column1 as room_id,
column2 as user_id,
column3 as membership_event_id,
column4 as membership,
column5 as event_stream_ordering,
{", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))}
FROM (
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
) as v
WHERE membership != ?
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
If we don't need the `membership` condition, we could use:
```sql
INSERT INTO sliding_sync_non_join_memberships
(room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)})
VALUES (
?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?),
{", ".join("?" for _ in insert_values)}
)
ON CONFLICT (room_id, user_id)
DO UPDATE SET
membership_event_id = EXCLUDED.membership_event_id,
membership = EXCLUDED.membership,
event_stream_ordering = EXCLUDED.event_stream_ordering,
{", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)}
```
</details>
### Pull Request Checklist
<!-- Please read
https://element-hq.github.io/synapse/latest/development/contributing_guide.html
before submitting your pull request -->
* [x] Pull request is based on the develop branch
* [x] Pull request includes a [changelog
file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog).
The entry should:
- Be a short description of your change which makes sense to users.
"Fixed a bug that prevented receiving messages from other servers."
instead of "Moved X method from `EventStore` to `EventWorkerStore`.".
- Use markdown where necessary, mostly for `code blocks`.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by
@github_username." or "Contributed by [Your Name]." to the end of the
entry.
* [x] [Code
style](https://element-hq.github.io/synapse/latest/code_style.html) is
correct
(run the
[linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
---------
Co-authored-by: Erik Johnston <erik@matrix.org>
2024-08-29 17:09:51 +02:00
|
|
|
self.get_success(self.store.is_locally_forgotten_room(room_id))
|
2022-08-17 11:42:01 +02:00
|
|
|
)
|
|
|
|
|
2019-07-23 17:55:45 +02:00
|
|
|
|
2024-07-17 20:10:15 +02:00
|
|
|
class RoomSummaryTestCase(unittest.HomeserverTestCase):
|
|
|
|
"""
|
|
|
|
Test `/sync` room summary related logic like `get_room_summary(...)` and
|
|
|
|
`extract_heroes_from_room_summary(...)`
|
|
|
|
"""
|
|
|
|
|
|
|
|
servlets = [
|
|
|
|
admin.register_servlets,
|
|
|
|
knock.register_servlets,
|
|
|
|
login.register_servlets,
|
|
|
|
room.register_servlets,
|
|
|
|
]
|
|
|
|
|
|
|
|
def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
|
|
|
|
self.sliding_sync_handler = self.hs.get_sliding_sync_handler()
|
|
|
|
self.store = self.hs.get_datastores().main
|
|
|
|
|
|
|
|
def _assert_member_summary(
|
|
|
|
self,
|
|
|
|
actual_member_summary: MemberSummary,
|
|
|
|
expected_member_list: List[str],
|
|
|
|
*,
|
|
|
|
expected_member_count: Optional[int] = None,
|
|
|
|
) -> None:
|
|
|
|
"""
|
|
|
|
Assert that the `MemberSummary` object has the expected members.
|
|
|
|
"""
|
|
|
|
self.assertListEqual(
|
|
|
|
[
|
|
|
|
user_id
|
|
|
|
for user_id, _membership_event_id in actual_member_summary.members
|
|
|
|
],
|
|
|
|
expected_member_list,
|
|
|
|
)
|
|
|
|
self.assertEqual(
|
|
|
|
actual_member_summary.count,
|
|
|
|
(
|
|
|
|
expected_member_count
|
|
|
|
if expected_member_count is not None
|
|
|
|
else len(expected_member_list)
|
|
|
|
),
|
|
|
|
)
|
|
|
|
|
|
|
|
def test_get_room_summary_membership(self) -> None:
|
|
|
|
"""
|
|
|
|
Test that `get_room_summary(...)` gets every kind of membership when there
|
|
|
|
aren't that many members in the room.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
user3_id = self.register_user("user3", "pass")
|
|
|
|
_user3_tok = self.login(user3_id, "pass")
|
|
|
|
user4_id = self.register_user("user4", "pass")
|
|
|
|
user4_tok = self.login(user4_id, "pass")
|
|
|
|
user5_id = self.register_user("user5", "pass")
|
|
|
|
user5_tok = self.login(user5_id, "pass")
|
|
|
|
|
|
|
|
# Setup a room (user1 is the creator and is joined to the room)
|
|
|
|
room_id = self.helper.create_room_as(user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User2 is banned
|
|
|
|
self.helper.join(room_id, user2_id, tok=user2_tok)
|
|
|
|
self.helper.ban(room_id, src=user1_id, targ=user2_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User3 is invited by user1
|
|
|
|
self.helper.invite(room_id, targ=user3_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User4 leaves
|
|
|
|
self.helper.join(room_id, user4_id, tok=user4_tok)
|
|
|
|
self.helper.leave(room_id, user4_id, tok=user4_tok)
|
|
|
|
|
|
|
|
# User5 joins
|
|
|
|
self.helper.join(room_id, user5_id, tok=user5_tok)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(self.store.get_room_summary(room_id))
|
|
|
|
empty_ms = MemberSummary([], 0)
|
|
|
|
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.JOIN, empty_ms),
|
|
|
|
[user1_id, user5_id],
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.INVITE, empty_ms), [user3_id]
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.LEAVE, empty_ms), [user4_id]
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.BAN, empty_ms), [user2_id]
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.KNOCK, empty_ms),
|
|
|
|
[
|
|
|
|
# No one knocked
|
|
|
|
],
|
|
|
|
)
|
|
|
|
|
|
|
|
def test_get_room_summary_membership_order(self) -> None:
|
|
|
|
"""
|
|
|
|
Test that `get_room_summary(...)` stacks our limit of 6 in this order: joins ->
|
|
|
|
invites -> leave -> everything else (bans/knocks)
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
user3_id = self.register_user("user3", "pass")
|
|
|
|
_user3_tok = self.login(user3_id, "pass")
|
|
|
|
user4_id = self.register_user("user4", "pass")
|
|
|
|
user4_tok = self.login(user4_id, "pass")
|
|
|
|
user5_id = self.register_user("user5", "pass")
|
|
|
|
user5_tok = self.login(user5_id, "pass")
|
|
|
|
user6_id = self.register_user("user6", "pass")
|
|
|
|
user6_tok = self.login(user6_id, "pass")
|
|
|
|
user7_id = self.register_user("user7", "pass")
|
|
|
|
user7_tok = self.login(user7_id, "pass")
|
|
|
|
|
|
|
|
# Setup the room (user1 is the creator and is joined to the room)
|
|
|
|
room_id = self.helper.create_room_as(user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# We expect the order to be joins -> invites -> leave -> bans so setup the users
|
|
|
|
# *NOT* in that same order to make sure we're actually sorting them.
|
|
|
|
|
|
|
|
# User2 is banned
|
|
|
|
self.helper.join(room_id, user2_id, tok=user2_tok)
|
|
|
|
self.helper.ban(room_id, src=user1_id, targ=user2_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User3 is invited by user1
|
|
|
|
self.helper.invite(room_id, targ=user3_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User4 leaves
|
|
|
|
self.helper.join(room_id, user4_id, tok=user4_tok)
|
|
|
|
self.helper.leave(room_id, user4_id, tok=user4_tok)
|
|
|
|
|
|
|
|
# User5, User6, User7 joins
|
|
|
|
self.helper.join(room_id, user5_id, tok=user5_tok)
|
|
|
|
self.helper.join(room_id, user6_id, tok=user6_tok)
|
|
|
|
self.helper.join(room_id, user7_id, tok=user7_tok)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(self.store.get_room_summary(room_id))
|
|
|
|
empty_ms = MemberSummary([], 0)
|
|
|
|
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.JOIN, empty_ms),
|
|
|
|
[user1_id, user5_id, user6_id, user7_id],
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.INVITE, empty_ms), [user3_id]
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.LEAVE, empty_ms), [user4_id]
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.BAN, empty_ms),
|
|
|
|
[
|
|
|
|
# The banned user is not in the summary because the summary can only fit
|
|
|
|
# 6 members and prefers everything else before bans
|
|
|
|
#
|
|
|
|
# user2_id
|
|
|
|
],
|
|
|
|
# But we still see the count of banned users
|
|
|
|
expected_member_count=1,
|
|
|
|
)
|
|
|
|
self._assert_member_summary(
|
|
|
|
room_membership_summary.get(Membership.KNOCK, empty_ms),
|
|
|
|
[
|
|
|
|
# No one knocked
|
|
|
|
],
|
|
|
|
)
|
|
|
|
|
|
|
|
def test_extract_heroes_from_room_summary_excludes_self(self) -> None:
|
|
|
|
"""
|
|
|
|
Test that `extract_heroes_from_room_summary(...)` does not include the user
|
|
|
|
itself.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
|
|
|
|
# Setup the room (user1 is the creator and is joined to the room)
|
|
|
|
room_id = self.helper.create_room_as(user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User2 joins
|
|
|
|
self.helper.join(room_id, user2_id, tok=user2_tok)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(self.store.get_room_summary(room_id))
|
|
|
|
|
|
|
|
# We first ask from the perspective of a random fake user
|
|
|
|
hero_user_ids = extract_heroes_from_room_summary(
|
|
|
|
room_membership_summary, me="@fakeuser"
|
|
|
|
)
|
|
|
|
|
|
|
|
# Make sure user1 is in the room (ensure our test setup is correct)
|
|
|
|
self.assertListEqual(hero_user_ids, [user1_id, user2_id])
|
|
|
|
|
|
|
|
# Now, we ask for the room summary from the perspective of user1
|
|
|
|
hero_user_ids = extract_heroes_from_room_summary(
|
|
|
|
room_membership_summary, me=user1_id
|
|
|
|
)
|
|
|
|
|
|
|
|
# User1 should not be included in the list of heroes because they are the one
|
|
|
|
# asking
|
|
|
|
self.assertListEqual(hero_user_ids, [user2_id])
|
|
|
|
|
|
|
|
def test_extract_heroes_from_room_summary_first_five_joins(self) -> None:
|
|
|
|
"""
|
|
|
|
Test that `extract_heroes_from_room_summary(...)` returns the first 5 joins.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
user3_id = self.register_user("user3", "pass")
|
|
|
|
user3_tok = self.login(user3_id, "pass")
|
|
|
|
user4_id = self.register_user("user4", "pass")
|
|
|
|
user4_tok = self.login(user4_id, "pass")
|
|
|
|
user5_id = self.register_user("user5", "pass")
|
|
|
|
user5_tok = self.login(user5_id, "pass")
|
|
|
|
user6_id = self.register_user("user6", "pass")
|
|
|
|
user6_tok = self.login(user6_id, "pass")
|
|
|
|
user7_id = self.register_user("user7", "pass")
|
|
|
|
user7_tok = self.login(user7_id, "pass")
|
|
|
|
|
|
|
|
# Setup the room (user1 is the creator and is joined to the room)
|
|
|
|
room_id = self.helper.create_room_as(user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User2 -> User7 joins
|
|
|
|
self.helper.join(room_id, user2_id, tok=user2_tok)
|
|
|
|
self.helper.join(room_id, user3_id, tok=user3_tok)
|
|
|
|
self.helper.join(room_id, user4_id, tok=user4_tok)
|
|
|
|
self.helper.join(room_id, user5_id, tok=user5_tok)
|
|
|
|
self.helper.join(room_id, user6_id, tok=user6_tok)
|
|
|
|
self.helper.join(room_id, user7_id, tok=user7_tok)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(self.store.get_room_summary(room_id))
|
|
|
|
|
|
|
|
hero_user_ids = extract_heroes_from_room_summary(
|
|
|
|
room_membership_summary, me="@fakuser"
|
|
|
|
)
|
|
|
|
|
|
|
|
# First 5 users to join the room
|
|
|
|
self.assertListEqual(
|
|
|
|
hero_user_ids, [user1_id, user2_id, user3_id, user4_id, user5_id]
|
|
|
|
)
|
|
|
|
|
|
|
|
def test_extract_heroes_from_room_summary_membership_order(self) -> None:
|
|
|
|
"""
|
|
|
|
Test that `extract_heroes_from_room_summary(...)` prefers joins/invites over
|
|
|
|
everything else.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
user3_id = self.register_user("user3", "pass")
|
|
|
|
_user3_tok = self.login(user3_id, "pass")
|
|
|
|
user4_id = self.register_user("user4", "pass")
|
|
|
|
user4_tok = self.login(user4_id, "pass")
|
|
|
|
user5_id = self.register_user("user5", "pass")
|
|
|
|
user5_tok = self.login(user5_id, "pass")
|
|
|
|
|
|
|
|
# Setup the room (user1 is the creator and is joined to the room)
|
|
|
|
room_id = self.helper.create_room_as(user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# We expect the order to be joins -> invites -> leave -> bans so setup the users
|
|
|
|
# *NOT* in that same order to make sure we're actually sorting them.
|
|
|
|
|
|
|
|
# User2 is banned
|
|
|
|
self.helper.join(room_id, user2_id, tok=user2_tok)
|
|
|
|
self.helper.ban(room_id, src=user1_id, targ=user2_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User3 is invited by user1
|
|
|
|
self.helper.invite(room_id, targ=user3_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User4 leaves
|
|
|
|
self.helper.join(room_id, user4_id, tok=user4_tok)
|
|
|
|
self.helper.leave(room_id, user4_id, tok=user4_tok)
|
|
|
|
|
|
|
|
# User5 joins
|
|
|
|
self.helper.join(room_id, user5_id, tok=user5_tok)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(self.store.get_room_summary(room_id))
|
|
|
|
|
|
|
|
hero_user_ids = extract_heroes_from_room_summary(
|
|
|
|
room_membership_summary, me="@fakeuser"
|
|
|
|
)
|
|
|
|
|
|
|
|
# Prefer joins -> invites, over everything else
|
|
|
|
self.assertListEqual(
|
|
|
|
hero_user_ids,
|
|
|
|
[
|
|
|
|
# The joins
|
|
|
|
user1_id,
|
|
|
|
user5_id,
|
|
|
|
# The invites
|
|
|
|
user3_id,
|
|
|
|
],
|
|
|
|
)
|
|
|
|
|
|
|
|
@skip_unless(
|
|
|
|
False,
|
|
|
|
"Test is not possible because when everyone leaves the room, "
|
|
|
|
+ "the server is `no_longer_in_room` and we don't have any `current_state_events` to query",
|
|
|
|
)
|
|
|
|
def test_extract_heroes_from_room_summary_fallback_leave_ban(self) -> None:
|
|
|
|
"""
|
|
|
|
Test that `extract_heroes_from_room_summary(...)` falls back to leave/ban if
|
|
|
|
there aren't any joins/invites.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
user3_id = self.register_user("user3", "pass")
|
|
|
|
user3_tok = self.login(user3_id, "pass")
|
|
|
|
|
|
|
|
# Setup the room (user1 is the creator and is joined to the room)
|
|
|
|
room_id = self.helper.create_room_as(user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User2 is banned
|
|
|
|
self.helper.join(room_id, user2_id, tok=user2_tok)
|
|
|
|
self.helper.ban(room_id, src=user1_id, targ=user2_id, tok=user1_tok)
|
|
|
|
|
|
|
|
# User3 leaves
|
|
|
|
self.helper.join(room_id, user3_id, tok=user3_tok)
|
|
|
|
self.helper.leave(room_id, user3_id, tok=user3_tok)
|
|
|
|
|
|
|
|
# User1 leaves (we're doing this last because they're the room creator)
|
|
|
|
self.helper.leave(room_id, user1_id, tok=user1_tok)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(self.store.get_room_summary(room_id))
|
|
|
|
|
|
|
|
hero_user_ids = extract_heroes_from_room_summary(
|
|
|
|
room_membership_summary, me="@fakeuser"
|
|
|
|
)
|
|
|
|
|
|
|
|
# Fallback to people who left -> banned
|
|
|
|
self.assertListEqual(
|
|
|
|
hero_user_ids,
|
|
|
|
[user3_id, user1_id, user3_id],
|
|
|
|
)
|
|
|
|
|
|
|
|
def test_extract_heroes_from_room_summary_excludes_knocks(self) -> None:
|
|
|
|
"""
|
|
|
|
People who knock on the room have (potentially) never been in the room before
|
|
|
|
and are total outsiders. Plus the spec doesn't mention them at all for heroes.
|
|
|
|
"""
|
|
|
|
user1_id = self.register_user("user1", "pass")
|
|
|
|
user1_tok = self.login(user1_id, "pass")
|
|
|
|
user2_id = self.register_user("user2", "pass")
|
|
|
|
user2_tok = self.login(user2_id, "pass")
|
|
|
|
|
|
|
|
# Setup the knock room (user1 is the creator and is joined to the room)
|
|
|
|
knock_room_id = self.helper.create_room_as(
|
|
|
|
user1_id, tok=user1_tok, room_version=RoomVersions.V7.identifier
|
|
|
|
)
|
|
|
|
self.helper.send_state(
|
|
|
|
knock_room_id,
|
|
|
|
EventTypes.JoinRules,
|
|
|
|
{"join_rule": JoinRules.KNOCK},
|
|
|
|
tok=user1_tok,
|
|
|
|
)
|
|
|
|
|
|
|
|
# User2 knocks on the room
|
|
|
|
knock_channel = self.make_request(
|
|
|
|
"POST",
|
|
|
|
"/_matrix/client/r0/knock/%s" % (knock_room_id,),
|
|
|
|
b"{}",
|
|
|
|
user2_tok,
|
|
|
|
)
|
|
|
|
self.assertEqual(knock_channel.code, 200, knock_channel.result)
|
|
|
|
|
|
|
|
room_membership_summary = self.get_success(
|
|
|
|
self.store.get_room_summary(knock_room_id)
|
|
|
|
)
|
|
|
|
|
|
|
|
hero_user_ids = extract_heroes_from_room_summary(
|
|
|
|
room_membership_summary, me="@fakeuser"
|
|
|
|
)
|
|
|
|
|
|
|
|
# user1 is the creator and is joined to the room (should show up as a hero)
|
|
|
|
# user2 is knocking on the room (should not show up as a hero)
|
|
|
|
self.assertListEqual(
|
|
|
|
hero_user_ids,
|
|
|
|
[user1_id],
|
|
|
|
)
|
|
|
|
|
|
|
|
|
2019-07-23 17:55:45 +02:00
|
|
|
class CurrentStateMembershipUpdateTestCase(unittest.HomeserverTestCase):
|
2022-04-05 14:54:41 +02:00
|
|
|
def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
|
|
|
|
self.store = hs.get_datastores().main
|
|
|
|
self.room_creator = hs.get_room_creation_handler()
|
2019-07-23 17:55:45 +02:00
|
|
|
|
2022-04-05 14:54:41 +02:00
|
|
|
def test_can_rerun_update(self) -> None:
|
2019-07-23 17:55:45 +02:00
|
|
|
# First make sure we have completed all updates.
|
2021-10-06 14:56:45 +02:00
|
|
|
self.wait_for_background_updates()
|
2019-07-23 17:55:45 +02:00
|
|
|
|
|
|
|
# Now let's create a room, which will insert a membership
|
|
|
|
user = UserID("alice", "test")
|
2020-10-22 11:11:06 +02:00
|
|
|
requester = create_requester(user)
|
2019-07-23 17:55:45 +02:00
|
|
|
self.get_success(self.room_creator.create_room(requester, {}))
|
|
|
|
|
|
|
|
# Register the background update to run again.
|
|
|
|
self.get_success(
|
2020-08-05 22:38:57 +02:00
|
|
|
self.store.db_pool.simple_insert(
|
2019-07-23 17:55:45 +02:00
|
|
|
table="background_updates",
|
|
|
|
values={
|
|
|
|
"update_name": "current_state_events_membership",
|
|
|
|
"progress_json": "{}",
|
|
|
|
"depends_on": None,
|
|
|
|
},
|
|
|
|
)
|
|
|
|
)
|
|
|
|
|
|
|
|
# ... and tell the DataStore that it hasn't finished all updates yet
|
2020-08-05 22:38:57 +02:00
|
|
|
self.store.db_pool.updates._all_done = False
|
2019-07-23 17:55:45 +02:00
|
|
|
|
|
|
|
# Now let's actually drive the updates to completion
|
2021-10-06 14:56:45 +02:00
|
|
|
self.wait_for_background_updates()
|