synapse

mirror of https://mau.dev/maunium/synapse.git synced 2024-12-16 23:33:50 +01:00

Author	SHA1	Message	Date
Eric Eastwood	1a6b718f8c	Sliding Sync: Pre-populate room data for quick filtering/sorting (#17512 ) Pre-populate room data for quick filtering/sorting in the Sliding Sync API Spawning from https://github.com/element-hq/synapse/pull/17450#discussion_r1697335578 This PR is acting as the Synapse version `N+1` step in the gradual migration being tracked by https://github.com/element-hq/synapse/issues/17623 Adding two new database tables: - `sliding_sync_joined_rooms`: A table for storing room meta data that the local server is still participating in. The info here can be shared across all `Membership.JOIN`. Keyed on `(room_id)` and updated when the relevant room current state changes or a new event is sent in the room. - `sliding_sync_membership_snapshots`: A table for storing a snapshot of room meta data at the time of the local user's membership. Keyed on `(room_id, user_id)` and only updated when a user's membership in a room changes. Also adds background updates to populate these tables with all of the existing data. We want to have the guarantee that if a row exists in the sliding sync tables, we are able to rely on it (accurate data). And if a row doesn't exist, we use a fallback to get the same info until the background updates fill in the rows or a new event comes in triggering it to be fully inserted. This means we need a couple extra things in place until we bump `SCHEMA_COMPAT_VERSION` and run the foreground update in the `N+2` part of the gradual migration. For context on why we can't rely on the tables without these things see [1]. 1. On start-up, block until we clear out any rows for the rooms that have had events since the max-`stream_ordering` of the `sliding_sync_joined_rooms` table (compare to max-`stream_ordering` of the `events` table). For `sliding_sync_membership_snapshots`, we can compare to the max-`stream_ordering` of `local_current_membership` - This accounts for when someone downgrades their Synapse version and then upgrades it again. This will ensure that we don't have any stale/out-of-date data in the `sliding_sync_joined_rooms`/`sliding_sync_membership_snapshots` tables since any new events sent in rooms would have also needed to be written to the sliding sync tables. For example a new event needs to bump `event_stream_ordering` in `sliding_sync_joined_rooms` table or some state in the room changing (like the room name). Or another example of someone's membership changing in a room affecting `sliding_sync_membership_snapshots`. 1. Add another background update that will catch-up with any rows that were just deleted from the sliding sync tables (based on the activity in the `events`/`local_current_membership`). The rooms that need recalculating are added to the `sliding_sync_joined_rooms_to_recalculate` table. 1. Making sure rows are fully inserted. Instead of partially inserting, we need to check if the row already exists and fully insert all data if not. All of this extra functionality can be removed once the `SCHEMA_COMPAT_VERSION` is bumped with support for the new sliding sync tables so people can no longer downgrade (the `N+2` part of the gradual migration). <details> <summary><sup>[1]</sup></summary> For `sliding_sync_joined_rooms`, since we partially insert rows as state comes in, we can't rely on the existence of the row for a given `room_id`. We can't even rely on looking at whether the background update has finished. There could still be partial rows from when someone reverted their Synapse version after the background update finished, had some state changes (or new rooms), then upgraded again and more state changes happen leaving a partial row. For `sliding_sync_membership_snapshots`, we insert items as a whole except for the `forgotten` column ~~so we can rely on rows existing and just need to always use a fallback for the `forgotten` data. We can't use the `forgotten` column in the table for the same reasons above about `sliding_sync_joined_rooms`.~~ We could have an out-of-date membership from when someone reverted their Synapse version. (same problems as outlined for `sliding_sync_joined_rooms` above) Discussed in an [internal meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7) </details> ### TODO - [x] Update `stream_ordering`/`bump_stamp` - [x] Handle remote invites - [x] Handle state resets - [x] Consider adding `sender` so we can filter `LEAVE` memberships and distinguish from kicks. - [x] We should add it to be able to tell leaves from kicks - [x] Consider adding `tombstone` state to help address https://github.com/element-hq/synapse/issues/17540 - [x] We should add it `tombstone_successor_room_id` - [x] Consider adding `forgotten` status to avoid extra lookup/table-join on `room_memberships` - [x] We should add it - [x] Background update to fill in values for all joined rooms and non-join membership - [x] Clean-up tables when room is deleted - [ ] Make sure tables are useful to our use case - First explored in https://github.com/element-hq/synapse/compare/erikj/ss_use_new_tables - Also explored in `76b5a576eb` - [x] Plan for how can we use this with a fallback - See plan discussed above in main area of the issue description - Discussed in an [internal meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.dz5x6ef4mxz7) - [x] Plan for how we can rely on this new table without a fallback - Synapse version `N+1`: (this PR) Bump `SCHEMA_VERSION` to `87`. Add new tables and background update to backfill all rows. Since this is a new table, we don't have to add any `NOT VALID` constraints and validate them when the background update completes. Read from new tables with a fallback in cases where the rows aren't filled in yet. - Synapse version `N+2`: Bump `SCHEMA_VERSION` to `88` and bump `SCHEMA_COMPAT_VERSION` to `87` because we don't want people to downgrade and miss writes while they are on an older version. Add a foreground update to finish off the backfill so we can read from new tables without the fallback. Application code can now rely on the new tables being populated. - Discussed in an [internal meeting](https://docs.google.com/document/d/1MnuvPkaCkT_wviSQZ6YKBjiWciCBFMd-7hxyCO-OCbQ/edit#bookmark=id.hh7shg4cxdhj) ### Dev notes ``` SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.storage.test_events.SlidingSyncPrePopulatedTablesTestCase ``` ``` SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.handlers.test_sliding_sync.FilterRoomsTestCase ``` Reference: - [Development docs on background updates and worked examples of gradual migrations ](`1dfa59b238/docs/development/database_schema.md (background-updates)`) - A real example of a gradual migration: https://github.com/matrix-org/synapse/pull/15649#discussion_r1213779514 - Adding `rooms.creator` field that needed a background update to backfill data, https://github.com/matrix-org/synapse/pull/10697 - Adding `rooms.room_version` that needed a background update to backfill data, https://github.com/matrix-org/synapse/pull/6729 - Adding `room_stats_state.room_type` that needed a background update to backfill data, https://github.com/matrix-org/synapse/pull/13031 - Tables from MSC2716: `insertion_events`, `insertion_event_edges`, `insertion_event_extremities`, `batch_events` - `current_state_events` updated in `synapse/storage/databases/main/events.py` --- ``` persist_event (adds to queue) _persist_event_batch _persist_events_and_state_updates (assigns `stream_ordering` to events) _persist_events_txn _store_event_txn _update_metadata_tables_txn _store_room_members_txn _update_current_state_txn ``` --- > Concatenated Indexes [...] (also known as multi-column, composite or combined index) > > [...] key consists of multiple columns. > > We can take advantage of the fact that the first index column is always usable for searching > > -- https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys --- Dealing with `portdb` (`synapse/_scripts/synapse_port_db.py`), https://github.com/element-hq/synapse/pull/17512#discussion_r1725998219 --- <details> <summary>SQL queries:</summary> Both of these are equivalent and work in SQLite and Postgres Options 1: ```sql WITH data_table (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) AS ( VALUES ( ?, ?, ?, (SELECT membership FROM room_memberships WHERE event_id = ?), (SELECT stream_ordering FROM events WHERE event_id = ?), {", ".join("?" for _ in insert_values)} ) ) INSERT INTO sliding_sync_non_join_memberships (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) SELECT * FROM data_table WHERE membership != ? ON CONFLICT (room_id, user_id) DO UPDATE SET membership_event_id = EXCLUDED.membership_event_id, membership = EXCLUDED.membership, event_stream_ordering = EXCLUDED.event_stream_ordering, {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)} ``` Option 2: ```sql INSERT INTO sliding_sync_non_join_memberships (room_id, user_id, membership_event_id, membership, event_stream_ordering, {", ".join(insert_keys)}) SELECT column1 as room_id, column2 as user_id, column3 as membership_event_id, column4 as membership, column5 as event_stream_ordering, {", ".join("column" + str(i) for i in range(6, 6 + len(insert_keys)))} FROM ( VALUES ( ?, ?, ?, (SELECT membership FROM room_memberships WHERE event_id = ?), (SELECT stream_ordering FROM events WHERE event_id = ?), {", ".join("?" for _ in insert_values)} ) ) as v WHERE membership != ? ON CONFLICT (room_id, user_id) DO UPDATE SET membership_event_id = EXCLUDED.membership_event_id, membership = EXCLUDED.membership, event_stream_ordering = EXCLUDED.event_stream_ordering, {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)} ``` If we don't need the `membership` condition, we could use: ```sql INSERT INTO sliding_sync_non_join_memberships (room_id, membership_event_id, user_id, membership, event_stream_ordering, {", ".join(insert_keys)}) VALUES ( ?, ?, ?, (SELECT membership FROM room_memberships WHERE event_id = ?), (SELECT stream_ordering FROM events WHERE event_id = ?), {", ".join("?" for _ in insert_values)} ) ON CONFLICT (room_id, user_id) DO UPDATE SET membership_event_id = EXCLUDED.membership_event_id, membership = EXCLUDED.membership, event_stream_ordering = EXCLUDED.event_stream_ordering, {", ".join(f"{key} = EXCLUDED.{key}" for key in insert_keys)} ``` </details> ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: Erik Johnston <erik@matrix.org>	2024-08-29 16:09:51 +01:00
Erik Johnston	23740eaa3d	Correctly mention previous copyright (#16820 ) During the migration the automated script to update the copyright headers accidentally got rid of some of the existing copyright lines. Reinstate them.	2024-01-23 11:26:48 +00:00
Patrick Cloke	8e1e62c9e0	Update license headers	2023-11-21 15:29:58 -05:00
Patrick Cloke	9407d5ba78	Convert simple_select_list and simple_select_list_txn to return lists of tuples (#16505 ) This should use fewer allocations and improves type hints.	2023-10-26 13:01:36 -04:00
Patrick Cloke	3ac412b4e2	Require types in tests.storage. (#14646 ) Adds missing type hints to `tests.storage` package and does not allow untyped definitions.	2022-12-09 12:36:32 -05:00
reivilibre	e630722f11	Optimise `_update_client_ips_batch_txn` to batch together database operations. (#12252 ) Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>	2022-04-08 15:29:13 +01:00
Dirk Klimpel	d666fc02fa	Add type hints to some tests files (#12371 )	2022-04-05 13:54:41 +01:00
Richard van der Hoff	e24ff8ebe3	Remove `HomeServer.get_datastore()` (#12031 ) The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733	2022-02-23 11:04:02 +00:00
Andrew Morgan	fe604a022a	Remove various bits of compatibility code for Python <3.6 (#9879 ) I went through and removed a bunch of cruft that was lying around for compatibility with old Python versions. This PR also will now prevent Synapse from starting unless you're running Python 3.6+.	2021-04-27 13:13:07 +01:00
Jonathan de Jong	4b965c862d	Remove redundant "coding: utf-8" lines (#9786 ) Part of #9744 Removes all redundant `# -- coding: utf-8 --` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`	2021-04-14 15:34:27 +01:00
Richard van der Hoff	7b71695388	Combine the two sets of tests for CacheDescriptor	2020-10-21 15:38:29 +01:00
Richard van der Hoff	470dedd266	Combine the two sets of DeferredCache tests	2020-10-14 23:49:27 +01:00
Richard van der Hoff	4182bb812f	move DeferredCache into its own module	2020-10-14 23:38:14 +01:00
Richard van der Hoff	9f87da0a84	Rename Cache->DeferredCache	2020-10-14 23:38:14 +01:00
Patrick Cloke	c619253db8	Stop sub-classing object (#8249 )	2020-09-04 06:54:56 -04:00
Erik Johnston	a7bdf98d01	Rename database classes to make some sense (#8033 )	2020-08-05 21:38:57 +01:00
Amber Brown	7cb8b4bc67	Allow configuration of Synapse's cache without using synctl or environment variables (#6391 )	2020-05-11 18:45:23 +01:00
Patrick Cloke	509e381afa	Clarify list/set/dict/tuple comprehensions and enforce via flake8 (#6957 ) Ensure good comprehension hygiene using flake8-comprehensions.	2020-02-21 07:15:07 -05:00
Erik Johnston	756d4942f5	Move DB pool and helper functions into dedicated Database class	2019-12-05 10:46:37 +00:00
Erik Johnston	ee86abb2d6	Remove underscore from SQLBaseStore functions	2019-12-04 16:23:43 +00:00
Erik Johnston	326b3dace7	Make ObservableDeferred.observe() always return deferred. This makes it easier to use in an async/await world. Also fixes a bug where cache descriptors would occaisonally return a raw value rather than a deferred.	2019-10-30 11:35:46 +00:00
Amber Brown	a06614bd2a	UPSERT many functionality (#4644 )	2019-02-20 23:03:30 +11:00
black	8b3d9b6b19	Run black.	2018-08-10 23:54:09 +10:00
Amber Brown	b37c472419	Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678 )	2018-08-10 23:50:21 +10:00
Amber Brown	49af402019	run isort	2018-07-09 16:09:20 +10:00
Erik Johnston	29a4066a4d	Update test	2017-07-04 10:21:25 +01:00
Erik Johnston	eefd9fee81	Fix up tests	2017-03-30 14:14:46 +01:00
Erik Johnston	f85b6ca494	Speed up cache size calculation Instead of calculating the size of the cache repeatedly, which can take a long time now that it can use a callback, instead cache the size and update that on insertion and deletion. This requires changing the cache descriptors to have two caches, one for pending deferreds and the other for the actual values. There's no reason to evict from the pending deferreds as they won't take up any more memory.	2017-01-17 11:18:13 +00:00
Erik Johnston	45fd2c8942	Ensure invalidation list does not grow unboundedly	2016-08-19 16:09:16 +01:00
Erik Johnston	c0d7d9d642	Rename to on_invalidate	2016-08-19 15:13:58 +01:00
Erik Johnston	dc76a3e909	Make cache_context an explicit option	2016-08-19 15:02:38 +01:00
Erik Johnston	ba214a5e32	Remove lru option	2016-08-19 14:17:11 +01:00
Erik Johnston	4161ff2fc4	Add concept of cache contexts	2016-08-19 14:17:07 +01:00
Mark Haines	700487a7c7	Fix flake8 warnings for tests	2016-02-19 15:34:38 +00:00
Matthew Hodgson	6c28ac260c	copyrights	2016-01-07 04:26:29 +00:00
Erik Johnston	2df8dd9b37	Move all the caches into their own package, synapse.util.caches	2015-08-11 18:00:59 +01:00
Erik Johnston	b8e386db59	Change Cache to not use *args in its interface	2015-08-07 11:52:21 +01:00
Erik Johnston	7eea3e356f	Make @cached cache deferreds rather than the deferreds' values	2015-08-06 13:33:34 +01:00
Erik Johnston	d8866d7277	Caches should be bound to instances. Before, caches were global and so different instances of the stores would share caches. This caused problems in the unit tests.	2015-06-03 14:45:17 +01:00
Paul "LeoNerd" Evans	9ba6487b3f	Allow a choice of LRU behaviour for Cache() by using LruCache() or OrderedDict()	2015-03-25 19:05:34 +00:00
Paul "LeoNerd" Evans	7ab9f91a60	Unit-test that Cache() key eviction is ordered	2015-03-25 18:50:43 +00:00
Paul "LeoNerd" Evans	0f86312c4c	Pull out the cache logic from the @cached wrapper into its own class we can reuse	2015-03-20 18:25:42 +00:00
Paul "LeoNerd" Evans	f53fcbce97	Use cache.pop() instead of a separate membership test + del []	2015-02-23 18:30:45 +00:00
Paul "LeoNerd" Evans	e76d485e29	Allow @cached-wrapped functions to have a prefill method for setting entries	2015-02-23 15:41:54 +00:00
Paul "LeoNerd" Evans	ebc3db295b	Take named arguments to @cached() decorator, add a 'max_entries' limit	2015-02-19 18:36:02 +00:00

45 commits