Emphasize the right reasons to use (room_id, event_id) in a schema (#13915)

* Emphasize the right reasons to use (room_id, event_id)

Follow-up to:
 - https://github.com/matrix-org/synapse/pull/13701
 - https://github.com/matrix-org/synapse/pull/13771
This commit is contained in:
Eric Eastwood 2022-09-27 14:43:16 -05:00 committed by GitHub
parent f5aaa55e27
commit 35e9d6a616
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 17 additions and 15 deletions

1
changelog.d/13915.doc Normal file
View file

@ -0,0 +1 @@
Emphasize the right reasons when to use `(room_id, event_id)` in a database schema.

View file

@ -195,23 +195,24 @@ There are three separate aspects to this:
## `event_id` global uniqueness ## `event_id` global uniqueness
In room versions `1` and `2` it's possible to end up with two events with the `event_id`'s can be considered globally unique although there has been a lot of
same `event_id` (in the same or different rooms). After room version `3`, that debate on this topic in places like
can only happen with a hash collision, which we basically hope will never [MSC2779](https://github.com/matrix-org/matrix-spec-proposals/issues/2779) and
happen. [MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
has no resolution yet (as of 2022-09-01). There are several places in Synapse
There are several places in Synapse and even Matrix APIs like [`GET and even in the Matrix APIs like [`GET
/_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid) /_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid)
where we assume that event IDs are globally unique. where we assume that event IDs are globally unique.
But hash collisions are still possible, and by treating event IDs as room When scoping `event_id` in a database schema, it is often nice to accompany it
scoped, we can reduce the possibility of a hash collision. When scoping with `room_id` (`PRIMARY KEY (room_id, event_id)` and a `FOREIGN KEY(room_id)
`event_id` in the database schema, it should be also accompanied by `room_id` REFERENCES rooms(room_id)`) which makes flexible lookups easy. For example it
(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair makes it very easy to find and clean up everything in a room when it needs to be
`(room_id, event_id)`. purged (no need to use sub-`select` query or join from the `events` table).
A note on collisions: In room versions `1` and `2` it's possible to end up with
two events with the same `event_id` (in the same or different rooms). After room
version `3`, that can only happen with a hash collision, which we basically hope
will never happen (SHA256 has a massive big key space).
There has been a lot of debate on this in places like
https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and
[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
has no resolution yet (as of 2022-09-01).