Should fix the following issues or make a lot less worse when using
Postgres:
The main issue behind #2911: The client gives up after a certain time,
causing a cascade of context errors, because the response couldn't be
built up fast enough. This mostly happens on accounts with many rooms,
due to the inefficient way we're getting recent events and current state
For #2777: The queries for getting the membership events for history
visibility were being executed for each room (I think 185?), resulting
in a whooping 2k queries for membership events. (Getting the
statesnapshot -> block nids -> actual wanted membership event)
Both should now be better by:
- Using a LATERAL join to get all recent events for all joined rooms in
one go (TODO: maybe do the same for room summary and current state etc)
- If we're lazy loading on initial syncs, we're now not getting the
whole current state, just to drop the majority of it because we're lazy
loading members - we add a filter to exclude membership events on the
first call to `CurrentState`.
- Using an optimized query to get the membership events needed to
calculate history visibility
---------
Co-authored-by: kegsay <kegan@matrix.org>
This adds a new admin endpoint `/_dendrite/admin/purgeRoom/{roomID}`. It
completely erases all database entries for a given room ID.
The roomserver will start by clearing all data for that room and then
will generate an output event to notify downstream components (i.e. the
sync API and federation API) to do the same.
It does not currently clear media and it is currently not implemented
for SQLite since it relies on SQL array operations right now.
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: Till Faelligen <2353100+S7evinK@users.noreply.github.com>
The stale device lists table might contain entries for users we don't
share a room with anymore. This now asks the roomserver about left users
and removes those entries from the table.
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Makes the following tests pass
```
/upgrade moves remote aliases to the new room
Local and remote users' homeservers remove a room from their public directory on upgrade
```
This optimizes history visibility checks by (mostly) avoiding database
hits.
Possibly solves https://github.com/matrix-org/dendrite/issues/2777
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Adds `PUT
/_matrix/client/v3/directory/list/appservice/{networkId}/{roomId}` and
`DELTE
/_matrix/client/v3/directory/list/appservice/{networkId}/{roomId}`
support, as well as the ability to filter `/publicRooms` on networkID
and including all networks.
* Reprocess outliers that were previously rejected
* Might as well do all events this way
* More useful errors
* Fix queries
* Tweak condition
* Don't wrap errors
* Report more useful error
* Flatten error on `r.Queryer.QueryStateAfterEvents`
* Some more debug logging
* Flatten error in `QueryRestrictedJoinAllowed`
* Revert "Flatten error in `QueryRestrictedJoinAllowed`"
This reverts commit 1238b4184c.
* Tweak `QueryStateAfterEvents`
* Handle MissingStateError too
* Scope to room
* Clean up
* Fix the error
* Only apply rejection check to outliers
* Try optimising checking if server is allowed to see event
* Fix error
* Handle case where snapshot NID is 0
* Fix query
* Update SQL
* Clean up `CheckServerAllowedToSeeEvent`
* Not supported on SQLite
* Maybe placate the unit tests
* Review comments
* Try Ristretto cache
* Tweak
* It's beautiful
* Update GMSL
* More strict keyable interface
* Fix that some more
* Make less panicky
* Don't enforce mutability checks for now
* Determine mutability using deep equality
* Tweaks
* Namespace keys
* Make federation caches mutable
* Update cost estimation, add metric
* Update GMSL
* Estimate cost for metrics better
* Reduce counters a bit
* Try caching events
* Some guards
* Try again
* Try this
* Use separate caches for hopefully better hash distribution
* Fix bug with admitting events into cache
* Try to fix bugs
* Check nil
* Try that again
* Preserve order jeezo this is messy
* thanks VS Code for doing exactly the wrong thing
* Try this again
* Be more specific
* aaaaargh
* One more time
* That might be better
* Stronger sorting
* Cache expiries, async publishing of EDUs
* Put it back
* Use a shared cache again
* Cost estimation fixes
* Update ristretto
* Reduce counters a bit
* Clean up a bit
* Update GMSL
* 1GB
* Configurable cache sizees
* Tweaks
* Add `config.DataUnit` for specifying friendly cache sizes
* Various tweaks
* Update GMSL
* Add back some lazy loading caching
* Include key in cost
* Include key in cost
* Tweak max age handling, config key name
* Only register prometheus metrics if requested
* Review comments @S7evinK
* Don't return errors when creating caches (it is better just to crash since otherwise we'll `nil`-pointer exception everywhere)
* Review comments
* Update sample configs
* Update GHA Workflow
* Update Complement images to Go 1.18
* Remove the cache test from the federation API as we no longer guarantee immediate cache admission
* Don't check the caches in the renewal test
* Possibly fix the upgrade tests
* Update to matrix-org/gomatrixserverlib#322
* Update documentation to refer to Go 1.18
* Ensure we check powerlevel/origin before redacting an event
* Add passing test
* Use pl.UserLevel
* Make check more readable, also check for the sender
* Add Room Aliases tests
* Add Rooms table test
* Move StateKeyTuplerSorter to the types package
* Add StateBlock tests
Some optimizations
* Add State Snapshot tests
Some optimization
* Return []int64 and convert to pq.Int64Array for postgres
* Move []types.EventNID back to rows.Next()
* Update tests, rename SelectRoomIDs
* Assign room NIDs and state key NIDs outside of database transactions
* In roomserver storage package too
* Don't take a `txn` parameter, clean up SQLite
* Let's try to work out why this endpoint lies
* Try that again
* Fix `QueryPublishedRooms`
* Remove logging
* Remove unnecessary change
* Remove unnecessary change
Previously this error line would print because we were pulling out all user memberships, but now this is no longer necessary — an event state key that we don't know will no longer get passed to `SelectJoinedUsersSetForRooms` at all.
* Initial cut at fixing up MSC2946 to work with latest spec
* bugfix: send response back correctly
* Initial working version of MSC2946
* msc2946: handle suggested_only; remove custom database
As the MSC doesn't require reverse lookups, we can just pull
the room state and inspect via the roomserver database. To
handle this, expand QueryCurrentState to support wildcards.
Use all this and handle `?suggested_only`.
* Sort child rooms
* msc2946: Make TestClientSpacesSummary pass
* msc2946: allow invited rooms to be spidered
* msc2946: support basic federation requests
* fix up go mod
* Remove error when state keys are missing for user NIDs
There is still an actual bug here somewhere in the membership updater, but this check does more harm than good, since it means that the key consumers don't actually distribute updates to *anyone*. It's better just to deal with this silently for now.
To find these broken rows:
```
SELECT * FROM roomserver_membership AS m WHERE NOT EXISTS (
SELECT event_state_key_nid FROM roomserver_event_state_keys AS s
WHERE m.sender_nid = s.event_state_key_nid
);
```
* Logging
* Don't proactively cache event types and state keys when we don't know if the transaction has persisted yet
* Remove event type and state key caches altogether
* Only add events to `add_state_events` that haven't already been sent to the roomserver output before
* Filter on event NIDs instead, hopefully bring joy to SQLite
* UnsentFilter, review comments
* Ensure the input API only uses a single transaction
* Remove more of the dead query API call
* Tidy up
* Fix tests hopefully
* Don't do unnecessary work for rooms that don't exist
* Improve error, fix another case where transaction wasn't used properly
* Add a unit test for checking single transaction on RS input API
* Fix logic oops when deciding whether to use a transaction in storeEvent
* Check that we have a populated state snapshot when determining if we closed the gap
* Do the same in the query API
* Use HasState more opportunistically
* Try to avoid falling down the hole of using a trustworthy but empty state snapshot for non-create events
* Refactor missing state and make sure that we really solve the problem for the new event
* Comments
* Review comments
* Tweak that check again
* Tidy up that create check further
* Fix build hopefully
* Update sendOutliers to use OrderAuthAndStateEvents
* Don't go out of bounds on missingEvents
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input
* Better transaction management
* Tweak order
* Handle cases where the room does not exist
* Other fixes
* More tweaks
* Fill some gaps
* Fill in the gaps
* good lord it gets worse
* Don't roll back transactions when events rejected
* Pass through errors properly
* Fix bugs
* Fix incorrect error check
* Don't panic on nil txns
* Tweaks
* Hopefully fix panics for good in SQLite this time
* Fix rollback
* Minor bug fixes with latest event updater
* Some review comments
* Revert "Some review comments"
This reverts commit 0caf8cf53e.
* Fix a couple of bugs
* Clearer commit and rollback results
* Remove unnecessary prepares
The server ACL code on startup will grab all known rooms from
the rooms_table and then call `GetStateEvent` with each found
room ID to find the server ACL event. This can fail for stub
rooms, which will be present in the rooms table. Previously
this would result in an error being returned and the server
failing to start (!). Now we just return no event for stub
rooms.
* Generate m.room.canonical_alias instead of legacy m.room.aliases
* Add omitempty tags
* Add aliases endpoint to client API
* Check power levels when setting aliases
* Don't return null on /aliases
* Don't return error if the state event fails
* Update sytest-whitelist
* Don't send updated m.room.canonical_alias events
* Don't check PLs after all because for local aliases they are apparently irrelevant
* Fix some bugs
* Allow deleting a local alias with enough PL
* Fix some more bugs
* Update sytest-whitelist
* Fix copyright notices
* Review comments
* Add room membership and powerlevel checks for func SendBan
* Added non-error return to func GetStateEvent when no state events with the specified state key are found
* Add passing tests to whitelist
* Fixed formatting
* Update roomserver/storage/shared/storage.go
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: kegsay <kegsay@gmail.com>
* Add more optimised code path for checking if we're in a room
* Fix database queries
* Fix federation API test
* Fix logging
* Review comments
* Make separate API call for room membership