This should avoid 2 additional DB roundtrips if we don't want to use
relays.
So instead of possibly doing roughly 20k trips to the DB, we are now
"only" doing ~6600.
---------
Co-authored-by: devonh <devon.dmytro@gmail.com>
This should fix#3004 by making sure we also update our in-memory ACLs
after joining a new room.
Also makes use of more caching in `GetStateEvent`
Bonus: Adds some tests, as I was about to use `GetBulkStateContent`, but
turns out that `GetStateEvent` is basically doing the same, just that it
only gets the `eventTypeNID`/`eventStateKeyNID` once and not for every
call.
Use `IsBlacklistedOrBackingOff` from the federation API to check if we
should fetch devices.
To reduce back pressure, we now only queue retrying servers if there's
space in the channel.
There are cases where a dendrite instance is unaware of a pseudo ID for
a user, the user is not a member of that room. To represent this case,
we currently use the 'zero' value, which is often not checked and so
causes errors later down the line. To make this case more explict, and
to be consistent with `QueryUserIDForSender`, this PR changes this to
use a pointer (and `nil` to mean no sender ID).
Signed-off-by: `Sam Wedgwood <sam@wedgwood.dev>`
Background federated joins are currently broken since they timeout after
30s. This timeout didn't exist before the refactor. It should still exist but it needs to be extended to allow for the additional time it can take a server to generate the /send_join response when joining a complex room.
They are fundamentally different concepts, so should be represented as
such. Proto events are exchanged in /make_xxx calls over federation, and
made as "fledgling" events in /createRoom and general event sending.
*Building* events is a reasonably complex VERSION SPECIFIC process which
needs amongst other things, auth event providers, prev events, signing
keys, etc.
Requires https://github.com/matrix-org/gomatrixserverlib/pull/379
Requires https://github.com/matrix-org/gomatrixserverlib/pull/376
This has numerous upsides:
- Less type casting to `*Event` is required.
- Making Dendrite work with `PDU` interfaces means we can swap out Event
impls more easily.
- Tests which represent weird event shapes are easier to write.
Part of a series of refactors on GMSL.
We only use it in a few places currently, enough to get things to
compile and run. We should be using it in much more places.
Similarly, in some places we cast []PDU back to []*Event, we need to not
do that. Likewise, in some places we cast PDU to *Event, we need to not
do that. For now though, hopefully this is a start.
Replaced with types.HeaderedEvent _for now_. In reality we want to move
them all to gmsl.Event and only use HeaderedEvent when we _need_ to
bundle the version/event ID with the event (seriailsation boundaries,
and even then only when we don't have the room version).
Requires https://github.com/matrix-org/gomatrixserverlib/pull/373
As outlined in https://github.com/matrix-org/gomatrixserverlib/pull/368
The main change Dendrite side is that `RoomVersion` no longer has any
methods on it. Instead, you need to bounce via `gmsl.GetRoomVersion`.
It's very interesting to see where exactly Dendrite cares about this.
For some places it's creating events (fine) but others are way more
specific. Those areas will need to migrate to GMSL at some point.
This removes most of the code used for polylith/API mode.
This removes the `/api` internal endpoints entirely.
Binary size change roughly 5%:
```
51437560 Feb 13 10:15 dendrite-monolith-server # old
48759008 Feb 13 10:15 dendrite-monolith-server # new
```
This extends the dendrite monolith for pinecone to integrate the s&f
features into the mobile apps.
Also makes a few tweaks to federation queueing/statistics to make some
edge cases more robust.
This adds store & forward relays into dendrite for p2p.
A few things have changed:
- new relay api serves new http endpoints for s&f federation
- updated outbound federation queueing which will attempt to forward
using s&f if appropriate
- database entries to track s&f relays for other nodes
Adds wakeup broadcast handling to the pinecone demos.
This will reset their blacklist status and interrupt any ongoing
federation queue backoffs currently in progress for this peer.
The end result is that any queued events will quickly be sent to the
peer if they had disconnected while attempting to send events to them.
* Add `QueryRestrictedJoinAllowed`
* Add `Resident` flag to `QueryRestrictedJoinAllowedResponse`
* Check restricted joins on federation API
* Return `Restricted` to determine if the room was restricted or not
* Populate `AuthorisedVia` properly
* Sign the event on `/send_join`, return it in the `/send_join` response in the `"event"` key
* Kick back joins with invalid authorising user IDs, use event from `"event"` key if returned in `RespSendJoin`
* Use invite helper in `QueryRestrictedJoinAllowed`
* Only use users with the power to invite, change error bubbling a bit
* Placate the almighty linter
One day I will nuke `gocyclo` from orbit and everything in the world will be much better for it.
* Review comments
* Fix flakey sytest 'Local device key changes get to remote servers'
* Debug logs
* Remove internal/test and use /test only
Remove a lot of ancient code too.
* Use FederationRoomserverAPI in more places
* Use more interfaces in federationapi; begin adding regression test
* Linting
* Add regression test
* Unbreak tests
* ALL THE LOGS
* Fix a race condition which could cause events to not be sent to servers
If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.
* Unbreak new tests
* Linting
* tidy up interfaces
* remove unused GetCreatorIDForAlias
* Add RoomserverUserAPI interface
* Define more interfaces
* Use AppServiceInternalAPI for consistent naming
* clean up federationapi constructor a bit
* Fix monolith in -http mode
* Initial cut at fixing up MSC2946 to work with latest spec
* bugfix: send response back correctly
* Initial working version of MSC2946
* msc2946: handle suggested_only; remove custom database
As the MSC doesn't require reverse lookups, we can just pull
the room state and inspect via the roomserver database. To
handle this, expand QueryCurrentState to support wildcards.
Use all this and handle `?suggested_only`.
* Sort child rooms
* msc2946: Make TestClientSpacesSummary pass
* msc2946: allow invited rooms to be spidered
* msc2946: support basic federation requests
* fix up go mod
* Use new event json types in gmsl
* Fix EventJSON to actually unmarshal events
* Update GMSL
* Bump GMSL and improve error messages
* Send back the correct RespState
* Update GMSL
* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>