* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input
* Better transaction management
* Tweak order
* Handle cases where the room does not exist
* Other fixes
* More tweaks
* Fill some gaps
* Fill in the gaps
* good lord it gets worse
* Don't roll back transactions when events rejected
* Pass through errors properly
* Fix bugs
* Fix incorrect error check
* Don't panic on nil txns
* Tweaks
* Hopefully fix panics for good in SQLite this time
* Fix rollback
* Minor bug fixes with latest event updater
* Some review comments
* Revert "Some review comments"
This reverts commit 0caf8cf53e.
* Fix a couple of bugs
* Clearer commit and rollback results
* Remove unnecessary prepares
* PerformInvite: bugfix and rejig control flow
Local clients would not be notified of invites to rooms
Dendrite had already joined in all cases due to not returning
an `api.OutputNewInviteEvent` for local invites. We now do this.
This was an easy mistake to make due to the control flow of the
function which doesn't handle the happy case at the end of the
function and instead forks the function depending on if the
invite was via federation or not. This has now been changed to
handle the federated invite as if it were an error (in that we
check it, do it and bail out) rather than outstay our welcome.
This ends up with the local invite being the happy case, which
now both sends an `InputRoomEvent` to the roomserver _and_ a
`api.OutputNewInviteEvent` is returned.
* Don't send invite pokes in PerformInvite
* Move event ID into logger
* Improve server selection somewhat
* Remove things from the map when we're done
* Be less panicky about auth event signatures in case they are not fatal after all
* Accept HasState in all cases
* Send join asynchronously
* Revert "Send join asynchronously"
This reverts commit 5b685bfcd0.
* Joins and leaves use background context
* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>
The server ACL code on startup will grab all known rooms from
the rooms_table and then call `GetStateEvent` with each found
room ID to find the server ACL event. This can fail for stub
rooms, which will be present in the rooms table. Previously
this would result in an error being returned and the server
failing to start (!). Now we just return no event for stub
rooms.
* Use named NATS durable consumers
* Build fixes
* Remove dupe call to SetFederationAPI
* Use namespaced consumer name
* Fix namespacing
* Fix unit tests hopefully
* Add NATS JetStream support
Update shopify/sarama
* Fix addresses
* Don't change Addresses in Defaults
* Update saramajetstream
* Add missing error check
Keep typing events for at least one minute
* Use all configured NATS addresses
* Update saramajetstream
* Try setting up with NATS
* Make sure NATS uses own persistent directory (TODO: make this configurable)
* Update go.mod/go.sum
* Jetstream package
* Various other refactoring
* Build fixes
* Config tweaks, make random jetstream storage path for CI
* Disable interest policies
* Try to sane default on jetstream base path
* Try to use in-memory for CI
* Restore storage/retention
* Update nats.go dependency
* Adapt changes to config
* Remove unneeded TopicFor
* Dep update
* Revert "Remove unneeded TopicFor"
This reverts commit f5a4e4a339.
* Revert changes made to streams
* Fix build problems
* Update nats-server
* Update go.mod/go.sum
* Roomserver input API queuing using NATS
* Fix topic naming
* Prometheus metrics
* More refactoring to remove saramajetstream
* Add missing topic
* Don't try to populate map that doesn't exist
* Roomserver output topic
* Update go.mod/go.sum
* Message acknowledgements
* Ack tweaks
* Try to resume transaction re-sends
* Try to resume transaction re-sends
* Update to matrix-org/gomatrixserverlib@91dadfb
* Remove internal.PartitionStorer from components that don't consume keychanges
* Try to reduce re-allocations a bit in resolveConflictsV2
* Tweak delivery options on RS input
* Publish send-to-device messages into correct JetStream subject
* Async and sync roomserver input
* Update dendrite-config.yaml
* Remove roomserver tests for now (they need rewriting)
* Remove roomserver test again (was merged back in)
* Update documentation
* Docker updates
* More Docker updates
* Update Docker readme again
* Fix lint issues
* Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)
* Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that
* Go 1.16 instead of Go 1.13 for upgrade tests and Complement
* Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"
This reverts commit 368675283f.
* Don't report any errors on `/send` to see what fun that creates
* Fix panics on closed channel sends
* Enforce state key matches sender
* Do the same for leave
* Various tweaks to make tests happier
Squashed commit of the following:
commit 13f9028e7a
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 15:47:14 2022 +0000
Do the same for leave
commit e6be7f05c3
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 15:33:42 2022 +0000
Enforce state key matches sender
commit 85ede6d64b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 14:07:04 2022 +0000
Fix panics on closed channel sends
commit 9755494a98
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 13:38:22 2022 +0000
Don't report any errors on `/send` to see what fun that creates
commit 3bb4f87b5d
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 13:00:26 2022 +0000
Revert "Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that"
This reverts commit 368675283f.
commit fe2673ed7b
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 12:09:34 2022 +0000
Go 1.16 instead of Go 1.13 for upgrade tests and Complement
commit 368675283f
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 11:51:45 2022 +0000
Don't report event rejection errors via `/send`, since apparently this is upsetting tests that don't expect that
commit b028dfc085
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Tue Jan 4 10:29:08 2022 +0000
Send final event in `processEvent` synchronously (since this might stop Sytest from being so upset)
* Merge in NATS Server v2.6.6 and nats.go v1.13 into the in-process connection fork
* Add `jetstream.WithJetStreamMessage` to make ack/nak-ing less messy, use process context in consumers
* Fix consumer component name in federation API
* Add comment explaining where streams are defined
* Tweaks to roomserver input with comments
* Finish that sentence that I apparently forgot to finish in INSTALL.md
* Bump version number of config to 2
* Add comments around asynchronous sends to roomserver in processEventWithMissingState
* More useful error message when the config version does not match
* Set version in generate-config
* Fix version in config.Defaults
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Initial federation sender -> federation API refactoring
* Move base into own package, avoids import cycle
* Fix build errors
* Fix tests
* Add signing key server tables
* Try to fold signing key server into federation API
* Fix dendritejs builds
* Update embedded interfaces
* Fix panic, fix lint error
* Update configs, docker
* Rename some things
* Reuse same keyring on the implementing side
* Fix federation tests, `NewBaseDendrite` can accept freeform options
* Fix build
* Update create_db, configs
* Name tables back
* Don't rename federationsender consumer for now
* Add more logs
To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.
* Fix query
* Do not store 'null' in the database for empty JSON arrays
This can cause issues, though it should be noted that the majority
of the time this will marshal/unmarshal just fine, see
https://play.golang.org/p/Doe2NZUgv7Q
* bugfix: sqlite migration should handle create events as having no 'before' snapshot
The state snapshot for any given event in the roomserver represents the state _before_
the event. For the create event, this is nothing, so the state snapshot nid should be 0.
In some cases this wasn't happening, resulting in a nice mix of possible options including:
- A state snapshot without any state blocks `[]` or `null`.
- A state snapshot with a single state block with a single event, the create event, causing
a circular loop. This is incorrect as it represents the state before the event, not after.
* Add state key check
* Topologically sort outliers in SendEventWithState
* Knock in membership updater
* Update gomatrixserverlib
* Update gomatrixserverlib
* Get the NID of the knock event properly for the membership updater
* Generate m.room.canonical_alias instead of legacy m.room.aliases
* Add omitempty tags
* Add aliases endpoint to client API
* Check power levels when setting aliases
* Don't return null on /aliases
* Don't return error if the state event fails
* Update sytest-whitelist
* Don't send updated m.room.canonical_alias events
* Don't check PLs after all because for local aliases they are apparently irrelevant
* Fix some bugs
* Allow deleting a local alias with enough PL
* Fix some more bugs
* Update sytest-whitelist
* Fix copyright notices
* Review comments
* Add room membership and powerlevel checks for func SendBan
* Added non-error return to func GetStateEvent when no state events with the specified state key are found
* Add passing tests to whitelist
* Fixed formatting
* Update roomserver/storage/shared/storage.go
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Co-authored-by: kegsay <kegan@matrix.org>
Co-authored-by: kegsay <kegsay@gmail.com>
* bugfix: retire invites even when we cannot talk to the remote server to make/send_leave
Also modify the leave response in /sync to include a fake event as this is ultimately
what clients (and sytest) will use to determine leave-ness.
* hash the event ID
* Base64 not hex
* Add more optimised code path for checking if we're in a room
* Fix database queries
* Fix federation API test
* Fix logging
* Review comments
* Make separate API call for room membership
* db migration: fix#1844 and add additional assertions
- Migration scripts will now check to see if there are any unconverted
snapshot IDs and fail the migration if there are any. This should
prevent people from getting a corrupt database in the event the root
cause is still unknown.
- Add an ORDER BY clause when doing batch queries in the postgres
migration. LIMIT and OFFSET without ORDER BY are undefined and must
not be relied upon to produce a deterministic ordering (e.g row order).
See https://www.postgresql.org/docs/current/queries-limit.html
* Linting
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Hash-deduplicated state storage (and migrations) for PostgreSQL and SQLite
* Refactor droomserver database setup for migrations
* Fix conflict statements
* Update migration names
* Set a boundary for old to new block/snapshot IDs so we don't rewrite them more than once accidentally
* Create sequence if not exists
* Fix boundary queries
* Fix boundary queries
* Use Query
* Break out queries a bit
* More sequence tweaks
* Query parameters are not playing the game
* Injection escaping may not work for CREATE SEQUENCE after all
* Fix snapshot sequence name
* Use boundaried IDs in SQLite too
* Use IFNULL for SQLite
* Use COALESCE in PostgreSQL
* Review comments @Kegsay
* Check membership of room
* Use QueryStateAfterEventsResponse
* Fix complexity
* Add field ShouldHitAppservice to GetRoomIDForAlias
* Hit appservice when trying to join a non-existent alias
* remove unused
* Changes that I made a long time ago
* Rename to appserviceJoinedAtEvent
* Check membership in GetMemberships
* Update QueryMembershipsForRoom
* Tweaks in client API
* Update appserviceJoinedAtEvent
* Comments
* Try QueryMembershipForUser instead
* Undo some changes to client API that shouldn't be needed
* More /event tweaks
* Refactor /event bit
* Go back to QueryMembershipsForRoom because appservices are hard
* Fix bugs in onMessage
* Add comments
* More logical naming, clean up a bit
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Check membership of room
* Use QueryStateAfterEventsResponse
* Fix complexity
* Changes that I made a long time ago
* Rename to appserviceJoinedAtEvent
* Check membership in GetMemberships
* Update QueryMembershipsForRoom
* Tweaks in client API
* Update appserviceJoinedAtEvent
* Comments
* Try QueryMembershipForUser instead
* Undo some changes to client API that shouldn't be needed
* More /event tweaks
* Refactor /event bit
* Go back to QueryMembershipsForRoom because appservices are hard
* Fix bugs in onMessage
* Add comments
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Look up servers less often, don't hit API for missing auth events unless there are actually missing auth events
* Remove ResolveConflictsAdhoc (since it is already in GMSL), other tweaks
* Update gomatrixserverlib to matrix-org/gomatrixserverlib#254
* Fix resolve-state
* Initialise t.servers on first use
MSC1772 needs this because the create event contains info on if
the room is a space or not. The create event itself isn't sensitive
so other people may find this useful too.
* a very very WIP first cut of peeking via MSC2753.
doesn't yet compile or work.
needs to actually add the peeking block into the sync response.
checking in now before it gets any bigger, and to gather any initial feedback on the vague shape of it.
* make PeekingDeviceSet private
* add server_name param
* blind stab at adding a `peek` section to /sync
* make it build
* make it launch
* add peeking to getResponseWithPDUsForCompleteSync
* cancel any peeks when we join a room
* spell out how to runoutside of docker if you want speed
* fix SQL
* remove unnecessary txn for SelectPeeks
* fix s/join/peek/ cargocult fail
* HACK: Track goroutine IDs to determine when we write by the wrong thread
To use: set `DENDRITE_TRACE_SQL=1` then grep for `unsafe`
* Track partition offsets and only log unsafe for non-selects
* Put redactions in the writer goroutine
* Update filters on writer goroutine
* wrap peek storage in goid hack
* use exclusive writer, and MarkPeeksAsOld more efficiently
* don't log ascii in binary at sql trace...
* strip out empty roomd deltas
* re-add txn to SelectPeeks
* re-add accidentally deleted field
* reject peeks for non-worldreadable rooms
* move perform_peek
* fix package
* correctly refactor perform_peek
* WIP of implementing MSC2444
* typo
* Revert "Merge branch 'kegan/HACK-goid-sqlite-db-is-locked' into matthew/peeking"
This reverts commit 3cebd8dbfb, reversing
changes made to ed4b3a58a7.
* (almost) make it build
* clean up bad merge
* support SendEventWithState with optional event
* fix build & lint
* fix build & lint
* reinstate federated peeks in the roomserver (doh)
* fix sql thinko
* todo for authenticating state returned by /peek
* support returning current state from QueryStateAndAuthChain
* handle SS /peek
* reimplement SS /peek to prod the RS to tell the FS about the peek
* rename RemotePeeks as OutboundPeeks
* rename remote_peeks_table as outbound_peeks_table
* add perform_handle_remote_peek.go
* flesh out federation doc
* add inbound peeks table and hook it up
* rename ambiguous RemotePeek as InboundPeek
* rename FSAPI's PerformPeek as PerformOutboundPeek
* setup inbound peeks db correctly
* fix api.SendEventWithState with no event
* track latestevent on /peek
* go fmt
* document the peek send stream race better
* fix SendEventWithRewrite not to bail if handed a non-state event
* add fixme
* switch SS /peek to use SendEventWithRewrite
* fix comment
* use reverse topo ordering to find latest extrem
* support postgres for federated peeking
* go fmt
* back out bogus go.mod change
* Fix performOutboundPeekUsingServer
* Fix getAuthChain -> GetAuthChain
* Fix build issues
* Fix build again
* Fix getAuthChain -> GetAuthChain
* Don't repeat outbound peeks for the same room ID to the same servers
* Fix lint
* Don't omitempty to appease sytest
Co-authored-by: Kegan Dougal <kegan@matrix.org>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Light-weight checking of state changes when updating forward extremities
* Only do this for non-state events, since state events will always result in state change at extremities
Squashed commit of the following:
commit e5e2d793119733ecbcf9b85f966e018ab0318741
Author: Neil Alexander <neilalexander@users.noreply.github.com>
Date: Wed Jan 13 17:28:12 2021 +0000
Add dendrite_roomserver_processroomevent_duration_millis to prometheus
* Add RoomInfo cache, remove RoomServerRoomNID cache, ensure caches are thread-safe
* Don't panic if the roomInfo isn't known yet
* LRU package is already threadsafe
* Use RoomInfo cache to find room version if possible in Events()
* Adding comments about RoomInfoCache safety
* Hit the database far less to find room NIDs for event NIDs
* Close the rows
* Fix SQLite selectRoomNIDsForEventNIDsSQL
* Give same treatment to room version lookups
* Update GMSL
* Add MSC2836EventRelationships to fedsender
* Call MSC2836EventRelationships in reqCtx
* auth remote servers
* Extract room ID and servers from previous events; refactor a bit
* initial cut of federated threading
* Use the right client/fed struct in the response
* Add QueryAuthChain for use with MSC2836
* Add auth chain to federated response
* Fix pointers
* under CI: more logging and enable mscs, nil fix
* Handle direction: up
* Actually send message events to the roomserver..
* Add children and children_hash to unsigned, with tests
* Add logic for exploring threads and tracking children; missing storage functions
* Implement storage functions for children
* Add fetchUnknownEvent
* Do federated hits for include_children if we have unexplored children
* Use /ev_rel rather than /event as the former includes child metadata
* Remove cross-room threading impl
* Enable MSC2836 in the p2p demo
* Namespace mscs db
* Enable msc2836 for ygg
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Don't recalculate event IDs so often
* Revert invite change
* Make sure we're using the right NIDs
* Update gomatrixserverlib
* Update to NewEventFromTrustedJSONWithEventID
* Fix go.mod
* Update gomatrixserverlib to matrix-org/gomatrixserverlib#243
* Use BulkSelectEventID
* Add mscs/hooks package, begin work for msc2836
* Flesh out hooks and add SQL schema
* Begin implementing core msc2836 logic
* Add test harness
* Linting
* Implement visibility checks; stub out APIs for tests
* Flesh out testing
* Flesh out walkThread a bit
* Persist the origin_server_ts as well
* Edges table instead of relationships
* Add nodes table for event metadata
* LEFT JOIN to extract origin_server_ts for children
* Add graph walking structs
* Implement walking algorithm
* Add more graph walking tests
* Add auto_join for local rooms
* Fix create table syntax on postgres
* Add relationship_room_id|servers to the unsigned section of events
* Persist the parent room_id/servers in edge metadata
Other events cannot assert the true room_id/servers for the
parent event, only make claims to them, hence why this is
edge metadata.
* guts to pass through room_id/servers
* Refactor msc2836 to allow handling from federation
* Add JoinedVia to PerformJoin responses
* Fix tests; review comments
The previous implementation was only checking if room history was
"shared", which it wasn't for rooms where a user was invited, or world
readable rooms.
This implementation leverages the IsServerAllowed method, which already
implements the complete verification algorithm.
Signed-off-by: `Mayeul Cantan <oss+matrix@mayeul.net>`
Co-authored-by: Kegsay <kegan@matrix.org>
* Add basic storage methods
* Add internal api handler
* Add check for forgotten room
* Add /rooms/{roomID}/forget endpoint
* Add missing rsAPI method
* Remove unused parameters
* Add passing tests
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Add missing file
* Add postgres migration
* Add sqlite migration
* Use Forgetter to forget room
* Remove empty line
* Update HTTP status codes
It looks like the spec calls for these to be 400, rather than 403: https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-rooms-roomid-forget
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Add configuration for max_message_bytes for sarama
* Log all errors when sending multiple messages
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Add missing config
* - Better comments on what MaxMessageBytes is used for
- Also sets the size the consumer may use
* Set RewritesState once
* Check if any new state provided
* Obey rewritesState
* Don't nuke everything the sync API knows when purging state
* Fix panic from duplicate insert
* Consistency
* Use HasState
* Remove nolint
* Clean up joined rooms on state rewrite
* Add resolve-state helper
* Tweaks
* Refactor forward extremities, again
* Tweaks
* Minor optimisation
* Make path a bit clearer
* Only process state/membership if forward extremities have changed
* Usage comments in resolve-state
* Fix sqite locking bugs present on sytest
Comments do the explaining.
* Fix deadlock in sqlite mode
Caused by starting a writer whilst within a writer
* Only complain about invalid state deltas for non-overwrite events
* Do not re-process outlier unnecessarily
* Add KindOld
* Don't process latest events/memberships for old events
* Allow federationsender to ignore duplicate key entries when LatestEventIDs is duplicated by RS output events
* Signal to downstream components if an event has become a forward extremity
* Don't exclude from sync
* Soft-fail checks on KindNew
* Don't run the latest events updater at all for KindOld
* Don't make federation sender change after all
* Kind in federation sender join
* Don't send isForwardExtremity
* Fix syncapi
* Update comments
* Fix SendEventWithState
* Update sytest-whitelist
* Generate old output events
* Sync API consumes old room events
* Update comments
* Start Kafka connection for each component that needs one
* Fix roomserver unit tests
* Rename to naffkaInstance (@Kegsay review comment)
* Fix import cycle
* Capture errors
* Don't request only state key tuples needed for auth (we end up discarding room state this way)
* QueryStateAfterEvent returns all state when no tuples supplied
* Resolve state
* Comments
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events
* Not finished refactor
* Fix test
* Remove isInboundTxn
* Remove debug logging
We expect to have missing events as we walk back in the DAG over federation
as we didn't always create the room. When checking if the server is allowed
to see those events, just give up and stop rather than fail the request.
* Rename serverkeyapi to signingkeyserver
We use "api" for public facing stuff and "server" for internal stuff.
As the server key API is internal only, we call it 'signing key server',
which also clarifies the type of key (as opposed to TLS keys, E2E keys, etc)
* Convert docker/scripts to use signing-key-server
* Rename missed bits
* Deep forward extremity calculation
* Use updater txn
* Update error
* Update error
* Create previous event references in StoreEvent
* Use latest events updater to row-lock prev events
* Fix unexpected fallthrough
* Fix deadlock
* Don't roll back
* Update comments in calculateLatest
* Don't include events that we can't find references for in the forward extremities
* Add another passing test
* Don't send rewrite events
* Remove final traces of rewrite events
* Remove test that is no longer needed
* Revert "Remove test that is no longer needed"
This reverts commit 9a45babff6.
* Update test to use KindOutlier
* Resolve state after event against current room state when determining latest state changes
* Update sytest-whitelist
* Update sytest-whitelist, blacklist
* Try to ask other servers in the room for missing events if the origin won't provide them
* Logging
* More logging
* Implement QueryMissingAuthPrevEvents
* Try to get missing auth events badly
* Use processEvent
* Logging
* Update QueryMissingAuthPrevEvents
* Try to find missing auth events
* Patchy fix for test
* Logging tweaks
* Send auth events as outliers
* Update check in QueryMissingAuthPrevEvents
* Error responses
* More return codes
* Don't return error on reject/soft-fail since it was ultimately handled
* More tweaks
* More error tweaks
* Sanity-check room version on RS event input
* Update gomatrixserverlib
* Reject make_join when no room members are left
* Revert some changes from wrong branch
* Distinguish between room not existing and room being abandoned on this server
* nolint
* Replace all usages of txn.Stmt with sqlutil.TxStmt
Signed-off-by: Sam Day <me@samcday.com>
* Fix sign off link in PR template.
Signed-off-by: Sam Day <me@samcday.com>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* WIP Event rejection
* Still send back errors for rejected events
Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.
* Implement rejected events
Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.
* Update test to match reality
* Modify InputRoomEvents to no longer return an error
Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.
* Remove redundant returns; linting
* Update blacklist
* SendEventWithState events as new
* Use cumulative state IDs for final event
* Error wrapping in calculateAndSetState
* Handle overwriting same event type and state key
* Hacky way to spot historical events
* Don't exclude from sync
* Don't generate output events when rewriting forward extremities
* Update output event check
* Historical output events
* Define output room event type
* Notify key changes on state
* Don't send our membership event twice
* Deduplicate state entries
* Tweaks
* Remove unnecessary nolint
* Fix current state upsert in sync API
* Send auth events as outliers, state events as rewrite
* Sync API don't consume state events
* Process events actually
* Improve outlier check
* Fix local room check
* Remove extra room check, it seems to break the whole damn world
* Fix federated join check
* Fix nil pointer exception
* Better comments on DeduplicateStateEntries
* Reflow forced federated joins
* Don't force federated join for possibly even local invites
* Comment SendEventWithState better
* Rewrite room state in sync API storage
* Add TODO
* Clean up all room data when receiving create event
* Don't generate output events for rewrites, but instead notify that state is rewritten on the final new event
* Rename to PurgeRoom
* Exclude backfilled messages from /sync
* Split out rewriting state from updating state from state res
Co-authored-by: Kegan Dougal <kegan@matrix.org>
* Remove QueryBulkStateContent from current state server
Expected fail due to db impl not existing
* Implement query bulk state content
* Fix up rejecting invites over federation
* Fix bulk content marshalling
* Move currentstateserver API to roomserver
Stub out DB functions for now, nothing uses the roomserver version yet.
* Allow it to startup
* Implement some current-state-server storage interface functions
* Add missing package
* Initial FIFOing of roomserver inputs
* Remove EventID response from api.InputRoomEventsResponse
* Don't send back event ID unnecessarily
* Fix ordering hopefully
* Reduce copies, use buffered task channel to reduce contention on other rooms
* Fix error handling
* Add Queryer and use embedded structs
* Add Inputer and factor out more RS API stuff
This neatly splits up the RS API based on the functionality it provides,
whilst providing a useful place for code sharing via the `helpers` package.
* Use federation sender for backfill and getting missing events
* Fix internal URL paths
* Update go.mod/go.sum for matrix-org/gomatrixserverlib#218
* Add missing server implementations in HTTP interface
- New package `perform` which contains all `Perform` functions
- New package `helpers` which contains helper functions used by both
perform and query/input functions.
- Perform invite/leave have no idea how to `WriteOutputEvents` and this
is now returned from `PerformInvite` or `PerformLeave` respectively.
Still to do:
- RSAPI is fed into the inviter/joiner/leaver - this introduces circular
logic so will need to be removed.
- Put query operations in a `query` package.
- Put input operations (and output) in an `input` package.
- Factor out helper functions as much as possible, possibly rejigging the
storage layer in the process.
* Initial work on roomserver NID caches
* Give caches to roomserver storage
* Populate caches
* Fix bugs
* Fix WASM build
* Don't hit cache twice in RoomNIDExcludingStubs
* Store reverse room ID-room NID mapping, consult caches when assigning NIDs
* Offset updates take place using TransactionWriter
* Refactor TransactionWriter in current state server
* Refactor TransactionWriter in federation sender
* Refactor TransactionWriter in key server
* Refactor TransactionWriter in media API
* Refactor TransactionWriter in server key API
* Refactor TransactionWriter in sync API
* Refactor TransactionWriter in user API
* Fix deadlocking Sync API tests
* Un-deadlock device database
* Fix appservice API
* Rename TransactionWriters to Writers
* Move writers up a layer in sync API
* Document sqlutil.Writer interface
* Add note to Writer documentation
* Per-room input mutex
* GetMembership should use transaction when assigning state key NID
* Actually use writer transactions rather than ignoring them
* Limit per-room mutexes to Postgres
* Flip the check in InputRoomEvents
* Updated TransactionWriters, moved locks in roomserver, various other tweaks
* Fix redaction deadlocks
* Fix lint issue
* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter
* Fix us not sending transactions through in latest events updater
* Make PerformJoin send input membership event
* Invite input room events in separate goroutine
* Don't limit roomserver input events using request context
* Synchronous input room events
* Nope, that didn't work
* oops send state key to GetMembership
* Don't generate stripped state in client API more times than necessary, generate output events on receiving end of federated invite
* Commit membership updater changes
* Tweaks
* Initial pass at refactoring config (not finished)
* Don't forget current state and EDU servers
* More shifting around
* Update server key API tests
* Fix roomserver test
* Fix more tests
* Further tweaks
* Fix current state server test (sort of)
* Maybe fix appservices
* Fix client API test
* Include database connection string in database options
* Fix sync API build
* Update config test
* Fix unit tests
* Fix federation sender build
* Fix gobind build
* Set Listen address for all services in HTTP monolith mode
* Validate config, reinstate appservice derived in directory, tweaks
* Tweak federation API test
* Set MaxOpenConnections/MaxIdleConnections to previous values
* Update generate-config
* Modify /state/{eventType}/{stateKey} to return the event at the time the user left
Or live, depending on their current state. Hopefully fixes some sytests!
* Linting
* Set HasBeenInRoom
* Fix cases for world-readable history visibility
* Fix bug in finding the requested state event
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Use TransactionWriter on other component SQLites
* Fix sync API tests
* Fix panic in media API
* Fix a couple of transactions
* Fix wrong query, add some logging output
* Add debug logging into StoreEvent
* Adjust InsertRoomNID
* Update logging
* Add a bit more logging to the fedsender
* bugfix: continue sending PDUs if ones are added whilst sending another PDU
Without this, the queue goes back to sleep on `<-oq.notifyPDUs` which won't
fire because `pendingPDUs` is already > 0. This should fix a flakey sytest.
* Break if no txn is sent
* WIP syncapi work
* More debugging
* Bump GMSL version to pull in working Event.Redact
* Remove logging
* Make redactions work on v3+
* Fix more tests
* Emit redacted_event from the roomserver when redactions are validated
- Consume them in the currentstateserver and act accordingly.
- Add integration test for the roomserver to check that injecting
`m.room.redaction` events result in `redacted_event` being emitted.
* Linting
* Ignore events that redact themselves
* Implement core redaction logic
- Add a new `redactions_table.go` which tracks the mapping of
the redaction event ID and the redacted event ID
- Mark redactions as 'validated' when we have both events.
- When redactions are validated, add `unsigned.redacted_because`
and modify the `eventJSON` accordingly.
Note: We currently do NOT redact the event content - it's gated
behind a feature flag - until we have tested redactions a bit more.
* Linting
* Use content_value instead of membership
* Fix build
* Replace publicroomsapi with a combination of clientapi/roomserver/currentstateserver
- All public rooms paths are now handled by clientapi
- Requests to (un)publish rooms are sent to the roomserver via `PerformPublish`
which are stored in a new `published_table.go`
- Requests for public rooms are handled in clientapi by:
* Fetch all room IDs which are published using `QueryPublishedRooms` on the roomserver.
* Apply pagination parameters to the slice.
* Do a `QueryBulkStateContent` request to the currentstateserver to pull out
required state event *content* (not entire events).
* Aggregate and return the chunk.
Mostly but not fully implemented (DB queries on currentstateserver are missing)
* Fix pq query
* Make postgres work
* Make sqlite work
* Fix tests
* Unbreak pagination tests
* Linting
* Return remote errors from FS.PerformJoin
Follows the same pattern as PerformJoin on roomserver (no error return).
Also return the right format for incompatible room version errors.
Makes a bunch of tests pass!
* Handle network errors better when returning remote HTTP errors
* Linting
* Fix tests
* Update whitelist, pass network errors through in API=1 mode
* Add PerformInvite and refactor how errors get handled
- Rename `JoinError` to `PerformError`
- Remove `error` from the API function signature entirely. This forces
errors to be bundled into `PerformError` which makes it easier for callers
to detect and handle errors. On network errors, HTTP clients will make a
`PerformError`.
* Unbreak everything; thanks Go!
* Send back JSONResponse according to the PerformError
* Update federation invite code too
* Pass join errors through internal API boundaries
Required for certain invite sytests. We will need to think of a
better way of handling this going forwards.
* Include m.room.avatar in stripped state; handle trailing slashes when GETing state events
* Update whitelist
* Update whitelist
* Minor perf/debugging improvements
- publicroomsapi: Don't call QueryEventsByID with no event IDs
- appservice: Consume only if there are 1 or more ASes
- roomserver: don't keep a copy of the request "for debugging" - we trace now
* fedsender: return early if we have no destinations
* Unbreak tests
This is a wrapper around whatever impl we have which then logs
the function name/request/response/error.
Also tweak when we log on kafka streams: only log on the producer
side not the consumer side: we've never had issues with comms and
having 1 message rather than N would be nice.
* s/QueryBackfill/PerformBackfill/g
* OutputEvent now includes AddStateEvents which contain the full event of extra state events
* Only include adds not the current event
* Get adding state right
* Remove clientapi producers which aren't actually producers
They are actually just convenience wrappers around the internal APIs
for roomserver/eduserver. Move their logic to their respective `api`
packages and call them directly.
* Remove TODO
* unbreak ygg
* Split out adding HTTP routes from making internal APIs for clarity
* Split out more components
* Split out more things
* Finish converting
* internal mux for internal routes
* Move Updater structs to shared and use it for postgres
* Add constructors for NewXXXUpdater and a useTxns flag
In sqlite, we set useTxns=false and comment why.
* Handle nil txn
* Handle nil in transaction
* Missed one
* Close the txn at the right time
* Don't close the transaction as we reuse it between calls
* Add missing routing for PerformDirectoryLookupRequest
* Tweak output
* Fix some bugs in devices
* Don't default to federated room joins in response to invite
* Update sytest-whitelist
* Update comments
* Return correct room ID from PerformJoin
* Fix appservice and EDU server API setup, update sytest-whitelist
* Update sytest-whitelist
* Separate muxes for public and internal APIs
* Update client-api-proxy and federation-api-proxy so they don't add /api to the path
* Tidy up
* Consistent HTTP setup
* Set up prefixes properly
* sytest: Make 'Inbound federation can backfill events' pass
This breaks 'Outbound federation can backfill events' because now
we are returning the right number of events, which the previous
test was relying on.
Previously, /messages was backfilling the membership event, causing
the test to pass. Now we are no longer backfilling the membership
event due to the change in this commit, causing the test to fail.
The test should instead be returning the membership event locally
from synacpis database, but it doesn't do it fast enough, resulting
in a no-op /sync response with a next_batch=s0_0 which will never
pick up the local membership event when it rolls in. The test
does attempt to retry, but doesn't take the new next_batch=s1_0
resulting in it missing from the /messages response.
* Linting
* Comment out updaters a bit, add overwrite flag to latest events
* Make sure we don't send fast-forwarded state changes over federation, start with empty set when overwriting
* Remove redundant check for overwrite