* Only add events to `add_state_events` that haven't already been sent to the roomserver output before
* Filter on event NIDs instead, hopefully bring joy to SQLite
* UnsentFilter, review comments
* Ensure the input API only uses a single transaction
* Remove more of the dead query API call
* Tidy up
* Fix tests hopefully
* Don't do unnecessary work for rooms that don't exist
* Improve error, fix another case where transaction wasn't used properly
* Add a unit test for checking single transaction on RS input API
* Fix logic oops when deciding whether to use a transaction in storeEvent
* Revert "Revert "Fix storage bug in PSQL events table""
This reverts commit cf447dd52a.
* Membership updater to use updater
* Fix membership updater to use transactions properly
* Add transaction to all database tables in roomserver, rename latest events updater to room updater, use room updater for all RS input
* Better transaction management
* Tweak order
* Handle cases where the room does not exist
* Other fixes
* More tweaks
* Fill some gaps
* Fill in the gaps
* good lord it gets worse
* Don't roll back transactions when events rejected
* Pass through errors properly
* Fix bugs
* Fix incorrect error check
* Don't panic on nil txns
* Tweaks
* Hopefully fix panics for good in SQLite this time
* Fix rollback
* Minor bug fixes with latest event updater
* Some review comments
* Revert "Some review comments"
This reverts commit 0caf8cf53e.
* Fix a couple of bugs
* Clearer commit and rollback results
* Remove unnecessary prepares
* Put federation client functions into their own file
* Look for missing auth events in RS input
* Remove retrieveMissingAuthEvents from federation API
* Logging
* Sorta transplanted the code over
* Use event origin failing all else
* Don't get stuck on mutexes:
* Add verifier
* Don't mark state events with zero snapshot NID as not existing
* Check missing state if not an outlier before storing the event
* Reject instead of soft-fail, don't copy roominfo so much
* Use synchronous contexts, limit time to fetch missing events
* Clean up some commented out bits
* Simplify `/send` endpoint significantly
* Submit async
* Report errors on sending to RS input
* Set max payload in NATS to 16MB
* Tweak metrics
* Add `workerForRoom` for tidiness
* Try skipping unmarshalling errors for RespMissingEvents
* Track missing prev events separately to avoid calculating state when not possible
* Tweak logic around checking missing state
* Care about state when checking missing prev events
* Don't check missing state for create events
* Try that again
* Handle create events better
* Send create room events as new
* Use given event kind when sending auth/state events
* Revert "Use given event kind when sending auth/state events"
This reverts commit 089d64d271.
* Only search for missing prev events or state for new events
* Tweaks
* We only have missing prev if we don't supply state
* Room version tweaks
* Allow async inputs again
* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed
* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)
* Use work queue policy, deliver all on restart
* Reduce chance of duplicates being sent by NATS
* Limit the number of servers we attempt to reduce backpressure
* Some review comment fixes
* Tidy up a couple things
* Don't limit servers, randomise order using map
* Some context refactoring
* Update gmsl
* Don't resend create events
* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't
* Exclude our own servername
* Try backing off servers
* Make excluding self behaviour optional
* Exclude self from g_m_e
* Update sytest-whitelist
* Update consumers for the roomserver output stream
* Remember to send outliers for state returned from /gme
* Make full HTTP tests less upsetti
* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist
* Remove debugging test
* Fix blacklist again, remove unnecessary duplicate context
* Clearer contexts, don't use background in case there's something happening there
* Don't queue up events more than once in memory
* Correctly identify create events when checking for state
* Fill in gaps again in /gme code
* Remove `AuthEventIDs` from `InputRoomEvent`
* Remove stray field
Co-authored-by: Kegan Dougal <kegan@matrix.org>
* Add more logs
To help debug the migration issue in #1924 along with manual data-loss-inducing fixes.
Also log the origin server on processed txns to help debug buggy server origins.
* Fix query
* Add more optimised code path for checking if we're in a room
* Fix database queries
* Fix federation API test
* Fix logging
* Review comments
* Make separate API call for room membership
* db migration: fix#1844 and add additional assertions
- Migration scripts will now check to see if there are any unconverted
snapshot IDs and fail the migration if there are any. This should
prevent people from getting a corrupt database in the event the root
cause is still unknown.
- Add an ORDER BY clause when doing batch queries in the postgres
migration. LIMIT and OFFSET without ORDER BY are undefined and must
not be relied upon to produce a deterministic ordering (e.g row order).
See https://www.postgresql.org/docs/current/queries-limit.html
* Linting
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Hash-deduplicated state storage (and migrations) for PostgreSQL and SQLite
* Refactor droomserver database setup for migrations
* Fix conflict statements
* Update migration names
* Set a boundary for old to new block/snapshot IDs so we don't rewrite them more than once accidentally
* Create sequence if not exists
* Fix boundary queries
* Fix boundary queries
* Use Query
* Break out queries a bit
* More sequence tweaks
* Query parameters are not playing the game
* Injection escaping may not work for CREATE SEQUENCE after all
* Fix snapshot sequence name
* Use boundaried IDs in SQLite too
* Use IFNULL for SQLite
* Use COALESCE in PostgreSQL
* Review comments @Kegsay
* Hit the database far less to find room NIDs for event NIDs
* Close the rows
* Fix SQLite selectRoomNIDsForEventNIDsSQL
* Give same treatment to room version lookups
* Add basic storage methods
* Add internal api handler
* Add check for forgotten room
* Add /rooms/{roomID}/forget endpoint
* Add missing rsAPI method
* Remove unused parameters
* Add passing tests
Signed-off-by: Till Faelligen <tfaelligen@gmail.com>
* Add missing file
* Add postgres migration
* Add sqlite migration
* Use Forgetter to forget room
* Remove empty line
* Update HTTP status codes
It looks like the spec calls for these to be 400, rather than 403: https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-rooms-roomid-forget
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* Replace all usages of txn.Stmt with sqlutil.TxStmt
Signed-off-by: Sam Day <me@samcday.com>
* Fix sign off link in PR template.
Signed-off-by: Sam Day <me@samcday.com>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
* WIP Event rejection
* Still send back errors for rejected events
Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.
* Implement rejected events
Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.
* Update test to match reality
* Modify InputRoomEvents to no longer return an error
Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.
* Remove redundant returns; linting
* Update blacklist
* Move currentstateserver API to roomserver
Stub out DB functions for now, nothing uses the roomserver version yet.
* Allow it to startup
* Implement some current-state-server storage interface functions
* Add missing package
* Initial work on roomserver NID caches
* Give caches to roomserver storage
* Populate caches
* Fix bugs
* Fix WASM build
* Don't hit cache twice in RoomNIDExcludingStubs
* Store reverse room ID-room NID mapping, consult caches when assigning NIDs
* Offset updates take place using TransactionWriter
* Refactor TransactionWriter in current state server
* Refactor TransactionWriter in federation sender
* Refactor TransactionWriter in key server
* Refactor TransactionWriter in media API
* Refactor TransactionWriter in server key API
* Refactor TransactionWriter in sync API
* Refactor TransactionWriter in user API
* Fix deadlocking Sync API tests
* Un-deadlock device database
* Fix appservice API
* Rename TransactionWriters to Writers
* Move writers up a layer in sync API
* Document sqlutil.Writer interface
* Add note to Writer documentation
* Per-room input mutex
* GetMembership should use transaction when assigning state key NID
* Actually use writer transactions rather than ignoring them
* Limit per-room mutexes to Postgres
* Flip the check in InputRoomEvents
* Updated TransactionWriters, moved locks in roomserver, various other tweaks
* Fix redaction deadlocks
* Fix lint issue
* Rename SQLiteTransactionWriter to ExclusiveTransactionWriter
* Fix us not sending transactions through in latest events updater
* Initial pass at refactoring config (not finished)
* Don't forget current state and EDU servers
* More shifting around
* Update server key API tests
* Fix roomserver test
* Fix more tests
* Further tweaks
* Fix current state server test (sort of)
* Maybe fix appservices
* Fix client API test
* Include database connection string in database options
* Fix sync API build
* Update config test
* Fix unit tests
* Fix federation sender build
* Fix gobind build
* Set Listen address for all services in HTTP monolith mode
* Validate config, reinstate appservice derived in directory, tweaks
* Tweak federation API test
* Set MaxOpenConnections/MaxIdleConnections to previous values
* Update generate-config
* Emit redacted_event from the roomserver when redactions are validated
- Consume them in the currentstateserver and act accordingly.
- Add integration test for the roomserver to check that injecting
`m.room.redaction` events result in `redacted_event` being emitted.
* Linting
* Ignore events that redact themselves
* Implement core redaction logic
- Add a new `redactions_table.go` which tracks the mapping of
the redaction event ID and the redacted event ID
- Mark redactions as 'validated' when we have both events.
- When redactions are validated, add `unsigned.redacted_because`
and modify the `eventJSON` accordingly.
Note: We currently do NOT redact the event content - it's gated
behind a feature flag - until we have tested redactions a bit more.
* Linting
* Use content_value instead of membership
* Fix build
* Replace publicroomsapi with a combination of clientapi/roomserver/currentstateserver
- All public rooms paths are now handled by clientapi
- Requests to (un)publish rooms are sent to the roomserver via `PerformPublish`
which are stored in a new `published_table.go`
- Requests for public rooms are handled in clientapi by:
* Fetch all room IDs which are published using `QueryPublishedRooms` on the roomserver.
* Apply pagination parameters to the slice.
* Do a `QueryBulkStateContent` request to the currentstateserver to pull out
required state event *content* (not entire events).
* Aggregate and return the chunk.
Mostly but not fully implemented (DB queries on currentstateserver are missing)
* Fix pq query
* Make postgres work
* Make sqlite work
* Fix tests
* Unbreak pagination tests
* Linting
* Move Updater structs to shared and use it for postgres
* Add constructors for NewXXXUpdater and a useTxns flag
In sqlite, we set useTxns=false and comment why.
* Handle nil txn
* Handle nil in transaction
* Missed one
* Close the txn at the right time
* Don't close the transaction as we reuse it between calls