The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:
* Rather undocumented, and
* causing a *lot* of config checks
This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation.
Best to be reviewed commit-by-commit.
A couple of changes of significance:
* remove the `_last_ack < federation_position` condition, so that
updates will still be correctly processed after restart
* Correctly wire up send_federation_ack to the right class.
Make sure that the AccountDataStream presents complete updates, in the right
order.
This is much the same fix as #7337 and #7358, but applied to a different stream.
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.
The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.
Fixes#7334.
* Factor out functions for injecting events into database
I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.
* Rework TestReplicationDataHandler
This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.
* Fix AssertionErrors being thrown by EventsStream
Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
Specifically some tests for the typing stream, which means we test streams that fetch missing updates via HTTP (rather than via the DB).
We also shuffle things around a bit so that we create two separate `HomeServer` objects, rather than trying to insert a slaved store into places.
Note: `test_typing.py` is heavily inspired by `test_receipts.py`
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
Hopefully this time we really will fix#4422.
We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
* Rate-limiting for registration
* Add unit test for registration rate limiting
* Add config parameters for rate limiting on auth endpoints
* Doc
* Fix doc of rate limiting function
Co-Authored-By: babolivier <contact@brendanabolivier.com>
* Incorporate review
* Fix config parsing
* Fix linting errors
* Set default config for auth rate limiting
* Fix tests
* Add changelog
* Advance reactor instead of mocked clock
* Move parameters to registration specific config and give them more sensible default values
* Remove unused config options
* Don't mock the rate limiter un MAU tests
* Rename _register_with_store into register_with_store
* Make CI happy
* Remove unused import
* Update sample config
* Fix ratelimiting test for py2
* Add non-guest test
We want to wait until we have read the response body before we log the request
as complete, otherwise a confusing thing happens where the request appears to
have completed, but we later fail it.
To do this, we factor the salient details of a request out to a separate
object, which can then keep track of the txn_id, so that it can be logged.
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
This is only used by filter_events_for_client, so we can simplify the whole
thing by just doing one user at a time, and removing a dead storage function to
boot.
* Split state group persist into seperate storage func
* Add per database engine code for state group id gen
* Move store_state_group to StateReadStore
This allows other workers to use it, and so resolve state.
* Hook up store_state_group
* Fix tests
* Rename _store_mult_state_groups_txn
* Rename StateGroupReadStore
* Remove redundant _have_persisted_state_group_txn
* Update comments
* Comment compute_event_context
* Set start val for state_group_id_seq
... otherwise we try to recreate old state groups
* Update comments
* Don't store state for outliers
* Update comment
* Update docstring as state groups are ints
Some streams will occaisonally advance their positions without actually
having any new rows to send over federation. Currently this means that
the token will not advance on the workers, leading to them repeatedly
sending a slightly out of date token. This in turns requires the master
to hit the DB to check if there are any new rows, rather than hitting
the no op logic where we check if the given token matches the current
token.
This commit changes the API to always return an entry if the position
for a stream has changed, allowing workers to advance their tokens
correctly.