Using the new `TaskScheduler` meant that we'ed create lots of new
metrics (due to adding task ID to the desc of background process),
resulting in requests for metrics taking an increasing amount of CPU.
* Allow user_id to be optional for room deletion
* Add module API method to delete a room
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Don't worry about the case block=True && requester_user_id is None
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Add a (long) timeout to when a "busy" device is considered not online.
This does *not* match MSC3026, but is a reasonable thing for an
implementation to do.
Expands tests for the (unstable) busy presence with multiple devices.
Tracks presence on an individual per-device basis and combine
the per-device state into a per-user state. This should help in
situations where a user has multiple devices with conflicting status
(e.g. one is syncing with unavailable and one is syncing with online).
The tie-breaking is done by priority:
BUSY > ONLINE > UNAVAILABLE > OFFLINE
* Fix rare bug that broke looping calls
We can't interact with the reactor from the main thread via looping
call.
Introduced in v1.90.0 / #15791.
* Newsfile
Refactoring to use both the user ID & the device ID when tracking
the currently syncing users in the presence handler.
This is done both locally and over replication. Note that the device
ID is discarded but will be used in a future change.
Refactoring to pass the device ID (in addition to the user ID) through
the presence handler (specifically the `user_syncing`, `set_state`,
and `bump_presence_active_time` methods and their replication
versions).
Simplify some of the presence code by reducing duplicated code between
worker & non-worker modes.
The main change is to push some of the logic from `user_syncing` into
`set_state`. This is done by passing whether the user is setting the presence
via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some
additional logic is performed:
* Don't override `busy` presence.
* Update the `last_user_sync_ts`.
* Never update the status message.
* Properly update retry_last_ts when hitting the maximum retry interval
This was broken in 1.87 when the maximum retry interval got changed from
almost infinite to a week (and made configurable).
fixes#16101
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Add changelog
* Change fix + add test
* Add comment
---------
Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
Co-authored-by: Mathieu Velten <mathieuv@matrix.org>
This should only be called on HomeServer objects which are configured
to run background tasks, which is automatically (and properly) done via
the call to setup().
If we don't have all the auth events in a room then not all state events will have a chain cover index. Even so, we can still use the chain cover index on the events that do have it, rather than bailing and using the slower functions.
This situation should not arise for newly persisted rooms, as we check we have the full auth chain for each event, but can happen for existing rooms.
c.f. #15245
We were seeing serialization errors when taking out multiple read locks.
The transactions were retried, so isn't causing any failures.
Introduced in #15782.
* Fix the method signature of `run_db_interaction` on the module API
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Misc. clean-ups to:
* Use keyword arguments.
* Return early (reducing indentation) of some functions.
* Removing duplicated / unused code.
* Use wrap_as_background_process.
* Add a module API function to provide `call_later`
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add comments
* Update version number
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Add a cache invalidation clean-up task
* Run the cache invalidation stream clean-up on the background worker
* Tune down
* call_later is in millis!
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* fixup! Add a cache invalidation clean-up task
* Update synapse/storage/databases/main/cache.py
Co-authored-by: Eric Eastwood <erice@element.io>
* Update synapse/storage/databases/main/cache.py
Co-authored-by: Eric Eastwood <erice@element.io>
* MILLISEC -> MS
* Expand on comment
* Move and tweak comment about Postgres
* Use `wrap_as_background_process`
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
For now this maintains compatible with old Synapses by falling back
to using transaction semantics on a per-access token. A future version
of Synapse will drop support for this.
Adds three new configuration variables:
* destination_min_retry_interval is identical to before (10mn).
* destination_retry_multiplier is now 2 instead of 5, the maximum value will
be reached slower.
* destination_max_retry_interval is one day instead of (essentially) infinity.
Capping this will cause destinations to continue to be retried sometimes instead
of being lost forever. The previous value was 2 ^ 62 milliseconds.
The location of the redacts field changes in room version 11. Ensure
it is copied to the *new* location for *old* room versions for
forwards-compatibility with clients.
Note that copying it to the *old* location for the *new* room version
was previously handled.
* Updates the rule ID.
* Use `event_property_is` instead of `event_match`.
This updates the implementation of MSC3958 to match the latest
text from the MSC.
The un_partial_stated_event_stream_sequence and
application_services_txn_id_seq were never properly configured
in the portdb script, resulting in an error on start-up.
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <n.werner@famedly.com>
Co-authored-by: Nicolas Werner <89468146+nico-famedly@users.noreply.github.com>
Co-authored-by: Hubert Chathi <hubert@uhoreg.ca>
And fix a bug in the implementation of the updated redaction
format (MSC2174) where the top-level redacts field was not
properly added for backwards-compatibility.
Allow configuring the set of workers to proxy outbound federation traffic through (`outbound_federation_restricted_to`).
This is useful when you have a worker setup with `federation_sender` instances responsible for sending outbound federation requests and want to make sure *all* outbound federation traffic goes through those instances. Before this change, the generic workers would still contact federation themselves for things like profile lookups, backfill, etc. This PR allows you to set more strict access controls/firewall for all workers and only allow the `federation_sender`'s to contact the outside world.
Make it more obvious which Python version runs on a given Linux distribution so when we end up dropping support for a given Python version, we can more easily find the reference to the Python version and remove any references for the distribution. We don't want to be running tests or building packages on a distribution that no longer has a supported Python version.
This way, we can avoid another situation like when we dropped support for Python 3.7 but forgot to drop the Debian Buster references everywhere (https://github.com/matrix-org/synapse/pull/15893)
Previously, if you just followed the instructions per the docs, you just ran into an error:
```sh
$ poetry run synapse_worker --config-path homeserver_generic_worker1.yaml
Missing mandatory `server_name` config option.
```
**Before:**
```
Error retrieving alias
```
**After:**
```
Error retrieving alias #foo:bar -> 401 Unauthorized
```
*Spawning from creating the [manual testing strategy for the outbound federation proxy](https://github.com/matrix-org/synapse/pull/15773).*
Unix socket support for `federation` and `client` Listeners has existed now for a little while(since [1.81.0](https://github.com/matrix-org/synapse/pull/15353)), but there was one last hold out before it could be complete: HTTP Replication communication. This should finish it up. The Listeners would have always worked, but would have had no way to be talked to/at.
---------
Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>
Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
Co-authored-by: Eric Eastwood <erice@element.io>
A lot of the functions have the same name in this space like `store_file`,
and we also do it multiple times for different reasons (main media repo,
other storage providers, thumbnails, etc) so it's good to differentiate
them so your head doesn't explode.
Follow-up to https://github.com/matrix-org/synapse/pull/15850
Tracing instrumentation to media `/upload` code paths to investigate https://github.com/matrix-org/synapse/issues/15841
Allow configuring the set of workers to proxy outbound federation traffic through (`outbound_federation_restricted_to`).
This is useful when you have a worker setup with `federation_sender` instances responsible for sending outbound federation requests and want to make sure *all* outbound federation traffic goes through those instances. Before this change, the generic workers would still contact federation themselves for things like profile lookups, backfill, etc. This PR allows you to set more strict access controls/firewall for all workers and only allow the `federation_sender`'s to contact the outside world.
The original code is from @erikjohnston's branches which I've gotten in-shape to merge.
Image.ANTIALIAS is not defined in current pillow releases. Since ANTIALIAS was just using LANCZOS anyways, this is just a cosmetic change, but makes synapse work with most recent pillow releases.
Signed-off-by: Giovanni Harting <539@idlegandalf.com>
Old device entries for the same user were being removed in individual
SQL commands, making the batch take way longer than necessary.
This combines the commands into a single one with a IN/ANY clause.
Example of log entry before the change, regularly observed with
"log_min_duration_statement = 10000" in PostgreSQL's config:
LOG: duration: 42538.282 ms statement:
DELETE FROM device_lists_stream
WHERE user_id = '@someone' AND device_id = 'someid1'
AND stream_id < 123456789
;
DELETE FROM device_lists_stream
WHERE user_id = '@someone' AND device_id = 'someid2'
AND stream_id < 123456789
;
[repeated for each device ID of that user, potentially a lot...]
With the patch applied on my instance for the past couple of days, I
no longer notice overly long statements of that particular kind.
Signed-off-by: pacien <pacien.trangirard@pacien.net>
* Fix use of config override directory in `devenv up`
`--config-directory` is for the generate config script; `-c` is for usage
* Add homeserver config override directory to gitignore
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
If you leave a room and forget it, then rejoin it, the room would be
missing from the next initial sync.
fixes#13262
Signed-off-by: Nicolas Werner <n.werner@famedly.com>
The port DB script would try and run database background tasks, which
could fail if the data they acted on was in the process of being ported.
These exceptions were non fatal.
Fixes#15789
We now only block the client to backfill when we see a large gap in the events (more than 2 events missing in a row according to `depth`), more than 3 single-event holes, or not enough messages to fill the response. Otherwise, we return the messages directly to the client and backfill in the background for eventual consistency sake.
Fix https://github.com/matrix-org/synapse/issues/15696
* Check required power levels earlier in createRoom handler.
- If a server was configured to reject the creation of rooms with E2EE
enabled (by specifying an unattainably high power level for
"m.room.encryption" in default_power_level_content_override), the 403
error was not being triggered until after the room was created and
before the "m.room.power_levels" was sent. This allowed a user to
access the partially-configured room and complete the setup of E2EE
and power levels manually.
- This change causes the power level overrides to be checked earlier and
the request to be rejected before the user gains access to the room.
- A new `_validate_room_config` method is added to contain checks that
should be run before a room is created.
- The new test case confirms that a user request is rejected by the new
validation method.
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
* Add a changelog file.
* Formatting fix for black.
* Remove unneeded line from test.
---------
Signed-off-by: Grant McLean <grant@catalyst.net.nz>
See https://github.com/matrix-org/synapse/pull/14095#discussion_r990335492
This is useful because when see that a relevant event is an `outlier` or `soft-failed`, then that's a good unexpected indicator explaining why it's not showing up. `filter_events_for_client` is used in `/sync`, `/messages`, `/context` which are all common end-to-end assertion touch points (also notifications, relations).
Implements stable support for MSC3882; this involves updating Synapse's support to
match the MSC / the spec says.
Continue to support the unstable version to allow clients to transition.
Fix https://github.com/matrix-org/synapse/issues/15662
This manifests as purple lines that show up on all time series panels
that you can hover and see what version was deployed.
Also added a new "Deployed Synapse versions over time" panel
where the color block changes with each version. And mixed this
color block into the "Up" time series panel.
To get the Grafana dashboard JSON to copy here: use the **Share** icon at the top -> **Export** -> check the **Export for sharing externally** option -> **View JSON** or **Save to file**
The stubs have some issues so this has some generous cast
and ignores in it, but it is better than not having stubs.
Note that confusing that Element is a function which creates
_Element instances (and similarly for Comment).
* Fully qualified docker image names for the main Dockerfile and Complement related.
* Fully qualified docker image names for Dockerfiles associated with building Debian release artifacts.
This one is harder and is separate from the other commit in case it wasn't correct or was unwanted. I decided to
do the expansion on the docker images in the Dockerfile itself, instead of the various source places that build
which distribution that is selected, as it would have been more invasive with the scripts breaking up the string
for tagging and such. This one is untested.
* Changelog
* Update docker/Dockerfile-workers
* Update docker/complement/Dockerfile
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Fix#15667
- Reiterate the importance of getting Rust installed and set up before attempting to install the Python dependencies.
- Mention the importance of confirming that `poetry install` completed successfully and include a typical error that the user might see if it did not.
- Expand on "Now edit homeserver.yaml" to give examples of things likely to need changing and to link to the relevant sections of the Synapse server documentation.
Updates the database schema to require a thread_id (by adding a
constraint that the column is non-null) for event_push_actions,
event_push_actions_staging, and event_push_actions_summary.
For PostgreSQL we add the constraint as NOT VALID, then
VALIDATE the constraint a background job to avoid locking
the table during an upgrade.
Each table is updated as a separate schema delta to avoid
deadlocks between them.
For SQLite we simply rebuild the table & copy the data.
The cached decorators always return a Deferred, which was not
properly propagated. It was close enough when wrapping coroutines,
but failed if a bare function was wrapped.
```
2023-05-21 09:30:09,288 - synapse.logging.opentracing - 940 - ERROR - POST-1 - @trace may not have wrapped StateStorageController.get_state_for_groups correctly! The function is not async but returned a coroutine
```
Tracing instrumentation for these functions originally introduced in https://github.com/matrix-org/synapse/pull/15610
This moves the deactivated user check to the method which
all login types call.
Additionally updates the application service tests to be more
realistic by removing invalid tests and fixing server names.
All the information needed is already in the `instance_map`, so
use that instead of passing the hostname / IP & port manually
for each replication request.
This consolidates logic for future improvements of using e.g.
UNIX sockets for workers.
Fix https://github.com/matrix-org/synapse/issues/15618
### Before
```
2023-05-17 22:51:36-0500 [-] 2023-05-17 22:51:36,889 - synapse.server - 338 - INFO - sentinel - Finished setting up.
```
### After
```
2023-05-19 18:16:20-0500 [-] synapse.server - 338 - INFO - sentinel - Finished setting up.
```
### Dev notes
The `Twisted.Logger` controls the `2023-05-19 18:16:20-0500 [-]` prefix, see : [`twisted/twisted` -> `src/twisted/logger/_format.py#L362-L374`](34b161e66b/src/twisted/logger/_format.py (L362-L374))
And we delegate our logs to the Twisted Logger for the tests which puts it in `_trial_temp/test.log`
The event_fields property in filters should use the proper
escape rules, namely backslashes can be escaped with
an additional backslash.
This adds tests (adapted from matrix-js-sdk) and implements
the logic to properly split the event_fields strings.
...to try to control memory usage. `HomeServerConfig`s hold on to
many Jinja2 objects, which come out to over 0.5 MiB per config.
Over the course of a full test run, the cache grows to ~360 entries.
Limit it to 8 entries.
Part of #15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
Instrument `state` and `state_group` storage related things (tracing) so it's a little more clear where these database transactions are coming from as there is a lot of wires crossing in these functions.
Part of `/messages` performance investigation: https://github.com/matrix-org/synapse/issues/13356
R30v2 has been out since 2021-07-19 (https://github.com/matrix-org/synapse/pull/10332)
and we started collecting stats on 2021-08-16. Since it's been over a year now
(almost 2 years), this is enough grace period for us to now rip it out.
Synapse will no longer send (or respond to) the unstable flags
for faster joins. These were only available behind a configuration
flag and handled in parallel with the stable flags.
This change fixes two memory leaks during `trial` test runs.
Garbage collection is disabled during each test case and a gen-0 GC is
run at the end of each test. However, when the gen-0 GC is run, the
`TestCase` object usually still holds references to the `HomeServer`
used during the test. As a result, the `HomeServer` gets promoted to
gen-1 and then never garbage collected.
Fix this by periodically running full GCs.
Additionally, fix `HomeServer`s leaking after tests that touch inbound
federation due to `FederationRateLimiter`s adding themselves to a global
set, by turning the set into a `WeakSet`.
Resolves#15622.
Signed-off-by: Sean Quah <seanq@matrix.org>
If the previous read marker is pointing to an event that no longer exists
(e.g. due to retention) then assume that the newly given read marker
is newer.
To track changes in MSC2666:
- The change from `/mutual_rooms/{user_id}` to `/mutual_rooms?user_id={user_id}`.
- The addition of `next_batch_token` (and logic).
- Unstable flag now being `uk.half-shot.msc2666.query_mutual_rooms`.
- The error code when your own user is requested.
The second argument of `ConfigError` is a path, passed as an optional
`Iterable[str]` and not a `str`. If a string is passed directly,
Synapse unhelpfully emits "Error in configuration at
a.p.p._.s.e.r.v.i.c.e._.c.o.n.f.i.g._.f.i.l.e.s'" when the config
option has the wrong data type.
Signed-off-by: Sean Quah <seanq@matrix.org>
There are two situations which were previously not properly checked:
1. If the requested URL was replaced with an oEmbed URL, then the
oEmbed URL was not checked against url_preview_url_blacklist.
2. Follow-up URLs (either via autodiscovery of oEmbed or to pre-cache
images) were not checked against url_preview_url_blacklist.
We use the oldest Python version because later Python versions can include some overloads which don't work in the older versions which we still support.
We're using Python 3.8 instead of 3.7 which is our actual minimum support version because it's EOL is in a matter of weeks so can avoid the extra effort. And in any case, minimum Python 3.8 support is better than winging it on Python 3.11.
* Usage that is compatible with Python 3.8 and 3.11
> Since Python 3.10, instead of passing value and tb, an exception object can
be passed as the first argument. If value and tb are provided, the first
argument is ignored in order to provide backwards compatibility.
>
> -- https://docs.python.org/3/library/traceback.html
* Add changelog
Fix the following `mypy` errors when running `mypy` with Python 3.7:
```
synapse/storage/controllers/stats.py:58: error: "Counter" is not subscriptable, use "typing.Counter" instead [misc]
tests/test_state.py:267: error: "dict" is not subscriptable, use "typing.Dict" instead [misc]
```
Part of https://github.com/matrix-org/synapse/issues/15603
In Python 3.9, `typing` is deprecated and the types are subscriptable (generics) by default, https://peps.python.org/pep-0585/#implementation
MSC3389 proposes protecting the relation type & parent event ID
from redaction. This keeps the relation information intact after
redaction which helps with some UX flaws (e.g. deleting an
event causes it to no longer be in a thread, which is confusing).
Adds logging for key server requests which include a key ID.
This is technically in violation of the 1.6 spec, but is the only
way to remain backwards compatibly with earlier versions of
Synapse (and possibly other homeservers) which *did* include
the key ID.
I found the error in the **Before** really vague and obtuse and didn't realize port `5432` corresponded to the Postgres port until searching the codebase. It says to check the logs but that wasn't my first instinct. It's just more obvious if we just print the full thing which gives context of the error type and the traceback to the relevant area of code.
#### Before
```
$ poetry run python -m synapse.app.homeserver -c homeserver.yaml
**********************************************************************************
Error during initialisation:
connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
There may be more information in the logs.
**********************************************************************************
```
#### After
```sh
$ poetry run python -m synapse.app.homeserver -c homeserver.yaml
**********************************************************************************
Error during initialisation:
Traceback (most recent call last):
File "/home/eric/Documents/github/element/synapse/synapse/app/homeserver.py", line 352, in setup
hs.setup()
File "/home/eric/Documents/github/element/synapse/synapse/server.py", line 337, in setup
self.datastores = Databases(self.DATASTORE_CLASS, self)
File "/home/eric/Documents/github/element/synapse/synapse/storage/databases/__init__.py", line 65, in __init__
with make_conn(database_config, engine, "startup") as db_conn:
File "/home/eric/Documents/github/element/synapse/synapse/storage/database.py", line 161, in make_conn
native_db_conn = engine.module.connect(**db_params)
File "/home/eric/.cache/pypoetry/virtualenvs/matrix-synapse-xCtC9ulO-py3.10/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "localhost" (::1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
There may be more information in the logs.
**********************************************************************************
```
* Add SSL options to redis config
* fix lint issues
* Add documentation and changelog file
* add missing . at the end of the changelog
* Move client context factory to new file
* Rename ssl to tls and fix typo
* fix lint issues
* Added when redis attributes were added
* Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master.
* Fix typo in drive by.
* Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place)
* Several updates:
1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future.
2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map.
3. Clean up old comments/commented out code.
4. Adjust unit tests to match with new code.
5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template.
* Initial Docs upload
* Changelog
* Missed some commented out code that can go now
* Remove TODO comment that no longer holds true.
* Fix links in docs
* More docs
* Remove debug logging
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Apply suggestions from code review
Co-authored-by: reivilibre <olivier@librepush.net>
* Update version to latest, include completeish before/after examples in upgrade notes.
* Fix up and docs too
---------
Co-authored-by: reivilibre <olivier@librepush.net>
Separate out a HTTP client for replication in preparation for
also supporting using UNIX sockets. The major difference from
the base class is that this does not use treq to handle HTTP
requests.
This stops media (and thumbnails) from being accessed from the
listed domains. It does not delete any already locally cached media,
but will prevent accessing it.
Note that admin APIs are unaffected by this change.
m.push_rules, like m.fully_read, is a special account data type that cannot
be set using the normal /account_data endpoint. Return an error instead
of allowing data that will not be used to be stored.
MSC3984 proxies /keys/query requests to appservices, but servers will
can also requests devices / keys from the /user/devices endpoint.
The formats are close enough that we can "proxy" that /user/devices to
appservices (by calling /keys/query) and then change the format of the
returned data before returning it over federation.
Behind a configuration flag this adds + to the list of allowed
characters in Matrix IDs. The main feature this enables is
using full E.164 phone numbers as Matrix IDs.
Add an `is_mine_server_name` method, similar to `is_mine_id`.
Ideally we would use this consistently, instead of sometimes comparing
against `hs.hostname` and other times reaching into
`hs.config.server.server_name`.
Also fix a bug in the tests where `hs.hostname` would sometimes differ
from `hs.config.server.server_name`.
Signed-off-by: Sean Quah <seanq@matrix.org>
A dont_notify action is a no-op (and coalesce is undefined). These are
both considered no-ops by the spec, per MSC3987 and the predefined
push rules were updated to remove dont_notify from the list of actions.
It seems that YouTube Short previews do not work in some
regions, but the oEmbed information for those areas is still
valid.
This causes YouTube Shorts to always use (only) the oEmbed
endpoint which is a minor regression for regions where the URL
preview was already working -- some of the additional video
metadata is lost. It is not likely that clients are using this today
and it is more beneficial to have a limited preview working everywhere
than unused metadata in the Open Graph response.
Enforce that we use index scans (rather than seq scans), which we also do for state queries. The reason to enforce this is that we can't correctly get PostgreSQL to understand the distribution of `stream_ordering` depends on `highlight`, and so it always defaults (on matrix.org) to sequential scans.
Updates the database schema to require a thread_id (by adding a
constraint that the column is non-null) for event_push_actions,
event_push_actions_staging, and event_push_actions_summary.
For PostgreSQL we add the constraint as NOT VALID, then
VALIDATE the constraint a background job to avoid locking
the table during an upgrade.
For SQLite we simply rebuild the table & copy the data.
Pushers tend to make many connections to the same HTTP host
(e.g. a new event comes in, causes events to be pushed, and then
the homeserver connects to the same host many times). Due to this
the per-host HTTP connection pool size was increased, but this does
not make sense for other SimpleHttpClients.
Add a parameter for the connection pool and override it for pushers
(making a separate SimpleHttpClient for pushers with the increased
configuration).
This returns the HTTP connection pool settings to the default Twisted
ones for non-pusher HTTP clients.
Adds an optional keyword argument to the /relations API which
will recurse a limited number of event relationships.
This will cause the API to return not just the events related to the
parent event, but also events related to those related to the parent
event, etc.
This is disabled by default behind an experimental configuration
flag and is currently implemented using prefixed parameters.
MSC3983 provides a way to request multiple OTKs at once from appservices,
this extends this concept to the Client-Server API.
Note that this will likely be spit out into a separate MSC, but is currently part of
MSC3983.
Cleans-up the schema delta files:
* Removes no-op functions.
* Adds missing type hints to function parameters.
* Fixes any issues with type hints.
This also renames one (very old) schema delta to avoid a conflict
that mypy complains about.
* Docs: Add Nginx loadbalancing example with sticky mxid for workers
Add example nginx configuration snippet that
* does load balancing for workers
* respects mxid part of the token
* from both url parameter and auth header
* and handles since parameter
Thanks to @olmari for pushing me to write this and testing the configs
Signed-off-by: Tatu Wikman <tatu.wikman@gmail.com>
* Add changelog entry
Signed-off-by: Tatu Wikman <tatu.wikman@gmail.com>
* Update codeblock formatter
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
* Remove indirectly related nginx-config
Signed-off-by: Sami Olmari <sami@olmari.fi>
* Proper definition of action how to target username for worker
Signed-off-by: Sami Olmari <sami@olmari.fi>
* Change "nginx" to general "reverse proxy" as it's concept now.
Signed-off-by: Sami Olmari <sami@olmari.fi>
* Wording in better English
Co-authored-by: Tatu Wikman <tatu.wikman@gmail.com>
* rename changelog entry to have correct extension
---------
Signed-off-by: Tatu Wikman <tatu.wikman@gmail.com>
Signed-off-by: Sami Olmari <sami@olmari.fi>
Co-authored-by: Dirk Klimpel <5740567+dklimpel@users.noreply.github.com>
Co-authored-by: Sami Olmari <sami@olmari.fi>
Co-authored-by: Sami Olmari <sami+github@olmari.fi>
It can be useful to always return the fallback key when attempting to
claim keys. This adds an unstable endpoint for `/keys/claim` which
always returns fallback keys in addition to one-time-keys.
The fallback key(s) are not marked as "used" unless there are no
corresponding OTKs.
This is currently defined in MSC3983 (although likely to be split out
to a separate MSC). The endpoint shape may change or be requested
differently (i.e. a keyword parameter on the current endpoint), but the
core logic should be reasonable.
Before this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_signature_keys`.
After this change:
* `PerspectivesKeyFetcher` and `ServerKeyFetcher` write to `server_keys_json`.
* `PerspectivesKeyFetcher` also writes to `server_signature_keys`.
* `StoreKeyFetcher` reads from `server_keys_json`.
This results in `StoreKeyFetcher` now using the results from `ServerKeyFetcher`
in addition to those from `PerspectivesKeyFetcher`, i.e. keys which are directly
fetched from a server will now be pulled from the database instead of refetched.
An additional minor change is included to avoid creating a `PerspectivesKeyFetcher`
(and checking it) if no `trusted_key_servers` are configured.
The overall impact of this should be better usage of cached results:
* If a server has no trusted key servers configured then it should reduce how often keys
are fetched.
* if a server's trusted key server does not have a requested server's keys cached then it
should reduce how often keys are directly fetched.
These two lines:
```
config_obj = HomeServerConfig()
config_obj.parse_config_dict(config, "", "")
```
are called many times with the exact same value for `config`.
As the test suite is CPU-bound and non-negligeably time is spent in
`parse_config_dict`, this saves ~5% on the overall runtime of the Trial
test suite (tested with both `-j2` and `-j12` on a 12t CPU).
This is sadly rather limited, as the cache cannot be shared between
processes (it contains at least jinja2.Template and RLock objects which
aren't pickleable), and Trial tends to run close tests in different
processes.
* Switch InstanceLocationConfig to a pydantic BaseModel, apply Strict* types and add a few helper methods(that will make more sense in follow up work).
Co-authored-by: David Robertson <davidr@element.io>
* More precise type for LoggingTransaction.execute
* Add an annotation for stream_ordering_month_ago
This would have spotted the error that was fixed in "Add comma missing from #15382. (#15429)"
c.f. #15264
The two changes are:
1. Add indexes so that the select / deletes don't do sequential scans
2. Don't repeatedly call `SELECT count(*)` each iteration, as that's slow
The registration fallback is broken and unspecced. This removes it
since there is no plan to spec it.
Note that this does not modify the login fallback code.
* Change `store_server_verify_keys` to take a `Mapping[(str, str), FKR]`
This is because we already can't handle duplicate keys — leads to cardinality violation
* Newsfile
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
---------
Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
This moves `redacts` from being a top-level property to
a `content` property in a new room version.
MSC2176 (which was previously implemented) states to not
`redact` this property.
* raise a ConfigError on an invalid app_service_config_files
* changelog
* Move config check to read_config
* Add test
* Ensure list also contains strings
* Trust dtolnay/rust-toolchain
The author is a big deal in the Rust world and I'm happy to trust them.
I'm also bored of the dependabot updates tbh.
* Changelog
This change fixes a rare bug where initial /syncs would fail with a
`KeyError` under the following circumstances:
1. A user fast joins a remote room.
2. The user is kicked from the room before the room's full state has
been synced.
3. A second local user fast joins the room.
4. Events are backfilled into the room with a higher topological
ordering than the original user's leave. They are assigned a
negative stream ordering. It's not clear how backfill happened here,
since it is expected to be equivalent to syncing the full state.
5. The second local user leaves the room before the room's full state
has been synced. The homeserver does not complete the sync.
6. The original user performs an initial /sync with lazy_load_members
enabled.
* Because they were kicked from the room, the room is included in
the /sync response even though the include_leave option is not
specified.
* To populate the room's timeline, `_load_filtered_recents` /
`get_recent_events_for_room` fetches events with a lower stream
ordering than the leave event and picks the ones with the highest
topological orderings (which are most recent). This captures the
backfilled events after the leave, since they have a negative
stream ordering. These events are filtered out of the timeline,
since the user was not in the room at the time and cannot view
them. The sync code ends up with an empty timeline for the room
that notably does not include the user's leave event.
This seems buggy, but at least we don't disclose events the user
isn't allowed to see.
* Normally, `compute_state_delta` would fetch the state at the
start and end of the room's timeline to generate the sync
response. Since the timeline is empty, it fetches the state at
`min(now, last event in the room)`, which corresponds with the
second user's leave. The state during the entirety of the second
user's membership does not include the membership for the first
user because of partial state.
This part is also questionable, since we are fetching state from
outside the bounds of the user's membership.
* `compute_state_delta` then tries and fails to find the user's
membership in the auth events of timeline events. Because there
is no timeline event whose auth events are expected to contain
the user's membership, a `KeyError` is raised.
Also contains a drive-by fix for a separate unlikely race condition.
Signed-off-by: Sean Quah <seanq@matrix.org>
This uses the specced /_matrix/app/v1/... paths instead of the
"legacy" paths. If the homeserver receives an error it will retry
using the legacy path.
* Add IReactorUNIX to ISynapseReactor type hint.
* Create listen_unix().
Two options, 'path' to the file and 'mode' of permissions(not umask, recommend 666 as default as
nginx/other reverse proxies write to it and it's setup as user www-data)
For the moment, leave the option to always create a PID lockfile turned on by default
* Create UnixListenerConfig and wire it up.
Rename ListenerConfig to TCPListenerConfig, then Union them together into ListenerConfig.
This spidered around a bit, but I think I got it all. Metrics and manhole have been placed
behind a conditional in case of accidental putting them onto a unix socket.
Use new helpers to get if a listener is configured for TLS, and to help create a site tag
for logging.
There are 2 TODO things in parse_listener_def() to finish up at a later point.
* Refactor SynapseRequest to handle logging correctly when using a unix socket.
This prevents an exception when an IP address can not be retrieved for a request.
* Make the 'Synapse now listening on Unix socket' log line a little prettier.
* No silent failures on generic workers when trying to use a unix socket with metrics or manhole.
* Inline variables in app/_base.py
* Update docstring for listen_unix() to remove reference to a hardcoded permission of 0o666 and add a few comments saying where the default IS declared.
* Disallow both a unix socket and a ip/port combo on the same listener resource
* Linting
* Changelog
* review: simplify how listen_unix returns(and get rid of a type: ignore)
* review: fix typo from ConfigError in app/homeserver.py
* review: roll conditional for http_options.tag into get_site_tag() helper(and add docstring)
* review: enhance the conditionals for checking if a port or path is valid, remove a TODO line
* review: Try updating comment in get_client_ip_if_available to clarify what is being retrieved and why
* Pretty up how 'Synapse now listening on Unix Socket' looks by decoding the byte string.
* review: In parse_listener_def(), raise ConfigError if neither socket_path nor port is declared(and fix a typo)