Fixes#1834.
`get_new_events_for_appservice` internally calls `get_events_as_list`, which will filter out any rejected events. If all returned events are filtered out, `_notify_interested_services` will return without updating the last handled stream position. If there are 100 consecutive such events, processing will halt altogether.
Breaking the loop is now done by checking whether we're up-to-date with `current_max` in the loop condition, instead of relying on an empty `events` list.
Signed-off-by: Willem Mulder <14mRh4X0r@gmail.com>
If backfilling is slow then the client may time out and retry, causing
Synapse to start a new `/backfill` before the existing backfill has
finished, duplicating work.
This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
The existing tracing reports an error each time there is a timeout, which isn't
really representative.
Additionally, we log things about the way `wait_for_events` works
(eg, the result of the callback) to the *parent* span, which is confusing.
So that they render nicely in mdbook (see #10086), and so that we no longer have a mix of structured text languages in our documentation (excluding files outside of `docs/`).
Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
* Make `invalidate` and `invalidate_many` do the same thing
... so that we can do either over the invalidation replication stream, and also
because they always confused me a bit.
* Kill off `invalidate_many`
* changelog
`keylen` seems to be a thing that is frequently incorrectly set, and we don't really need it.
The only time it was used was to figure out if we had removed a subtree in `del_multi`, which we can do better by changing `TreeCache.pop` to return a different type (`TreeCacheNode`).
Commits should be independently reviewable.
* Fix /upload 500'ing when presented a very large image
Catch DecompressionBombError and re-raise as ThumbnailErrors
* Set PIL's MAX_IMAGE_PIXELS to match homeserver.yaml
to get it to bomb out quicker, to load less into memory
in the case of super large images
* Add changelog entry for 10029
https://github.com/matrix-org/synapse/issues/9962 uncovered that we accidentally removed all but one of the presence updates that we store in the database when persisting multiple updates. This could cause users' presence state to be stale.
The bug was fixed in #10014, and this PR just adds a test that failed on the old code, and was used to initially verify the bug.
The test attempts to insert some presence into the database in a batch using `PresenceStore.update_presence`, and then simply pulls it out again.
Fixes: https://github.com/matrix-org/synapse/issues/9962
This is a fix for above problem.
I fixed it by swaping the order of insertion of new records and deletion of old ones. This ensures that we don't delete fresh database records as we do deletes before inserts.
Signed-off-by: Marek Matys <themarcq@gmail.com>
Also add support for giving a callback to generate the JSON object to
verify. This should reduce memory usage, as we no longer have the event
in memory in dict form (which has a large memory footprint) for extend
periods of time.
Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
To be more consistent with similar code. The check now automatically
raises an AuthError instead of passing back a boolean. It also absorbs
some shared logic between callers.
- use a tuple rather than a list for the iterable that is passed into the
wrapped function, for performance
- test that we can pass an iterable and that keys are correctly deduped.
It's not obvious that instances of SQLBaseStore each need their own
instances of random.SystemRandom(); let's just use random directly.
Introduced by 52839886d6
Signed-off-by: Dan Callahan <danc@element.io>
We can get away with just catching UnicodeError here.
⋮
+-- ValueError
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
⋮
https://docs.python.org/3/library/exceptions.html#exception-hierarchy
Signed-off-by: Dan Callahan <danc@element.io>
Functionally identical, but more obviously cryptographically secure.
...Explicit is better than implicit?
Avoids needing to know that SystemRandom() implies a CSPRNG, and
complies with the big scary red box on the documentation for random:
> Warning:
> The pseudo-random generators of this module should not be used for
> security purposes. For security or cryptographic uses, see the
> secrets module.
https://docs.python.org/3/library/random.html
Signed-off-by: Dan Callahan <danc@element.io>
This should help ensure that equivalent results are achieved between
homeservers querying for the summary of a space.
This implements modified MSC1772 rules, according to MSC2946.
The different is that the origin_server_ts of the m.room.create event
is not used as a tie-breaker since this might not be known if the
homeserver is not part of the room.
Per changes in MSC2946, the C-S and S-S APIs for spaces summary
should use GET requests.
Until this is stable, the POST endpoints still exist.
This does not switch federation requests to use the GET version yet
since it is newly added and already deployed servers might not support
it. When switching to the stable endpoint we should switch to GET
requests.
MSC1772 specifies the m.room.create event should be sent as part
of the invite_state. This was done optionally behind an experimental
flag, but is now done by default due to MSC1772 being approved.