Update UPSERT comment now that native upserts are the default (#13924)

This commit is contained in:
David Robertson 2022-09-29 19:10:47 +01:00 committed by GitHub
parent ebd9e2dac6
commit 15754d720f
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 51 additions and 10 deletions

1
changelog.d/13924.misc Normal file
View file

@ -0,0 +1 @@
Update an innaccurate comment in Synapse's upsert database helper.

View file

@ -1141,17 +1141,57 @@ class DatabasePool:
desc: str = "simple_upsert",
lock: bool = True,
) -> bool:
"""
"""Insert a row with values + insertion_values; on conflict, update with values.
`lock` should generally be set to True (the default), but can be set
to False if either of the following are true:
1. there is a UNIQUE INDEX on the key columns. In this case a conflict
will cause an IntegrityError in which case this function will retry
the update.
2. we somehow know that we are the only thread which will be updating
this table.
As an additional note, this parameter only matters for old SQLite versions
because we will use native upserts otherwise.
All of our supported databases accept the nonstandard "upsert" statement in
their dialect of SQL. We call this a "native upsert". The syntax looks roughly
like:
INSERT INTO table VALUES (values + insertion_values)
ON CONFLICT (keyvalues)
DO UPDATE SET (values); -- overwrite `values` columns only
If (values) is empty, the resulting query is slighlty simpler:
INSERT INTO table VALUES (insertion_values)
ON CONFLICT (keyvalues)
DO NOTHING; -- do not overwrite any columns
This function is a helper to build such queries.
In order for upserts to make sense, the database must be able to determine when
an upsert CONFLICTs with an existing row. Postgres and SQLite ensure this by
requiring that a unique index exist on the column names used to detect a
conflict (i.e. `keyvalues.keys()`).
If there is no such index, we can "emulate" an upsert with a SELECT followed
by either an INSERT or an UPDATE. This is unsafe: we cannot make the same
atomicity guarantees that a native upsert can and are very vulnerable to races
and crashes. Therefore if we wish to upsert without an appropriate unique index,
we must either:
1. Acquire a table-level lock before the emulated upsert (`lock=True`), or
2. VERY CAREFULLY ensure that we are the only thread and worker which will be
writing to this table, in which case we can proceed without a lock
(`lock=False`).
Generally speaking, you should use `lock=True`. If the table in question has a
unique index[*], this class will use a native upsert (which is atomic and so can
ignore the `lock` argument). Otherwise this class will use an emulated upsert,
in which case we want the safer option unless we been VERY CAREFUL.
[*]: Some tables have unique indices added to them in the background. Those
tables `T` are keys in the dictionary UNIQUE_INDEX_BACKGROUND_UPDATES,
where `T` maps to the background update that adds a unique index to `T`.
This dictionary is maintained by hand.
At runtime, we constantly check to see if each of these background updates
has run. If so, we deem the coresponding table safe to upsert into, because
we can now use a native insert to do so. If not, we deem the table unsafe
to upsert into and require an emulated upsert.
Tables that do not appear in this dictionary are assumed to have an
appropriate unique index and therefore be safe to upsert into.
Args:
table: The table to upsert into