synapse/docs/replication.md

# Replication Architecture

## Motivation

We'd like to be able to split some of the work that synapse does into
multiple python processes. In theory multiple synapse processes could
share a single postgresql database and we\'d scale up by running more
synapse processes. However much of synapse assumes that only one process
is interacting with the database, both for assigning unique identifiers
when inserting into tables, notifying components about new updates, and
for invalidating its caches.

So running multiple copies of the current code isn't an option. One way
to run multiple processes would be to have a single writer process and
multiple reader processes connected to the same database. In order to do
this we'd need a way for the reader process to invalidate its in-memory
caches when an update happens on the writer. One way to do this is for
the writer to present an append-only log of updates which the readers
can consume to invalidate their caches and to push updates to listening
clients or pushers.

Synapse already stores much of its data as an append-only log so that it
can correctly respond to `/sync` requests so the amount of code changes
needed to expose the append-only log to the readers should be fairly
minimal.

## Architecture

### The Replication Protocol

See [the TCP replication documentation](tcp_replication.md).

### The Slaved DataStore

There are read-only version of the synapse storage layer in
`synapse/replication/slave/storage` that use the response of the
replication API to invalidate their caches.

### The TCP Replication Module
Information about how the tcp replication module is structured, including how
the classes interact, can be found in
`synapse/replication/tcp/__init__.py`
(#5849) Convert rst to markdown (#6040) Converting some of the rst documentation to markdown. Attempted to preserve whitespace and line breaks to minimize cosmetic change. 2019-09-17 13:55:29 +02:00			`# Replication Architecture`

			`## Motivation`

			`We'd like to be able to split some of the work that synapse does into`
			`multiple python processes. In theory multiple synapse processes could`
			`share a single postgresql database and we\'d scale up by running more`
			`synapse processes. However much of synapse assumes that only one process`
			`is interacting with the database, both for assigning unique identifiers`
			`when inserting into tables, notifying components about new updates, and`
			`for invalidating its caches.`

			`So running multiple copies of the current code isn't an option. One way`
			`to run multiple processes would be to have a single writer process and`
			`multiple reader processes connected to the same database. In order to do`
			`this we'd need a way for the reader process to invalidate its in-memory`
			`caches when an update happens on the writer. One way to do this is for`
			`the writer to present an append-only log of updates which the readers`
			`can consume to invalidate their caches and to push updates to listening`
			`clients or pushers.`

			`Synapse already stores much of its data as an append-only log so that it`
			can correctly respond to `/sync` requests so the amount of code changes
			`needed to expose the append-only log to the readers should be fairly`
			`minimal.`

			`## Architecture`

			`### The Replication Protocol`

Docs: Use something other than the document name to describe a page (#10399) Our documentation has a history of using a document's name as a way to link to it, such as "See [workers.md]() for details". This makes sense when you're traversing a directory of files, but less sense when the files are abstracted away - as they are on the documentation website. This PR changes the links to various documentation pages to something that fits better into the surrounding sentence, as you would when making any hyperlink on the web. 2021-07-15 13:47:55 +02:00			`See [the TCP replication documentation](tcp_replication.md).`
(#5849) Convert rst to markdown (#6040) Converting some of the rst documentation to markdown. Attempted to preserve whitespace and line breaks to minimize cosmetic change. 2019-09-17 13:55:29 +02:00
			`### The Slaved DataStore`

			`There are read-only version of the synapse storage layer in`
			`synapse/replication/slave/storage` that use the response of the
			`replication API to invalidate their caches.`
Update `replication.md` with info on TCP module structure (#12621) 2022-05-09 23:46:43 +02:00
			`### The TCP Replication Module`
			`Information about how the tcp replication module is structured, including how`
			`the classes interact, can be found in`
			`synapse/replication/tcp/__init__.py`