Update the media repository documentation (#11415)

This commit is contained in:
Sean Quah 2021-11-29 15:37:56 +00:00 committed by GitHub
parent a82b90ab32
commit 7564b8e118
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 71 additions and 19 deletions

1
changelog.d/11415.doc Normal file
View file

@ -0,0 +1 @@
Update the media repository documentation.

View file

@ -2,29 +2,80 @@
*Synapse implementation-specific details for the media repository*
The media repository is where attachments and avatar photos are stored.
It stores attachment content and thumbnails for media uploaded by local users.
It caches attachment content and thumbnails for media uploaded by remote users.
The media repository
* stores avatars, attachments and their thumbnails for media uploaded by local
users.
* caches avatars, attachments and their thumbnails for media uploaded by remote
users.
* caches resources and thumbnails used for
[URL previews](development/url_previews.md).
## Storage
All media in Matrix can be identified by a unique
[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris),
consisting of a server name and media ID:
```
mxc://<server-name>/<media-id>
```
Each item of media is assigned a `media_id` when it is uploaded.
The `media_id` is a randomly chosen, URL safe 24 character string.
## Local Media
Synapse generates 24 character media IDs for content uploaded by local users.
These media IDs consist of upper and lowercase letters and are case-sensitive.
Other homeserver implementations may generate media IDs differently.
Metadata such as the MIME type, upload time and length are stored in the
sqlite3 database indexed by `media_id`.
Local media is recorded in the `local_media_repository` table, which includes
metadata such as MIME types, upload times and file sizes.
Note that this table is shared by the URL cache, which has a different media ID
scheme.
Content is stored on the filesystem under a `"local_content"` directory.
### Paths
A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg`
thumbnail, created by scaling, would be stored at:
```
local_content/aa/bb/cccccccccccccccccccc
local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
```
Thumbnails are stored under a `"local_thumbnails"` directory.
## Remote Media
When media from a remote homeserver is requested from Synapse, it is assigned
a local `filesystem_id`, with the same format as locally-generated media IDs,
as described above.
The item with `media_id` `"aabbccccccccdddddddddddd"` is stored under
`"local_content/aa/bb/ccccccccdddddddddddd"`. Its thumbnail with width
`128` and height `96` and type `"image/jpeg"` is stored under
`"local_thumbnails/aa/bb/ccccccccdddddddddddd/128-96-image-jpeg"`
A record of remote media is stored in the `remote_media_cache` table, which
can be used to map remote MXC URIs (server names and media IDs) to local
`filesystem_id`s.
Remote content is cached under `"remote_content"` directory. Each item of
remote content is assigned a local `"filesystem_id"` to ensure that the
directory structure `"remote_content/server_name/aa/bb/ccccccccdddddddddddd"`
is appropriate. Thumbnails for remote content are stored under
`"remote_thumbnail/server_name/..."`
### Paths
A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its
`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at:
```
remote_content/matrix.org/aa/bb/cccccccccccccccccccc
remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
```
Older thumbnails may omit the thumbnailing method:
```
remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg
```
Note that `remote_thumbnail/` does not have an `s`.
## URL Previews
See [URL Previews](development/url_previews.md) for documentation on the URL preview
process.
When generating previews for URLs, Synapse may download and cache various
resources, including images. These resources are assigned temporary media IDs
of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current
date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters.
The metadata for these cached resources is stored in the
`local_media_repository` and `local_media_repository_url_cache` tables.
Resources for URL previews are deleted after a few days.
### Paths
The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96`
`image/jpeg` thumbnail, created by scaling, would be stored at:
```
url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa
url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale
```