From 1381563988c6dc7a2b8801b736b1f0c663970da8 Mon Sep 17 00:00:00 2001 From: Patrick Cloke Date: Tue, 12 Jul 2022 15:01:58 -0400 Subject: [PATCH] Inline URL preview documentation. (#13261) Inline URL preview documentation near the implementation. --- changelog.d/13233.doc | 3 +- changelog.d/13261.doc | 1 + docs/SUMMARY.md | 1 - docs/admin_api/user_admin_api.md | 2 +- docs/development/url_previews.md | 62 ------------------- docs/media_repository.md | 5 +- synapse/rest/media/v1/preview_url_resource.py | 60 +++++++++++++++++- 7 files changed, 61 insertions(+), 73 deletions(-) create mode 100644 changelog.d/13261.doc delete mode 100644 docs/development/url_previews.md diff --git a/changelog.d/13233.doc b/changelog.d/13233.doc index b6babd7f1..3eaea7c5e 100644 --- a/changelog.d/13233.doc +++ b/changelog.d/13233.doc @@ -1,2 +1 @@ -Add a link to configuration instructions in the URL preview documentation. - +Move the documentation for how URL previews work to the URL preview module. diff --git a/changelog.d/13261.doc b/changelog.d/13261.doc new file mode 100644 index 000000000..3eaea7c5e --- /dev/null +++ b/changelog.d/13261.doc @@ -0,0 +1 @@ +Move the documentation for how URL previews work to the URL preview module. diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 8d6030e34..f54b571d3 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -35,7 +35,6 @@ - [Application Services](application_services.md) - [Server Notices](server_notices.md) - [Consent Tracking](consent_tracking.md) - - [URL Previews](development/url_previews.md) - [User Directory](user_directory.md) - [Message Retention Policies](message_retention_policies.md) - [Pluggable Modules](modules/index.md) diff --git a/docs/admin_api/user_admin_api.md b/docs/admin_api/user_admin_api.md index 1235f1cb9..0871cfebf 100644 --- a/docs/admin_api/user_admin_api.md +++ b/docs/admin_api/user_admin_api.md @@ -544,7 +544,7 @@ Gets a list of all local media that a specific `user_id` has created. These are media that the user has uploaded themselves ([local media](../media_repository.md#local-media)), as well as [URL preview images](../media_repository.md#url-previews) requested by the user if the -[feature is enabled](../development/url_previews.md). +[feature is enabled](../usage/configuration/config_documentation.md#url_preview_enabled). By default, the response is ordered by descending creation date and ascending media ID. The newest media is on top. You can change the order with parameters diff --git a/docs/development/url_previews.md b/docs/development/url_previews.md deleted file mode 100644 index 25f189683..000000000 --- a/docs/development/url_previews.md +++ /dev/null @@ -1,62 +0,0 @@ -URL Previews -============ -For information on how to enable URL previews in synapse, please see the [config manual](../usage/configuration/config_documentation.md#url_preview_enabled). - -The `GET /_matrix/media/r0/preview_url` endpoint provides a generic preview API -for URLs which outputs [Open Graph](https://ogp.me/) responses (with some Matrix -specific additions). - -This does have trade-offs compared to other designs: - -* Pros: - * Simple and flexible; can be used by any clients at any point -* Cons: - * If each homeserver provides one of these independently, all the HSes in a - room may needlessly DoS the target URI - * The URL metadata must be stored somewhere, rather than just using Matrix - itself to store the media. - * Matrix cannot be used to distribute the metadata between homeservers. - -When Synapse is asked to preview a URL it does the following: - -1. Checks against a URL blacklist (defined as `url_preview_url_blacklist` in the - config). -2. Checks the in-memory cache by URLs and returns the result if it exists. (This - is also used to de-duplicate processing of multiple in-flight requests at once.) -3. Kicks off a background process to generate a preview: - 1. Checks the database cache by URL and timestamp and returns the result if it - has not expired and was successful (a 2xx return code). - 2. Checks if the URL matches an [oEmbed](https://oembed.com/) pattern. If it - does, update the URL to download. - 3. Downloads the URL and stores it into a file via the media storage provider - and saves the local media metadata. - 4. If the media is an image: - 1. Generates thumbnails. - 2. Generates an Open Graph response based on image properties. - 5. If the media is HTML: - 1. Decodes the HTML via the stored file. - 2. Generates an Open Graph response from the HTML. - 3. If a JSON oEmbed URL was found in the HTML via autodiscovery: - 1. Downloads the URL and stores it into a file via the media storage provider - and saves the local media metadata. - 2. Convert the oEmbed response to an Open Graph response. - 3. Override any Open Graph data from the HTML with data from oEmbed. - 4. If an image exists in the Open Graph response: - 1. Downloads the URL and stores it into a file via the media storage - provider and saves the local media metadata. - 2. Generates thumbnails. - 3. Updates the Open Graph response based on image properties. - 6. If the media is JSON and an oEmbed URL was found: - 1. Convert the oEmbed response to an Open Graph response. - 2. If a thumbnail or image is in the oEmbed response: - 1. Downloads the URL and stores it into a file via the media storage - provider and saves the local media metadata. - 2. Generates thumbnails. - 3. Updates the Open Graph response based on image properties. - 7. Stores the result in the database cache. -4. Returns the result. - -The in-memory cache expires after 1 hour. - -Expired entries in the database cache (and their associated media files) are -deleted every 10 seconds. The default expiration time is 1 hour from download. diff --git a/docs/media_repository.md b/docs/media_repository.md index ba17f8a85..23e6da7f3 100644 --- a/docs/media_repository.md +++ b/docs/media_repository.md @@ -7,8 +7,7 @@ The media repository users. * caches avatars, attachments and their thumbnails for media uploaded by remote users. - * caches resources and thumbnails used for - [URL previews](development/url_previews.md). + * caches resources and thumbnails used for URL previews. All media in Matrix can be identified by a unique [MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris), @@ -59,8 +58,6 @@ remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg Note that `remote_thumbnail/` does not have an `s`. ## URL Previews -See [URL Previews](development/url_previews.md) for documentation on the URL preview -process. When generating previews for URLs, Synapse may download and cache various resources, including images. These resources are assigned temporary media IDs diff --git a/synapse/rest/media/v1/preview_url_resource.py b/synapse/rest/media/v1/preview_url_resource.py index 54a849eac..b36c98a08 100644 --- a/synapse/rest/media/v1/preview_url_resource.py +++ b/synapse/rest/media/v1/preview_url_resource.py @@ -109,10 +109,64 @@ class MediaInfo: class PreviewUrlResource(DirectServeJsonResource): """ - Generating URL previews is a complicated task which many potential pitfalls. + The `GET /_matrix/media/r0/preview_url` endpoint provides a generic preview API + for URLs which outputs Open Graph (https://ogp.me/) responses (with some Matrix + specific additions). - See docs/development/url_previews.md for discussion of the design and - algorithm followed in this module. + This does have trade-offs compared to other designs: + + * Pros: + * Simple and flexible; can be used by any clients at any point + * Cons: + * If each homeserver provides one of these independently, all the homeservers in a + room may needlessly DoS the target URI + * The URL metadata must be stored somewhere, rather than just using Matrix + itself to store the media. + * Matrix cannot be used to distribute the metadata between homeservers. + + When Synapse is asked to preview a URL it does the following: + + 1. Checks against a URL blacklist (defined as `url_preview_url_blacklist` in the + config). + 2. Checks the URL against an in-memory cache and returns the result if it exists. (This + is also used to de-duplicate processing of multiple in-flight requests at once.) + 3. Kicks off a background process to generate a preview: + 1. Checks URL and timestamp against the database cache and returns the result if it + has not expired and was successful (a 2xx return code). + 2. Checks if the URL matches an oEmbed (https://oembed.com/) pattern. If it + does, update the URL to download. + 3. Downloads the URL and stores it into a file via the media storage provider + and saves the local media metadata. + 4. If the media is an image: + 1. Generates thumbnails. + 2. Generates an Open Graph response based on image properties. + 5. If the media is HTML: + 1. Decodes the HTML via the stored file. + 2. Generates an Open Graph response from the HTML. + 3. If a JSON oEmbed URL was found in the HTML via autodiscovery: + 1. Downloads the URL and stores it into a file via the media storage provider + and saves the local media metadata. + 2. Convert the oEmbed response to an Open Graph response. + 3. Override any Open Graph data from the HTML with data from oEmbed. + 4. If an image exists in the Open Graph response: + 1. Downloads the URL and stores it into a file via the media storage + provider and saves the local media metadata. + 2. Generates thumbnails. + 3. Updates the Open Graph response based on image properties. + 6. If the media is JSON and an oEmbed URL was found: + 1. Convert the oEmbed response to an Open Graph response. + 2. If a thumbnail or image is in the oEmbed response: + 1. Downloads the URL and stores it into a file via the media storage + provider and saves the local media metadata. + 2. Generates thumbnails. + 3. Updates the Open Graph response based on image properties. + 7. Stores the result in the database cache. + 4. Returns the result. + + The in-memory cache expires after 1 hour. + + Expired entries in the database cache (and their associated media files) are + deleted every 10 seconds. The default expiration time is 1 hour from download. """ isLeaf = True