forked from MirrorHub/synapse
2.3 KiB
2.3 KiB
URL Previews
The GET /_matrix/media/r0/preview_url
endpoint provides a generic preview API
for URLs which outputs Open Graph responses (with some Matrix
specific additions).
This does have trade-offs compared to other designs:
- Pros:
- Simple and flexible; can be used by any clients at any point
- Cons:
- If each homeserver provides one of these independently, all the HSes in a room may needlessly DoS the target URI
- The URL metadata must be stored somewhere, rather than just using Matrix itself to store the media.
- Matrix cannot be used to distribute the metadata between homeservers.
When Synapse is asked to preview a URL it does the following:
- Checks against a URL blacklist (defined as
url_preview_url_blacklist
in the config). - Checks the in-memory cache by URLs and returns the result if it exists. (This is also used to de-duplicate processing of multiple in-flight requests at once.)
- Kicks off a background process to generate a preview:
- Checks the database cache by URL and timestamp and returns the result if it has not expired and was successful (a 2xx return code).
- Checks if the URL matches an oEmbed pattern. If it does, fetch the oEmbed response. If this is an image, replace the URL to fetch and continue. If if it is HTML content, use the HTML as the document and continue.
- If it doesn't match an oEmbed pattern, downloads the URL and stores it into a file via the media storage provider and saves the local media metadata.
- If the media is an image:
- Generates thumbnails.
- Generates an Open Graph response based on image properties.
- If the media is HTML:
- Decodes the HTML via the stored file.
- Generates an Open Graph response from the HTML.
- If an image exists in the Open Graph response:
- Downloads the URL and stores it into a file via the media storage provider and saves the local media metadata.
- Generates thumbnails.
- Updates the Open Graph response based on image properties.
- Stores the result in the database cache.
- Returns the result.
The in-memory cache expires after 1 hour.
Expired entries in the database cache (and their associated media files) are deleted every 10 seconds. The default expiration time is 1 hour from download.