TROUBLESHOOTING
- Useful program options
- Recovering from broken configurations
- Recovering from database corruption
- Trouble with reverse proxies and middlewares
- Timeouts while resolving domain names
Useful program options
Start the daemon with one or more of the following program options to make it easier to troubleshoot and perform maintenance:
-
-debug
increases the logging level to the terminal. -
-single
will start in "single user mode" which is a convenience combination of-nolisten -wa -console
options described below. -
-safe
is more restrictive than-single
by denying outbound network requests and background database tasks. -
-nolisten
will disable the loading of any listener sockets during startup. -
-nobackfill
specifically skips the initial-backfill background task. -
-wa
write-avoid will discourage (but not deny) writes to the database. This prevents a lot of background tasks and other noise, making it easier to conduct maintenance (implies-nobackfill
). -
-ro
read-only is more restrictive than-wa
by denying all writes to the database. -
-slave
allows read-only access to a live database by additional instances of Construct. Only one instance of Construct may have write access to a database at a time; additional instances use this option. -
-console
convenience to immediately drop to the adminstrator console after startup.
Recovering from broken configurations
If your server ever fails to start from an errant conf item: you can override any item using an environmental variable before starting the program. To do this simply replace the '.' characters with '_' in the name of the item when setting it in the environment. The name is otherwise the same, including its lower case.
Otherwise, the program can be run with the option -defaults
. This will
prevent initial loading of the configuration from the database. It will
not prevent environmental variable overrides (as mentioned above). Values
will not be written back to the database unless they are explicitly set by
the user in the console.
Recovering from database corruption
In very rare cases after a hard crash the journal cannot completely restore data before the crash. Due to the design of rocksdb and the way we apply it for Matrix, data is lost in chronological order starting from the most recent transaction (matrix event). The database is consistent for all events up until the first corrupt event, called the point-in-time.
When any loss has occurred the daemon will fail to start normally. To enable
point-in-time recovery use the command-line option -recoverdb point
at the next
invocation. Some recent events may be lost. If -recoverdb point
does not work,
others techniques may be invoked as detailed below.
In some cases the daemon will start normally without the need for any recovery mode
but later encounter hard corruption. Only the -recoverdb repair
mode is effective
against this.
❗ It is advised that any recovery is performed in
-single
mode. Additional program options such as-safe
or-ro
may be useful for some salvage techniques.
❗ After employing a salvage mode one should strongly consider an events dump and rebuild.
-recoverdb <option>
- 🟢 point - Recovery mode; rewinds the database to the last consistent state before corruption.
- 🔴 skip - Salvage mode; drops recent corrupt data, which will leave the database in an inconsistent state.
- 🔴 tolerate - Salvage mode; expert use only.
- 🔴 repair - Salvage mode; finds and drops deep corruption. This will leave the database in an inconsistent state.
Trouble with reverse proxies and middlewares
Construct is designed to be capable internet service software and should perform best when directly interfacing with remote parties. Nevertheless, some users wish to employ middlewares known as "reverse-proxies" through which all communication is forwarded. This gives the appearance, from the server's perspective, that all clients are connecting from the same IP address on different ports.
At this time there are some known issues with reverse proxies which may be mitigated by administrators having reviewed the following:
- Construct now supports plaintext listener sockets and this point can be ignored. If the proxy generates ACME certificates you can use those same certificates to encrypt the link to Construct. The proxy will have to be configured to forward SNI, for example with Caddy:
https://construct.chat:8448
reverse_proxy https://localhost:1234 {
transport http {
tls_server_name construct.chat
}
}
- If the proxy does not run on localhost, the connection limit from a single remote IP address must be raised from its default, for example by entering the following in !control or console:
conf set ircd.client.max_client_per_peer 65535
-
Avoid rewriting the
Host:
header which is sent to Construct. The header should appear as sent by remote clients. This is no longer a hard requirement with recent versions of Matrix protocol and this is likely not the source of your trouble. -
Ensure the reverse-proxy is not setting
Connection: close
when communicating to Construct. The ideal middleware is configured to maintain a pool of persistent connections and pipeline requests. As a hint based on Construct's default settings at the time of this writing, the optimal connection count from the middleware is 64, and up to 128.
Timeouts while resolving domain names
Due to the abnormal loads of Matrix, Construct implements custom DNS resolution directly over UDP. Construct does not use 127.0.0.1
or any locally provided DNS servers by default after nearly all users reported issues which required them to reconfigure or upgrade their service. To ship the least-broken solution by default, Construct is pre-configured with an array of public servers to query in a load-balanced round-robin. To view the DNS configuration in its entirety use the command: conf ircd.net.dns
.
-
Reduce the rate-limits to slow down queries made to the servers. This can be done with
conf ircd.net.dns.resolver.send_rate
which is a millisecond value to wait between requests; higher is slower:conf set ircd.net.dns.resolver.send_rate 300
-
The
conf ircd.net.dns.resolver.send_burst
can be tweaked in conjunction with theconf ircd.net.dns.resolver.send_rate
to more effectively shape the load as tolerated by the remote server's rate-limiting scheme. The burst is important to keep requests in flight to utilize the array while minimizing local delays for a lot of resolutions. If necessary, try setting a lower value, or1
to never exceed thesend_rate
with any burst. -
Add or replace the default configured array of servers. The configuration at
conf ircd.net.dns.resolver.servers
is a string of IP addresses separated by spaces. It is better to add more servers than to replace the existing, but it is worse to add a server which is configured very differently from the others. The default servers were chosen because they have reasonably high rates and are consistent among themselves; they may not be the best choice for all users, especially in Europe and Asia.
👉 Administrators are tempted to simply replace the array with
127.0.0.1
to use their own high-performance service: this is okay, but theircd.net.dns.resolver.send_rate
may need to be configured significantly faster (lower) than the default if only one server is configured rather than the default six.