This should NOT be backported to 20.09!
When 21.03 is released, the DB changes are about a year old and
operators had two release cycles for the upgrade. At this point it
should be fair to remove the compat layer to reduce the complexity of
the module itself.
In the current implementation, there's no possibility to modify the default
parameter for keepalive. This is a number that indicates how frequently
keepalive messages should be sent from the worker to the buildmaster,
expressed in seconds. The default (600) causes a message to be sent to
the buildmaster at least once every 10 minutes.
If the worker is behind a NAT box or stateful firewall, these messages
may help to keep the connection alive: some NAT boxes tend to forget about
a connection if it has not been used in a while. When this happens, the
buildmaster will think that the worker has disappeared, and builds will
time out. Meanwhile the worker will not realize than anything is wrong.
Since Buildbot 0.9.0, status targets were deprecated and ignored.
There's a very small line on startup explaining that, and status simply
isn't reported. Avoid others the same headaches, and do it right in the
NixOS module.
As there might have been changes in the way reporters are organized, and
configuration might need to be migrated remove the old option, and not
just provide an alias.
It's pbPort, and it's also a connection string, meaning
listen-on-localhost is also possible. Provide an alias for the old
option name, so old configs still work.
We already set the relevant env vars in the systemd services. That does
not help one when executing any of the executables outside a service,
e.g. when creating a new user.
Also removed `pkgs.hydra-flakes` since flake-support has been merged
into master[1]. Because of that, `pkgs.hydra-unstable` is now compiled
against `pkgs.nixFlakes` and currently requires a patch since Hydra's
master doesn't compile[2] atm.
[1] https://github.com/NixOS/hydra/pull/730
[2] https://github.com/NixOS/hydra/pull/732
Upgrades Hydra to the latest master/flake branch. To perform this
upgrade, it's needed to do a non-trivial db-migration which provides a
massive performance-improvement[1].
The basic ideas behind multi-step upgrades of services between NixOS versions
have been gathered already[2]. For further context it's recommended to
read this first.
Basically, the following steps are needed:
* Upgrade to a non-breaking version of Hydra with the db-changes
(columns are still nullable here). If `system.stateVersion` is set to
something older than 20.03, the package will be selected
automatically, otherwise `pkgs.hydra-migration` needs to be used.
* Run `hydra-backfill-ids` on the server.
* Deploy either `pkgs.hydra-unstable` (for Hydra master) or
`pkgs.hydra-flakes` (for flakes-support) to activate the optimization.
The steps are also documented in the release-notes and in the module
using `warnings`.
`pkgs.hydra` has been removed as latest Hydra doesn't compile with
`pkgs.nixStable` and to ensure a graceful migration using the newly
introduced packages.
To verify the approach, a simple vm-test has been added which verifies
the migration steps.
[1] https://github.com/NixOS/hydra/pull/711
[2] https://github.com/NixOS/nixpkgs/pull/82353#issuecomment-598269471
* nixos/buildkite: drop user option
This reverts 8c6b1c3eaa.
Turns out, buildkite-agent has logic to write .ssh/known_hosts files and
only really works when $HOME and the user homedir are in sync.
On top of that, we provision ssh keys in /var/lib/buildkite-agent, which
doesn't work if that other users' homedir points elsewhere (we can cheat
by setting $HOME, but then getent and $HOME provide conflicting
results).
So after all, it's better to only run the system-wide buildkite agent as
the "buildkite-agent" user only - if one wants to run buildkite as
different users, systemd user services might be a better fit.
* nixosTests.buildkite-agent: add node with separate user and no ssh key
This gets passed to BUILDKITE_SHELL, which will specify the shell being
used to executes script in.
Defaults to `${pkgs.bash}/bin/bash -e -c`, matching how buildkite
behaves on other distros.
SSH public keys aren't needed to clone private repos, and if we only
need to configure a single attribute, there's no need for the "openssh"
attrset anymore.
This applies [hydra PR #432](https://github.com/NixOS/hydra/pull/432)
to the NixOS module in nixpkgs:
```
commit 4efd078977e5ea20e1104783efc324cba11690bc
Author: Bas van Dijk <v.dijk.bas@gmail.com>
Date: Sun Dec 11 15:35:38 2016 +0100
Only set buildMachinesFiles when nix.buildMachines is defined
```
The following commit from 2016 in hydra removed the `--option
build-use-substitutes` from the hydra-queue-runner service:
```
commit ee2e9f5335c8c0288c102975b506f6b275793cfe
Author: Eelco Dolstra <edolstra@gmail.com>
Date: Fri Oct 7 20:23:05 2016 +0200
Update to reflect BinaryCacheStore changes
BinaryCacheStore no longer implements buildPaths() and ensurePath(),
so we need to use copyPath() / copyClosure().
```
It would be better if the hydra module in NixOS matches the upstream
module.
During the last update, `hydra-notify` was rewritten as a daemon which
listens to postgresql notifications for each build[1]. The module
uses the `hydra-notify.service` unit from upstream's Hydra module and
the VM test ensures that email notifications are sent properly.
Also updated `hydra-init.service` to install `pg_trgm` on a local
database if needed[2].
[1] c7861b85c4
[2] 8a0a5ec3a3
Currently there are two calls to curl in the reloadScript, neither which
check for errors. If something is misconfigured (like wrong authToken),
the only trace that something wrong happened is this log message:
Asking Jenkins to reload config
<h1>Bad Message 400</h1><pre>reason: Illegal character VCHAR='<'</pre>
The service isn't marked as failed, so it's easy to miss.
Fix it by passing --fail to curl.
While at it:
* Add $curl_opts and $jenkins_url variables to keep the curl command
lines DRY.
* Add --show-error to curl to show short error message explanation when
things go wrong (like HTTP 401 error).
* Lower-case the $CRUMB variable as upper case is for exported environment
variables.
The new behaviour, when having wrong accessToken:
Asking Jenkins to reload config
curl: (22) The requested URL returned error: 401
And the service is clearly marked as failed in `systemctl --failed`.