Noted by @Infinisil on IRC:
infinisil: Question regarding the confinement PR
infinisil: On line 136 you do different things depending on
RootDirectoryStartOnly
infinisil: But on line 157 you have an assertion that disallows that
option being true
infinisil: Is there a reason behind this or am I missing something
I originally left this in so that once systemd supports that, we can
just flip a switch and remove the assertion and thus support
RootDirectoryStartOnly for our confinement module.
However, this doesn't seem to be on the roadmap for systemd in the
foreseeable future, so I'll just remove this, especially because it's
very easy to add it again, once it is supported.
Signed-off-by: aszlig <aszlig@nix.build>
My implementation was relying on PrivateDevices, PrivateTmp,
PrivateUsers and others to be false by default if chroot-only mode is
used.
However there is an ongoing effort[1] to change these defaults, which
then will actually increase the attack surface in chroot-only mode,
because it is expected that there is no /dev, /sys or /proc.
If for example PrivateDevices is enabled by default, there suddenly will
be a mounted /dev in the chroot and we wouldn't detect it.
Fortunately, our tests cover that, but I'm preparing for this anyway so
that we have a smoother transition without the need to fix our
implementation again.
Thanks to @Infinisil for the heads-up.
[1]: https://github.com/NixOS/nixpkgs/issues/14645
Signed-off-by: aszlig <aszlig@nix.build>
From @edolstra at [1]:
BTW we probably should take the closure of the whole unit rather than
just the exec commands, to handle things like Environment variables.
With this commit, there is now a "fullUnit" option, which can be enabled
to include the full closure of the service unit into the chroot.
However, I did not enable this by default, because I do disagree here
and *especially* things like environment variables or environment files
shouldn't be in the closure of the chroot.
For example if you have something like:
{ pkgs, ... }:
{
systemd.services.foobar = {
serviceConfig.EnvironmentFile = ${pkgs.writeText "secrets" ''
user=admin
password=abcdefg
'';
};
}
We really do not want the *file* to end up in the chroot, but rather
just the environment variables to be exported.
Another thing is that this makes it less predictable what actually will
end up in the chroot, because we have a "globalEnvironment" option that
will get merged in as well, so users adding stuff to that option will
also make it available in confined units.
I also added a big fat warning about that in the description of the
fullUnit option.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Another thing requested by @edolstra in [1]:
We should not provide a different /bin/sh in the chroot, that's just
asking for confusion and random shell script breakage. It should be
the same shell (i.e. bash) as in a regular environment.
While I personally would even go as far to even have a very restricted
shell that is not even a shell and basically *only* allows "/bin/sh -c"
with only *very* minimal parsing of shell syntax, I do agree that people
expect /bin/sh to be bash (or the one configured by environment.binsh)
on NixOS.
So this should make both others and me happy in that I could just use
confinement.binSh = "${pkgs.dash}/bin/dash" for the services I confine.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Quoting @edolstra from [1]:
I don't really like the name "chroot", something like "confine[ment]"
or "restrict" seems better. Conceptually we're not providing a
completely different filesystem tree but a restricted view of the same
tree.
I already used "confinement" as a sub-option and I do agree that
"chroot" sounds a bit too specific (especially because not *only* chroot
is involved).
So this changes the module name and its option to use "confinement"
instead of "chroot" and also renames the "chroot.confinement" to
"confinement.mode".
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Currently, if you want to properly chroot a systemd service, you could
do it using BindReadOnlyPaths=/nix/store (which is not what I'd call
"properly", because the whole store is still accessible) or use a
separate derivation that gathers the runtime closure of the service you
want to chroot. The former is the easier method and there is also a
method directly offered by systemd, called ProtectSystem, which still
leaves the whole store accessible. The latter however is a bit more
involved, because you need to bind-mount each store path of the runtime
closure of the service you want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages. That's also what I did several times[1][2] in the
past.
However, this process got a bit tedious, so I decided that it would be
generally useful for NixOS, so this very implementation was born.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.yourservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
chroot.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes "script" and {pre,post}Start) need to be in the
chroot, it can be specified using the chroot.packages option. By
default (which uses the "full-apivfs"[3] confinement mode), a user
namespace is set up as well and /proc, /sys and /dev are mounted
appropriately.
In addition - and by default - a /bin/sh executable is provided as well,
which is useful for most programs that use the system() C library call
to execute commands via shell. The shell providing /bin/sh is dash
instead of the default in NixOS (which is bash), because it's way more
lightweight and after all we're chrooting because we want to lower the
attack surface and it should be only used for "/bin/sh -c something".
Prior to submitting this here, I did a first implementation of this
outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality
from systemd-lib.nix, just because it's only a single line.
However, I decided to just re-use the one from systemd here and
subsequently made it available when importing systemd-lib.nix, so that
the systemd-chroot implementation also benefits from fixes to that
functionality (which is now a proper function).
Unfortunately, we do have a few limitations as well. The first being
that DynamicUser doesn't work in conjunction with tmpfs, because it
already sets up a tmpfs in a different path and simply ignores the one
we define. We could probably solve this by detecting it and try to
bind-mount our paths to that different path whenever DynamicUser is
enabled.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and not the
individual bind mounts or our tmpfs. It would be helpful if systemd
would have a way to disable specific bind mounts as well or at least
have some way to ignore failures for the bind mounts/tmpfs setup.
Another quirk we do have right now is that systemd tries to create a
/usr directory within the chroot, which subsequently fails. Fortunately,
this is just an ugly error and not a hard failure.
[1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62
[2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124
[3]: The reason this is called "full-apivfs" instead of just "full" is
to make room for a *real* "full" confinement mode, which is more
restrictive even.
[4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix
Signed-off-by: aszlig <aszlig@nix.build>
Most tests are fine (only 20 out of 173 fail, 10 because lsb_release is
not available or returns exit code 3 and the other 10 might also fail
due to the sandboxed build environment).
Manual tests show that distro works as intended in a normal environment.
See 9382d2ea76
Can cause issues where lightdm prefers the last saved session in .dmrc
even if it doesn't exist anymore, preventing people from logging in to
a session.