linux/net
Tejun Heo bd1060a1d6 sock, cgroup: add sock->sk_cgroup
In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound.  As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.

net_cls and net_prio controllers are examples of the latter.  They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.

Both net_cls and net_prio aren't properly hierarchical.  Both inherit
configuration from the parent on creation but there's no interaction
afterwards.  An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level.  net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.

While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.

In preparation, this patch updates sock->sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used.  Once either of the two is used,
sock->sk_cgrp_data reverts to its previous role of carrying prioidx
and classid.  This is to avoid adding yet another cgroup related field
to struct sock.

As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead.  It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs.  Non-critical inaccuracies from small race windows won't
make any noticeable difference.

This patch doesn't make use of the pointer yet.  The following patch
will implement netfilter match for cgroup2 membership.

v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
    cgroup specific field.

v3: Add comments explaining why sock_data_prioidx() and
    sock_data_classid() use different fallback values.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-08 22:02:33 -05:00
..
6lowpan 6lowpan: put mcast compression in an own function 2015-10-21 00:49:25 +02:00
9p IB/cma: Add support for network namespaces 2015-10-28 12:32:48 -04:00
802
8021q vlan: Do not put vlan headers back on bridge and macvlan ports 2015-11-17 14:38:35 -05:00
appletalk
atm net: Generalise wq_has_sleeper helper 2015-11-30 14:47:33 -05:00
ax25
batman-adv batman-adv: Act on NETDEV_*_TYPE_CHANGE events 2015-12-05 17:41:42 -05:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
bridge net: add possibility to pass information about upper device via notifier 2015-12-03 11:49:25 -05:00
caif net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
can can: avoid using timeval for uapi 2015-10-13 17:42:34 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2015-11-13 09:24:40 -08:00
core sock, cgroup: add sock->sk_cgroup 2015-12-08 22:02:33 -05:00
dcb net/dcb: make dcbnl.c explicitly non-modular 2015-10-09 07:52:27 -07:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
decnet net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
dns_resolver net: dns_resolver: convert time_t to time64_t 2015-11-18 16:27:46 -05:00
dsa net: dsa: move dsa slave destroy code to slave.c 2015-12-07 16:35:50 -05:00
ethernet
hsr net/hsr: fix a warning message 2015-11-23 14:56:15 -05:00
ieee802154 net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
ipv4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
ipv6 ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addresses 2015-12-05 17:41:42 -05:00
ipx
irda TTY/Serial driver patches for 4.4-rc1 2015-11-04 21:35:12 -08:00
iucv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
key af_key: fix two typos 2015-10-23 03:05:19 -07:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
l3mdev net: Add netif_is_l3_slave 2015-10-07 04:27:43 -07:00
lapb
llc
mac80211 mac80211: handle HW ROC expired properly 2015-12-07 11:06:37 +01:00
mac802154 mac802154: Delete an unnecessary check before the function call "kfree_skb" 2015-11-19 17:50:32 +01:00
mpls mpls: support for dead routes 2015-12-03 15:03:27 -05:00
netfilter net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
netlabel
netlink mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
netrom
nfc net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
packet packet: Allow packets with only a header (but no payload) 2015-11-29 22:17:17 -05:00
phonet
rds RDS: fix race condition when sending a message on unbound socket 2015-11-24 17:20:09 -05:00
rfkill
rose
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
sched net_sched: fix qdisc_tree_decrease_qlen() races 2015-12-03 14:59:05 -05:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 16:02:46 -08:00
switchdev switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion 2015-11-03 13:39:21 -05:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-03 21:09:12 -05:00
unix af_unix: fix unix_dgram_recvmsg entry locking 2015-12-06 23:31:54 -05:00
vmw_vsock Revert "Merge branch 'vsock-virtio'" 2015-12-08 21:55:49 -05:00
wimax
wireless cfg80211: reg: Refactor calculation of bandwidth flags 2015-12-04 14:43:32 +01:00
x25
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2015-10-30 20:51:56 +09:00
compat.c
Kconfig net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
Makefile net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
socket.c net: fix sock_wake_async() rcu protection 2015-12-01 15:45:05 -05:00
sysctl_net.c net: sysctl: fix a kmemleak warning 2015-10-23 06:22:08 -07:00