linux/block
Tejun Heo 06b285bd11 blkcg: fix blkcg_policy_data allocation bug
e48453c386 ("block, cgroup: implement policy-specific per-blkcg
data") updated per-blkcg policy data to be dynamically allocated.
When a policy is registered, its policy data aren't created.  Instead,
when the policy is activated on a queue, the policy data are allocated
if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
This is buggy.  Consider the following scenario.

1. A blkcg is created.  No blkg's attached yet.

2. The policy is registered.  No policy data is allocated.

3. The policy is activated on a queue.  As the above blkcg doesn't
   have any blkg's, it won't allocate the matching blkcg_policy_data.

4. An IO is issued from the blkcg and blkg is created and the blkcg
   still doesn't have the matching policy data allocated.

With cfq-iosched, this leads to an oops.

It also doesn't free policy data on policy unregistration assuming
that freeing of all policy data on blkcg destruction should take care
of it; however, this also is incorrect.

1. A blkcg has policy data.

2. The policy gets unregistered but the policy data remains.

3. Another policy gets registered on the same slot.

4. Later, the new policy tries to allocate policy data on the previous
   blkcg but the slot is already occupied and gets skipped.  The
   policy ends up operating on the policy data of the previous policy.

There's no reason to manage blkcg_policy_data lazily.  The reason we
do lazy allocation of blkg's is that the number of all possible blkg's
is the product of cgroups and block devices which can reach a
surprising level.  blkcg_policy_data is contrained by the number of
cgroups and shouldn't be a problem.

This patch makes blkcg_policy_data to be allocated for all existing
blkcg's on policy registration and freed on unregistration and removes
blkcg_policy_data handling from policy [de]activation paths.  This
makes that blkcg_policy_data are created and removed with the policy
they belong to and fixes the above described problems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e48453c386 ("block, cgroup: implement policy-specific per-blkcg data")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-09 14:41:11 -06:00
..
partitions Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block 2015-02-12 14:13:23 -08:00
bio-integrity.c bio integrity: do not assume bio_integrity_pool exists if bioset exists 2015-07-07 07:46:47 -06:00
bio.c blkcg: implement bio_associate_blkcg() 2015-06-02 08:33:34 -06:00
blk-cgroup.c blkcg: fix blkcg_policy_data allocation bug 2015-07-09 14:41:11 -06:00
blk-core.c block: use FIELD_SIZEOF to calculate size of a field 2015-07-07 07:47:37 -06:00
blk-exec.c block: move PM request support to IDE 2015-05-05 13:40:42 -06:00
blk-flush.c blk-mq: support per-distpatch_queue flush machinery 2014-09-25 15:22:45 -06:00
blk-integrity.c writeback: separate out include/linux/backing-dev-defs.h 2015-06-02 08:33:34 -06:00
blk-ioc.c block: Substitute rcu_access_pointer() for rcu_dereference_raw() 2014-02-18 12:21:26 -08:00
blk-iopoll.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-03 12:57:53 -07:00
blk-lib.c block: Quiesce zeroout wrapper 2015-02-05 10:14:54 -07:00
blk-map.c blk_rq_map_user(): use import_single_range() 2015-04-11 22:27:13 -04:00
blk-merge.c block: only honor SG gap prevention for merges that contain data 2015-05-29 13:10:23 -06:00
blk-mq-cpu.c blk-mq: add file comments and update copyright notices 2014-05-28 10:15:41 -06:00
blk-mq-cpumap.c sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() 2015-05-27 15:22:15 +02:00
blk-mq-sysfs.c blk-mq: add blk_mq_init_allocated_queue and export blk_mq_register_disk 2015-03-13 08:26:53 -06:00
blk-mq-tag.c blk-mq: Shared tag enhancements 2015-06-01 14:35:56 -06:00
blk-mq-tag.h blk-mq: Shared tag enhancements 2015-06-01 14:35:56 -06:00
blk-mq.c Merge branch 'for-4.2/core' of git://git.kernel.dk/linux-block 2015-06-25 14:29:53 -07:00
blk-mq.h blk-mq: release mq's kobjects in blk_release_queue() 2015-01-29 08:30:51 -08:00
blk-settings.c block: fix blk_stack_limits() regression due to lcm() change 2015-03-31 09:45:50 -06:00
blk-softirq.c block: fix regression with block enabled tagging 2014-04-09 21:54:06 -06:00
blk-sysfs.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
blk-tag.c block: support different tag allocation policy 2015-01-23 14:15:46 -07:00
blk-throttle.c blkcg: move block/blk-cgroup.h to include/linux/blk-cgroup.h 2015-06-02 08:33:33 -06:00
blk-timeout.c blk-mq: Allow requests to never expire 2015-01-08 08:59:01 -07:00
blk.h blk-mq: make plug work for mutiple disks and queues 2015-05-08 14:17:23 -06:00
bounce.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
bsg-lib.c
bsg.c block: Simplify bsg complete all 2015-02-04 09:57:52 -07:00
cfq-iosched.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
cmdline-parser.c block: remove unrelated header files and export symbol 2014-01-21 20:18:26 -08:00
compat_ioctl.c block, bdi: an active gendisk always has a request_queue associated with it 2014-09-08 10:00:35 -06:00
deadline-iosched.c block: Stop abusing csd.list for fifo_time 2014-02-24 14:46:32 -08:00
elevator.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
genhd.c Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block 2015-06-25 16:00:17 -07:00
ioctl.c block: replace trylock with mutex_lock in blkdev_reread_part() 2015-05-20 09:05:45 -06:00
ioprio.c block: Fix computation of merged request priority 2014-10-31 08:30:43 -06:00
Kconfig block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00
Kconfig.iosched
Makefile block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00
noop-iosched.c elevator: Fix a race in elevator switching 2013-07-03 13:25:24 +02:00
partition-generic.c block: Fix dev_t minor allocation lifetime 2014-09-03 15:01:02 -06:00
scsi_ioctl.c block: fix bogus EFAULT error from SG_IO ioctl 2015-06-27 11:43:34 -06:00
t10-pi.c block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00