Re: [opensuse-factory] systemd
![](https://seccdn.libravatar.org/avatar/bcd65ca98f9d97839f15c54575c7edec.jpg?s=120&d=mm&r=g)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/24/2010 12:05 AM, Mike Galbraith wrote:
FYI, this isn't limited to openSUSE factory. Peterz has a repeatable testcase now (kvm image), and is tracing through it. Systemd is triggering a strange use after free cgroups problem.
Yep, but we knew that already. I was able to reproduce it with a vanilla kernel with the desktop config. CONFIG_PREEMPT seemed to have caused the difference.
In about 12 hours, I should have a copy of the thing to play with. Hopefully, Peter will have it all figured out before that, as cgroup.c is hard to read.
Even better. Thanks for looking into this. - -Jeff
-Mike https://lkml.org/lkml/2010/6/29/22
On Thu, 2010-12-23 at 13:33 +0100, Peter Zijlstra wrote:
systemd-1 0d..1. 2070793us : sched_destroy_group: se: f69e43c0, load: 1024 systemd-1 0d..1. 2070794us : sched_destroy_group: cfs_rq: f69e4720, nr: 1, load: 1024 systemd-1 0d..1. 2070794us : __print_runqueue: cfs_rq: f69e4720, nr: 1, load: 1024 systemd-1 0d..1. 2070795us : __print_runqueue: curr: (null) systemd-1 0d..1. 2070796us : __print_runqueue: se: f6a8eb4c, comm: systemd-tmpfile/1243, load: 1024 systemd-1 0d..1. 2070796us : _raw_spin_unlock_irqrestore <-sched_destroy_group
So somehow it manages to destroy a group with a task attached.
Its even weirder:
systemd-1 0d..1. 1663489us : sched_destroy_group: se: f69e7360, load: 1024 systemd-1 0d..1. 1663489us : sched_destroy_group: cfs_rq: f69e72a0, nr: 1, load: 1024 systemd-1 0d..1. 1663491us : __print_runqueue: cfs_rq: f69e72a0, nr: 1, load: 1024, cgroup: /system/systemd-sysctl.service systemd-1 0d..1. 1663491us : __print_runqueue: curr: (null) systemd-1 0d..1. 1663493us : __print_runqueue: se: f69d95bc, comm: systemd-sysctl/1209, load: 1024, cgroup: / systemd-1 0d..1. 1663496us : do_invalid_op <-error_code
The task enqueued to the cfs_rq doesn't match the cgroup, the thing is, I don't see a cpu_cgroup_attach/sched_move_task call in the log, nor does a BUG_ON() validating the task's cgroup against the cfs_rq's cgroup on account_entity_enqueue() trigger.
So it looks like a task changes cgroup without passing through the cgroup_subsys::attach method, which afaict isn't supposed to happen.
- -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk0U6UMACgkQLPWxlyuTD7IBcQCfZFsaNG0N9HxKxPRwjbyydKxc XqIAniqZ7HKSAF72pWeM8D0bmT2YtT3E =LUzP -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/ded95a69e84413169753f9bd4e110178.jpg?s=120&d=mm&r=g)
On Fri, 2010-12-24 at 13:41 -0500, Jeff Mahoney wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/24/2010 12:05 AM, Mike Galbraith wrote:
FYI, this isn't limited to openSUSE factory. Peterz has a repeatable testcase now (kvm image), and is tracing through it. Systemd is triggering a strange use after free cgroups problem.
Yep, but we knew that already. I was able to reproduce it with a vanilla kernel with the desktop config. CONFIG_PREEMPT seemed to have caused the difference.
In about 12 hours, I should have a copy of the thing to play with. Hopefully, Peter will have it all figured out before that, as cgroup.c is hard to read.
Even better.
Eyeballs fingered the bad thing spot before my dog slow download finished, and Peter has subsequently confirmed/plugged the hole. Problem was cgroup_exit() assigning exiting tasks to the root task group without actually moving it. In a CONFIG_PREEMPT kernel, preemption after that assignment means you'll be enqueued on the cgroup cfs_rq, which can go away if you were the last task with a reference. When you get back to the CPU, boom, use after free. -Mike -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/ed90d0132a4f59f2d3a1cf82a1b70915.jpg?s=120&d=mm&r=g)
On Sat, 25 Dec 2010 03:33:23 +0100 Mike Galbraith <mgalbraith@suse.de> wrote:
Eyeballs fingered the bad thing spot before my dog slow download finished, and Peter has subsequently confirmed/plugged the hole.
And in plain english this means: "it is fixed in kernel-*-2.6.xxx.yyy-zzz.rpm" for which xxx, yyy and zzz? Or for which rpm changelog entry do we need to look? ;)
Problem was cgroup_exit() assigning exiting tasks to the root task group without actually moving it. In a CONFIG_PREEMPT kernel, preemption after that assignment means you'll be enqueued on the cgroup cfs_rq, which can go away if you were the last task with a reference. When you get back to the CPU, boom, use after free.
Oops, nasty one. Lucky me it has only hit me on test-VMs and never on my production laptop which is running fine the last few weeks. Thanks -- Stefan Seyfried "Dispatch war rocket Ajax to bring back his body!" -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/ded95a69e84413169753f9bd4e110178.jpg?s=120&d=mm&r=g)
On Sat, 2010-12-25 at 12:15 +0100, Stefan Seyfried wrote:
And in plain english this means:
"it is fixed in kernel-*-2.6.xxx.yyy-zzz.rpm" for which xxx, yyy and zzz?
Or for which rpm changelog entry do we need to look? ;)
The fix was baked on Christmas eve.. Christmas day is reserved for swilling eggnog and whatnot, so you'll have to wait a bit for it to appear in any tree :) -Mike Subject: sched, cgroup: Use exit hook to avoid use-after-free crash From: Peter Zijlstra <peterz@infradead.org> Date: Fri, 24 Dec 2010 16:59:13 +0100 References: <AANLkTin49UHeVhfS-iFwWvPIg29HPhXaP3DorBAa-a0I@mail.gmail.com> By not notifying the controller of the on-exit move back to init_css_set, we fail to move the task out of the previous cgroup's cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and free the cgroup (there are no active tasks left in it after all) to which the not-quite dead task is still enqueued. Cc: stable@kernel.org Reported-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- kernel/sched.c | 10 ++++++++++ 1 file changed, 10 insertions(+) Index: linux-2.6.37.git/kernel/sched.c =================================================================== --- linux-2.6.37.git.orig/kernel/sched.c +++ linux-2.6.37.git/kernel/sched.c @@ -613,6 +613,9 @@ static inline struct task_group *task_gr struct task_group *tg; struct cgroup_subsys_state *css; + if (p->flags & PF_EXITING) + return &root_task_group; + css = task_subsys_state_check(p, cpu_cgroup_subsys_id, lockdep_is_held(&task_rq(p)->lock)); tg = container_of(css, struct task_group, css); @@ -9187,6 +9190,12 @@ cpu_cgroup_attach(struct cgroup_subsys * } } +static void +cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task) +{ + sched_move_task(task); +} + #ifdef CONFIG_FAIR_GROUP_SCHED static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype, u64 shareval) @@ -9259,6 +9268,7 @@ struct cgroup_subsys cpu_cgroup_subsys = .destroy = cpu_cgroup_destroy, .can_attach = cpu_cgroup_can_attach, .attach = cpu_cgroup_attach, + .exit = cpu_cgroup_exit, .populate = cpu_cgroup_populate, .subsys_id = cpu_cgroup_subsys_id, .early_init = 1, -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/bcd65ca98f9d97839f15c54575c7edec.jpg?s=120&d=mm&r=g)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/25/2010 08:45 AM, Mike Galbraith wrote:
On Sat, 2010-12-25 at 12:15 +0100, Stefan Seyfried wrote:
And in plain english this means:
"it is fixed in kernel-*-2.6.xxx.yyy-zzz.rpm" for which xxx, yyy and zzz?
Or for which rpm changelog entry do we need to look? ;)
The fix was baked on Christmas eve.. Christmas day is reserved for swilling eggnog and whatnot, so you'll have to wait a bit for it to appear in any tree :)
I've accepted this into the master kernel. It fixes the crashes and I'll take the performance impact over a kernel oops until the real fix is released. Upstream discussion here: http://groups.google.com/group/linux.kernel/browse_thread/thread/549060f2310... - -Jeff
Subject: sched, cgroup: Use exit hook to avoid use-after-free crash From: Peter Zijlstra <peterz@infradead.org> Date: Fri, 24 Dec 2010 16:59:13 +0100 References: <AANLkTin49UHeVhfS-iFwWvPIg29HPhXaP3DorBAa-a0I@mail.gmail.com>
By not notifying the controller of the on-exit move back to init_css_set, we fail to move the task out of the previous cgroup's cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and free the cgroup (there are no active tasks left in it after all) to which the not-quite dead task is still enqueued.
Cc: stable@kernel.org Reported-by: Miklos Vajna <vmiklos@frugalware.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- kernel/sched.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
Index: linux-2.6.37.git/kernel/sched.c =================================================================== --- linux-2.6.37.git.orig/kernel/sched.c +++ linux-2.6.37.git/kernel/sched.c @@ -613,6 +613,9 @@ static inline struct task_group *task_gr struct task_group *tg; struct cgroup_subsys_state *css;
+ if (p->flags & PF_EXITING) + return &root_task_group; + css = task_subsys_state_check(p, cpu_cgroup_subsys_id, lockdep_is_held(&task_rq(p)->lock)); tg = container_of(css, struct task_group, css); @@ -9187,6 +9190,12 @@ cpu_cgroup_attach(struct cgroup_subsys * } }
+static void +cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task) +{ + sched_move_task(task); +} + #ifdef CONFIG_FAIR_GROUP_SCHED static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype, u64 shareval) @@ -9259,6 +9268,7 @@ struct cgroup_subsys cpu_cgroup_subsys = .destroy = cpu_cgroup_destroy, .can_attach = cpu_cgroup_can_attach, .attach = cpu_cgroup_attach, + .exit = cpu_cgroup_exit, .populate = cpu_cgroup_populate, .subsys_id = cpu_cgroup_subsys_id, .early_init = 1,
- -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk0ZEfoACgkQLPWxlyuTD7K6AQCgmd0eALyxW9/x/HzPA9mV5W4+ tDkAn2OSERLnxgifw+Pxwr+pp+4T2yDN =NBCC -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
![](https://seccdn.libravatar.org/avatar/0edf09abc2dffb4df54af190ea29ffa1.jpg?s=120&d=mm&r=g)
On 12/27/2010 11:23 PM, Jeff Mahoney wrote:
On 12/25/2010 08:45 AM, Mike Galbraith wrote:
On Sat, 2010-12-25 at 12:15 +0100, Stefan Seyfried wrote:
And in plain english this means:
"it is fixed in kernel-*-2.6.xxx.yyy-zzz.rpm" for which xxx, yyy and zzz?
Or for which rpm changelog entry do we need to look? ;)
The fix was baked on Christmas eve.. Christmas day is reserved for swilling eggnog and whatnot, so you'll have to wait a bit for it to appear in any tree :)
I've accepted this into the master kernel. It fixes the crashes and I'll take the performance impact over a kernel oops until the real fix is released. Thanks, my systemd testmachine in vmware did not crash ever since! Bye, CzP -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
participants (4)
-
Jeff Mahoney
-
Mike Galbraith
-
Peter Czanik
-
Stefan Seyfried