After dozen hours of debugging starting from start_kernel (preempt count starts leaking there) through lockdep code (bah!) I got to stack dumping called in lockdep. I don't know where exactly, but dump_stack() also leaks preempt_count on x86_32.