Hi, On Tue, 23 Jun 2009, Greg KH wrote:
No, it's to tell the kernel exactly when to initialize the code at what part during the init level processing, and link order matters.
"exactly when to initialize the code" == "addresses dependencies", isn't it?
No, see below for details.
I stay by the above claim after having read your mail. You essentially want to start some thing early, before other things, hence there's a dependency from those other things to the first thing. Not out of correctness needs but out of speed needs. But reasons for dependencies don't matter for making them dependencies :) In any case, it's just idle terminology and a minor point.
Excuse me for not being up-to-date wrt. the kernel anymore, but isn't this done via the .init sections?
Yes it is, but order within the .init sections matter.
Yes, understood. This is determined by the link order. There is no reason that the hypothetical lump-modules-together-and-attach-to-vmlinux program cannot also observe some ordering given from the outside. In fact part of this program will be calls to the normal linker to join together the individual .o files of modules (possibly after stripping away some uninteresting sections), and that again establishes a certain order in the newly created initcall section. That's the point where you turn knobs to start some things earlier than other things, much like you right now had changed the order in the Makefile.
fact isn't going to change this principle. You still have a .initcall section (well, two of them, one for the built kernel, one for the module lump) which the kernel proper would iterate over very early (after determining existence of the second initcall table).
We really have 8 different levels of init calls in the kernel these days: #define pure_initcall(fn) __define_initcall("0",fn,0) #define core_initcall(fn) __define_initcall("1",fn,1) #define core_initcall_sync(fn) __define_initcall("1s",fn,1s) #define postcore_initcall(fn) __define_initcall("2",fn,2) #define postcore_initcall_sync(fn) __define_initcall("2s",fn,2s) #define arch_initcall(fn) __define_initcall("3",fn,3) #define arch_initcall_sync(fn) __define_initcall("3s",fn,3s) #define subsys_initcall(fn) __define_initcall("4",fn,4) #define subsys_initcall_sync(fn) __define_initcall("4s",fn,4s) #define fs_initcall(fn) __define_initcall("5",fn,5) #define fs_initcall_sync(fn) __define_initcall("5s",fn,5s) #define rootfs_initcall(fn) __define_initcall("rootfs",fn,rootfs) #define device_initcall(fn) __define_initcall("6",fn,6) #define device_initcall_sync(fn) __define_initcall("6s",fn,6s) #define late_initcall(fn) __define_initcall("7",fn,7) #define late_initcall_sync(fn) __define_initcall("7s",fn,7s)
This doesn't change the picture, as long as this information is preserved in the module .o files ...
If you build any code as a module, any of these different levels all change to be the "generic" module_init() call, which runs after all of these 8 levels runs. So you can't work backwards and figure out what level of init call the module really wanted to be run at if you only have a .o file.
... maybe you can't currently, but we are talking about ways to improve the sitation. One of the necessary things would be to _not_ throw away this information. You would then end up with multiple .initcall sections also in modules. That's perfectly fine if the module loader simply iterates over all of them in order. The linker script can make sure that all these .initcall sections are lying next to each other (i.e. just remain separate in the ELF section list), so that the current code doesn't even need to be changed, if I'm reading it right.
And then, within the different init call levels, we call the functions in the order in which they are linked into the kernel, which is driven by the Makefile.
Yes, understood. This will still work, except for one detail: you wouldn't have just one chunk of such .initcall sections (the one in vmlinux proper), but two. So before iterating the next level you have to iterate the current level twice: once the list in vmlinux, once the list in module-lump. Then go to next level, do the same. It is true then that all things in module-lump will run after vmlinux things of the same level L. If that isn't wanted you need to introduce a new level L-0.5 for the module-lump things, at voila, they are run before the vmlinux things at level L. Inside one level again the link order specifies the order, the link order of vmlinux (as specified in the Makefile) for vmlinux things, the link order when creating module-lump for those things.
Remember, we are talking about a whole boot time of the kernel to be less than a second right now, so optimizations like this are essencial to get there.
Yes, understood. I don't see why that wouldn't be possible with the lumping together anymore, under the condition that modules will contain the original initcall sections in the future. I think now is the first time it actually could pay back that modules are relocatable objects instead of DSOs. The latter would be considerably more difficult to join together into one bucket, with relocatable objects it's trivial. Ciao, Michael. -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org