[opensuse-factory] [boottime] The effect of preload
Hi, I spent some time optimizing preload (still not done), but I wanted to share two bootcharts for my laptop: one with preload and one without: http://en.opensuse.org/Image:Bootchart-nopreload.png (44s) http://en.opensuse.org/Image:Bootchart-preload.png (34s) Otherwise the systems are exactly the same. The only difference is a mv /sbin/preload{,.away} - both boot into an autologin icewm-lite. I let you know when I submit it to factory. For now I collect patches in home:coolo:Factory Greetings, Stephan -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Hi Stephan, On Mon, 2009-01-26 at 11:01 +0100, Stephan Kulow wrote:
I spent some time optimizing preload (still not done), but I wanted to share two bootcharts for my laptop: one with preload and one without:
Interesting :-) A few questions; * how does preload differ from sreadahead ? * do you take the moblin route: + of running preload asynchronously at the lowest I/O priority + of growing /sys/proc/sda/queue/nr_requests to 1024+ [ supposedly so the fairness code works ;-) Thanks, Michael. -- michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Am Montag 26 Januar 2009 schrieb Michael Meeks: Hi Michael, > * how does preload differ from sreadahead ? sreadahead: - does open, seek and readahead - reads parts sync - does 4 threads with low I/O priority detached - uses boot order (by means of kernel patch for ext3) - needs pregenerated lists from boots not using sreadahead - there is only one sreadhead started very early preload: - does stat, open and fadvise - reads whole file async - does one process with high I/O priority blocking - uses predefined list that can be regenerated without changing boot - there are several preload phases All in all: they differ more than they have in common. preload is pretty close to prefetch though, but doesn't require kernel patches. I guess sreadahead is helping with certain SSDs only, there it's necessary to get the kernel to create throughput as quickly as possible and seeks don't hurt. That's different to my laptop hdd. According to the sources, sreadahead doesn't need the kernel patch, it only optimzes the order. But still, calling sreadahead early doesn't improve my boot time. And it's much harder to deploy too, more suited to show cases ;) > * do you take the moblin route: > + of running preload asynchronously at the lowest > I/O priority > + of growing /sys/proc/sda/queue/nr_requests to > 1024+ [ supposedly so the fairness code works ;-) Doesn't change _anything_ - it defaults to 128 and the queue never gets that long. At least neither with sreadahead nor with preload. And yes, I tested (of course I used the correct the name - around /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/block/sda/queue/nr_requests, there is no /sys/proc). But nice anecdote. I also tested different elevators. All others (ant11y, noop, deadline) are slower than cfq with preload. Greetings, Stephan -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Mon, 2009-01-26 at 10:58 +0100, Stephan Kulow wrote:
I guess sreadahead is helping with certain SSDs only, there it's necessary to get the kernel to create throughput as quickly as possible and seeks don't hurt. That's different to my laptop hdd. According to the sources, sreadahead doesn't need the kernel patch, it only optimzes the order. But still, calling sreadahead early doesn't improve my boot time. And it's much harder to deploy too, more suited to show cases ;)
We have a patched kernel in devel:playground:fastboot now, in case you feel like testing it. I'd guess using an unsorted block list can really hurt HDD performance. I find sreadahead interesting, since it loads individual blocks, not entire files. I can test it on SSD quickly if you have a package I can drop into 11.1. I'm also interested in testing preload on SSD. -- Hans Petter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Am Montag 26 Januar 2009 schrieb Hans Petter Jansson:
I find sreadahead interesting, since it loads individual blocks, not entire files. I can test it on SSD quickly if you have a package I can Huh? What sreadahead do you have?
sreadahead uses readahead with what mincore returns and that's pretty much always the full file. Greetings, Stephan -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Mon, 2009-01-26 at 13:14 +0100, Stephan Kulow wrote:
Am Montag 26 Januar 2009 schrieb Hans Petter Jansson:
I find sreadahead interesting, since it loads individual blocks, not entire files. I can test it on SSD quickly if you have a package I can
Huh? What sreadahead do you have?
sreadahead uses readahead with what mincore returns and that's pretty much always the full file.
It uses mincore() to discover which parts of each file are in-core, so it can avoid preloading blocks that weren't needed during the discovery boot. As for how much that is, YMMV, but it helps with files that aren't being read wholesale. -- Hans Petter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Hi Stephan, On Mon, 2009-01-26 at 10:58 +0100, Stephan Kulow wrote:
* how does preload differ from sreadahead ?
Wow - thanks :-) nice comparison; so to re-frame it: * similarities: + both pre-load only file data => reading the inodes, crawling the directory structures etc. is all done synchronously, without much parallelism or I/O sorting [ modulo sreadahead's 4x threads ]. * differences: + sreadahead lets boot continue while pre-fetching to interleave CPU/sleep-intensive loads [ eg. boot.udev+ in your chart ], preload instead defers the work so we get better seek behaviour on rotating media. + sreadahead only forces in the parts of the files we know are used, preload forces in the whole file - in practise this makes no difference [you assert]. + there are several phases of preload, a single phase for sreadahead. another might be: + sreadahead-pack is slow, opening ~all files on the system to call 'mincore' on them all [ presumably also vandalising it's results to some degree while doing so ;-] Other queries:
- uses boot order (by means of kernel patch for ext3) - needs pregenerated lists from boots not using sreadahead vs. - uses predefined list that can be regenerated without changing boot
So - preload allows you to re-generate when booting with preload ? that sounds pretty neat - how do you elide I/O caused by preload itself ? by process-id [ it seems the tools parse strace output to generate the preload lists ]. Reading the preload code, it looks rather nice :-) I guess my only concern is keeping the preload data itself up-to-date: apparently we don't ship it in SLED11, and eg. my /etc/preload.d/OpenOffice is obsolete. At some level, it seems a shame that we cannot ask the kernel to dump all block-level I/O generated post boot, elide pages un-touched since they were pre-loaded [ presumably that is ~easy enough to detect ], do some quick & dirty sort on that and save it for next boot. Presumably that would require work, a new API etc. Naturally, that's not going to work wonderfully for machines with RAM-size << boot-time-working-set, but simple & no worse than sreadahead.
That's different to my laptop hdd. According to the sources, sreadahead doesn't need the kernel patch, it only optimzes the order. But still, calling sreadahead early doesn't improve my boot time. And it's much harder to deploy too, more suited to show cases ;)
Heh :-) well, I can believe sreadahead is optimised for SSDs. Having said that - I don't see why preload shouldn't work just as nicely on SSDs - but there (of course), it would make perfect sense to do the I/O at a low priority in the background surely (?) As a crazy idea - do you think SSDs and Rotating media are converses - ie. if in an SSD world, it makes sense to run preload at a really low I/O priority; perhaps in a rotating world, it makes sense to run preload at an incredibly high priority [ while letting boot continue in parallel at the worlds lowest I/O prio ] ?
* do you take the moblin route: + of running preload asynchronously at the lowest I/O priority + of growing /sys/proc/sda/queue/nr_requests to 1024+ [ supposedly so the fairness code works ;-)
Doesn't change _anything_ - it defaults to 128 and the queue never gets that long. At least neither with sreadahead nor with preload. And yes, I tested (of course I used the correct the name - around /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/block/sda/queue/nr_requests,
Ah-well, some missing punctuation fluff, mine is: ;-) /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/queue/nr_requests Out of interest, how long does the queue get ?
But nice anecdote.
It was an anecdote ? let me make it one : Arjan said this was a good idea, what should I know ? ;-) Looking at your boot-chart; it seems you're using blktace to profile the first few preload runs, and stapio for the later ones, yet prepare_preload seems to work on strace output - is there a new way to prepare the preload output. Thanks, Michael. -- michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Am Mittwoch 28 Januar 2009 schrieb Michael Meeks:
Hi Stephan,
On Mon, 2009-01-26 at 10:58 +0100, Stephan Kulow wrote:
* how does preload differ from sreadahead ?
Wow - thanks :-) nice comparison; so to re-frame it:
* similarities: + both pre-load only file data => reading the inodes, crawling the directory structures etc. is all done synchronously, without much parallelism or I/O sorting [ modulo sreadahead's 4x threads ]. Hmm, preload does stats too, didn't I say that?
* differences: + sreadahead lets boot continue while pre-fetching to interleave CPU/sleep-intensive loads [ eg. boot.udev+ in your chart ], preload instead defers the work so we get better seek behaviour on rotating media. Right. the eepc might not have the bad seek performance laptops have.
+ sreadahead only forces in the parts of the files we know are used, preload forces in the whole file - in practise this makes no difference [you assert]. sreadahead-pack has a -d switch that will tell you how much is in mincore and I get all 100%
+ there are several phases of preload, a single phase for sreadahead. Yes, and these are still too few. I'm working on preloadNG :)
So - preload allows you to re-generate when booting with preload ? that sounds pretty neat - how do you elide I/O caused by preload itself ? by process-id [ it seems the tools parse strace output to generate the preload lists ]. Yes, preload execs are out when looking at the pattern.
Reading the preload code, it looks rather nice :-) I guess my only concern is keeping the preload data itself up-to-date: apparently we don't ship it in SLED11, and eg. my /etc/preload.d/OpenOffice is obsolete. I know and I won't maintain these preload lists as they are. As you noticed yourself below, I work on a new idea.
As a crazy idea - do you think SSDs and Rotating media are converses - ie. if in an SSD world, it makes sense to run preload at a really low I/O priority; perhaps in a rotating world, it makes sense to run preload at an incredibly high priority [ while letting boot continue in parallel at the worlds lowest I/O prio ] ?
It wouldn't suprise me if there are SSDs that have a bank switching time and create a third class ;)
* do you take the moblin route: + of running preload asynchronously at the lowest I/O priority + of growing /sys/proc/sda/queue/nr_requests to 1024+ [ supposedly so the fairness code works ;-)
Doesn't change _anything_ - it defaults to 128 and the queue never gets that long. At least neither with sreadahead nor with preload. And yes, I tested (of course I used the correct the name - around /sys/devices/pci0000:00/0000:00:1f.2/host2/target2:0:0/2:0:0:0/block/sda/ queue/nr_requests,
Ah-well, some missing punctuation fluff, mine is: ;-)
/sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/qu eue/nr_requests
Out of interest, how long does the queue get ?
No idea, but if it would be larger than 128, 1024 would have made a difference ;)
But nice anecdote.
It was an anecdote ? let me make it one : Arjan said this was a good idea, what should I know ? ;-)
Looking at your boot-chart; it seems you're using blktace to profile the first few preload runs, and stapio for the later ones, yet prepare_preload seems to work on strace output - is there a new way to prepare the preload output.
Don't get too close, you might burn your fingers ;) Greetings, Stephan -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Mon, 2009-01-26 at 09:17 +0000, Michael Meeks wrote:
* do you take the moblin route: + of running preload asynchronously at the lowest I/O priority + of growing /sys/proc/sda/queue/nr_requests to 1024+ [ supposedly so the fairness code works ;-)
By the way, I guess this isn't as effective if you only have one or two processes blocking on reads - won't they have only one outstanding request each at any given time? Readahead notwithstanding, has anyone tried to make apps that need to read lots of small files, read each in a separate thread simultaneously to maximize the elevator benefits (thinking especially about GConf here - AFAIK it's currently serializing concurrent requests)? Relatedly, I'm packaging sreadahead in devel:playground:fastboot now - the kernel-source package in there has the requisite kernel patch. -- Hans Petter -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Stephan Kulow wrote:
I spent some time optimizing preload (still not done), but I wanted to share two bootcharts for my laptop: one with preload and one without: Now that looks very nice! But it also shows that booting is still largely IO-bound. CPU load remains below 50% for a big part of the time, which means the second core is essentially idle. Would be interesting to run this with only one active core and see how IO-bound it really is and how much the second core is helping us.
What I find interesting is the time after 3.5 seconds. It seems that only blogd is running, but creates neither CPU nor disk load. Since fsck.ext3 has already finished, am I correct in the assumption that / is already mounted? In that case there would be ~1 second between the end of fsck and the end of blogd where not much is happening. That should be enough to completely preload the relevant parts of /etc and /bin, which have ~30MB on my system (not counting big stuff like /etc/cups/yes/ppds.dat or /bin/vim-normal). Doing this should also alleviate delays caused by reading many small files, e.g. by gconf as mentioned by Hans Petter. But it would need a change in the initrd if I'm not mistaken. Hmmm... Talking about the very beginning, what is init doing between 4.5 and 6.5 seconds? Looks like it is busy waiting for IO to complete, which can also be sped up by preloading in the initrd already. Regards nordi -- Spam protection: All mail to me that does not contain the string "suse" goes to /dev/null. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Il lunedì 26 gennaio 2009, Stephan Kulow scrisse:
Hi,
I spent some time optimizing preload (still not done), but I wanted to share two bootcharts for my laptop: one with preload and one without: http://en.opensuse.org/Image:Bootchart-nopreload.png (44s) http://en.opensuse.org/Image:Bootchart-preload.png (34s)
Otherwise the systems are exactly the same. The only difference is a mv /sbin/preload{,.away} - both boot into an autologin icewm-lite.
I let you know when I submit it to factory. For now I collect patches in home:coolo:Factory What did you change ? Files in /etc/preload.d/ are generated from /var/cache/preload, right ? If so I think that rebuilding those files on installed system might help.. I'd like to try but you should post a very mini how-to about it... Bye. Daniele.
-- *** Linux user # 198661 ---_ ICQ 33500725 *** *** Home http://www.kailed.net *** *** Powered by openSUSE *** -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Il lunedì 26 gennaio 2009, Stephan Kulow scrisse:
Hi,
I spent some time optimizing preload (still not done), but I wanted to share two bootcharts for my laptop: one with preload and one without: http://en.opensuse.org/Image:Bootchart-nopreload.png (44s) http://en.opensuse.org/Image:Bootchart-preload.png (34s) Uh, here is better without preload :/ http://www.kailed.net/nopreload-bootchart.png (34s) http://www.kailed.net/preload-bootchart.png (48s)
Well, I removed some services e some preload files. I have only: gdm kde kdm kdm.auto Autologin in kde-4.1.3. Hardware spec: Old athlon 2800XP @2200Mhz HD PATA 1Gb RAM Partitions: / and /home - ext3 mounted with relatime,barrier=0 /boot - ext2 /local - reiserfs with noatime But it's the first time that i run bootchart, so help me to understand: with preload there is a "hole" between 21s and 31s. It seems that the culprit is "hald-probe-volu", right ? I don't see the link between preload and "hald-probe-volu" :/ Bye. -- *** Linux user # 198661 ---_ ICQ 33500725 *** *** Home http://www.kailed.net *** *** Powered by openSUSE *** -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Am Montag 26 Januar 2009 schrieb Daniele:
Il lunedì 26 gennaio 2009, Stephan Kulow scrisse:
Hi,
I spent some time optimizing preload (still not done), but I wanted to share two bootcharts for my laptop: one with preload and one without: http://en.opensuse.org/Image:Bootchart-nopreload.png (44s) http://en.opensuse.org/Image:Bootchart-preload.png (34s)
Uh, here is better without preload :/ http://www.kailed.net/nopreload-bootchart.png (34s) http://www.kailed.net/preload-bootchart.png (48s)
Well, I removed some services e some preload files. I have only: gdm kde kdm kdm.auto
Autologin in kde-4.1.3.
Hardware spec: Old athlon 2800XP @2200Mhz HD PATA 1Gb RAM
Partitions: / and /home - ext3 mounted with relatime,barrier=0 /boot - ext2 /local - reiserfs with noatime
But it's the first time that i run bootchart, so help me to understand: with preload there is a "hole" between 21s and 31s. It seems that the culprit is "hald-probe-volu", right ? I don't see the link between preload and "hald-probe-volu" :/
That looks like a volume did not react in time - i.e. you had pure luck it didn't happen in your nopreload boot. But yes, there is a reason I work on the way preload works. Greetings, Stephan -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
participants (5)
-
Daniele
-
Hans Petter Jansson
-
Michael Meeks
-
nordi
-
Stephan Kulow