Hello! Maybe OpenSUSE packages should be compiled with LDFLAGS=-Wl,-O1 ? Take a look here: http://forums.gentoo.org/viewtopic-t-226909-postdays-0-postorder-asc-start-0... This little trick can give you a pretty nice application startup decrease, maybe even better when used with prelink ? Greets.
Dne pá 11. listopadu 2005 17:38 piotrek napsal(a):
Hello!
Maybe OpenSUSE packages should be compiled with LDFLAGS=-Wl,-O1 ?
Take a look here: http://forums.gentoo.org/viewtopic-t-226909-postdays-0-postorder-asc-start- 0.html
This little trick can give you a pretty nice application startup decrease, maybe even better when used with prelink ?
Have you tested it yourself or do you have some trustworthy numbers? Gentoo users are often pretty good at improving performance just by wishful thinking. I just tried to relink my qt3 build this way and -Wl,-O1 increased the time spent for relocations by almost 50% according to LD_DEBUG=statistics. It's a debug build though, which increases the number of symbols, I have no idea how significantly. -- Lubos Lunak KDE developer --------------------------------------------------------------------- SuSE CR, s.r.o. e-mail: l.lunak@suse.cz , l.lunak@kde.org Drahobejlova 27 tel: +420 2 9654 2373 190 00 Praha 9 fax: +420 2 9654 2374 Czech Republic http://www.suse.cz/
Hello!
Maybe OpenSUSE packages should be compiled with LDFLAGS=-Wl,-O1 ?
Take a look here: http://forums.gentoo.org/viewtopic-t-226909-postdays-0-postorder-asc-star t- 0.html
This little trick can give you a pretty nice application startup decrease, maybe even better when used with prelink ?
Have you tested it yourself or do you have some trustworthy numbers? Gentoo users are often pretty good at improving performance just by wishful thinking.
I just tried to relink my qt3 build this way and -Wl,-O1 increased the time spent for relocations by almost 50% according to LD_DEBUG=statistics. It's a debug build though, which increases the number of symbols, I have no idea how significantly.
Well, I've checked this using amarok and the best benchmarking tool ever known called "time" :) The results are: WITHOUT LDFLAGS: amaroK: [Loader] Don't run gdb, valgrind, etc. against this binary! Use amarokapp. QPainter::setPen: Will be reset by begin() QLayout: Adding KToolBar/mainToolBar (child of QVBox/unnamed) to layout for PlaylistWindow/PlaylistWindow real 0m3.879s user 0m0.132s sys 0m0.020s WITH LDFLAGS: piotrek@ip-152-36:~> time amarok amaroK: [Loader] Starting amarokapp.. amaroK: [Loader] Don't run gdb, valgrind, etc. against this binary! Use amarokapp. QPainter::setPen: Will be reset by begin() QLayout: Adding KToolBar/mainToolBar (child of QVBox/unnamed) to layout for PlaylistWindow/PlaylistWindow real 0m2.199s user 0m0.130s sys 0m0.017s I also use prelink, preloading and 2.6.14-ck3 kernel CFLAGS/CPPFLAGS in my system are: -march=athlon-4 -mtune=athlon-4 -pipe -fomit-frame-pointer -falign-functions=4 -s I think that LDFLAGS are even better than prelinking, because they don't change anything inside the binary, but I still use prlink 'cause I want to tweak all the apps I install using RPMS. And yes, I admit, I am a little nasty speed freak ;) Why don't I use Gentoo then? Well SuSE is just cooler :) Greets.
Dne pá 11. listopadu 2005 19:00 piotrek napsal(a):
Well, I've checked this using amarok and the best benchmarking tool ever known called "time" :)
*cough*
The results are:
WITHOUT LDFLAGS:
amaroK: [Loader] Don't run gdb, valgrind, etc. against this binary! Use amarokapp. QPainter::setPen: Will be reset by begin() QLayout: Adding KToolBar/mainToolBar (child of QVBox/unnamed) to layout for PlaylistWindow/PlaylistWindow
real 0m3.879s user 0m0.132s sys 0m0.020s
If you used /usr/bin/time instead of just bash's time, one of the additional numbers you'd get would be CPU usage. Which would be very low. Run 'LD_DEBUG=statistics amarok' several times, ignore the first result to avoid the effect of reading from the disk and use the average. Moreover, as the debug output suggests, it'd be probably better to use amarokapp. -- Lubos Lunak KDE developer --------------------------------------------------------------------- SuSE CR, s.r.o. e-mail: l.lunak@suse.cz , l.lunak@kde.org Drahobejlova 27 tel: +420 2 9654 2373 190 00 Praha 9 fax: +420 2 9654 2374 Czech Republic http://www.suse.cz/
Hi again! So, I've just performd few small tests using amaroK as my test animal and the results are amazing! Well, the amazing thing is that they're the same for both builds of amaroK - with and without LDFLAGS... Forgive me this rather loooong post, but I just HAD to publish it, it's almost like NASA's TOP SECRET research results :) LDFLAGS set: piotrek@ip-152-36:~> LD_DEBUG=statistics /opt/kde3/bin/amarokapp 25800: 25800: runtime linker statistics: 25800: total startup time in dynamic loader: 499659055 clock cycles 25800: time needed for relocation: 492614988 clock cycles (98.5%) 25800: number of relocations: 27187 25800: number of relocations from cache: 83505 25800: number of relative relocations: 0 25800: time needed to load objects: 6300823 clock cycles (1.2%) QPainter::setPen: Will be reset by begin() QLayout: Adding KToolBar/mainToolBar (child of QVBox/unnamed) to layout for PlaylistWindow/PlaylistWindow STARTUP QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) 25800: 25800: runtime linker statistics: 25800: final number of relocations: 41052 25800: final number of relocations from cache: 98621 piotrek@ip-152-36:~> ############################################### and WITHOUT ANY LDFLAGS: piotrek@ip-152-36:~> LD_DEBUG=statistics /opt/kde3/bin/amarokapp 6838: 6838: runtime linker statistics: 6838: total startup time in dynamic loader: 522070984 clock cycles 6838: time needed for relocation: 514459824 clock cycles (98.5%) 6838: number of relocations: 27187 6838: number of relocations from cache: 83505 6838: number of relative relocations: 0 6838: time needed to load objects: 6659297 clock cycles (1.2%) QPainter::setPen: Will be reset by begin() QLayout: Adding KToolBar/mainToolBar (child of QVBox/unnamed) to layout for PlaylistWindow/PlaylistWindow STARTUP QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) QObject::disconnect: Unexpected null parameter QObject::connect: Cannot connect (null)::activePartChanged( KParts::Part * ) to KHTMLPart::slotActiveFrameChanged( KParts::Part * ) 6838: 6838: runtime linker statistics: 6838: final number of relocations: 41050 6838: final number of relocations from cache: 98621 piotrek@ip-152-36:~> But I'm still thinking about these results gained using the time (BASH version) command... Could anyone (Lubos maybe? :) ) perform similiar test on his/her machine and explain this to me? All my hopes for getting a lightning fast Linux box are now burried... I think I'll just stick to simple and nice prelink :) THX, greets.
Ha! My great discovery has already been discovered and will change the future of OpenSUSE! Here: https://bugzilla.novell.com/show_bug.cgi?id=133462
Dne so 12. listopadu 2005 02:00 piotrek napsal(a):
Ha!
My great discovery has already been discovered and will change the future of OpenSUSE!
How exactly will it achieve that, given that even your own benchmark shows it doesn't improve anything? As for your question about what's wrong with the time "benchmark", amarok is a small application that launches the main amarokapp application and (I assume) waits for it to finish startup before it quits. So time measured mostly how long it takes to wait. And guessing from the big difference in the numbers the first test was probably with cold caches and the second one with warm caches.
-- Lubos Lunak KDE developer --------------------------------------------------------------------- SuSE CR, s.r.o. e-mail: l.lunak@suse.cz , l.lunak@kde.org Drahobejlova 27 tel: +420 2 9654 2373 190 00 Praha 9 fax: +420 2 9654 2374 Czech Republic http://www.suse.cz/
You wrote:
How exactly will it achieve that, given that even your own benchmark shows it doesn't improve anything?
His startup time in dynamic loader went down from 522 million clock cycles to 500 million. That is a 4.4% reduction, not bad. On the other hand, 22 million clock cycles take 0.022 seconds on a 1Ghz, so it is hard to measure and impossible to notice the difference. Maybe during boot time this could add up to an entire second. Cheers nordi
Hi! I think that it really MAY give you a some performance improvements but only if you recompile more packages with this option... I read on that gentoo forum that it gives a quite noticable poerformance gain after running make World on a Gentoo box... So maybe SuSE guys have runned some more extansive benchmarks and it turned out to be a pretty useful trick... I've been thinking of doing some more benchmarks like compiling something bigger - e.g. Mozilla, maybe even recompiling beta version KDE 3.5 and see how it performs comared to standard SuSE KDE 3.4 , but my old Belinea 102020 has died today - everything is RED and it blows my eye balls up :), so I'm gonna have to wait few days to get a new monitor... :) Greets.
I think that it really MAY give you a some performance improvements but only if you recompile more packages with this option... I read on that gentoo forum that it gives a quite noticable poerformance gain after running make World on a Gentoo box...
So maybe SuSE guys have runned some more extansive benchmarks and it turned out to be a pretty useful trick...
I've been thinking of doing some more benchmarks like compiling something bigger - e.g. Mozilla, maybe even recompiling beta version KDE 3.5 and see how it performs comared to standard SuSE KDE 3.4 , but my old Belinea 102020 has died today - everything is RED and it blows my eye balls up :), so I'm gonna have to wait few days to get a new monitor... :) Some days ago I was thinking the same way. I used Gentoo about 1year and a half ago. I thought that using C{XX}FLAGS and LDFLAGS will improve the
Hi, On Saturday 12 November 2005 23:30, piotrek wrote: performance. Well, it improves it but not to much. If you will search the Gentoo forums about optimization using gcc you will find out that the default flags are enough and you don't get to much speed using the optimization flags. After doing some optimized RPMS for amarok and some kde-* I talk with Pascal Bleser about optimization. Here are parts of our conversation log: ### (22:03:58) liviudm: since you definately have much experience than me... how much waste of time realy is? (22:04:00) Pascal Bleser: not much. it's usually not very productive.. you might gain 3% on most apps (22:04:20) Pascal Bleser: on very few apps you might gain 10 to 20%, but it's the exception, not the rule (22:04:32) Pascal Bleser: current gcc optimizations are enough for most things [stuff deleted] (22:18:10) Pascal Bleser: most apps spend their time in I/O and compiler flags don't help at all for that (22:18:36) Pascal Bleser: for some applications that do a lot of computing (scientific, etc...), it makes sense (22:19:21) liviudm: then, why so many (Gentoo) users are using these optimizations? (22:19:39) Pascal Bleser: because they're not very experienced (22:19:55) Pascal Bleser: and because they have a lot of time to loose, apparently, like building a few days to install a linux system ### Now it's up to you to make decisions about optimizing applications. I won't build "insane" optimized apps anymore, only if Andreas Girardet asks me to do it for the SUPER project. The only flags used will be -g -mtune=i686 -march=i686 and eventually -O3 Yours faithfully, -- Damian Mihai Liviu Mobile: +40 741 226993; Fax: +1 347-632-4117 Phone : +1 360-526-6441; +1 347-632-4117; +44 0870-3403339 URL: http://liviudm.blogspot.com
Am Sonntag, 13. November 2005 07:27 schrieb Damian Mihai Liviu:
(...) After doing some optimized RPMS for amarok and some kde-* I talk with Pascal Bleser about optimization. Here are parts of our conversation log:
### (22:03:58) liviudm: since you definately have much experience than me... how much waste of time realy is? (22:04:00) Pascal Bleser: not much. it's usually not very productive.. you might gain 3% on most apps (22:04:20) Pascal Bleser: on very few apps you might gain 10 to 20%, but it's the exception, not the rule (22:04:32) Pascal Bleser: current gcc optimizations are enough for most things [stuff deleted] (22:18:10) Pascal Bleser: most apps spend their time in I/O and compiler flags don't help at all for that (22:18:36) Pascal Bleser: for some applications that do a lot of computing (scientific, etc...), it makes sense (22:19:21) liviudm: then, why so many (Gentoo) users are using these optimizations? (22:19:39) Pascal Bleser: because they're not very experienced (22:19:55) Pascal Bleser: and because they have a lot of time to loose, apparently, like building a few days to install a linux system ###
Now it's up to you to make decisions about optimizing applications. I won't build "insane" optimized apps anymore, only if Andreas Girardet asks me to do it for the SUPER project. The only flags used will be -g -mtune=i686 -march=i686 and eventually -O3
Yours faithfully,
Hmm, 3% faster ´startup´ "on most apps" - cool ;) why not? The question is : Are there negative side effects? mfg
On Sun, Nov 13, 2005 at 10:59:12AM +0100, Michael Lange wrote:
Hmm, 3% faster ´startup´ "on most apps" - cool ;) why not? The question is : Are there negative side effects?
Typically the higher your optimization level the more difficult it becomes to debug a program. The question is always whether an additional optimization is worth the amount of debugability you lose therefor. Robert -- Robert Schiele Tel.: +49-621-181-2214 Dipl.-Wirtsch.informatiker mailto:rschiele@uni-mannheim.de
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Lange wrote:
Am Sonntag, 13. November 2005 07:27 schrieb Damian Mihai Liviu:
(...) After doing some optimized RPMS for amarok and some kde-* I talk with Pascal Bleser about optimization. Here are parts of our conversation log:
### (22:03:58) liviudm: since you definately have much experience than me... how much waste of time realy is? (22:04:00) Pascal Bleser: not much. it's usually not very productive.. you might gain 3% on most apps (22:04:20) Pascal Bleser: on very few apps you might gain 10 to 20%, but it's the exception, not the rule (22:04:32) Pascal Bleser: current gcc optimizations are enough for most things [stuff deleted] ... Hmm, 3% faster ´startup´ "on most apps" - cool ;) why not? The question is : Are there negative side effects?
Exactly. Why not ? because it's barely noticeable.
Because compilation time increases a lot (but that's a minor issue, given you don't make packages
;P), because it may affect the behaviour of the application, because the developers don't test their
software with such insane optimizations.
And because it's not worth spending hours of building, testing, (proper) benchmarking and
investigation on optimization flags when you barely win a few milliseconds on startup, if at all.
Let's take amarok as an example. You don't have to be an expert to see that the slowest things it
does are:
- - communication with the database (sqlite, mysql, postgresql): I/O and networking (or named pipes)
- - read mp3 files and stream them into the sound driver: I/O
- - converting compressed formats such as mp3 and ogg into wav: CPU
Actually, that format converting stuff is not even done by amarok itself, but by mad, libogg, arts,
xine, gstreamer. So, applying optimizations to the amarok build only affects the 2 other points.
Do you think that other LDFLAGS or CFLAGS will affect I/O or networking ?
Not at all, that's done in the kernel.
I didn't run benchmarks as, to me, it's a waste of time, but I would already guess that most of the
time, amarok is doing I/O waits. You want to spend hours, days of work to optimize maybe... 10% of
the application, that doesn't do I/O and I/O waits ?
Basically, that's my argumentation when I say it's a waste of time.
But hey, that's just my very personal opinion and optimization + benchmarking (at least on that low
level) is certainly not my domain of expertise, so I might be wrong here and there.
cheers
- --
-o) Pascal Bleser http://linux01.gwdg.de/~pbleser/
/\\
Am Sonntag, 13. November 2005 13:47 schrieb Pascal Bleser:
(...) because it may affect the behaviour of the application, because the developers don't test their software with such insane optimizations. And because it's not worth spending hours of building, testing, (proper) benchmarking and investigation on optimization flags when you barely win a few milliseconds on startup, if at all.
(...) Just for my interest...
Do you have a example of "code/programm" that run without the LDFLAGS "-Wl,-O1" and dont run with it? mfg
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Lange wrote:
Am Sonntag, 13. November 2005 13:47 schrieb Pascal Bleser:
(...) because it may affect the behaviour of the application, because the developers don't test their software with such insane optimizations. And because it's not worth spending hours of building, testing, (proper) benchmarking and investigation on optimization flags when you barely win a few milliseconds on startup, if at all.
(...) Just for my interest...
Do you have a example of "code/programm" that run without the LDFLAGS "-Wl,-O1" and dont run with it?
No, as said, I don't spend time on that. But CFLAGS may affect the application and its runtime
behaviour. I meant that for compiler optimization, as a whole.
If -Wl,-O1 is an optional flag and must specifically be passed to the linker... and it that doesn't,
never, ever affect the binaries that result from it.. why isn't it the default behaviour then ?
cheers
- --
-o) Pascal Bleser http://linux01.gwdg.de/~pbleser/
/\\
Am Sonntag, 13. November 2005 17:30 schrieb Pascal Bleser:
No, as said, I don't spend time on that. But CFLAGS may affect the application and its runtime behaviour. I meant that for compiler optimization, as a whole.
If -Wl,-O1 is an optional flag and must specifically be passed to the linker... and it that doesn't, never, ever affect the binaries that result from it.. why isn't it the default behaviour then ?
Do you compile a program with of the same compiler version that the developers used? -- How many compilers you have installed? ;) You uses another compiler? And what is with the "run time behavior"? ... mfg
Dne so 12. listopadu 2005 22:30 piotrek napsal(a):
Hi!
I think that it really MAY give you a some performance improvements but only if you recompile more packages with this option
You think it may? Hmm. Look, go to http://ktown.kde.org/akademy2005/unprocessed/ and find the talk which has "performance" in its name, the slides for it can be found at http://conference2005.kde.org/sched-devconf.php . I especially recommend the part with the picture of Odie, it should be somewhen in the first third of the talk. Until you realize why that sentence above so funny reasoning you're very likely just wasting (not only) your time. Working on performance is difficult and just thinking that something may doesn't count, not even from me.
... I read on that gentoo forum that it gives a quite noticable poerformance gain after running make World on a Gentoo box...
Oh please.
So maybe SuSE guys have runned some more extansive benchmarks and it turned out to be a pretty useful trick...
My guess is rather that those extensive benchmarks were more like "uhm, ok, looks sensible" (because it looks sensible after all) and it was applied. I'll talk to the person who did it and unless I see some good proof it improves the situation it'll get most probably reverted. See also http://people.redhat.com/drepper/dsohowto.pdf , the note at the top of page 7 for some more details. -- Lubos Lunak KDE developer --------------------------------------------------------------------- SuSE CR, s.r.o. e-mail: l.lunak@suse.cz , l.lunak@kde.org Drahobejlova 27 tel: +420 2 9654 2373 190 00 Praha 9 fax: +420 2 9654 2374 Czech Republic http://www.suse.cz/
Dne ne 13. listopadu 2005 12:06 Lubos Lunak napsal(a):
I especially recommend the part with the picture of Odie.
Just in order to avoid a possible misunderstanding in case the text in the picture is readable, it is completely irrelevant. -- Lubos Lunak KDE developer --------------------------------------------------------------------- SuSE CR, s.r.o. e-mail: l.lunak@suse.cz , l.lunak@kde.org Drahobejlova 27 tel: +420 2 9654 2373 190 00 Praha 9 fax: +420 2 9654 2374 Czech Republic http://www.suse.cz/
participants (7)
-
Damian Mihai Liviu
-
Lubos Lunak
-
Michael Lange
-
nordi@addcom.de
-
Pascal Bleser
-
piotrek
-
Robert Schiele