[opensuse-wiki] Performance Options
Hello Everyone, I was looking at some site stats, and right now Google is reporting that the wiki pages are taking an average of between 2.5 and 3.5 seconds to load. This is slower than 50%+ for all sites. I am hoping to cut this down to under 2 seconds, which would put us in the 25% range or better. Here are a couple of options to do that: Localization Caching - There is no conceivable reason to not do this, and it should provide a small speed increase. Enable Gzip - This would help out a little, and there shouldn't be much risk to it. File Caching - This would provide a major boost in speed. This alone could take nearly a second off page loading time. However, this does have a drawback. Taken from the manual page: "The file cache tends to cache aggressively; there is no set expiry date for the cached pages and pages are cached unconditionally even if they contain variables, extensions and other changeable output. Some extensions disable file cache for pages with dynamic content." How badly would this affect us? It would be easy enough to cron up a refresh of this cache if that would help. Combine external files - Perhaps some of the skin designers might want to look at this. Right now, each page requires up to 13 CSS files and 9 Javascript files, a good portion of which are on static.opensuse.org. Combining some of these would make a noticeable difference. Caching Proxy (Squid) - An alternative to file caching and a viable long term option. Would require some big changes and probably not justifiable right now... I can work on the first two, but I would like to hear some input on the last three. In particular, I want to see if the file caching option is viable, since it has the largest potential right now.
On Tue, Aug 17, 2010 at 18:06, Matthew Ehle wrote:
File Caching - This would provide a major boost in speed. This alone could take nearly a second off page loading time. However, this does have a drawback. Taken from the manual page: "The file cache tends to cache aggressively; there is no set expiry date for the cached pages and pages are cached unconditionally even if they contain variables, extensions and other changeable output. Some extensions disable file cache for pages with dynamic content." How badly would this affect us? It would be easy enough to cron up a refresh of this cache if that would help.
Combine external files - Perhaps some of the skin designers might want to look at this. Right now, each page requires up to 13 CSS files and 9 Javascript files, a good portion of which are on static.opensuse.org. Combining some of these would make a noticeable difference.
Caching Proxy (Squid) - An alternative to file caching and a viable long term option. Would require some big changes and probably not justifiable right now...
I can work on the first two, but I would like to hear some input on the last three. In particular, I want to see if the file caching option is viable, since it has the largest potential right now.
APC Caching can show a major improvement in MediaWikis... at least 4x. If APC is not turned on, it probably should be. C. -- To unsubscribe, e-mail: opensuse-wiki+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-wiki+help@opensuse.org
APC Caching can show a major improvement in MediaWikis... at least 4x. If APC is not turned on, it probably should be.
APC opcode caching was added for all the openSUSE blogs and wikis back in November, and it has been a major improvement. I shudder to think of how the new wiki would run without it. However, this does bring up another point. Right now, we cache variables with memcached. We could switch variable caching to APC, which is arguably faster. This would most likely be a very minor improvement, but it is a possibility. Minor Correction From Last Email: Localization caching is a 1.16 feature. The 1.16 release is slightly slower than 1.15 by default, and enabling localization cache will put it back on the same level.
On 2010-08-17 11:14:03 -0600, Matthew Ehle wrote:
APC Caching can show a major improvement in MediaWikis... at least 4x. If APC is not turned on, it probably should be.
APC opcode caching was added for all the openSUSE blogs and wikis back in November, and it has been a major improvement. I shudder to think of how the new wiki would run without it.
However, this does bring up another point. Right now, we cache variables with memcached. We could switch variable caching to APC, which is arguably faster. This would most likely be a very minor improvement, but it is a possibility.
I highly doubt that apc cache is much faster than memcached. In a proper config you got one memcached per webserver, and configure all memcacheds in the wiki. a) you got shared storage, no matter on which webserver you get the request. I would assume that the APC is local to the machine. b) you can scale to much more cache with memcached than with APC. c) memcached is fast enough for wikipedia and even facebook (although they use it with udp in the mean time) darix -- openSUSE - SUSE Linux is my linux openSUSE is good for you www.opensuse.org -- To unsubscribe, e-mail: opensuse-wiki+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-wiki+help@opensuse.org
Marcus Rueckert <darix@opensu.se> 8/17/2010 12:03 PM >>> On 2010-08-17 11:14:03 -0600, Matthew Ehle wrote: APC Caching can show a major improvement in MediaWikis... at least 4x. If APC is not turned on, it probably should be.
APC opcode caching was added for all the openSUSE blogs and wikis back in November, and it has been a major improvement. I shudder to think of how the new wiki would run without it.
However, this does bring up another point. Right now, we cache variables with memcached. We could switch variable caching to APC, which is arguably faster. This would most likely be a very minor improvement, but it is a possibility.
I highly doubt that apc cache is much faster than memcached.
It mostly depends on how the wiki is used (amount of read vs. write), but the general consensus is that APC is slightly faster for most situations.
In a proper config you got one memcached per webserver, and configure all memcacheds in the wiki.
Each web server has its own memcached daemon, which is used by only that web server. We can always pool them, but it would make very little difference for our setup.
a) you got shared storage, no matter on which webserver you get the request. I would assume that the APC is local to the machine. Yes, and that is why it is slightly faster. APC is accessed locally, while memcached is accessed through a network socket, even on localhost.
b) you can scale to much more cache with memcached than with APC. Yes, but we don't have enough data for that to be relevant quite yet. I run memcached to store up to 1GB of data, but I don't think we even come close to hitting that.
c) memcached is fast enough for wikipedia and even facebook (although they use it with udp in the mean time)
Agreed. I think that it is a perfectly acceptable solution. I just wanted to bring up the APC alternative as a possibility. I might have to look into running memcached with UDP, as that might cut some of the network overhead.
On 2010-08-17 12:37:01 -0600, Matthew Ehle wrote:
I highly doubt that apc cache is much faster than memcached.
It mostly depends on how the wiki is used (amount of read vs. write), but the general consensus is that APC is slightly faster for most situations.
As the former co maintainer of the wiki I know very well that the cache access was the fastest part of it. :)
In a proper config you got one memcached per webserver, and configure all memcacheds in the wiki.
Each web server has its own memcached daemon, which is used by only that web server. We can always pool them, but it would make very little difference for our setup.
well, except that you cache things multiple times and might hit stale caches that way. I would recommend to configure it to use all running mecacheds
a) you got shared storage, no matter on which webserver you get the request. I would assume that the APC is local to the machine. Yes, and that is why it is slightly faster. APC is accessed locally, while memcached is accessed through a network socket, even on localhost.
1. access via localhost is shortcutted in the kernel for tcp. so if you only use it via localhost the overhead is pretty low.
b) you can scale to much more cache with memcached than with APC. Yes, but we don't have enough data for that to be relevant quite yet. I run memcached to store up to 1GB of data, but I don't think we even come close to hitting that.
c) memcached is fast enough for wikipedia and even facebook (although they use it with udp in the mean time)
Agreed. I think that it is a perfectly acceptable solution. I just wanted to bring up the APC alternative as a possibility. I might have to look into running memcached with UDP, as that might cut some of the network overhead.
if you want to keep the localhost only solution, the recent memcached versions support listening on a unix domain socket (see -s param). anyway ... I think we are far away from the workload that warrants the switch to udp. darix -- openSUSE - SUSE Linux is my linux openSUSE is good for you www.opensuse.org -- To unsubscribe, e-mail: opensuse-wiki+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-wiki+help@opensuse.org
Marcus Rueckert <darix@opensu.se> 8/17/2010 12:44 PM >>> On 2010-08-17 12:37:01 -0600, Matthew Ehle wrote: I highly doubt that apc cache is much faster than memcached.
It mostly depends on how the wiki is used (amount of read vs. write), but the general consensus is that APC is slightly faster for most situations.
As the former co maintainer of the wiki I know very well that the cache access was the fastest part of it. :)
In a proper config you got one memcached per webserver, and configure all memcacheds in the wiki.
Each web server has its own memcached daemon, which is used by only that web server. We can always pool them, but it would make very little difference for our setup.
well, except that you cache things multiple times and might hit stale caches that way. I would recommend to configure it to use all running mecacheds
It's true that it might not be the most efficient use of memory, but there is tons to spare on these servers. As for stale caches, this hasn't really been an issue unless there has been a outage or schema change in the database, and that would be an issue no matter how we have memcached set up.
a) you got shared storage, no matter on which webserver you get the request. I would assume that the APC is local to the machine. Yes, and that is why it is slightly faster. APC is accessed locally, while memcached is accessed through a network socket, even on localhost.
1. access via localhost is shortcutted in the kernel for tcp. so if you only use it via localhost the overhead is pretty low.
Yep, exactly why I have been running memcached in the way that I have. It cuts out a lot of latency. My understanding is that even localhost binds still carry some latency.
if you want to keep the localhost only solution, the recent memcached versions support listening on a unix domain socket (see -s param).
Yes! Thank you, that pretty much takes care of any possible advantage that APC would have. The nice thing here is that if we need to scale up, it would be a very easy change to run memcached back in network mode.
anyway ... I think we are far away from the workload that warrants the switch to udp.
In all honesty, we are only talking about differences in the milliseconds anyways. The time used by a TCP handshake is pretty small when compared to the caching and compression options. However, why let a perfectly good LoadRunner license go to waste? :)
Hello, on Dienstag, 17. August 2010, Matthew Ehle wrote:
I was looking at some site stats, and right now Google is reporting that the wiki pages are taking an average of between 2.5 and 3.5 seconds to load.
Hmm, Firebug (on Firefox 3.6) tells me something around 2.0 seconds (main page, full reload) and 1.1 seconds for a text-only content page with filled cache. Still, that can and should be optimized. Caching issues I noticed when loading a page with filled cache (aka "click a link in the wiki"): The following files - http://en.opensuse.org/skins/common/shared.css?207 - http://en.opensuse.org/skins/common/commonPrint.css?207 - http://en.opensuse.org/skins/common/wikibits.js?207 - http://en.opensuse.org/skins/common/ajax.js?207 - http://en.opensuse.org/extensions/FlaggedRevs/flaggedrevs.css?56 - http://en.opensuse.org/extensions/FlaggedRevs/flaggedrevs.js?56 are requested at every page load, even if they don't change. The server always answers with "304 Not Modified". The problem seems to be the ?207 which probably has the intend to allow a long client-side cache lifetime, while being able to deliver a changed file quickly. The HTTP headers look similar with and without the ?207 - maybe Firefox does a revalidation on every file with ?foo... At least Firefox does _not_ do the revalidation if I remove the ?207 ;-) Proposed solution: remove the ?207 and modify the filename if needed (shared.207.css instead of shared.css?207). This could probably even be done with mod_rewrite if you don't want to keep shared.css on the webserver. Saving: 4 requests, 0.25 seconds Another problem is the Google Analytics pixel - it adds at least 0.1 seconds till the onLoad event triggers. Just for fun, I added 127.0.0.1 www.google-analytics.com to /etc/hosts. Besides a better privacy, this resulted in savings between 0.1 and 0.5 (!) seconds. Yes, statistics are nice - but are they worth a longer page load time? Maybe the google JS can be modified to run at onLoad or so - this would mean it doesn't hurt load time. Googling for "js defer google analytics" brought up this: http://phasetwo.org/post/defer-google-analytics-until-after-page- load.html Looks promising ;-) but I didn't test it. Besides that: I'd prefer a statistics tool hosted at openSUSE or Novell (maybe Piwik?) over google.
File Caching - This would provide a major boost in speed. [...]
That's worth a try for sure ;-) Possible disadvantage besides what you mentioned: maybe (I did not test it) the "This page was viewed n times" counter might not get updated. Regarding the aggressively caching: I'd say let's just try it and check what breaks ;-) The cron job to cleanup the file cache could move the files instead of deleting them, and make a diff at the next run. This should help to identify issues with over-cached pages. In the beginning, I'd let the cronjob delete files older than an hour. That should reduce the impact if something is accidently cached.
Combine external files - Perhaps some of the skin designers might want to look at this. Right now, each page requires up to 13 CSS files and 9 Javascript files, a good portion of which are on static.opensuse.org. Combining some of these would make a noticeable difference.
Yes, very good point. We could even safe some more requests by using CSS sprites (combine multiple images/icons etc. into one file and use background-position to display the right one).
Caching Proxy (Squid) - An alternative to file caching and a viable long term option. Would require some big changes and probably not justifiable right now...
This would basically come with the same problems as the mediawiki file cache - and I doubt squid is much faster than the file cache. Regards, Christian Boltz --
Wozu braucht root einen Browser? Fürs Internet? *wuuhahaha* Der Tag faengt echt gut an... Gleich so ein Highlight am Morgen, herrlich... [> Marcel Stein und Thomas Hertweck in suse-linux] -- To unsubscribe, e-mail: opensuse-wiki+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-wiki+help@opensuse.org
Hmm, Firebug (on Firefox 3.6) tells me something around 2.0 seconds (main page, full reload) and 1.1 seconds for a text-only content page with filled cache. Still, that can and should be optimized.
The Google stats average out all the visitors (at least those with Chrome), and probably some of them have slower internet connections. That would be my guess.
Caching issues I noticed when loading a page with filled cache (aka "click a link in the wiki"): The following files
- http://en.opensuse.org/skins/common/shared.css?207 - http://en.opensuse.org/skins/common/commonPrint.css?207 - http://en.opensuse.org/skins/common/wikibits.js?207 - http://en.opensuse.org/skins/common/ajax.js?207 - http://en.opensuse.org/extensions/FlaggedRevs/flaggedrevs.css?56 - http://en.opensuse.org/extensions/FlaggedRevs/flaggedrevs.js?56
are requested at every page load, even if they don't change. The server always answers with "304 Not Modified".
The problem seems to be the ?207 which probably has the intend to allow a long client-side cache lifetime, while being able to deliver a changed file quickly.
The HTTP headers look similar with and without the ?207 - maybe Firefox does a revalidation on every file with ?foo... At least Firefox does _not_ do the revalidation if I remove the ?207 ;-)
Proposed solution: remove the ?207 and modify the filename if needed (shared.207.css instead of shared.css?207). This could probably even be done with mod_rewrite if you don't want to keep shared.css on the webserver.
Saving: 4 requests, 0.25 seconds
I think you are on the right track with that. I was wondering the same thing, but I was looking on the server side and couldn't see anything that was out of the ordinary. Cache directives are based on content type rather than file names/extensions, so there should be no reason Apache would be affected.
Another problem is the Google Analytics pixel - it adds at least 0.1 seconds till the onLoad event triggers.
Just for fun, I added 127.0.0.1 www.google-analytics.com to /etc/hosts. Besides a better privacy, this resulted in savings between 0.1 and 0.5 (!) seconds.
Yes, statistics are nice - but are they worth a longer page load time?
Maybe the google JS can be modified to run at onLoad or so - this would mean it doesn't hurt load time.
Googling for "js defer google analytics" brought up this: http://phasetwo.org/post/defer-google-analytics-until-after-page- load.html Looks promising ;-) but I didn't test it.
Awesome! Funny how Google doesn't complain about itself in its page suggestions. I have noticed requests to Google at the bottom of the browser when visiting the wiki, so there must be at least some appreciable amount of time for those requests.
File Caching - This would provide a major boost in speed. [...]
That's worth a try for sure ;-)
Possible disadvantage besides what you mentioned: maybe (I did not test it) the "This page was viewed n times" counter might not get updated. That is very likely the case. I actually learned somewhere that disabling the counter entirely can shave off a decent amount of time, but who would want to get rid of something cool like that? ; )
Regarding the aggressively caching: I'd say let's just try it and check what breaks ;-) The cron job to cleanup the file cache could move the files instead of deleting them, and make a diff at the next run. This should help to identify issues with over-cached pages.
In the beginning, I'd let the cronjob delete files older than an hour. That should reduce the impact if something is accidently cached. That is a possibility. I'll have to think about this.
Caching Proxy (Squid) - An alternative to file caching and a viable long term option. Would require some big changes and probably not justifiable right now...
This would basically come with the same problems as the mediawiki file cache - and I doubt squid is much faster than the file cache.
It probably isn't any faster, and I forgot that iChain also does at least some web acceleration. I think the reason that MediaWiki recommends it is that they provide some level of integration with Squid. I don't know the details of it though. We can probably bury the idea for now.
participants (4)
-
C
-
Christian Boltz
-
Marcus Rueckert
-
Matthew Ehle