Mailinglist Archive: opensuse (2459 mails)

< Previous Next >
Re: [opensuse] Re: Xen on production environments
  • From: "Ciro Iriarte" <cyruspy@xxxxxxxxx>
  • Date: Sun, 16 Mar 2008 12:51:09 -0400
  • Message-id: <a998a0140803160951wdeed36g3935e3dcce17842c@xxxxxxxxxxxxxx>
2008/3/13, Ian Marlier <ian.marlier@xxxxxxxxxxxxxxxxxxx>:
On 3/13/08 11:23 AM, "Jose" <jtc@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:

I've got a whole mess o' virtual machines (well north of 100), running on
(currently) 23 dual- and quad-core servers. OpenSUSE 10.1 is the OS for
most of them, with opensuse 10.3 in the middle of rollout.

I think we'll go the SLES route, i don't want to reinstall everything
each X months to have patches available...


Performance has been pretty great, in my estimation. We've upped our useful
capacity by quite a bit, and managed to spread VMs around in such a way that
utilization of resources is pretty even.

That's done manually?


For most of our applications, I don't worry about HA, since almost
everything is redundant anyway. If hardware dies, I just rebuild the
virtual machines somewhere else that has some spare cycles (using an
autoyast setup -- takes about 20 minutes from the time of failure to having
a replacement VM up and running), and we're off and going.

For those machines that do need HA, I use drbd version 8 to replicate the
block devices on each machine, effectively a network RAID 1, and heartbeat
to monitor and failover. It's only been triggered once, but it worked
flawlessly in that case. Because of our setup there is downtime when the
failover happens, as the machine has to be started up fresh (as opposed to
migrated), but that downtime proved to be around 30 seconds for all
services.

We are currently running heartbeat and drbd 0.7, but with HA i'm
thinking about a second site (apart from maintenance downtime). We'll
have access to the SAN, were we have mirrored storages...


As of opensuse 10.3, the vm-install and virt-manager packages are included
as part of the OS, and provide for a very simple installation and management
mechanism.

I can't really speak to network storage. I briefly played with iSCSI as a
way to provide shared storage, but in the end decided that the overhead just
wasn't worth it -- if you assume that any machine could possibly die, having
a single storage server gives you a big ol' single-point-of-failure, and
that's bad. DRBD gets around that.

TCO (in terms of management time, etc) is similar to a bunch of individual
machines, though there is a bit of an upfront investment to learn how it all
works. And, as mentioned, I've been able to improve our utilization quite a
bit, which means less power needed for fewer servers, less heat (so less
cooling), a simpler physical network infrastructure, and less cost for
space...

The only big issue that I've run into has to do with the clock; there's a
bug in xen 3.1 that can cause kernel panics on a CPU that does frequency
scaling, because the hypervisor's internal clock goes nuts. The workaround
is to add this to /etc/init.d/boot.local:

echo "jiffies" >>
/sys/devices/system/clocksource/clocksource0/current_clocksource

And I was wondering about why the SLES guidelines stated that
disabling CPU frequency scaling was a good practice...

Related to that, I've seen some issues with Windows Server running on Xen,
related to the clock (though I have very little experience with this setup
overall -- only 2 windows machines running on Xen at all, only 1 of them Win
Server). Basically, I've come to the conclusion that with xen 3.1, one
ought not to put the Domain Controller role on a virtual machine. (This may
well be different now that there are Windows drivers designed to work with
qemu hardware, and now that xen 3.2 is out.)

Interesting... Maybe we'll have to use a Windows app to monitor
Oracle's AS behavior (oracle reports, oracle forms and other nasty
things)..


HTH,


- Ian


Thanks for your comments.

Ciro
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >
References