[TriLUG] clustering or server mirroring

Thu Apr 21 13:05:01 EDT 2005

Great discussion and information!  I think I took a step off the deep
end for playtime... won't be doing those things anytime soon.  Thanks
folks for you input!

David McD

On 4/20/05, Aaron S. Joyner <aaron at joyner.ws> wrote:
> Mike Johnson wrote:
> 
> > Um, wow.  You have to do all that [DRBD] to fail-over Cyrus?  Ick.
> > This is why maildir is so nice.  Between IMAP/POP and SMTP, it's
> > actually why maildir was created.  Keep your spools on an NFS system
> > and you can have multiple IMAP servers with simply an IP level load
> > balancer and you're set.  One of the IMAP servers dies?  No big deal.
> > The same can be said/done with SMTP.  Both can easily scale to
> > multiple systems.  This relies on a reliable NFS system, but those
> > aren't too expensive.
> 
> Well keep in mind that this buys you more than just fail over of Cyrus.
> It also is providing that "reliable NFS" system you describe, in that
> the data is all safely mirrored between two ultimately redundant
> computers (which may also provide their own redundancy against hardware
> failure).  The maildir (usually read: qmail) setup you describe above
> only works in a situation where you have 3 servers or more.  That's
> usually not a problem, but it just pushes the "redundant single data
> store" problem farther back in the mail system.  Something still has to
> provide a single, redundant copy of the data.  It could very well be
> DRBD serving up NFS from the 3rd (and now 4th) machine in your picture.
> :)  Although at that point, unless load is a concern at the qmail level,
> you might as well integrate those 4 into a simple pair.
> 
> > On DRBD, what happens if the gigabit link between the systems fails?
> > Does it scrag your filesystem?
> 
> Nope, though the file systems will most likely go out of sync, depending
> on the circumstances.  If you have an additional path to monitor fail
> over (a null modem serial cable between the boxes is highly recommended,
> as well as monitoring on the front-end Ethernet interfaces), then the
> secondary will realize that only the gig-e link has failed.  It will
> receive no further updates of the file system until you repair this
> link.  Once the link is repaired, there is a "fast" checksum for
> restoring sync between the two boxes, so that you don't have to copy
> over the entire block device to resynchronize them.
> 
> In the case that the gig-e link fails, and that's you're *only* way of
> knowing that the other system is up, the secondary node would shoot the
> primary in the head, mount up it's copy of the block device, and
> continue on with life.  Now with out STONITH (i.e. a way to remotely
> power-off the other machine), you're possibly in for some trouble...
> you'd end up with a split-brain scenario, but that has only happened
> because you've got a seriously poorly configured HA setup.  :)
> 
> So in short, it's not quite as bad as you've made it out, Mike.
> Although, I'll be glad to concede that a mail system in general is not
> the most convenient thing to make redundant with just two boxen.
> --
> TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ  : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
> TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc
>