[TriLUG] mailman tuning

Fri Oct 19 16:09:16 EDT 2007

Christopher L Merrill wrote:
> "top" simply shows python consuming ~93% of the CPU.  Is there a
> better way to answer that question?

There are many ways to skin this cat.  top is a handy tool but terribly
inconsistent from distro to distro.  If you can get a pid out of top of
a python process that is running continuously, consuming vast resources,
you can figure out what that pid is with "ps -fp [pid]".  Chances are it
is one of mailman's qrunners and just figuring out which of them is
bottlenecked will go a long way towards knowing what needs to be tuned.

You can also poke around in ~mailman/qfiles/*/ and see if any of those
queues are backed up more than a few minutes.  With a little creativity
this is actually a handy way to tie into your system monitoring to make
sure Mailman is passing mail in a timely fashion.  Monitor the freshness
of your Mailman queues and your MTA queues, and when there is a problem
you can figure out where in the process the bottleneck exists simply by
observing which queue is stuck.  The Mailman qrunner threads take turns
running so if you have one archive qrunner that is stuck, all of the
other queues could get backed up as well.

If you start passing many messages over a long period of time, chances
are you're going to run into serious performance issues with the archive
qrunner.  Blunty, the archiver in Mailman sucks.  It has sucked for
years, and the Mailman developers have known of its sucktitude.  But
AFAIK nobody has stepped up to the plate to slay that beast yet.

If your outbound queue in Mailman is stuck, it may be an MTA tuning
issue (how fast will it accept incoming mail) or one of the outbound
tuning options in Mailman.  Using VERP will make mail take a whole lot
longer to go out, especially with 15K+ subscribers.  But without VERP,
the handoff to your local MTA should be measured in seconds, not hours
or days.

This has happened less frequently to me in recent years with the advent
of cheap/fast disks, but Mailman can also relatively easily get I/O
bound.  You can see what your disks are up to with iostat.

~mailman/logs/smtp will give you an idea of how long the smtp handoffs
are taking between Mailman and the MTA.

Still, even with VERP enabled, mail should be able to exit Mailman's
queue in a relatively hasty fashion.  Delivery then becomes your MTA's
problem.  This is almost certainly fixable, once a little detective work
is applied to determine where and how you're bottlenecking.