[TriLUG] "Light" monitoring

Matt Pusateri mpusateri at wickedtrails.com
Fri Mar 29 12:31:42 EDT 2013


On Mar 29, 2013, at 11:57 AM, John Vaughters <jvaughters04 at yahoo.com> wrote:

> +1 on Monit, simple and light weight. However, for even more basic information on really skinny embedded systems, consider "uptime" for cpu load and "free" for memory usage. You can easily script that and send it to another computer for analysis. 
> 
> To answer your question simply on the multi-cpu's, it is 1.00 per processor. So a quad cpu would be maxed at 4.00. Having said that, it really is not that simple, because it depends on the thread distribution across processors. Meaning it is possible for a single thread to wipe out a single core and that process may suffer. I would recommend that you start logging the cpu load and see if you can find a correlation to failure based on uptime output. BTW - uptime is the same load as top. 
> 
> Good Luck!
> 


The one thing I don't like about munin is that it rolls up it's historical data, so if you look at data over time and want to go back say 3 months and look at the spike in the graph, you can't do it as the data has been rolled up.  Cacti is much better in this respect as you can drill down.  Probably not an issue in this case but something to think about in larger setups.


Also have you thought about installing sysutils?  Then looking at the sar output when the system is having issues?  You might see memory/disk io/ and or wait times in the data that might point to what is going on.


Finally I'm pretty sure you can change how often Nagios Polls.    I've used Nagios mostly in the past, but it seems to have performance issues sometimes.  My brother switched his setup to Incinga and it performed much better.  I'm getting ready to put monitoring in at $work in a multi-datacenter setup.   I'm really leaning towards Shinkin for it's distributed nature.


Matt P.


More information about the TriLUG mailing list