[TriLUG] Tuning WAN links

Shawn Hood shawnlhood at gmail.com
Thu Nov 1 13:25:16 EDT 2007


Thanks for the response, guys.

I'm mostly satisfied with the local network performance on both ends.  What
could be done to increase the performance without modifying settings on each
box involved.  Could a box at each end be used to shape the traffic in a way
that would optimize this link, or do we really need to try to tune each box?

Shawn

On 11/1/07, Aaron S. Joyner <aaron at joyner.ws> wrote:
>
> Jeremy Portzer wrote:
> > Shawn Hood wrote:
> >
> >> I've run iperf between two RHEL4 boxes connected to the 3560s.  The
> most
> >> throughput I've been able to get is ~45mbit by increasing the buffer
> sizes
> >> in /etc/sysctl and using massive window sizes on iperf.  I was hoping
> you
> >> guys could point me in the right direction.  I need to do some reading
> about
> >> how to get the most out of this link, and any reference would be
> greatly
> >> appreciated.  Will this be a matter of create a Linux router on each
> end to
> >> shape the traffic destined for this link?  Is this something better
> suited
> >> for proprietary technology that claims to 'auto-tune' this kinds of
> links.
> >> I'm fairly fluent when it comes to talking about this stuff 'in
> theory,' but
> >> have yet to get any hands on experience.
> >>
> >>
> >
> > I don't know too much about iperf*, but my general "scientific method"
> > approach troubleshooting makes me wonder if you have a "control" in your
> > experiment.  What is the maximum throughput you can get between two Red
> > Hat boxes with the same type of interface (I assume single GigE card?),
> > just directly connected?  What about the capabilities of the switches or
> > other network equipment between the RH boxes and the routers?
> >
> > Just wanted to make sure you're pointing your finger at the right
> culprit.
> >
> > --Jeremy
> >
> > *meaning, I've never heard of it before reading your post!
> >
> To follow onto Jeremy's very good suggestion... Although I think you've
> already picked up on this from your explicit mention that it's a high
> latency link, that particular fact is a big deal.  To really simulate
> that problem, it's fun to setup 3 computers, two boxes with a cross-over
> cable betwen them.  Benchmark their throughput shoveling data around
> with your tool of choice (I like netcat and some interface counter
> scripts, iperf will probably work fine).  Then drop in an OpenBSD box in
> the middle of that cross over cable, running pf, and inject some latency
> into the link.  The pf firewall has particularly good support for doing
> this kind of lab testing, although the same can be accomplished with
> iptables and tc, with some care.  It's been some years since I had the
> time to sit down and do this, but it can be really enlightening to see
> how the queuing degrades under latency, at different line rates.  What
> you're likely to notice is that with the default linux tcp settings,
> higher throughputs suffer more from higher latencies, than lower
> throughputs do.  That is to say, your 3Mbit Cable or DSL line is
> reasonably okay with a 100ms latency and doesn't suffer much of a rate
> loss due to the latency.  Even a 100Mbit connection will see dramatic
> losses in throughput with the addition of even 5 to 10ms of latency.
> Gigabit ethernet doesn't even work at gigabit with NO latency, and the
> traditional stock tcp settings.  :)  Turning up the buffer sizes allows
> you to get reasonable throughput, until you inject minor latency, then
> it all goes to hell again.  Getting reasonable throughput on high
> latency gigabit ethernet lines requires *extremely* large tcp buffer
> sizes, to allow enough packets to be in flight until ack's for them are
> received.
>
> For those interested, a short digression into math.  On a 1Gbit link,
> you're sending 1,000,000,000 bits every second, or 125,000,000 bytes,
> aka 125MBytes.  This works out to sending 125KB every 1ms.  By default
> the linux buffers (tcp_wmem,tcp_rmem) on most older linux systems for
> writing tcp data is 13KB, on newer systems like my gutsy gibbon laptop
> I'm writing from, it's around 4Mb.  You can probably imagine a 13k
> buffer is hard for most applications to fill reliably at the required
> 0.1ms intervals to achieve the required 125Kb every 1ms.  Thus, by
> increasing this buffer to a larger size, your app can write data in
> larger chunks, increasing the likelihood that buffer will have data in
> it to keep the flow going.  Also, and really more importantly, these
> value are used by the kernel to calculate the dynamic window size for
> the tcp connection.  In short, it roughly corresponds to the amount of
> data that's allowed to be in-flight at any given moment, before an ack
> is received.  This is required, because the TCP stack may be requested
> to retransmit any of those packets to the other end that are dropped, so
> it has to keep them on hand for that eventuality.  So, if we have 5ms of
> latency, we need to keep 5*125Kb of buffer space, aka 625Kb.  If we have
> 50ms of latency, that's 6,250Kb, or roughly 6Mb.  If things get really
> crazy and we have 500Ms of latency (don't laugh, this is a fact of life,
> or at least physics, in the satellite world), that's 60Mbytes of buffer
> space.  It gets worse when you consider that if there is packet loss on
> the link.  As the likelihood that you have to transmit a packet twice
> goes up (ie. the first packet is lost, then the retransmitted packet is
> also lost), things get exponentially ugly, so to speak.  :)
>
> Anway, it's late, I'm rambling, so it's time for bed.  Hopefully the
> rambling has been educational for someone.
>
> Aaron S. Joyner
> --
> TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ  : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
>



-- 
Shawn Hood
(910) 670-1819 Mobile



More information about the TriLUG mailing list