[TriLUG] OT - gigabit switches

Sat Sep 23 13:53:40 EDT 2006

Christopher L Merrill wrote:

> Wow!  What a great bunch of responses, especially those from Greg,
> Ryan and Aaron.  Most of it was over my head...but I'm hoping to absorb
> just enough to make an intelligent decision for our test lab.
>
> I've tried to identify which specs of a switch are actually important
> for our use-case.  To recap, we're are moving towards having (at most)
> 20 computers in our test lab with GigE NICs (some with multiples).
> When we care about the performance, the scenario will be that most
> of those computers will be hammering one or more web servers, also on
> the same switch, with as much traffic as it/they can handle.  In some
> cases, each "load engine" will be aliasing multiple IP addresses on
> each NIC.  All of the machines will be on the same subnet.  When we
> run tests, we would like the network to be invisible...meaning that
> it is never the bottleneck.
>
> So I've seen a few specs mentioned in switch literature and mentioned
> in the discussions -- I am trying to assess how those relate to our
> situation.
> 1. MTU - larger is better to improve bandwidth efficiency

Jumbo frames are a cool technology, but you can't mix-and-match if your 
network contains any computers not capable of handling jumbo frames (aka 
any 10/100 clients).  Generally, you can get dramatically improved 
throughput by tuning the TCP connection parameters, with out the need to 
use Jumbo frames.  Check out /proc/sys/net/ipv4/tcp_*mem

> 2. # of MAC addresses - since we have a small number of computers
> on a small network, I would guess this is unimportant to us.

You're not likely to be affected by this unless you have more than 8,000 
computers on the same Ethernet segment.  If you do, fix that problem.  :)

> 3. Switching Capacity - pretty important to us, I would think, but
> also seems to be the same for all models within a given line from a
> given manufacturer - is the published number meaningful?

Generally, the published number is not meaningful.  We could get into a 
lot of specifics about internals and when it does matter, but unless 
you're going to go and do a *lot* of empirical testing, or take apart 
the switch and examine it's internals or work closely with the 
manufacturer to get design and implementation details, to learn what 
chips are used and how they're wired to what ports, you're not likely to 
come to any meaningful conclusions.  Also, unless you're stressing the 
daylights out of the switch, it's not likely to matter to you much.

The potential exception to this that I see in your use case is that with 
larger port densities, the switch design internally gets less 
efficient.  The designs are often implemented with single chips that can 
do non-blocking gig-e between 8 to 16 ports at a time.  If you have a 24 
port switch, for example, you might have two Gig-E chipsets that can 
talk to 12 ports.  The interconnections between any two ports which are 
on the same chipset, will be 1gig non-blocking, and ideally the 
interconnection between the two chipsets will be fully non-blocking.  In 
practice, that's often not the case, as that requires a 12gig channel 
between them.  In the better switch gear, generally it is, but in a lot 
of cases you either don't have full non-blocking throughput between the 
two chipsets, or even if you do, since the switch may not be able to 
push the required traffic out the required port fast enough, you can run 
into the switch having to drop packets internally, having congestion 
problems, etc -- this often isn't handled gracefully on the 
inter-chipset links.

So, consider your scenario, where you have 1 server, and 30 clients 
flooding that server with data.  It's not quite your scenario, as you're 
more likely to be pushing orders of magnitude more data *to* the clients 
than from, and this doesn't apply there, but bear with me.  :)  If those 
30 clients are all flooding 1 Gigabit of traffic into the switch, and 15 
of them are on the same chipset with the server, and 15 are not, you may 
begin to find that the 15 computers on the second chipset exhibit 
subtly, or maybe markedly different  behavior than those on the first.  
They may exhibit higher packet loss, thus high connection failure rates, 
lower throughput, etc.

> 4. Forwarding Rate - I have no idea what this is...important?

This is the same as Switching Capacity, essentially.

> One other point that I wanted to verify is that one of the jobs of
> the switch is to keep traffic away from parts of the network that
> are not involved with the sender or receiver.  For example - the
> switch in our test lab is hooked to the switch for the rest of the
> office, to which the rest of our desktops are connected.  So when
> we are running tests in the lab, none of that traffic bleeds into
> the rest of the network affecting performance there.  My
> understanding (and anecdotal evidence) is that this is true...is it?

So, how does a switch work?  :)  A switch learns what MAC addresses are 
connected to what port by looking at source addresses of incoming 
traffic, and associating that traffic with a given port.  Then, when 
another packet is received, it can look in the table of 
previously-learned entries, and send the traffic only to where it has 
learned that MAC address exists.  If it gets a packet addresses for a 
MAC it doesn't know about, or the Ethernet broadcast address, it forward 
that packet on to all ports except the port it was received on.  This is 
great for established simple flows.  This does not totally isolate one 
segment of another from another, though.  Broadcast packets, such as ARP 
requests, DHCP lease requests, some service discovery protocols 
(NetBIOS/NetBEUI's NMP, SMP, browser service...; mdns; etc) all are 
addressed to the Ethernet broadcast address (FF:FF:FF:FF:FF:FF), and 
these packets will be delivered to all end points on the network.  These 
protocols do not cross broadcast domains (god, I really sound like a 
network guy these days), which are at the borders of the routed subnet - 
for that you need a router.

So, the single-recommendation take-away I would recommend, is that you 
need to break apart your test network into a separate subnet.  There are 
numerous ways to do this, but I would suggest getting a layer3 switch, 
setting up two VLANs, assigning appropriate ports to the test lab VLAN 
and appropriate ports to the "office" VLAN, and have the switch route 
freely between those two subnets.  This will nicely allow you to still 
easily talk to machines in the test lab, but provide some insulation 
between traffic inadvertently bleeding from one network to the other.  
If you wanted to be even more cautious about it, you can implement ACLs 
on the switch to say, only allow port 22 or port 80 TCP traffic between 
the two subnets.  You could also implement this on the cheap with an old 
Linux box with two network cards, and a simple layer2 gig-e switch for 
the lab.  This is probably the simpler solution that better leverages 
your existing knowledge, but I think you'll find that once you start 
playing with a nicer switch, you'll come up with more interesting ways 
to segment the lab and get more use out of it in the long run.  Then 
again, perhaps not.  :)  That's why you get to make the decision, not me.

Aaron S. Joyner