As a completely unintentional side-effect of the knife put to our throats earlier last week, where my involvement is in the security and mitigation area, I’ve more closely analysed some of the Kolab Now traffic patterns over the past months.
A service like ours shows a predictable pattern. We receive a mail once, and it gets downloaded and read multiple times. Or our customers make an appointment in the web client, which then synchronizes to a desktop at work, one at home, a mobile, and a couple of invitations get sent out. Assets get refreshed, and downloaded. I think you catch my drift. I think you, too, would be surprised if we said we receive more data than we send. In other words, we have more TX than we have RX.
More detail on the particulars about various traffic patterns over time however helps us determine whether a particular traffic pattern is out of bounds or within the margin of variance — at that particular time.
Having zoomed in on things passing through our networks once more, I discovered the interesting case where a limited number of users is responsible for a majority of the traffic generated.
As soon as I mitigated the traffic pattern (out of bounds) by forcing it back to within bounds, we’ve had feedback of greater “snappiness”.
This article is about that mitigation strategy, and where we’re going with it.
Firstly, we need to understand that no single connection to any service should be allowed to just consume whatever it can. Any increase in bandwidth for one connection will, implicitly, create drag on all other connections. This is a scheduling thing called FIFO, and it happens on the packet level.
Secondly, we need to understand there’s no comprehending the depths to which this rabbit hole reaches. You can not encapsulate the option value in to some administrator Web UI though. You are you going to need to use some command-line. On that command-line, things will appear weird and awkward at first. It’s like as if these commands were made for some higher up class of people that know what they are doing, rather than just anyone.
Simply firewalling won’t cut it. iptables can drop traffic, but it can’t limit bandwidth by classification. Only protocols like TCP would be able to recover, at some retransmission request overhead and overall more significant delay (for the roundtrips involved).
Simple traffic shaping won’t cut it. You can limit bandwidth based on certain parameters, and classify and prioritize, but you can’t recognize the individual traffic pattern (with sufficient accuracy and autonomy) to act on the semantics of the individual connection.
What you want is to limit factual bandwidth, rather than duct-taping together some form of effective bandwidth, based on the semantics of the connection that triggered your attention. And then not pay any attention to it. It is a tango between two pieces of the network stack, and it takes two to tango.
Here’s a two-step summary of what you do;
- You use iptables to recognize the traffic you choose to maybe schedule, and mark it.
- You use iproute2 to limit what you have recognized to apply whatever policy you choose to the marked traffic.
This allows you a virtually unlimited set of policies to apply to who-knows what traffic pattern;
Go over 10MB/s for 5 consecutive seconds? Limit bandwidth to 1MB/s. Optionally, insert a clause of “until you learn how to behave” — in reality, the opposite is the cherry on the cake, technically speaking.
Using this level of the stack already available, we hear things like “Yeah, stick it to them!” followed by a “Hey, I thought you might be under attack but its actually just faster now!”.