Managing bandwidth needs and consumption is a discipline that has existed since the first 150bps modems connected computer systems together. Today, with multi-gigabit links becoming an everyday reality, we are still facing the same conundrums – only the number of zeros has changed.
One thing to remember is that bandwidth and bandwidth management is a means to an end, and not an end in itself. We need adequate performance for our applications and users for the minimum expenditure. To do this effectively, first and foremost we must understand what ‘adequate’ means for these users and applications. Only once this often omitted step is completed can we employ any bandwidth management techniques.
Here are five quick tips on better bandwidth management:
1. More Granularity
Effective bandwidth management is all about having good information. The more important a link, the more granular our information needs to be. Five minute polling may be alright for much of the infrastructure. It is incredibly inadequate for a critical link carrying latency sensitive data. A 10-second 100% spike, completely obliterating the SLAs of all critical applications, will be reduced to a 3% bump (or even less) on 5-minute polling cycle. Even a 30-second event is going to be reduced to only 10%. Being able to poll critical links at high frequencies – 10 seconds or below – will give you the information you need to address transient (if you can call an issue lasting 30 seconds transient).
2. Latency Measurements
Whether you have sufficient bandwidth or not, what really matters to the majority of today’s applications are the latency and jitter characteristics of the link. With modern queuing technologies even high bandwidth utilization may not have an adverse impact on the quality of the services provided by the link. Technologies such as Cisco IP SLA and Juniper RPM can help continuously monitor these very important KPIs at very good resolution. Since you can use these technologies to run tests that look like the real applications (HTTP, DNS, specific ports and not just ICMP), the traffic generated by them will be queued and treated as if it is the real thing – giving you a very accurate picture. Monitoring them continuously can help build good baselines and spot deviations immediately.
NetFlow, Flexible NetFlow, jFlow, sFlow and all of their brethren have one thing in common – they can help you understand what makes up the traffic causing your bandwidth utilization. Without the need of probes or sniffers, you can quickly and continuously monitor the makeup of the traffic traversing your most critical links. This will allow you to quickly spot bandwidth hogs, wasteful traffic, and give you an understanding of the applications using your network. Having this understanding is critical in deciding how to best prioritize traffic and furthermore to create policies that eliminate wasteful traffic (torrents, inappropriate streaming media, etc).
Once you have a firm understanding of the traffic on the network, the next logical step is to classify it and prioritize it. This will allow truly important traffic to fly unencumbered even in high bandwidth utilization conditions, while the rest waits for the best effort service. Monitoring the utilization of the queues is key for verifying your configuration (which can often be complex), and for the process of continuous optimization. Baselining the performance of each queue and alerting when anomalies occur before the entire circuit is affected will let you be proactive. Comparing the configuration of the QoS queues with the information derived from NetFlow will ensure that your enterprise applications are always given the level of service they require.
5. Baselining and Trending
As I’ve already alluded to above – continuous monitoring is a signature component of the discipline of bandwidth management. Trending and baselining bring automation and proactive prediction that most other techniques miss. Using highly granular baselines (15 minutes or under) over predictable business intervals (seasons, months, weeks) can allow us to quickly spot deviations from norm before they affect the performance of applications and the quality of the user experience. Trending, on the other hand lets us predict what may lie in the future, based on granular historical data. Generally, in order to have a meaningful prediction, a significant amount of historical data should be available – about six times the length of the interval we are trying to predict (e.g. for one month prediction – we need about six months of historical data). This data must be granular (preferably raw/non-aggregated) so that the statistical analysis can be as sophisticated as possible – accounting for peaks, valleys, and repeating patterns.
How are you handling your bandwidth management needs? Leave a comment and let's discuss.