Better Bandwidth Management
Join SevOne's Dave Hegenbarth, Systems Engineer for Global Strategic Alliances, as he discusses the topic of managing bandwidth. Within the video, Dave demonstrates SevOne's unique capabilities of visualizing key bandwidth data and how it can be efficiently used. Learn more about this process and how SevOne can be tailored to fit specific needs.
Good morning, everyone! Thank you for joining us today for our Demo with Dave on bandwidth management. Just some housekeeping before Dave starts his presentation at the end of today's demo, you will be directed to take a survey on today's presentation and we ask that all of you help us, help you. Your honest feedback is appreciated to make these demos better and we appreciate all of your time. With that, I'm going to turn it over to Dave.
Thank you, Alex. Good morning, good afternoon, wherever you may be in the world today and thank you for spending a short period of your Friday with Demo with Dave. I am Dave and we're going to talk about the ability to do bandwidth management with SevOne. How do you know what your most utilized, least utilized, etc. links may be? How do we display all that and how do we do that with great speed and simplicity? As always, if you haven't been here before, I kick off with just two or three short slides followed by an actual demo of the product and then we'll wrap up with questions. If you have questions please chat them in the Q.A. box, not the comments box or the messages box, whatever it might be called, I can't remember, but the Q.A. chat box and then at the end of the presentation I'll go ahead and try to answer as many of those as I can. If I can't get to all of them, if there are so many, then we will take those offline and make sure to get back to you individually.
With that, who are we and why are we different? Well this is SevOne, we are performance and infrastructure monitoring and what we did different in the world is we came up with this SevOne Cluster Technology. So each of our appliances is the data collection tool, it is the database, it is the analysis engine, and it's the reporting engine. And every appliance that you might have will do all of those functions. If you have either a geographically dispersed, or for a lot of device management monitoring, you may have multiple appliances, you may just have one. Just depends on the size of the network. In any case, they all do the same thing and those folks who are geographically distributed, we give you the ability to log into a single web page to get all of your performance stats - whether you're doing servers or routers - it doesn't matter what you're monitoring, we give you the ability to see it in a single pane of glass, regardless of where it is in the world and we do that very, very quickly in terms of reporting speed. We do it very quickly in terms of deployment as well.
In deploying these appliances they do come completely ready to go, all those things I mentioned are already installed and ready for you to use. We want you to monitor the network and the infrastructure we don't want you have to build servers. We do that for you. What do we monitor? Well we pull in almost any kind of IP performance stat you would like. We start of where most network performance monitoring tools do, grabbing SNMP stats from routers and switches, and firewall, and servers. Then we've added so many more technologies to this same platform. We're adding virtual servers connecting VMware and understanding host and guest resource utilization. We're able to monitor voice over IP systems like Cisco. We're pulling in cloud based apps statistics using some technologies like Cisco, IP SLA to measure, connect, response, DNS time, etc. We're able to pull DNS and http response times to pages and show you those as well. Trying to give you an entire end-to-end view of your IP infrastructure performance.
What do we do once we have all that? Well the one thing we did was, we built an open API that allows you to integrate with other things in your environment. We are a performance monitoring tool, we are not config management, we are not a complete fault solution, we are squarely in performance management. We knew there were going to be other devices in the data center we would need to speak with. Such as fault and event management, service desk, management portals, things like that, so the API gives you a programatic way to control our product. Whether it's adding devices, deleting device, creating and running reports, etc., that can all be done with the API, so that all of our reporting can be automated. You don't need to use the API we have a great GUI that will produce nice reports and we're going to look at that GUI in just a minute, but you can use the API if necessary.
Lastly we built a phone app, or a mobile app if you will, it's an android, and iOS. You can get the status of the performance of your network, on your phone or iPad or what have you anywhere you go. The other thing I might add, is the interface for this is all HTML. You could actually use the browser on your phone, or on your iPad to run reports and things as well.
Then just a quick slide, some of our customers. The reason that I put up this slide, is not only to introduce you to some of those who have chosen SevOne but also to show you that there's a number of different verticals we serve, but we actually serve them in a horizontal manner. They all use the same product, they're all producing a lot of the similar reports that you'll see me produce today. The key thing is their network, one of the things that ties all of these guys together is that their network is key to their business, or their IP infrastructure is key to their business. As are most business today. These guys all depend on the network and the routers and the switches and the servers being there. They need the performance and monitoring to understand how things are going in their environment which is really what I'm going to show you in my demo network today.
Now, I'm going to flip over and hopefully you're able to see my SevOne browser. I've pulled up Firefox, I've gone to my demo box, I've logged in as a guy named WAN because we're going to take a look at my WAN performance. There are a couple different ways to look at bandwidth management. One is in a real time view so I can see here that I have 5 alerts of a critical effecting my network. What I've done is this particular user, WAN , is limited to my WAN routers so that he is looking at performance and bandwidth across, the infrastructure. If I click on my alerts, one of our favorite demos here if you've ever seen any of these before - you've probably seen me do this before, is to look my top alerts. Right here I see that my New York Voice gateway Ethernet 1/0 has a lot of bits over 100 bits. Instead of threshold it say, anything across that interface greater than 100 bits, which is pretty much anything, throw an alert for me.
What are some of things I can do from that to look at that bandwidth? Well, if I click on the object here and run a quick object summary, I'm going to understand the bandwidth of that particular MPLS link headed from New York to Newark. I can see the availability is 100% the interface is enabled. I have frame statistics, so frame in, frame out. I have a bunch of in statistics, a bunch of out statistics of in bytes out bytes, in disk cards, out disk cards. We can see any queue drops that might be going on. All the way in the bottom is just a graph that show my total bytes. One of the cool things that we can do with this is we can click on it and attach to what we call an instant report.
In our instant report, wow, we see a couple of spikes here just recently. We see a dash line that runs through here, this is our baseline. SevOne creates a baseline, or an understanding of normal for every KPI - Key Performance Indicator, that we track. Whether that's in bytes, out bytes, temperature, volts, doesn't matter what it is you're monitoring, if you're monitoring, we automatically create that baseline. What we can see in this graph is that the actual traffic is well above the baseline traffic. Now usually, when we run a bandwidth report like this, our next question is, well who's doing that? Why do I have these spikes in this particular graph? One of the great features in SevOne is our ability to be able to do what we call a quick chain. This takes the output of that first report, and makes it the input of our next report. What we can see here is we are presented with a NetFlow graph that is the from the output of the graph up here.
We see our two spikes here and we see our NetFlow graph here, and then we see it's Bob Parsons, that did a lot of that talking right there. We have the ability to go from there actually into a number of different views. This first graph was our top talkers, so we see that it's Bob that's using a lot of the traffic. We have the ability to drill down to things like top traffic, or conversations. Here we're going to see that Bob was talking to this particular server called Time Tracks. I don't know maybe he was entering his time whatever. If we want to know more about that conversation, we can drill even further and we can go into something like let's see, top flows with next hop and DSCP. This gives us all the goodness that NetFlow has, we'll see that we have our application IP it's Bob, he's talking to Time Tracks, his default gateway is listed here as 13.1, protocol TCP, application port is this apparent net test packet sequencer and he sent it here. So, Bob's doing some sort of packet transfer from himself to a server called Time Tracks, and that dwarfs most of the other traffic that happened to be on the link at that particular time.
What we see here is an alert, right, so we went from an alert. From that alert we went right into the traffic that made up that alert. From that traffic we were able to do two things. One easily see that it was Bob Parson that was creating the traffic and two sort of building this dynamic dashboard that maybe we want to keep. I can come up here and I can say top alert, traffic to NetFlow detail. I can save that, if I wanted to share it with other people I could share it out. If I wanted to see this report everyday I could say you know what firstname.lastname@example.org you should see this traffic all the time. Maybe I'll give it a good description to go along with it. Then I click save and there I have my report. We'll look at some of the other reports.
One of the other things I can do also is, mark it as a favorite. When I mark it as a favorite, when I log back out and I come back to my home page, what we'll see is that I have my top alert to NetFlow detail report. When WAN logs in, and these reports are unique to each person who logs, when WAN logs in he's going to be able to click and view that particular report, and we can see that really quickly. Make it rerun. The other thing you'll notice is the speed of our reporting.
Another thing you notice about these is their dynamic. If I want to see, that was the past 24 hours, maybe I want to see the past 7 days, and all of the sudden just that quickly I can get a graph of the past 7 days. We see the bandwidth associated with it, we see that spike dominating the traffic in there. We can tell what type of traffic it is. Some of our other favorites as a WAN guy, I probably want to know how my world is. I built the WAN overview report here and I've included a lot of different graphs and parts and pieces to help me understand my bandwidth management.
What we can see here is I have some real time errors. As we saw we were looking at that New York air just a minute ago and he was x number of bits over 100. I have my map so I can see it as a visual. I can see the sites that are running well, I can see sites that might have an issue. If I come over to my New York guy, I have the same sort of thing. You can see right here, here's my alert I have 11 megabits which is way greater than 100, or 10 megabits or 100 bits, over a 30 minute period. I had a sustained burst and that object is alive. We come to Chicago we can see very much the same thing. This time we're looking at round trip time.
One of the other things about bandwidth management is you can have a link who's bandwidth is fine. It's 50% of what it's supposed to carry, but if it still takes a very long time to get to some point and back then we want to know about that too. In this case what we're looking at is the round trip time from Chicago to Newark which is our corporate headquarters. We see that it's now 27 milliseconds which is greater than 25 milliseconds over a 15 minute period. If we want to examine that a little closer, we can click on our object summary.
When I click object summary I can get the same sort of chart only this is for our Cisco IP SLA test from Chicago to Newark. Availability, it's run 100% of the time, average statistics, so we're looking at delay, destination to source, source to destination. We can see things like the average bandwidth. We can see a MOS score, so a Mean Opinion Score all out of this one IP SLA test. The Mean Opinion Score for voice goes 5 is perfect, 0 is inaudible. If I had a call going across this voice over IP call going across this link, I would understand that the quality is very good, it's a 4 or above.
We have all kinds of different statistics provided, packets out of sequence if they are, we have all kinds of different statistics that Cisco provides us. The one we're looking at remember was that round trip time. That allows me to call out in a visual manner very quickly what's going on. I can also call out in real time how things are running. This is my real time graph of my top 5 interfaces across the network. That's also followed by a couple more graphs down this right hand side. My most discarding interfaces and my most aired interfaces, maybe I want to put those guys like this because I want them in that way. I do have a couple errored interfaces that have some minute errors, we might want to investigate. Again my top most utilized interface and table form. I've also associated my top talkers across all my WAN links, who's doing the most talking.
We can see here I think these guys are a couple of network devices, then followed by Bob Parson's again as we saw. Then some guys going out to google Cache probably that guy's providing content to web videos and that kind of stuff. I've also followed along with some of my slowest links. If I was going to go after who are my round trip times who might be complaining, I'd have them stack ranked here. I have my slowest lengths and again I'm using an IP SLA response time test from my sites. I can also look at that in round trip times, so this is IP and P, this is UDP I believe and again the round trip time from Chicago to Newark is the slowest. New York to Chicago in that other link a little slower as well. Then you can see going down.
Lastly what we want to do, is we want to bring up the ability to actually do some planning. I do have one charted here, and we've actually said "Hey you know what we'd like to see how this network is going to run, three months out." We actually have the past 24 hours in this middle column, as it's happened in the last 24 hours. We have a column over here which is the next one week, and then lastly we have sorted by our 90 day projection. Because we know if we needed another circuit it probably would take us 90 days to get it turned up and bedded out before we turned it over for production, so we probably want to see where we're going.
In case we're headed to Boston or we're headed to Chicago we're 51% our last 24 hours showed us 62% but our one week showed us 13 so we can see sort of the trend that's going on there. It's probably not going to happen inside that 90 days but outside that 90 days another 90 days we really need to look at bandwidth.
Another way to look at capacity planning and bandwidth management, I have another capacity planning report up here. I've gone through and again my most utilized interfaces today, one week, next one month. You can see here even some negative numbers, so we can trend in the other way even though today is 64% headed towards Denver I don't think I need to get real nervous because one week out we're looking at 13% which tells me the historical data is sloping downward.
Highest response times in terms of round trip times from those IP SLA tests. We can see here that 29 milliseconds headed towards some pretty big numbers in the 3 month period. Although the negatives in the center would say that maybe there's some data skew going on there or some math that would suggest the downward slope before the upwards. Sometimes spikes are something that we look at.
One of the ways we could do that is we can actually chain from these into the actual data that makes up these numbers. Sort of drilling down, and we get a view we can see here that there's a spike that's probably skewing our projections just a little bit. This is a look at today and if we want to look at - probably best to go look at- say 4 weeks worth of data and we can do it just that quickly. We can make some assumptions about what's going on from the real time data. As opposed to that network overview data as I stretched this out just a bit. We can see a lot of the average times here are in the 150 millisecond range plus, minus. We do have every once in a while some big spikes. We also have the ability to work with these in terms of zooming in. We can zoom into a particular spike later if we want to see this.
We're starting to make some analysis or some idea about how we're running depending on what segment of time we look at, we can also look at last month. How did October compare to November? Things like that. We can begin to understand where it is we need to take ourselves in terms of bandwidth.
The last thing is people take a look at these reports and they say "Hey you know what those are fine but how long does it take some body to make a report like this?" I thought I would walk you through that very quickly in our product. We have a nice report wizard, if I want to know my top network interfaces I would use the top N square here. I can do next, I will get a list of devices and again maybe I'm only interested in the WAN routers so I will pick the wide area network device group, and we'll do next. Let's stick with most utilized interfaces for right now, I will do next. I'll say well we have today as a default but maybe I'd like to see as we did before the next 7 days, the next three months, on out. I can do next. I'm presented with the columns, so maybe all I want to see is my device name, my object name, I don't need my alias, and then I do next. I get a summary of what we have and I click finish. Just that quickly I have that particular chart in there.
If I want another one I can come back over here. In here I want to add, maybe I want to add the IP SLA test we saw, very easy again. I'm going to stick with top N right this second, I'm going to come in and pick a device group, it's my wide area guys I want to pick. I come in and I want to do IP SLA. These are all the different test times I have and here's the highest IP SLA jitter response time test. I can do next, if I don't care about time range I can just say finish and immediately I get a finish. But you can see indicator alias and object description are not well populated, for these devices so if we want to go in and edit, we go into our visualizations and we say for our table, we don't want the object description and we don't want the indicator alias and we'll redraw one more time. There we have a nice chart there.
Then if we want the real time graphs as I showed you earlier we can do that quick chain and we can actually get the real time graph of the data as it happened over time. We can come up here, we can name it our favorite bandwidth report. Away we go, I can save that, I can make it a favorite, and I can say you know what that's good when I come back here and I click the home button, I now have my favorite bandwidth report which will open right like that. Just that quickly we have a report a lot of this is for today, if we globally want to not be today. We can just as easily say I wanted to see the past seven days and just that quickly we have a 7 day report. There's a lot of different things you can do in the product. It's built to do them very easily, it's built for reporting speed. It's built to give you the information around any particular IP or IT performance indicator you want. Today we were talking a lot about bandwidth, which is just in bytes and out bytes, which we can trend and do things for a lot of different topics.
I think I'm going to finish my demo there and see if we have any questions in the Q.A. panel that I can answer.
One of the questions that came in was: When viewing the reports can you move the elements on the page so we can put the most important information first?
I think you guys saw me do that, but yes. I can grab any one particular object and place it in front of a different object. Absolutely, we can make them smaller, bigger whatever we want to do we can move them around. We can add separators too so we have the idea of a separator to pretty up your report a little bit. We can call it WAN's stats. When I insert that I can move it around as well, so maybe I want to break up these two graphs and I want to have a graph that goes like this. Absolutely can do that.
Another question that came in was: On the slowest link ranking, was this a ping from SevOne appliance to device or device to device?
We have the capability to do either. SevOne will go out and ping each of the devices that's monitoring and we do that with 5 packets. We use ICMP packets and that gives us min/max jitter and an average round trip time. You can alter that so the packets can be, you can have more than 5 if you wish, you can alter the size of the packets as well that SevOne issues to a device. We also as you saw support the Cisco IP SLA test. That gives us a more site to site so the round trip times that I had in there from Chicago to New York were not measured from SevOne but from the edge voice gateways that sit in each of those locations to the other locations. It gives us that device to device sort of measurement.
One of the other questions that came in was: Can we create reports based on specific OID's?
Absolutely, all of these that you looked at were specific OID's a lot of times are in bytes and out bytes. If you had any sort of statistical data that's data that changes over time then you could grab that. Other things we do temperature, another OID is how many users associated with a particular wireless access point? Yes, absolutely can grab any sort of OID that has statistical data associated with it.
Well guys I think we're coming right up on the bottom of the hour. I don't see any more questions in the chat box right now so I'm going to say thank you very much for your time today and spending your Friday. The next Demo with Dave is going to be December 6th, we're going to talk about best practices for cutting mean time to repair in half. How do we take some of this bandwidth information and either prevent bad things from happening? Or when bad things in our IT infrastructure happen how can we quickly identify them and get folks to repair them. Again thank you everybody for joining me on a Friday, I hope you have a great weekend, see you next time.
Thanks guys and please don't forget to take that survey.