Determine If Your Network is Ready for VoIP


VoIP applications on enterprise networks present specific challenges for network operations and telecommunications teams. Because VoIP is sensitive to latency, jitter, and packet loss, VoIP deployments demand optimal network performance to ensure an acceptable quality of experience for end users. Join SevOne's VP of Systems Engineers, Tom Griffin, as he guides users through the tough decision of determining whether or not networks are ready for VoIP.



Good afternoon, everyone, or good morning, or good evening for some of you. Thank you for joining us for today's Demo with Dave. My name is Alex. A few housekeeping items before we start. If you have questions, please chat them to me in the chat window, and I'll be sure that we get them answered before the end of the presentation. With that, I will turn it over to Dave.

Hello, everyone. Welcome to another segment of Demo with Dave. As Alex said, good morning, good afternoon, good evening depending where you are in the world. Today, we're going to talk about determining if your network is ready for VoIP. This really extends to not just VoIP but probably almost any IP-based application because that's what voiceover IP really is. It's an application running across IP, running across your network. It just happens to have one of the best detection systems ever in terms of performance, which is people's ears. They want their call to go through. They want to be able to hear themselves on the call. Voice became one of those things across their network that made us build and engineer better networks

Little bit about SevOne before I get started if you've never joined us before with Demo with Dave. How is SevOne different from a lot of the other products out there? SevOne designed this cluster technology. It allows you to do performance metric reporting of anything in your data center in real time. We ship it as an appliance, so it makes it an all in one hardware software solution that is ready to go when you get it, and it allows you to run reports very, very quickly on performance characteristics that are important to your business. As I mentioned today, we're talking about voiceover IP. This cluster technology allows us to scale. You could start very, very small and grow to a very, very large size network. You can get all the same features, whether you're small or large. The nice thing is, as you grow, your reporting speeds don't diminish. They actually get better as we add different and more appliances to the cluster

SevOne has a very open technology. When we started this product eight years ago now, we knew that we weren't going to be the only performance device in the network, we weren't going to be the only monitoring device in the network, so we built it with an open API. That allows us to get all of the input from the bottom there, whether it's virtual servers or networks, regular, physical servers, voiceover IP, as we'll talk about today, and all these other things in. We're gathering all those performance metrics across a wide number of different devices

Then, we wrote an open API, a programmatic way to give that information back or take information from things like configuration systems. Maybe we're going to push fault information to fault correlation boxes. Maybe you have your own corporate branded portal, and you just want a piece of the performance data in the left hand corner to show how things are running. You can grab that via the API if you wish and put it into your portal. Same thing would go for service management systems, and it goes on and on.

Then, as you see in the middle there to the left, we have a mobile app, so SevOne mobile app. You can get the status of how things are working by device groups or by object groups, meaning maybe all my voice routers are a group, and I get a notification if something's not working correctly. You'll see that in my demo in a minute. Also, just the reporting aspect. The front end of this is all HTML. There's no clients. You can get it on your iPad, your iPhone, your Android. Doesn't matter. As long as you have a web browser, you have access to your performance data

Just a slide to show you some of our current customers. Really, the takeaway from the slide is a lot of brand name companies utilizing SevOne, but what do they have in common? Whether it's Telco, finance, tech, media, others, these guys all have networks that are very important to their business. As we talk about getting ready for voice, or even if you have voice or getting ready for the next application that's going to hit your network, you want to understand how your network is performing before and after, and you want real time feedback, so you can make changes in the network to improve its performance if necessary. Voice is the application, but we're going to go through and look at the different things you might want to monitor.

The takeaway from this slide is all the networks that these guys have are important to their business. They can't do without them, and they need to understand across the board the performance of that network. That's all the slides I usually do. I will flip over to my demo box, and we will do a bit of a live demo.

When I talk about voiceover IP, before I log in, I'd like everybody to visualize really a three-layer approach to voice. The first layer is the network. The question becomes, do I have enough bandwidth? Do I have a little enough latency between the sites that I want to deploy my application, in this case, voice? We have bandwidth. Bandwidth is not the whole story. On top of that, we want to be able to monitor response time, the time it takes to get to a site and back. In response time, we want to look at jitter and things like that. Above that, really, that bandwidth response time. Do I have the pipes?

The next layer up is, do I have the appropriate quality of service configured? The pipes, we know, are not infinite. The response time, we can't control dramatically. If you have a site in Hong Kong, from San Francisco to Hong Kong, we know, is always going to be 160, 170 milliseconds. I can't make that any better, so we layer on top of that QoS, quality of service. What I can do it send the most important packets first. Voice being that very well monitored by our human ears kind of protocol, we want to know that those packets get priority over other packets. We have the pipe, which is my circuit to Hong Kong. We then have on top of that QoS, the prioritization of certain packets. They get to use that pipe first.

Then, lastly, this is going to be important in any application but especially in voice, is what we call, Call Control. When I punch the buttons on my phone, something has to understand what that is. Something has to understand where all the phones in the world are, and they have to connect that circuit. We have a layer of application servers. In the world of voice, we talk about call manager a lot of times for Cisco and for Avaya, all of these solutions have a server that's going to control at the very top level.

I'm going to log into my application. Some of you may have seen me use this dashboard before. This is my voice dashboard. In this dashboard, basically, I have three sites. I have a Newark site, a New York site, and I have a Chicago site. Those three sites form a triangle which allows calls to go from site to site to site to site. What I want to do is I want to be able to be sure that the traffic that's passing between those sites, that I have sufficient bandwidth, that I have sufficient queue depth, and that I can actually pass the packets back and forth.

We're going to bring up a dashboard here, and the first thing you're going to see is that top bar, which is the health of all of my voice queue within the application. I've grouped what we call an object group. I took every last voice cue, the ones from Newark, the ones from New York, and the ones from Chicago, I grouped them all up, I set a threshold that said, "You know what? If I use more than so much of a particular cue, I want to know about it," and then I rolled it up into this single, green bar that you see at the top here.

I can take this dashboard, and I can look at it very quickly and say, "Hey, you know what? I don't have any problems with my voice queue in terms of utilization because I've looked across all of them, I've grouped them. If there was an issue, I would have an orange or a red bar in here that says, "You have some sort of performance issue." Right below it, in this guy who's empty, where it says, "No records to display," that's actually just reiterating what the top summary bar says, which is there aren't any issues. If I did have an issue in this particular bar, what we would see is an event or an error message down here.

The next three graphs, as I talked about before, we're measuring bandwidth. I think I have bandwidth a little further down, but I measure my bandwidth. On top of that, I want to measure my response time. We use a technology called Cisco IP SLA. This is deployed at your edge routers, typically. The nice thing about Cisco IP SLA is that it comes with your Cisco router. You paid for a premium router, in a lot of cases. Same thing with Juniper. Same thing with a lot of routers. They have their own test. You have a way to test through the network which shipped with your router. SevOne has the ability to turn Cisco IP SLA tests on or provision those tests for you from within the GUI, but you can also do it at the command line. If you do it at the command line for Juniper or Cisco, when we discover that device, we'll discover all the tests that are on there and begin to give you these performance graphs. We'll talk a little more about bandwidth in a minute, and I'll show you that on the backside of this graph.

In this first one, what we actually have is MOS score. This is a simulated score. The mean opinion score is a voice score. This test goes between my three sites, and it, basically based on packet loss, jitter, and response time, it formulates what it feels the call quality would be on a scale of one to five, one being almost inaudible, five being perfect. We can see here that we have about 4.27 in Newark to Chicago, 4.34 in Newark to New York, and 4.33 in Chicago to New York. We have our three legs there. Call quality is all above 4, which is really what we want to see. We know that when we put phones at these sites, based on these measurements we've made, and you're going to see my QoS in a minutes, that we're doing pretty good.

Another measurement, as I mentioned, is jitter. In this graph here, we're recording jitter over time. You can see that this is measured every sixty seconds. For those of you that haven't been introduced to SevOne before, one of the neat things is that we keep that data for a period of one year, so you have the ability to go back into January, December of last year, however long you've had it up, on a rolling, 365 day basis, and be able to see the performance of this circuit. If you set these tests up well ahead of your deployment of voice or the next application, you have an understanding of how your circuit has performed even before that new application gets to it.

Jitter is very important in voice and in video. Latency is not as much a a factor. We could have a circuit here to Hong Kong where the latency may be 180 or 190 milliseconds, but voice will traverse that, and video, really not too bad. It's when jitter affects it. Jitter is maybe these packets all arrive in 100 milliseconds, and then another set arrives in 150 milliseconds, and then another set in 90. That variation in delivery time is what we call jitter, and it's really what IP end devices have a hard time with. In phones and in the video, even in the codecs in your PC when you do, even Skype, you have what they call a jitter buffer. It buffers up packets as they come in, and it plays it out nice and evenly to you.

What you want to make sure is that you don't have a large variation in your jitter. We usually say right around thirty milliseconds or below is where you want to be in that jitter category. You don't want to have a deviation of more than thirty milliseconds. You can see here, on average, over the last day or the past fourteen hours, if you will, we've seen an average, depending on the site, this guy's Newark to Chicago is 7.6 milliseconds, and then we're in the one and four range for the other two. You are going to have jitter. It's going to happen in a circuit. It largely depends on your QoS strategy as well, but as long as you're keeping your averages below thirty milliseconds, you're probably good.

Then, lastly, we want to watch packet drops, and that's almost self-explanatory. If I lose packets, a lot of voice is based on UDP. Packet's not coming back. We're not regenerating it on loss. That word's just dropped out. If I was dropping a large number of packets and ee'll see here .08, not a ton a lot of packets going away. If I lose enough packets, then you won't be able to understand what I'm saying on the phone.

In the next set of graphs I have, I'm looking at my queue utilization, and this is my voice queue utilization. Again, if voice is the app I'm going to deploy, I want to be able to build those queues and understand that I do or don't have traffic in it. We'll see here, on a scale of zero to 100, almost nothing going on in the voice queue right now, so I have plenty of room to either deploy more phones, in my case. If I was looking at it, and I saw traffic, and there was no phones, I might have to go back and look at my voice queue to see, hey, do I have the right traffic classified? Actually, we're going to look a little more at that in a second, too.

Lastly, I have a table here. We call it our TopN. It could be our top whatever. In this case, I'm looking at a different Cisco IP SLA test. This test is actually a little script. It runs on the voice routers. Again, comes with the software that you buy from Cisco that allows me to actually place a call. You'll see here that I have a voiceover IP to extension 6003, 7002. I'm making actual call setups, just like I would if there were hand sets on the outside edge of those routers, so I understand the call setup delay. Basically, Cisco will tell you anything less than a quarter second, 250 millisecond, is probably acceptable for a call setup. We'll see here, over today, the averages over today, 187 on my link from New York to Chicago, or Chicago to New York, 184. Pretty consistent between sites. Looks like my New York to Chicago is better than my Chicago to New York. Anyway, very interesting to see that. We're looking for something less than 250 milliseconds.

Then, lastly, we want to watch the traffic that's going through our queue. We want to understand what traffic is actually going through the voice queue. What I can tell you from this graph here is that somebody has misconfigured my voice queue because what I would expect to see in this queue in my top talkers would be traffic that is going between UDP ports that were associated with a call, which would be like 16323 to 32768, that UDP port range for voice. Then, I also would expect to see DSCP values that are much higher than these. The DSCP value of zero, which is a ToS value of zero is type of service, or DSCP is differentiated services code point of zero, means that I'm not tagging this traffic appropriately, or maybe I am tagging it, but I'm not seeing any voice traffic through this queue is really what it is. I sort of misexplained that in the fact that this is filtered on a particular port, and what I'm seeing is my untagged traffic here

I'm looking at my QoS layer. I'm looking at my response time and bandwidth. Then, remember, in these three layers, the last layer I have is my call control. I have a little different dashboard over here, I believe. I have my voice servers. I want to take a look at my Cisco call manager and see how the actual servers themselves are running. I want to make sure that, just like in any application, I have enough CPU, I have enough memory, I have disk space, etc., to verify that absolutely I do have.

This first graph, what we see are calls completed. We're seeing who's handling how much load and which gateway. We can see that, over the period, we've see twenty calls. This is a small site deployment, but twenty calls, .14 calls if you divide it calls per second. We're looking also at the CPU for the server itself. We're looking at total octets and how that's running. Then, we're going to look at disk space across my call managers, and we can actually project out. This is basically looking at my subscriber and my publisher, that the disk utilization past thirty days, and then I'm looking at next thirty, ninety, and 180. I want to find out, if my disk was going straight up, I need to make a change to that call control layer server, the call manager, so I can understand that I need more disk, or we need to trim CDR records quicker, or something like that.

Also, I'm just ranking my CPUs, so we can see who's using the most CPU out of my call control. Then, I have some other stats here when we get into Cisco. It's calls complete. We can see phones registered. I don't have a lot of phones registered in this particular box, but we're getting that stat. We know, if we see a big change in this, we'll know that we needed to do something, or something bad maybe happened either at the call control or the network layer at that point. We can see our heartbeat between our call manager, our system scriber, and our publisher.

Then, let's see. Then, lastly, we threw in here a call detail report, so we can see who's called who, what gateway they used. We can see the latency for the call, the R factor, the MOS score. That was a really bad score, or call. We can see those. In practice, what we've seen in a bigger deployment, what you'll typically see is that if you have bad MOS scores, they'll all cluster around one particular gateway. You will see them all around this one gateway, and you'll see that it gives you the idea- I'm sorry. They all come to one gateway, and it pretty well tells you that either the network to that particular point, that voice gateway that takes your IP traffic and makes it PSTN traffic or telephone traffic, you can see that that conversion's probably not happening. Then, you can go back and look at all the other metrics for that particular device, as well.

Then, a roll-up at the bottom of this that I threw in here, too. With different PBXs, we want to understand if there are any performances issues rolled up about each or any of these PBXs. We can take a look at this. These reports as always can be PDF, so you get a what you see is what you get PDF report of this. You can have it mailed to you. You can generate it on the fly. Then, you can have it, if there's someone that does not have access to the SevOne GUI for whatever reason, you could even email them this particular report.

Let me share that with you really quickly because you won't be able to see it until I click the button. Now, you should be able to see that what you see is what you get. You could have this mailed to you every morning or every afternoon, evening. As I mentioned, we hold all the data for a year. We also have the ability to trend data, as you saw in this column: 30, 60, 90, 180. You have a way of figuring out when it is you're going to exhaust the resources on your network in a way that will negatively impact your performance.

I talked about bandwidth first, but I skipped the idea of bandwidth and went straight into my voice stuff. This is a different way of looking at bandwidth, another way to look at just your bandwidth or your health of the pipes as you're going to put your application across or even after you have deployed your application. What we're seeing here is these are all the real time alerts that are associated with the devices. We can see the devices on the map. Then, I've picked my top interfaces, and I can see how they're performing over time. Then, below that, I have some more understanding about the bandwidth. Of my bandwidth, who's consuming what? We can see these are the application IPs that are consuming bandwidth. We have the ability in our dashboards to drill and get some more real time information. We could probably drill into this and see who's talking to who, what protocol they're using, and if any quality of service has been applied to that conversation

While that builds here, we've also done some TopN reports, so most discarding interfaces. If we have an interface that's throwing away packets or error interfaces throwing away packets, we're going to want to know that because, obviously, they're going to be a large part of looking at what we are doing. These dashboards can also be PDFs. You can mail them out every hour, every day, every week as you wish. We can see here who's talking to who, the protocol. At this point, it looks like there's a couple UDP sessions that are getting a ToS marking, or type of service marking. Then, we have a bunch of ICMP that's going on as well. Then, I've just TopN my slowest links. This is response time. How long did it take to get there and back? Some lab routers here. It takes 25.33 milliseconds on average to get to Dallas today. That was an ICMP round trip.

Then, we've utilized IP SLA from the edge routers, and we're looking at the IP SLA round trip times as well. The difference between the IP SLA test and a simple ICMP test is the IP SLA test for Cisco, and a lot of Juniper and others are similar, is you can actually set this traffic you want to use as your round trip traffic. I can make it UDP. I can put it on a specific UDP port, and I can give it a specific toss or a differentiated code services marking such that, as it traverses the network through the queue that I've set up, I can watch to see how it performs. I can watch non-queued traffic versus queued traffic, etc. to see response times.

I'm coming right to the bottom of the hour. That's a lot of information in a very short period of time. I'm wondering, Alex, do we have questions?

None yet.

No questions from the field. I hope I've covered a lot of this. Usual questions are, can we provision IP SLA on my edge routers? The answer is yes. Do we support voice types other than Cisco? The answer to that is yes as well. We have the ability to consume and produce telephony reports, which are who called who, is that chart where I showed you where they had the MOS scores, or they had the packet loss R, jitter. In there, we have the ability to take packets from Cisco, Avaya, a generic SIP platform like Asterisk, if you will. We can grab data from SIP, Avaya, call manager. That rounds out probably most of the phone systems you would have today, mostly anything IP-based. No questions from the field? Anybody? Somebody's got to have a question. You can use the Q and A chat window to push them up. You can raise your hand, I think.

All right. What do we got? What specific indicator OID are you using in your global voice or IP service dashboard for your QoS? That comes out of the class-based QoS MIB from Cisco. The actual OID at this point escapes me, but we are pulling those out of the class-based Cisco QoS MIB. A good question.

What about Lync? Today, we don't support Microsoft Lync. We have had a couple inquiries. If you would like to reach out to me directly, I am always looking for folks to talk to about alternate technologies, and Lync is certainly one of them. I don't know it well, and I don't believe it uses SIP. If it's a SIP-based protocol, then, yes, we would support it. Feel free to reach out to me. I'll throw up my credentials at the end of this.

You can reach me at if you have other questions. Anything else? No, I think that's all the questions we had. I thank everybody for their time today. I think, week after next, we have another Demo with Dave.

Yeah, one more two weeks from today.

Excellent. Thank you, everyone, for joining.