Improving VoIP Performance
This Demo with Dave explores how SevOne can help monitor VoIP environments while simultaneously improving VoIP performance. Take an in-depth tour of what metrics and reports are generated in regards to VoIP.
Hey guys it's Dave with Demo with Dave. We're going to wait one more minute here and then we're going to get started. Thanks for your patience. Appreciate it. All right. Good afternoon, good morning, good evening. Wherever you may be. Welcome to Demo with Dave and today's topic is Voice over IP. I'm going to take you through just a couple of slides as we do. I don't know if you've been with us before for a demo. Just a few slides on what we do and then on in to the demo of how SevOne is able to help you monitor your Voice over IP environment. Just a little bit about who we are, what we are and how we're different in the marketplace for a lot of Voice over IP, a lot of network management and infrastructure performance applications out there.
How is SevOne different? How are we able to really help you dig in to many different aspects of voice deployment? SevOne is built on top of a clustered technology. Now, what does that mean? That means that each of our appliances is a collector of data. It's a reporter of the data, dashboards, PDFs, all kinds of good things and it's also has the ability to talk to other SevOne appliances should your environment need this. Now, there's two reasons you would have really multiple appliances. One reason is because you're geographically dispersed and we don't want to pull NetFlow data and pull data in API sub calls across this WAN links and chew up bandwidth.
Each of the appliances is able to act as its own collector and then the appliances peer together as we call it to actually bring in a single pane of glass view. You see here, I have five appliances spread around the world and I might want to log in to any one of them, I might be in Germany, I might be in Africa, I might be in South America but when I log in as an admin, I want to be able to see all of my data all over the globe. We do that with this concept of peering. The data stays locally so I'm not transferring data across wide area links, yet I get a view into my Voice over IP to global to Voice IP deployment.
The other reason you might have more than one appliance is to simply for scale. If we had thousands and thousands and thousands of routers, we'd probably would more than one appliance but again, the peering concept comes back in. If I have even 15, 20, 30,000 voice gateways out there, we can monitor all those with a number of appliances and then bring them back together in a single GUI and because of each of the appliances is doing its own collection, its own portion of the reporting, reports run very, very fast whether you have one appliance or the other.
The last thing that depicted in this picture is two boxes, one on top of another. If the performance monitoring aspect of your voice deployment is important, we have a hot standby. Should one, the primary fail, the secondary will pick up. It is a one to one hot standby and the reason for that is each of our appliances is the reporter and the collector. They also are the holder of the data set that they report and collect on and that data set is going to be a year of as polled data. You get a 365 days exactly as it happened at the polling frequency you set and we have the high availability solution below that obviously those two replicate.
You have when one fails, the other one can pick up and has all the disk and all the ability to show you those metrics for the complete year should the primary fail. What do we do? Well, someone is an awesome aggregator of data and that data can come from all kinds of different places and all kinds of different devices, all kinds of different methods. Started out as most people ICMP and SNMP and we support both 1, 2 and version 3 of SNMP but we started out collecting data that way and then we had a myriad of other ways to get that data. We can go out and we grab data via maybe WMI for Windows servers.
We're going to go and today, we'll talk a lot about Cisco Call Manager and some of the voice monitoring around that. We can go out via the AXL API and we can grab that. We can look at virtual servers so we have the ability to talk to vCenter on the virtual server level. Cloud-based apps. Some of the ways we get data in the cloud. Through either HTTP response measurements, DNS response measurements and other things like that. We bring in the ability to to bring in flow data as well. There's a wide variety of different ways that we can get data into SevOne.
Once it's in SevOne, what do we do? Well, obviously reports and graphs are a big part of what we do. We have a mobile app for event notification. If your voice queues are full or whatever your priorities may be to look at, we can show you right own on your android or your iOS. It should be able to show those events that happened and then lastly, with the open API, we have the ability integrate other things in the environment or give data back. Config management. If you have a sources of truth for all your devices you're managing. We can use our API to manage Arc configuration inventory to be the same as that one.
A fault and event management. We can send northbound alerts when something goes wrong. We can display graphs and charts inside a corporate portal already built using the API and we can interface with service management tools like service desk and other things, usually, grabbing some performance metrics out of SevOne such as interface rates or whatever it might be, combining that with business metrics to give a bigger and better picture. Those are all ways we get stuff in, the ways we get stuff back. We do charts and reports really well.
The marketing slider and some of our customers. The take away from this slide for me is not the necessarily the customers or the verticals, but the fact that the thing that all of these customers have in common is that network is important to them. Their servers are important to their business. The IT infrastructure out there, voice and everything else is very key to them providing services to their customers. They have all needed a higher level or better way to monitor all of the statistics that go in to making up a service like voice or whatever applications they're serving and be able to get in real time information on how to make that particular application better.
With that, I'm going to jump over to the dashboard. I will talk for a moment about Voice over IP. Before I go in, I'll show you a bunch of graphs and lines and charts. I see Voice over IP as three distinct areas. The first distinct area is IP. This is IP traffic that traverses the network. We have switches and we have routers and we have voice gateways that get call traffic from one place to another. There's things within those. Interface utilization, queue utilization, delay, jitter, packet loss, that go along with that. The second piece is the health of the servers that control all that.
What I'll show you here is the Cisco Call Manager, but the Call Manager whether it's Cisco or Avaya, Nortel, it doesn't matter. Those physical servers that set up these IP connections have server needs. Right? They have CPU, they have memory, they have disk, they have the I/O for the Ethernet cards that help them do all the setting up of calls and reporting and on things like that. We need to understand the performance of those servers as well because if they're not performing or one fails even in a redundant scenario, we want to know that so we can get things fixed. Lastly, there's a bit of call detail.
Who calls who? How long did they talk? What's the quality of that particular call? That's another report that SevOne could help generate. You could put the whole end in spectrum together from the network layer of switches and routers through the call control layer and servers of that sort and finally into what do they produce. Who called who, when and for how long. That's really what we're going to see in this demo. I'm going to pop over to my demo server and I'm actually going to log out. I've built a particular dashboard. In this particular dashboard, this is for the voice team to log in and take a look at how their voice deployment might be running.
When I click on this, what we're going to see is we have at the dashboard that immediately pops up and this is my global Voice over IP service dashboards. What does that mean? Well, we have the ability to see here right in the first thing this green bar. I'm taking the performance violation of all my network queues and I'm putting those together in a way that says, "Hey, do I have any violations of queue performance throughout my entire infrastructure today?" We get to see nice green bar and we get to see that zero of my 52 voice queues are affected. I have no voice queue issues.
If there was something affected, I would get a real time display here. I would get an alert saying, "Hey, the voice queue on this particular router is no good." Right now, today we're looking pretty good. Now, what I've done is I have three voice sites. I'm in Chicago, in New York and in Newark voice deployment, kind of the triangle if you will and I want to know some things about that. In this first graph, on the left here, I'm looking at MOS score. Mean Opinion Score. This score is actually mathematically calculated now. Bell started it along time ago.
They took bunch of operators. They put them in a room. They made a phone call and they said, "Have that call sound on a scale of zero to five." Zero being you couldn't understand at all. Five being it sounded perfectly clear. The operators would give them a subjective voice score. Now, we've taken that and we no longer have people calling people but what we have here is some Cisco routers and they're performing a Cisco IP SLA test for voice and actually sending UDP packets just like the voice traffic would be, down a path and getting them back and then using a packet latency, jitter, or packet loss and some other metrics, we come up with this relative voice score.
You can see here on this deployment that we're running very well, 4.34. We did have one little dip earlier this morning, somewhere around midnight. You can see that over in all, we would expect the calls from all three sites to be sounding pretty well. We're also looking at in the center graph here jitter, measured by the same IP SLA test. Jitter tells us a lot. Jitter is really the enemy of Voice over IP. A lot of people think latency between sites is a bad thing and it can be. However he human ear can tolerate a lot of latency if the call quality is clear.
We look for a round trip times of Voice Over IP between two sites to be under 250 milliseconds, under a quarter of second and as long as you have a very low jitter and jitter is the variation in a way the packet is delivered. If you have a very low jitter voice quality even with a long latency will sound good. Conversely, even if I had a round trip time of two milliseconds, if I have very high jitter meaning that one time I made it two milliseconds, the next time I made it 38 milliseconds, next time I made it 4 milliseconds, the next time I made a 100 milliseconds then I made it in 6 millisecond. That drives Voice overIP end points crazy.
The reason it drives me crazy is most Voice over IP end points are built have what they call a jitter buffers built into them. They take in a bunch and then they play that just the way your ear wants to hear it. If it comes in on a regular pattern, very easy to play out to you. When it comes in a bunch then a little than a bunch in a little that jitter buffers can't handle it and that's when we start to hear that underwater thing. We're looking in one of our thresholds will be. Jitter over about 30 milliseconds. We can see here that we had one or two spikes that went to 16 milliseconds.
On average, our jitter here is at worst 4 milliseconds between Newark and Chicago and at best at Newark in New York which is much closer. We're running about one millisecond worth the jitter. It's pretty easy to tell that it looks like we've been doing a pretty good job and then packet loss. All zeroes. We always want to see the all zero. The next set of graphs I have here are my queue utilization. We could see that my voice queues are way underutilized almost no percentage running through a couple of them, a little more through my Chicago voice queue.
If someone would come to me and say, "Dave, we want to put another 500 phones in the Chicago location." I'm running through my voice queue utilization and I'm saying, ""Hey, you know what? We got plenty of bandwidth because we haven't seen our voice queue tapped almost at all." Below this, I have a table here and this table has a little different Cisco IP SLA. This is actually making a call from one router to the other and looking at the average call set up delay time. Here again, we want to see about 250 milliseconds or less. We want the call to be picked up in a timely fashion and I'm able to make those calls all the way around.
Chicago to New York, New York to Newark, Newark to Chicago. Around the triangle, you can see the extensions I'm calling here and you can see my average time ranked worst to best basically. Today, it looks like I'm averaging about 118 milliseconds for that particular call, 115, etc. I have the ability to understand how call set up is in my network and lastly, I'm looking at flow data for a particular queue. What I'm going to be looking for in this queue is that I have consistent queue as markings. I don't have very much traffic on to this queue right now but this is using NetFlow.
From the voice gateway, he's sending me NetFlow and I've filtered it to say for this particular voice queue. I would like to understand the types of traffic that are going through and what we see here is we have a host called mail relay, who's talking to a particular client. The protocol is UDP. The application is SIP and the DSCP. The Differentiated services code point is a AF41. This is, as we would expect it that I know because I was the designer of the voice network, that all voice traffic should be tagged AF41. You may tag your EF. This is just the call setting or the queue priority setting that you're going to have on your access list, the tags, the traffic.
I've set all mine to AF41 and what I want to see is for any traffic that's going through our voice queue, I personally in my network the way it was configured, want to see AF41. If for some reason it's not tagged that way, that means that some router some way along the path was misconfigured and that's one of the biggest problems you can have in a voice network is the misconfiguration of those voice queues. It happens a lot in a very large network. You have a thousand voice gateways or whatever, probably not the same guys configure an all thousand voice gateways and somewhere along the way, somebody decides to mark a queue differently.
When you have queue mis-marks, that's where voice call quality can become a real challenge. That's a dashboard that set up for me to understand that, "Hey, today we're doing really well." Say, "How do we do this week so I can do a weekly report?" One of the powerful things about SevOne is the speed at which I get my reports back. Basically, what I can see now is that over the past week, one of my queues had a problem. I can see here. One out of 52. For a very short period of time on yesterday, I had a performance violation of one of my voice queues and it's pretty easy to see because it's down here in the Chicago voice queue. We can see that he was driven to more than a 145%.
One of the great things about the SevOne product is that the graphs are interactable. In my dashboard, I can go in and I can zoom in or zoom out to a particular point and I can see the performance of a graph any way I want to see it, whether I want to see the past hour or the past week, whatever it is. We could see here we're definitely above a 100% voice queue. Something was going through that voice queue that probably shouldn't have been there. The question becomes what. This is a bandwidth graph. Down here, we can see this little blip right here. That is our baseline. For everything we tracked whether it's jitter or latency or packet loss, we also generate baseline values.
Our baseline values let us understand what normal looks like or way, way above normal but looking at this particular graph, all we know is that something was driving bandwidth. Fortunately, we have what we call chaining. We have the ability at SevOne to take the output of one graph and make it the input of the next. I can actually chain this to what we call a NetFlow report and what I'm saying is for this particular time frame on this particular QoS queue on this particular router, create a NetFlow graph that shows with what was going on. Now, what do I see? Now, I see this guy 10.2.15.
He's talking to this guys 50.199. The protocol is TCP and there's a lot of TCP in there. This shouldn't be. This should be a voice queue. What I know is that somehow this queue earlier yesterday was misconfigured. It look good for today but we went back and we drew a week's worth of everything. Same things that were on the graph before for today and we found this issue and it will correspond probably with the help desk to get if I go back and look to say, "Hey, there was a really bad voice yesterday morning before these people out of Chicago." Oh yeah, there was because something was misconfigured.
I always misconfigured in terms of TCP in the queue but also we can see here wherever this traffic was passing through, his QoS markings were incorrect or not marked at all. This traffic somehow made it into a queue. When it was TCP, it should only be UDP and that could only happens through a router misconfiguration of some point. It also says that wherever this guys sets, he's going to be crossing a router that has a QoS setting that's just completely incorrect. This gives me the ability to go to the Ops guys or the support guys or the router guys and say, "Hey look. We had a problem yesterday and I know what it was."
It wasn't jitter. It really wasn't MOS score except for probably a small period of time when we have the spike and I can tell you that the spike occurred between this couple people on these protocols." I have evidence that, "Hey, by the way, we know that something was misconfigured yesterday when the Chicago people were calling and complaining." It's a great way to look back. As I mentioned I think before, the SevOne appliances hold all of the as polled data for an entire year. I can go back last week. I could back last month. I can go to the beginning of this time last year and show you that the performance of a particular KPI, maybe it's the response times in milliseconds for the call set up or whatever it is.
I can show that over a great period of time. These dashboards we have are configurable. I could say I want to change the whole dashboards sign. You saw me pick a week report. I could say I want to change the time span to just the past two hours. How they've been running in the last two hours? I can just as easily go up and say, "How are we running over the past four weeks?" I can get a four week history just as easily. Another thing I can do is actually do export to PDF. If I wanted to send this off to someone who didn't have access to the dashboard, I could create what we call it a pretty much a what you see is what you get report in PDF form for whoever might need to see it.
If they don't have access or I want to file this a way as a , "Huh. I knew those guys made a mistake at that point in time." I can do that. This is a way to look into that first piece that we talked about which was infrastructure health, the routers and switches and ports that make up the voice network. If we go back to our report, the second piece, we could take a look at is our health of our servers. We want to understand that those devices that are helping set up the calls are running appropriately. The actually the first thing you see in here is the trend report.
Now, this is my very underutilized voice deployment but what we see here are the number of calls completed in terms of a certain value. These are actually in a per second value but how fast in a call center these numbers will be way higher, but how fast the calls are being set up and then given for that's the past 30 days, where am I going out? Now, obviously when you're well underutilized to begin with the trend happens to go downward. Then we're looking at the storage use. Right. If my disc runs out on my Call Managers, obviously there's going to be bad things happen. Not collecting CDR.
I'm not doing whatever. It would probably be a bad thing. We could see here past 30 days and then we're projecting out next 30, 90 and 180 days. You can see that this particular storage partition runs about the same always. He's taking to see our records. He's filling them up and he's purging them in about the same rate. We can tell that we're not going to run out of any disc in the next 180 days which is a comforting factor. We can also see CPU load. We could see 80% today but we're dropping quickly. We're not going to have a CPU. We're going to be out of business in 180 days apparently but you can see that the CPU is well underutilized, probably don't need to worry about adding another subscriber any time soon.
Other ways to look at this? This is the CPU of my publisher, along side the CPU of my subscriber and I can look at this over time. Same thing for memory. Same thing for Ethernet traffic in and out. Again, you see the baseline here. The baseline looks like we're right on about normal. We can look at calls completed over a not many calls today. We can look at the number of phones registered to the subscriber and to the publisher. We can see that just a little while ago, couple phones dropped off. Heart beat. This is the beat between the two to keep them alive and if there's need to be fail over happening.
We can see here that it's 0.50 as a number. I don't know what that really means but what I do know is it's the same as its baseline. I understand on a rolling pin with basis what normal is and I have the ability to say, "Hey, you know what? This is perfectly normal. It's always run like this and trying like this now." I have here just same kind of event summary chart we had before and I'm looking at my different PBX in my environments. Couple of which have been turned down. The rest had turned up but we can see how things are running in the environment.
Lastly, I talked a little bit about the last layer which is the call volume. Who's called who? In terms of the top layer if you will. What was their call quality like? In this report we can see a particular phone or a destination address whether it's voice mail or these extensions, who called them if possible? Might be gateways. We don't know. The IP address, the duration and then we get into our latency, our R-Factor, our MOS score and any packet loss.
We can take these reports and show them by destination, by source, or just by PBX rolled up. We can set alerts for when MOS score is not a good value, probably below 3.5. We want alerting a lot of different ways we can look at the data that goes on in a Voice over IP network. Been a bit long winded today. We're actually at the 11:30 mark and I want to stop there but I will open it up for questions and I will be around to answer any questions that people might have.
I have a question for you. Do you guys do anything to collect MOS scores from anywhere else beside IP SLA?
Yes. We will generate this call volume report down at the bottom here. These MOS scores can come from, They call it in the world of Cisco, the Call Manager will give us the CMR the Call Manager record database for the calls. With Avaya and Nortel, we can we can be an RTCP collector and then out of the RTCP streams that we get, we can produce information like this and then lastly, also for SIP conversations. We can collect that data and produce the actual MOS scores. These are actually MOS scores reported by Cisco phones. We can see the Call Manager so that's how I know that's my Cisco. These were actually the phones reported to the Cisco Call Manager, their are MOS score, which we then extracted.
All right. Do you do anything with any of the other vendors like Polycom or ShoreTel or MyTel?
Sure, as long as they're are running SIP. If they're able to give us an RTCP or RTCP extreme, we can grab the same values basically.
Okay. Cool. Thanks.
Any other questions?
You were saying that this supports Avaya. Is there something that you could show because obviously you've shown up till now seems to be the Cisco Call Manager? You were saying it supports RTCP, do you have to have SIP running on Avaya in order to gather any of that data?
Not SIP but you do need to point the RTCP stream from the Avaya Call Manager to us. Actually, I don't have one in my lab but it's a fairly simple configs option. It maybe on already for other collections and you just add another IP address which would be the SevOne server. You'll specify a port number to send it to, so the IP import number and we'll listen on the same IP import number.
Hi Dave. I've got one more question. Do you guys check in specifically for video traffic or a video conference traffic?
Not beyond monitoring video queues. Typically, folks will have queues set up on the routers for video traffic if you have any number of different packetshapers, we can get statistics on video as it goes through the packetshape in queues. You know the port ranges you use for video. You certainly could produce NetFlow graphs that show the volume of the video traffic as well.
That answered my question. Thank you.
All right. Everyone thank you.