Managing VoIP


This SevOne demonstration, led by Dave Hegenbarth ,SE Director of Global Strategic Alliances, discusses the topic of VoIP and provides in-depth analysis VoIP management solutions. Learn more about the hardware that drives VoIP and see how SevOne's Open API can help monitor and manage VoIP performance.



Welcome, This is Demo with Dave. We're here to talk a little bit about VoIP. Before I get into VoIP and that it has a lot of facets to it, I'm going to talk a little bit about the SevOne Solution. SevOne is an appliance collecting data from all kinds of different sources with the goal of putting together a global dashboard to see, what we're going to going to talk about today voice performance. It is delivered as a solution on an appliance, and that appliance has the ability to give you very real-time fast reports at scale. If you have lots and lots of things today talking around voice, lots of voice queues, voice servers, call managers controlling the voice, all that stuff can be brought together and reported on in real-time.

A lot of what allows that real-time reporting is our ability to scale. Our ability to scale is really fostered by a collection of devices. In some organizations you'll have the single appliance. In larger organizations you'll have multiples. Regardless of how many of those appliances you have, you have a single front end webpage that you go to do reporting.

In this particular slide these show you the sources of information that SevOne is capable of bringing in. Obviously today we're going to talk about VoIP. A lot of parts and pieces in VoIP have to do with things like our ability to deploy and monitor QoS, our ability to understand the hardware that drives the VoIP, our phones or our call managers, etc. Obviously that runs over IP. It runs over a network and could be that our call managers are virtualized. There may be virtual servers in the mix as well. All these things come into SevOne.

Then SevOne also through a very open API structure has the ability to send them back out. We have the ability to take or get information from service and config management tools maybe to populate SevOne or to give back some performance information into the provisioning database.

We also have the ability to feed portals. You could build your own portals, your own web based portals and use the API to drive some of the graphs and charts in a portal. Then for fault and event management as performance violations happen we have the ability to send out notifications. We now have an app for the iPhone and for android that allows you to see the performance or fault notifications of your network on your handheld.

Then a review of our customers. SevOne's ability to measure and report on infrastructure performance go across tiers. We have MSPs and Telecom, but we have banking and finance and media and retail and others. We have a lot of different verticals that we service because all of them need this ability to be able to monitor the performance of their infrastructure.

With that I'll flip over to the SevOne product itself. I'm going to log out of here and log in as a voice user. If we talk a little bit about VoIP, it has a lot of different components. As I mentioned, you have call managers and servers that actually serve up the voice. You're rely on an IP based network to transport the voice. You have end handsets, etc.

This first dashboard you see we put together to understand pretty much the underlying infrastructure of our voice network. The top green bar is the measurement of all our voice queues. We can see that for today we have 0 of 34 voice queues have an issue. We like to see a green bar here. I've also put up in this right-hand box some active real-time alerts. We can see right here the New York voice gateway has had a problem where its in bits are greater than 100. We have this here so you can see what a real-time alert looks like. You can see it's occurred 13,000 times. It's just a static alert to show you what a real-time alert would look like.

In the next 3 boxes what we have is the ability to monitor our 3 sites. We have 3 sites. If you picture a triangle, we have from Newark to New York, from New York to Chicago, Chicago back to Newark. We have that triangle of sites. We're measuring a couple key voice performance statistics. In this first graph we're measuring the mean opinion score. The mean opinion score was something that Bell sat out a really long time ago to I understand the quality or the sound of voice. It was just that. It was an opinion at first. Now it's more mathematically calculated.

Actually we use here at SevOne an IP SLA test from Cisco that emulates a phone call across the network. What we're displaying is that phone call to all 3 sites. We have Newark to Chicago. We have Newark to New York, and we have New York to Chicago, all 3 legs. We can see here today it's 4.34. The scale is 0, which would be inaudible, to 5, which will be perfect. We can see here that we're doing pretty good at a 4.34.

The middle graph it shows jitter across those same 3 links or links to those 3 sites in the triangle. What we're looking for is very low jitter rates even more so than latency. Latency in a voice over network in itself is not really a big deal. If you have less than 250 milliseconds roundtrip, you usually have very good voice. What really hurts is jitter. Jitter is the variance in the delivery time of a particular packet. Some packets get there really quick followed by some packets that get there really long followed by some packets that are really quick, the voice becomes choppy. Playout buffers have ability to not handle that so well as they do when we have a straight through or a consistent delivery time in terms of milliseconds.

What we're looking at in this graph anyway, are the 3 sites in jitter. We can see our average jitter is around 1.25 milliseconds. As long as we're below 30 milliseconds or so, we're probably doing pretty good. Then packet loss for the 3 links and you can see there is no packet loss. We're probably doing pretty good.

The next 3 graphs are measuring the utilization of our QoS queues between these sites. What we can see today here is that while New York's had some phone calls, it looks very quiet in terms of voice between the other 2 sites, probably a holiday week, not a whole lot of people working in those sites. We can measure the queue utilizations 0 to 100% to understand how full are our voice queues. Are we in trouble of overriding them?

If I go down, this is a tabular view of a little bit of the different IP SLA test. These are again using a router to call a router. We're looking at the time it took the call to establish as if you, the end user, were picking up the handset on your phone. Again, we want to see about 250 milliseconds or less. We've ranked these from slowest to fastest. We'll see here it's slowest for about 142 milliseconds, well below 250 milliseconds. We can see our voice network is doing its job.

Then lastly we have a NetFlow graph looking at our QOS queues. What we're looking for is we know that our QOS queues should be all UDP traffic. We're looking to make sure there are no TCP flows flowing through our voice queue, and we're also looking for consistent DSCP or differentiated services code point markings. We know if we see inconsistent markings or we see maybe EF46 or 0 as a marking, we know that the traffic is not consistently being marked with QOS markings through the network. Here we're looking at a nice clean queue here. All our markings are the same. Our bandwidth utilization is what we would expect it to be.

One of the things about SevOne as I mentioned is our speed of reporting. All of these graphs were from today. We can change that. We can actually say you know what? I want to change the timespan on this report to the past 7 days. Now we're going to get a 7-day view of this to take a little broader look to see did we have any violations over the past week. We can see here that there was 1 queue that had a violation. If we look down in here, we'll see that, wow, we hit for a little bit of time 152% in our voice queue from Chicago.

SevOne has the ability to update these dashboards in real-time. We also give you the ability to work with the dashboards in real-time. I can actually click on this, and I can zoom in to what happened. I can see here that my queue utilization went up on Monday, the 31st, at 11:54 and stayed up for about an hour. We also automatically baseline all the metrics that we track. Looking at these invites in this voice queue you can see that we have a spike up here, but we also notice that our baseline shows us that about that time every week we do have an increase in traffic.

The next thing that happens is usually people want to know what was doing that, what caused that spike in traffic in my voice queue. SevOne has the ability to do what we call chain. We can take the output of one graph and make it the input of another graph. In this case what we want to do is take this router, this interface, this time period that had this traffic and send it over to our NetFlow engine. We can see in a new NetFlow graph we have some other traffic in there. Before what we saw for just one day not during this spike which was on Monday but for today we had all UDP traffic in there. We can see from that past Monday that we had a lot of different types of TCP traffic in there.

Obviously something on Monday was misconfigured that allowed TCP traffic to get into the voice queue. We can see inconsistent markings as well. We have this guy talking to that guy on this particular port, and his traffic is marked with a differentiated services code point of AF41 while other traffic driven to this guy there's no marking whatsoever. We're not marking our traffic consistently, and somehow this traffic has actually leaked its way into our voice queue network. That's the benefit of having the ability to take an SNMP or a volume of traffic and display the corresponding traffic in NetFlow.

One of the other things we can do with these dashboards is we can generate a PDF report. We can come up here and say export to PDF. Maybe I need to tell my boss or someone that doesn't feel like logging into this system that I want a PDF report. Basically you get a what you see is what you get view of the report. Then I can go through and show people yes, we did have a spike there, and the spike was definitely caused by this TCP traffic that was going through that particular queue on that particular day. A lot of different things you can do around manipulating voice and understanding the performance of voice on your network.

I have some other voice reports too. One of the other ones we were looking at was voice calls versus available bandwidth. What I have done is I've used the SevOne integration with the Cisco call manager to understand calls by site. How many calls am I making to Chicago, and how many calls am I making to New York? Right beside that I put the QoS queue volume for those sites as well.

The question a lot of times is hey, we're going to add more phones to a remote site. Do we have enough bandwidth? Do we have enough queue capacity to handle additional calls? This is a lab environment. You can see that there's only a couple of calls. We can see that at one point we drove our utilization to about 12%. If I was going to answer the question if someone said Dave, could we put 20 more people in Chicago, I would say sure because I have the ability to look and say, hey, over the past week there haven't been very many calls, and they've used just about this much queue space. In this case 12% one time on my New York. Look it's 2.4% of the queue space I have that I can use. I can pretty easily answer the question could I have more people at this remote site.

We use voice as one way of showing how you can measure an application. Voice is the application at this point. A lot of times the question is how many more users could I add to application xyz. The process would be very much the same. The source of the data might not be call manager. It might be a synthetic transaction from IP SLA, or it might be monitoring a server, or it might be even adding NetFlow on a particular port for that application.

The same dashboard concept would be there. I would be able to say yes, I see this much application traffic. It's consuming this percentage of the voice queue or the SAS queue or whatever queues you set up on your routers, and I can confidently say that you could add another 10 or 20 or 50 people based on the fact that you're only using x% of your voice queue. This is just another way we show some data put together.

Then the last piece is your ability to look at the health of the call managers themselves. I want to monitor the servers that run this application called voice. What we're looking at here is I have 2 servers what Cisco calls a publisher and a subscriber. There's a primary and a backup. I'm measuring the CPU in this graph on the two. I'm looking at CPU for my primary and CPU for my secondary. In the center here I'm watching memory utilization for my primary and my secondary.

Here I'm actually looking at the throughput, the Ethernet port on those servers. I've set together a group to alert on. If there are any performance alerts, I would see those in there. Then I'm looking at things like calls, calls handled by the call manager, number of phones registered. We can see I have 12 phones active on the primary and 2 phones active on the subscriber. Then I have this thing that they call a heartbeat. The key here is the heartbeat it happens every so often. What you'll see here is I'm baselining the heartbeats. I want to know when the communication between my primary and my backup is not normal, when it's not regular. That's how I measured first baseline.

We can see here over today I'm average 0.5 heartbeats per second, but my baseline is 0.5 and the same thing for my subscriber. I know they're right on normal. They're having no problem communicating with each other. I'm also looking at a number of PBXs in my environment, and I'm glad to see that there's really been no issues with those PBXs over the last 24 hours.

Lastly I have a list of phone calls that have been made just recently. I can see that this person at this extension with this IP address called someone. The call was 1 minute 36 seconds. It had an average of 1 millisecond worth of jitter, but the MOS score looks very low for some reason. I might want to investigate that call. Somebody else called from this extension. Their conversation was 1 minute 14 seconds. They had no jitter really to speak of. They had a nice high R factor and a calculated mean opinion score of 4.46. These guys probably had great calls. We might want to look in to why this guy had almost an inaudible call. We'll probably want to investigate that.

Again, showing you now from a server perspective how the servers are running and then from a call detail perspective SevOne addresses call detail not from a perspective of billing. I'm not really totaling up how many people called how many people, but I'm actually just taking a look at how many people had a call that was good or bad, really a tool to understand the quality or the performance of the VoIP network as it relates to an end-user. These are actual stats that came from the phones themselves. They give you a little better idea how that might've worked or what the quality was like.