Monitoring F5 Load Balancers


Join SevOne's Dave Hegenbarth,SE of Global Strategic Alliances, as he discusses the topic of F5 and demonstrates SevOne's capability to monitor load balancing infrastructures. SevOne technology helps visualize many key performance metrics and data; learn more about the process and how SevOne can benefit specific needs.


Today we're going to talk about monitoring your F5 LTM deployment using the SevOne application. Some of the key things we should look for in a load balancing environment, how we track those over time, and give you the ability to visualize and alert on any performance problems you might have within your load balancing infrastructure. As always I have about two slides before I begin my demo just to show you a little bit about who SevOne is and how we do great things in the world.

SevOne is the leading performance monitoring solution today in terms of next generation solutions. What's next generation about SevOne really? It really is this, our clustering technology. We've solved the ability to make very fast reporting, real time reporting, even when your environment's very, very large. We did that by this clustered architecture. The key thing to know about SevOne is that we ship as an appliance. Each of the appliances is poller, it is the database, it is the reporting engine that builds this very fast report, and it comes all wrapped up in one very nice appliance. That appliance can be physical or that appliance can be virtual. We take care of the hardware, the software, everything that goes on in there. We take care of it and let you manage your network.

What do we monitor? Today we're going to talk about F5 and load balancing, but we monitor a lot of things in the IT environment. We started out as SNMP and ICMP. We moved into Windows server monitoring with WMI. We have the ability to pick up virtual server performance with VMware, cloud based applications and application response time, Voice over IP, all kinds of great things. We pull all of those statistics including your load balancing into a dashboard so you can make quick decisions in real time about how your network is performing, how load balancing is performing, etc. We also have an open API that ships with our product that allows you to give that information back. You can export alerts or you can export data. You can import things from config management, etc., to update your appliance. Lastly, we have a mobile app you can download from either the Apple Store or Android and put on your phone to understand the performance of your network in real time on your mobile device.

I set up a little drawing here just to understand where we are. I have a very simple F5 load balancing solution set up here. As you can see, folks come into my demo server. They go through a firewall. They hit my virtual server at the address of 192168.50.18 and that load is split between two other SevOne servers, one's a primary, one's a backup. Either is capable of doing reporting. All of that happens across my management network. With that, we're going to scoot on over to the demo and I'll show you a little bit about how we're able to pull stats from our F5 appliance.

I built an F5 dashboard here. We can talk a little later how this dashboard came about. It's a very easy wizard click and drag kind of thing to build these dashboards. What I've done is I've taken that green bar at the very top of the health status. That health status you can see is made up of 39 different metrics that are monitoring about my F5 LTM box. My F5 happens to be an image running on a virtual server in VMware. You saw the drawing there, I'm load balancing a couple of servers. Currently all of the different metrics about the F5, CPU, memory, network statistics, and I'm going to show you some of those in graphs right below here, are all running within the thresholds that I've defined for the F5 box. If there were any performance violations, we would see them here in the real time alerts report.

What are some of the things that I'm measuring and alerting on? The first and foremost is I'm taking a look in this graph here about my connections to the virtual server. I've directed folks to my demo server through this 50.18 VIP or virtual IP address or virtual server depending how you visualize it in your mind. These are the connections or connections per second that are hitting that particular VIP. We can see here we're bouncing around 10 or so. What you might not be able to see unless you look very closely is this square bracket line back here. You'll see it coming across here as well. This is what we call our baseline. For every indicator that SevOne tracks, we track a baseline. Whether it's in-bytes, out-bytes, virtual connections, temperature, voltage, whatever in your IT environment, disk utilization, IO rates, whatever it is, we build a baseline or an understanding of normal such that you could alert if something is not normal. We'll see here that my connection rate, although it's maxed at 10 connections per second, we could see that that is just above normal today but not so significant that I probably would want to throw an alert.

It's important to know that the baselines in SevOne are turned on for you. Every time I add a device, I add some KPI or indicator into SevOne, we automatically start tracking that baseline or understanding of normal. This allows you to set thresholds or visualize things over a period of time and be assured that the baseline's there. A lot of products you have to turn this on. A lot of times when we turn it on is after something bad has happened. That's way too late. We want to have those baselines from the beginning so we understand normal behavior.

Another neat thing about our dashboards are they are all interactive. If I want to zoom in to a particular time frame, I can do that. If I want to zoom out to a particular time frame, I can go you know I'd really like to see how this performed over the past four weeks or a month. Just that quickly, I'm going to be able to draw the graph for a months worth of data. SevOne is a very fast reporting engine giving you either real time stats as current as the last poll that we had or giving you that month long view of data going back very quickly.

In the graph to the right here, what you'll see is a bar graph. I'm tracking 3 really important things for a web based application. If I put them in the order of the way they happen, I'm tracking my DNS time, so the time it takes for me to resolve Sev demo to it's IP address is very important. I'm also tracking the time it takes to make the initial TCP connection to the particular website. Lastly, the green bar is the amount of time it takes to pull down the front page of this particular website. What we can see here is that DNS time and initial connect time, very short and the actual time it takes to pull down the page measured in milliseconds is hanging around an average of 13.5 msec and we had a peak here at 7.8 which was right around today at 5:30 this morning. That peaked right up on to about almost 50 msec. Obviously my server is pretty close to my poller. These particular stats come from Cisco's IP SLA measurements that I've set up to go through the VIP to reach it. That's the bar graph on the right.

Scrolling down, I have some more statistics about how my F5 is performing. I've broken these out into in things and out things. On the left here, you can see in virtual server, these are in-bytes, in-packets, in-PVAs if I had them and out-PVA's. The same thing on the outside, out-bytes, out-this, out-that, just good statistics to know about how things are moving through the F5 LTM. I've again included the baselines to understand where I am in regards to normal. Some statistics just about the appliance itself. We're tracking CPU, so we can see our CPU utilization for both of the actual physical CPUs in the box. I could have also selected the virtual CPUs as well. Right now, just looking at physical stats, we can see physical CPUs here.

Memory, we can just visually see that memory's tracking down just a little bit. If we want to know that's normal, again we can come in and say how has this run over the past 48 hours or how does this run over even the past 7 days. We can see that memory tends to hit a certain point and then gets freed up and used for a while and then some sort of job runs, frees up our memory, and we begin to use it again. We can see that this is a pretty normal event. Lastly, disk utilization. We can see that I'm caching a lot. This is a lab so our disk utilization is pretty flat. For both of these disk partitions we can see the disk utilization. Again, you might want to do that over time and you can come in here and say well that's fine for today but what did it look like over the past 7 days and you would see it there.

Next what I've done is I've added a graph here that is generated by taking flow data. The flow data in this particular graph is coming from my router. The router interface closest to the VIP. As traffic is passing through, what I said was please only graph the VIP traffic or the traffic destined for this particular IP address. I'm filtering all the flows I'm getting from the router down to my VIP address. I can basically see that there's this one test client, again I'm in my lab here. My one guy here, Bob Parsons, he's hitting the website pretty hard to the tune of a little less than 2 megabytes per second. SevOne gives you the ability to map flow data or bring flow data into a report just like we do data collected vSNMP or WMI or VMware in order for you to better understand how your environment is performing. We can see here we have CPU running in this top left graph, it's running around 5%. We can come right down and say, you know what, Bob is generating a whole lot of most of the traffic that's going to this particular VIP. If there was a big spike in CPU, we probably could look here to see if we had somehow added 1000 more users or something like that and we'd be able to understand that.

We also have the ability to drill into this traffic. We know it's HTTP because it is, but we do have the ability to drill in and see each of the sessions that Bob may have been doing to this particular VIP. We can see here that yep, it is Bob. He's hitting port 80. He has some random different ones and each of his sessions is 1. some odd megabytes or a couple hundred packets in length. It does give us a good way to describe the traffic that might be going through any particular VIP we are using. Then I've actually embedded a map or that drawing that you saw before. I've laid on top of it the status of different devices. As I come in I can see that my firewall status is looking pretty good or my switch actually I guess there. I can see that my core router has a bit of an issue. I can look here and see that happens to be that vLAN 11 is greater than one standard deviation, probably not relevant to us because we're not running on vLAN 11, but that is the status of that particular switch.

I can see my VIP here. I can go in and even take a look at some of the stats for this particular object. If I go to an object quick summary real quick, I can see that this is my HTTP polar or the test I have set up for testing this particular VIP, because the VIP's not a real thing, it's this virtual thing. I can see it's been available all day. I can see the download of byte rate. I can see if there's cert errors. Any error statuses reported over the time period, number of retries. Looks like a little earlier we had some, actually that was probably yesterday. Time statistics. Again these are the time for look up, time for connect, time for DNS resolve, etc. I see server availability. That was today. We might want to see it over the past week and so we could redraw very quickly into the past week and see the retries, server availability. Looks like we had two little dips, very short dips, this week, etc.

I talked a little bit about how I built that dashboard. One of the ways to build a dashboard is to do what we call instant dashboard. We can do an instant dashboard simply by clicking on any one of these particular graphs. It opens this up right into a dashboard that I can save a little bit later. If I come back here, I can click on bytes downloaded. Maybe I want to do retries, server availability and time statistics. I've clicked on each of these, nothing you could see was happening, but over here those come into my dashboard. I can put them together however I wish. I can drag and drop in kind of a three console layout. From there I can come up and save this. I could say this is my F5 health report or something like that. I may as well describe it the same way. I can share it with other SevOne users. If I mark it private, I could then come in and pick who can see it or I can just not check private and then everybody using the SevOne application on this particular server will be able to see it.

Lastly, if I want it emailed to me, I can say you know what, I'd really like this to go to I can pick, probably every hour is a little excessive, maybe once a week I want to see this and I want it on every Monday really. I'm going to do the 16th and I want it at 8 a.m. because every Monday morning I want to come in and I want to see how the performance of my F5 server ran over the past week.

Having done that, most of the graphs in here, let's see they're all past week so we're all good there too. I was about to say you can actually change an entire dashboard. I could come in here and say I want to see the past 2 hours and I'll redraw everybody to the past two hours or I can do it. You can see that I can change my time in lots of different places. I have the ability to bring together lots of different pieces of information in a single dashboard. Then lastly I can roll this all up. If this was going to come to me in a report, what you would see is my PDF in my email and it would look a lot like this. I have my health on top. I have the graphs and charts that we've been looking at, it just happens to be in a PDF form. This is great for me getting a report in my inbox. It's also great if I need to share information with someone who may not have access to the application. I created that by coming up here and clicking on the create PDF button. It was export as PDF and away I go and I got my PDF, my what you see is what you get PDF report.

Lots of different ways to get information. Most of the information that I am pulling here from my F5 box is all SNMP based. One thing to mention about the flow data, right now this flow data is coming from my router. As of 11.04, version 11 for the F5 box, they have introduced the ability to generate NetFlow or actually S-flow which we can consume. I'm looking forward to upgrading this box to that version so I can do that. Then they'll be able to tell you both flow statistics, so who was talking for how much. You will also be able to get CPU memory and some other statistics in that flow based form. Then I won't need to rely on my external router that feeds this particular VIP, but I'll be able to get it straight off the LTM appliance itself.

A lot of different ways that we can look at data in the SevOne product. We saw a little bit about how to build that instant dashboard to very quickly be able to report on the health of our LTM environment. As I mentioned, SevOne is built to scale, so whether you have one VIP or 1000 VIPs, your report speed runtimes are going to be that very quick web-based runtime that you felt and saw in this particular demo.

That's really all I had for today. I'm going to open up for questions. I think we're going to do those via chat. Is that correct?

Speaker 2:
Yeah. No one has submitted anything yet.

I would urge folks to send a chat either to Host or to SevOne Marketing or Marketing SevOne in the chat box there and I'd be glad to answer any of those. Sometimes questions come in around supporting multiple VIPs, multiple LTMs, and as I mentioned we scale very nicely. Sometimes questions come in around monitoring response time. The new 11.3 introduced a response time console within the F5 LTM. I think it's AVR. The AVR response times we're working to get into our product right now. They are not available via SNMP. They are available via the F5 GUI. They are available via some TCL scripting that I'm doing. Right now, that's the way to do that.

F5 has told me that there will hopefully be a restful API that we could call to pull in some of those stats. The one stat that folks like to see is how much time did it take to get from here to there. It's fine to understand that there's 1000 connections. Those 1000 connections take 100 megabytes per second. That still really doesn't tell you about the experience. One way to view the experience is as I'm measuring here on the right. In my graph right here, these are my HTTP response times. I've gone out and I've gotten a web page and I can tell you what that looks like. The LTM will also be able to produce these stats and we will pull those into SevOne here in the near future.

Okay guys, we've got a couple questions coming in. I'm just going to take a look at them for a minute, I apologize for the silence. The first question we had was, are the F5 VIPRION 2400 models supported? That I'm not sure of just yet. I haven't had a chance to really converse with someone who had them, although I would think what we're going to see is the VIPRION 2400 will be a chassis that will be able to monitor and then the LTM on top of that. I could probably take that one offline and get back to you.

Another question we had, is there no baseline for CPU utilization? Actually there is. There is a baseline for everything we monitor. In that particular graph I had not turned it on. You can choose to visualize or not visualize baselines, but absolutely we do have the visualization of baselines. All baselines are always on. Your choice whether you see them in the graph, but we're always generating that understanding in normal.

Let me just take another look here. The question we had, in there to experience with SNMP reliability. We have seen in the newer current versions, the 11 series software, that it seems to be pretty good. Some of the older models I know SNMP always ranks low in the processing order of things. We want to pass packets and load balance things first. We've seen the version 11 stuff that we have a much better hit rate or reliability rate with polling with SNMP.

A couple other questions here. Can the data generated by SevOne be pushed to a third party dashboard such as CA Nimsoft? Yes. Data can be extracted. In the slide I had early in the presentation, you saw the inbound and outbound things we could do. You could actually take the API and use that to build graphs in another portal, might be your corporate portal, might be whatever. Anyway, yes you absolutely can take data out. You could do it in the form of a URL or you could actually extract the data points and graph them with another tool using the API.

Let's see if I have one or two more questions here before we go. The question is around the topology that I had, the map that I showed there. That's created by hand. That was actually a drawing done. I simply imported it as a PNG and then I placed my devices on top of that. The little green/red/yellow dots I have on top of the icon or the picture to draw that. Then you get the ability to wherever I place the device or an object, the health of that object comes through. If there's a violation in performance of some sort, you see that turn yellow.

A great question here. Capability on top talkers per VIP? Absolutely we could do that in a couple of different ways. The first is we have a TopN capability I did not really represent in this dashboard. We could just do our TopN VIPs. From that we could actually do, using Flow, we could see our top talkers per VIP. In fact, that's what I'm showing in my dashboard where I have flows. Unfortunately, Bob Parsons happens to be the only person using this particular VIP, so he is the top talker, but absolutely you could represent the top talkers per VIP using flow data. That could be from the, as I said, 11.4 will give us the flow data from the F5 or you could process flow data from your routers filtered onto VIP addresses.

The last question was around the SevOne version needed for full support of F5. We've been fully supporting F5, we SevOne probably since 3.26 which is about 5 to 5-1/2 years ago. There are different SNMP certifications for each, well a lot of the F5 versions. If you are running SevOne in a version that may be a little older, those SNMP definitions can be imported into any version.

Here at SevOne, we disassociate our SNMP definitions from our versions. You could have a very new version of SevOne and all the new definitions of SNMP, you could have a very old and have the old definitions, but you can always take our SNMP definitions and load them into any version. They're not tied together. We don't upgrade SNMP definitions necessarily when we do upgrades. If you're missing something or would like something or would like something new added to the version that you have, whether it's an old or new, we do that as part of a SNMP device certification. That is irrelevant to the version of SevOne you're running. If there's something in an F5 box that you don't see that you would like to see, please reach out to SevOne support. They can give you or create you, as part of your maintenance contract, a new definition.

I think that's about all the time we have today. We've run 2 minutes past our stop time. I really appreciate everyone taking the time out of their day to join us here at Demo with Dave. I am Dave. If you have any questions and you want to reach out to me, I have an email that is I'd be glad to answer any of your further questions and look forward to seeing you guys on another Demo with Dave. Thanks.