Data Center Virtualization Management
Join SevOne's Manager of Product Marketing, Scott Frymire, and EMA's Managing Research Director, Jim Frey, as they discuss best practices and strategies regarding virtualization. Learn how organizations can greatly benefit from SevOne's complete performance management solution for virtualized environments.
Good day to everybody and thank you for joining our webinar today on network performance and consolidating virtualized environments. I appreciate you taking time out of your busy day to join us. The goal of the session for today is talk about some of the best practices and best ways to manage your consolidated and converging virtualized environments, not only from a best practice and strategy perspective, but also from the monitoring tools that you use to monitor these environments.
My name is Scott Frymire. I am manager of product marketing with SevOne. My co-presenter today is Jim Frey, managing research director with EMA. Jim is going to come on in just a bit. I’ll introduce him in a few minutes. He's going to be talking a lot about those best practices and strategies and the key challenges around virtualization. Then I'll be coming in throughout the presentation to lend further support, talking about the monitoring tools, and specifically how a tool like SevOne provides a complete management, IT management solution over your hybrid physical and virtual environments.
The agenda for today, going to start off talking about network performance management in virtual environments, what's the big deal, and what are some of the key challenges that organizations like yourself are facing today in regards to managing the virtual infrastructure. There has been a lot of questions on how virtualization is impacting performance management practices. We're going to address some of the trends in this area.
Then next we’re going to touch on convergence and management. In other words, convergence that is happening around different management domains, functions, and applications, and what opportunities exist to better manage this consolidation of platforms. Then finally we'll talk about advanced analytics, specifically advanced performance management analytics, what do they mean, what are they, and why are they important. We're going to take a look at some practical examples of how this applies in the real world.
Of course we’ll wrap up with a Q&A at the end. If you have any questions along the way, feel free to chat them in using the WebEx Q&A panel. You should see that on the right hand side of your screen. Go ahead. We're going to have everybody muted throughout the call. We do have quite a large audience today and it could get a little congested if we open up the phone lines for everybody. You just want to use the Q&A panel on your right hand side. Submit any questions you may have along the way. We will try to address them. Again, we’ll have a Q&A at the end, and we hope to get to everybody's questions. If by chance we're not able to get to your question we will certainly follow up with you as soon as the webinar is over.
Before we get started I’d like to give you a quick 60 second pitch on SevOne, something I would typically stay for the end of the presentation. I think, as I mentioned earlier, we're going to relate back to SevOne throughout the presentation as a solution that can help you address the challenges and monitoring virtual environment, so it's probably good that you understand upfront what SevOne does in a nutshell, the quick elevator pitch so to speak.
SevOne does provide end-to-end IT performance management. We're talking about a solution that specializes in monitoring and reporting around your networks, applications, systems, in both the cloud and virtualized environments and hybrid environments. It’s certainly important to the topic today. I think the point is you never know what might go wrong within your IT infrastructure, so having a solution that allows you to monitor everything end-to-end without the need for additional resources is certainly a preferred path to take.
SevOne is an all-in-one appliance-based solution. That means that the monitoring, the data collection, the analyzing, the reporting around your infrastructure, are all done by a single machine. There's no additional pollers or no additional hardware to implement. Everything is out-of-the-box ready to go. If you're monitoring and collecting data on SNMP or NetFlow, NBAR, WMI, whatever it may be, it’s all included out-of-the-box.
How we typically differentiate ourselves within this market, because there are certainly other IT and network performance management solutions out there, SevOne we believe excels in three areas: speed, scale, and simplicity, what we called the 3 S's.
From a speed perspective if you talk to our customers I think a lot of them will tell you that we have the fastest IT reporting on the market, bar none, reports that are generated in seconds, regardless of how big your monitor domain is. We literally have customers that monitor million developments and calculate trillions of baseline analytics and the reports are generated in real time in seconds, not hours or even days.
Scale, another area where were we excel. A single SevOne appliance that I just showed you can monitor up to 200,000 objects in your infrastructure. If you extrapolate that out you could potentially monitor over four million objects from a single server rack. We are by far the most scalable solution on the market today.
Simplicity, again, it's an out-of-the-box inclusive all-in-one solution. There's no additional hardware, no additional software that needs to be installed. It's an agentless solution, so we’re not having to deploy additional SevOne agents. You don't have to worry about additional licensing or security updates or anything like that. It’s a very simple solution to deploy, it literally deploys in minutes onto your network and you can get useful information reports back right away. That's SevOne in a nutshell, just so you have that frame of reference as we move forward.
Now I'm going to turn things over to Jim Frey from EMA. Jim, thank you for joining us. I see you were able to successfully dig yourself out of the snowstorm that hit the northeast region of the country just this weekend. We're glad to have you on the call.
Thanks Scott. Great to be here, and thanks everyone for joining. Yes, it's just an average little winter snowstorm for us up here, but no problems. Fortunately electronic communications are always there at the ready. We have to keep them performing well, and that's part of what our topic is today. We want to talk about the specific adaptations and changes and challenges around adapting network performance management to deal with virtualization in our infrastructures.
I’m going to start by talking a little bit about the background that I've gathered in research. I've been with EMA as a research analyst for about four years now. I’ve been tracking virtualization in the compute tier and the effect that it's having on network management tools and technologies and practices all along there. This is built on the rest of balance of my 20 years of experience in the network management sector and watching how the practice has evolved in parallel with how the technologies have evolved. I think it's a very timely topic. Hopefully we can pass along some good wisdom to folks on the call today.
First off, let's talk a little bit about what some of those key challenges at virtualized infrastructure means and brings to network managers and network operations and planners in particular. Of course some of these will be obvious. You probably won't be surprised by some of these and some of them might be a little less obvious.
First off, of course there's a bunch of new elements that we have to deal with. There are virtual switches, there’s virtual network interface cards within virtualized environments, specialized compute environments that is, and these new elements create new paths, new typologies. There's a bunch of new components to get your hands around.
The challenge with these is that they are not all easily recognized or managed. We've lost traditional visibility, complete visibility into the network and the path that’s going from point A to point B. This is the common refrain that I'm hearing from folks who are trying to make the adaptation to deal with these environments. This creates some problems in how you actually conduct and execute your network operations and planning.
In parallel with that there's some loss of control, and largely because a lot of these virtualized environments are largely managed by the non-traditional networking group. They’re managed by folks that are more traditionally system administrators, so control of setting up the access to an ability to manage change in configuration on these virtual devices is not always clean and clear, and in many organizations still is not shared back with the networking team, so that creates another set of related challenges.
Of course as we all well know the big draw of virtualization is the dynamic aspect, the flexible aspect of it. Of course what that means is there’s a lot of change happening, there's a lot of new end points that are being added to the network on a much more frequent basis than used to be the case. This means new flows of traffic, new sources and new destinations. The rate of change with this has gone up dramatically over what used to be the case in non-virtualized data centers.
The last piece here that I’m going to point out there really is no such thing as a pure virtual environment. Now I’m good to say that and then I’m going to acknowledge that there is an almost pure virtual environment, and that is for organizations that are putting some of their workloads up in the cloud. That is absolutely leveraging the concepts of virtualization. But you as an enterprise, if you're an enterprise doing that, won't necessarily be able to see or touch anything but the virtual components.
However, you have to realize of course there is in all of these cases some physical infrastructure that is allowing these virtual environments to exist. You can hear this as a recurring theme as we go throughout here today, that you cannot forget that there's a physical infrastructure there and you can’t ignore the fact that is got to be up and working and functioning for all of the virtual infrastructure to function as expected.
These tasks have impacts across the entire life cycle, in all of the different aspects of managing the network from planning where there's these higher rates of growth and more dynamic growth so you have to allow more additional headroom, you have to be anticipating the rates of change going up and therefore give more space on network links for accommodating this growth rate. It also becomes really important to really pay attention not just to conceptual plans but also the actual trends of what's going on, the actual use rates.
Folks that I talk to and almost without exception tell me that that VM deployments and the rate of growth of VMs is always greater than they expected, and one of the impact is you have to take that into account in capacity planning. The best way to do that is to use live and real data out of the network to continuously feedback into your planning efforts.
Changing config still becomes pretty important. There are some very specific and discreet changing configuration practices required to make sure that these extended virtual networks are going to work properly. The fact that they're changing all the time also makes this more of a key challenge because we all know as we all know network changes are often at the root of problems later on, they have performance issues later on. Faster rates of changes unfortunately in many cases mean more and faster rates of potential problems being created. You have to pay more attention to the changes there.
Here is this first opportunity for automation. We're going to talk about this in a few ways today. In this case the opportunity for automation is to take change event triggers coming from the configuration management system for virtual network elements and use that to adjust and keep up with the changes that you need to make on the monitoring side.
Over on the monitoring side, in that part of the life cycle, you do definitely are going to have to deal with scale. Scale is going to be a factor because you now are going to have many, many more devices and elements and paths and components that you’ve got to worry about and relationships between them.
Here again is the other side of that potential for automation. The big challenge is keeping up with that rate of change and making sure you're monitoring everything you need to be monitoring. Using change event as a trigger to make sure that you're keeping pace with those changes and recognizing the new elements to be managed becomes more and more important.
Throughout this, throughout all parts of the life cycle, I mentioned this before and we’ll keep mentioning it, you have to keep and maintain that awareness of the fact that there are all of these new virtual elements, great, there are physical infrastructure components that have to be working in order for them to do their jobs. You have to keep an eye on that at all times.
Little bit of a look at some of the research we've been doing in this area and some of the things we've been hearing, and I think reflects a lot of folks’ experience around the changes that have been brought on by embracing, in this case we were talking about both the cloud and server virtualization, and certainly cloud as a higher level evolution of service-oriented approaches to leveraging server virtualization, and so they have similar impacts. Little different cloud is more service-oriented for instance, but both of them change a lot of the practices around how you’re going to keep up with network change and network monitoring.
The top one here and the big thing I want to point out here is, Well, two things. First of all, there are impacts being felt across all different types of typical network management and network optimization functions and activities, the most pronounced being performance monitoring. Behind this by the way one of the things we continually find is that despite the fact that server virtualization primarily affects, from a technology perspective, the server tier, the compute tier, it often, very much often creates additional workload on behalf of the networking folks, because when there are performance issues, a lot of those performance issues even though they may be rooted up in that virtualized server tier end up being more things that the network guys have to take on and troubleshoot, plan around.
I want to move at this point and start focusing on a subset of those practices around monitoring assurance. That's really our focus for today. We’ll tie back to the other practices occasionally, but especially when we talk about monitoring and assurance, and by the assurance I mean the activities that are focused on making sure that the infrastructure, the network infrastructure, that infrastructure which the network connects, is available and performing at top potential levels. Highly available, highly performing, that's what we mean by assurance.
Of course this performance and quality demand it's never been higher. I’ve yet to talk to anybody who's going through a transformation and virtualizing their infrastructure and come back and tell me, “You know what? Things are simpler now and then there's less pressure.” I have yet to hear that. It's always, always the opposite.
We are still focused as we always have been on job number one being making sure that we are keeping these infrastructures up and running, and that when there is a problem we're turning around, finding problems as quickly as possible, and restoring services and restoring healthy activity as quickly as possible. You really want to drive this MTTR as close to zero as we can. It really does require a much more integrated approach of monitoring.
So how do you get started? We're going to talk more details here in a minute, but essentially you're going to have to start from where you are today, look at the monitoring tools you have figuring out together what you can take and extent and what you might have to go find new in the process, to make sure you’ve got the technology on the tool side to give you the monitoring data you need to cover this virtualized environment.
Ultimately we really do think and I really think and I continue to advocate for when you are refreshing your tools approach around this kind of a change you have to embrace automation as much as you can, more than ever, before it becomes really critical to embrace automation when you're dealing with virtualized environments. Because of the number of components that you’re dealing with, the rate of change it becomes pretty quickly impractical to keep up with monitoring an assurance when you're dealing with more traditional time-honored manual approaches to keeping up with the environment. They just don't scale anymore.
A little bit of insight before we move to our first core topic of practices which is some insight we've been tracking over the years, the last couple of years on the impact and the status that folks feel that they’re in with respect to visibility. This is a big key piece for monitoring and assurance. Do you have a view into this virtualized infrastructure to do what you need, to see what you need to see in order to understand current health and to troubleshoot when there are problems?
As you can see that over a period of a year, February 2011 to February 2012, we haven’t refreshed this yet for 2013 and that’s hopefully going to come up soon here, but you could see there was … The bulk of people feel like they have some visibility, they could always use more. The good news I suppose if there is some in this particular view of this data and responses and the trends that we see is that a relatively small number of folks really feel like it's still a major gap problem, but certainly a significant majority feel like they still don’t have the visibility they need. This will continue to improve over time, but it's definitely going to continue to take time for solving this problem.
Good, with that actually what I want to do is move to the first set of more specifics here. The first specifics we want to talk about is what can you do. I mentioned the key starting point is to take your existing tools, your existing practices and extend and adapt them, figure out the changes you're going to have to make specifically to accommodate this change in the infrastructure. So what are you going to do first?
Here’s just a quick list of the kind of questions that I hear on a regular basis from folks when we’re having this conversation, can we really use our current practices? What should be our priorities? Where do we start first? Who in the organization should have responsibility and access? The organizational side of these things is oftentimes the bigger challenge than the actual technology side of these things.
Essentially I think probably the most important thing to understand here and perhaps the most heartwarming part of this is the research that we do and the conversations I have indicate that traditional practices for network management still absolutely apply, you still want to be able to do the same kinds of things in this environment that you were doing in the pre-virtualized environment.
Remember, it's not just about the virtualized environment here. You also still have to pay attention to the traditional pre-virtualized technologies because they are an essential part of the whole here. You can see here some research we did just to check in on this and see what priority folks were placing on fairly traditional practices updated here to include what’s different in the virtualized environment.
Clearly 70-80% are saying, “Yeah, we absolutely have to do this,” or, “This is really important still,” traditional kind of things, doing packet-based analysis when you need to for troubleshooting, collecting NetFlow so you can see the ebb and flow of activity. Looking specifically at the virtual switches themselves to understand if they're healthy and performing as expected, and then understanding and mapping the topology as it changes and it relates to both the physical network but also then the virtual network that's being set up, the virtual networks that are being set up as part of these environments.
A little bit more about this, I want to put in context to some degree what we're needing to keep in mind when we think now about the adjustments to make in the tools. The first point here to be made is that these virtual devices and the VMs are software components or software constructs. This creates truly virtual connectivity at times. There are times when there are network flows that are really important to understand from a network operations perspective that never touch the physical network. This pushes you back to realizing that you need some way to see inside hypervisors to understand VM-to-VM communications.
Now most VM-to-VM communications will at some point to reverse a distributed switch or to reverse an external physical switch and you'll be able to at least see what's going on there. But there are cases where that traffic is not visible, so you have to be aware of that as you think about how to adjust and plan for better visibility.
The other big one of course is, and this is the essence of the dynamic nature of these environments, what I like to call vMotion factor. It’s vMotion. It’s also high availability and load balancing technologies that exist in these environments, where essentially the virtual machines, the VMs are moving.
Moving VMs is great from a load balancing perspective, but one of the big challenges really is when you are trying to understand health, track activity, especially when you're trying to troubleshoot a problem knowing where the VM is now is important, but knowing where the VM was when the problem occurred, if you're trying to do this in a retrospective fashion, becomes really essential.
If the VM isn't now where it was when the problem occurred, well you actually aren’t may be looking at the right things to figure out where the source of the issue was. Some historical perspective becomes really important here in terms of recognizing the changes that have been happening from not only a configuration perspective, but also where the VMs are living now versus prior.
The takeaway here is one of the things that doesn't change during all of this is the physical side of that, the infrastructure. That is still there and still present. Making sure you're watching closely what's happening from the physical infrastructure viewpoint can oftentimes give you really important clues into things that are happening in the virtual layer, but that might be changing, it might have sometimes self-resolved because of load balancing. Sometimes became worse because of a vMotion or a load balancing activity. Again, constant awareness of the fact that there is this relationship between the physical and virtual world that you have to keep on top of.
Lastly here on this topic I do want to say a word on the sources of data. Performance management is all about finding and collecting sources of data that you can accumulate and then track over time. The challenge here is not all of the virtual infrastructure, virtual network infrastructure provides the same sort of interfaces to gather data. In the new version of vSphere with the virtual distributed switch and then with products soft which is like the Nexus 1000V, they do generate NetFlow records, so that's great. If you have a NetFlow collection platform you can harvest records from those virtual switches and have very good visibility into what's going on.
If you're not using those kind of technologies, you have the more non-distributed traditional vSwitch, well the data for that, there is unsupported NetFlow in some cases that some folks have tried using, but my insights are that some are okay with that and others have said, “No, that's really not very good, it's not accurate.” Basically the data that you're going to gather from those non-distributed switches essentially has to be gathered through an element management system like a vCenter. We're going to talk mostly about VM-ware today because that’s what most folks are familiar with and have in place. This of course could apply to other hypervisor platforms. But the vCenter is there and is being used in many cases for configuring and changing and setting up these vSwitches.
Unfortunately it's really not sufficient as an enterprise network performance platform for a lot of I think hopefully fairly obvious reasons, things like vCenter, each instance a vCenter can manage a pool of hypervisors, and any configurations going on with respect to those hypervisors. It can give you some insight into the health and activity of the various different components of those virtual servers, including the switches, but it's not designed to do anything enterprise wide.
If you need a viewpoint you need another layer of management tools that's going to harvest data from them. You really don't have the vCenter any ability to do sustain monitoring, let alone any really significant troubleshooting capabilities that are focused on the network aspects of the infrastructure. That's important to know, vCenter becomes important, it's essential in some cases here, but it's not going to be adequate for a performance management perspective.
My recommendation out of this is from a tool adaptation perspective and from a data source perspective use the approaches that you know how to use, collect NetFlow from wherever it’s available, and it’s available on more and more places, NetFlow and/or its variants, I mean there are other flow record types of course, collect SNMP stats wherever you can too so you can understand the individual component health, use the vCenter APIs though, use them where you have to to collect stats. It's not the most efficient approach but if that’s the only way you can get them it's a reliable source. Then of course use that vCenter API to gather the change events and use that as a technique to trigger adaptations in your monitoring practices and policies.
With that actually I want to head back to Scott. I’ve been talking for a while. Scott, why don’t you tell us a little bit about what SevOne's doing with respect to making this adjustment to the virtual environments.
Sure, thanks Jim. If you want to pass me control back there I think one of the key points that Jim just made is that partial views of your IT environment are insufficient. Monitoring tools such as vCenter alone are insufficient, because it’s insufficient to have only a partial view of your IT environment. Now in this case we're talking about having visibility of your virtual environment, without having visibility of how that virtual environment relates to and impacts the network performance and the physical world around it.
If I could illustrate this point just a bit. Can anybody tell me what the name of this painting is by chance? I know you guys are all muted so I’m not going to expect anybody to shout it out, but do you know the name of this painting? Okay, well you should because it's probably the most famous painting in the world, the Mona Lisa. But without that additional information, without a complete picture of what's going on you might miss what otherwise would be considered really quite obvious. That's what we're talking about here.
One of the issues with a tool like vCenter that’s only looking at a portion of the information you need, so here's a screen capture. If you use vCenter you might be familiar with this. Even within the virtualized environment tools like vCenter may not provide enough context. There are many metrics beyond CPU, memory, disk I/O that provide additional insight into the health of environment.
For example, vCenter is going to tell you when your CPU is maxed out 100% but it won't tell you what process is driving that spike, and vCenter won't tell you if your web page is loading fast or the time it takes to do DNS resolution. So in many cases you may be asking yourself, “Well, okay there might be a potential issue here, but if the actual response time is not impacting my end user, then do I really care?” Please understand. I'm not knocking vCenter by any means. It's a great tool. I'm just simply saying that a complete performance management solution is going to tell you more about your virtual environment that can help you detect and avoid performance issues.
What is SevOne bringing to the table when it comes to monitoring the VMs? Well with SevOne you're going to get the same information that we just saw on vCenter, but you're also going to get a lot more. I know screen captures like this are very hard to read when you’re trying to shrink everything down into a PowerPoint slide, so I'm going to try to highlight a few areas and just illustrate and talk through the points that they're making here.
At the top of this dashboard we have information on your host, CPU percentage, memory allocated, throughput. Then below that in the same dashboard here we have all the guests associated with that host, and then not only what their CPU and memory is, but we can look at here's the guest memory of the guest CPU. Below that it’s going to tell you what percent that is consuming on the host as well. Your CPU percent may be around 7% for that guest, but it’s 1.8% or so of the host consumption there.
Now if there were a spike in usage your ability then, and this is where we get more of the complete picture, to click on that chart and drill down into more information. What are we looking at here? Well now we're bringing NetFlow into the picture. On the top right you can view NetFlow data to show who the top talkers were at the time, who are the bandwidth hogs. Then in the bottom left we're using SNMP process monitoring to see what processes were running when the spike occurred. Again, a much more complete picture of what's going on. To the right of that we're looking at CPU process over time.
Again, it's all about having a holistic approach to understanding your environment from a single pane of glass. It's about understanding who's using your virtualized resources and what caused that anomaly, information that you might not be able to get from something like vCenter. Of course it might not be an anomaly at all. If you look at a comparison to your baseline activity which SevOne presents you may realize that this is just a normal predictable behavior for this time of day and this time of week. But if it's not normal you're going to get an alert that there may be an issue here.
Now one more thing. Jim spoke about the vMotion factor, and we actually had somebody chat in a question about vMotion so hopefully I will address that here. Of course the beauty of vMotion is that you can more easily move VM based on the resources that they're consuming. But one thing you need to be sure is that you maintain consistency of historical data as guests move from one host to another.
Now SevOne does maintain that link when you’re reshuffling your virtual deck. But if you're evaluating performance monitoring solutions right now make sure you ask about this upfront, because the last thing you want is to discover after the fact that you’ve lost your entire historical data when you start rotating VMs around to different hosts.
Jim, I'm going to turn things back over to you. I believe you wanted to talk about convergence and management systems and data for a little bit.
Yes, thanks Scott. In fact something you said a few minutes ago about bringing all that data together really does speak to a broader trend that exists and is worth talking about, specifically with respect to virtual management and performance management in virtual environments. We call that convergence in management in EMA. What we mean by this is bringing together management tools and/or data and/or practices, any of these things across technology domains. This is server network storage application around management functions and organizing them around a service-oriented or an application-oriented viewpoint.
Convergence gets used a lot as a term. When we use it with respect to management this is what we're talking about. The reason for this and clearly the compelling reason to really focus on converging your management tools and data in these virtualized environments is the fact that there is now this really direct relationship, inescapable relationship between server health and a part of the network path health. When you've got servers, host servers that are actually hosting and executing soft switches, what's going on in that host can and will have a direct impact on the efficiency and health of the network components that are functioning on that server. Now there is this tighter alignment than there ever has been between these two worlds
Ultimately that is one aspect of what's driving a need for convergence and management. The other is with virtualization becoming more and more predominant, I mean most shops have some amount, maybe a majority now in some and almost completely virtualized computer environments. The next natural evolutionary step for all of this activity is to essentially be a rallying point for cloud transformation and to focus really on the end goals of delivering high-performing applications and/or services, so all of the management viewpoints.
We see this from all of the different management domains and practice areas that EMA tracks. There is this steady but slow and continuous move for all of these approaches, practices, and tools, to continue aligning around the applications and services. In the enterprise environment it's because that is where value is delivered by IT to the business. Business consumes applications and services. They don't consume servers and networks.
When you're in a service provider setting it's pretty clear. Money is made off of delivery of services and high quality delivery of services. This requires us more integrated approach of understanding not only all of the domains and the data that comes from those domains to indicate health, performance, but then also synthesizing and aligning those based on the applications and services being delivered. Not only looking at bringing together data from a monitoring perspective, but across all aspects of the life cycle, so that this orientation awareness is brought into the planning process, it's brought into the deployment and then trialing and testing process, it is used during the troubleshooting process as well. That’s what we mean by convergence and management.
A little data to back this up and this demand that we've seen. This came from a study we did and published last year where we were asking about network management demands and requirements. One of the things we asked was, “If you have preference for what you’re going to in your management products, do you want to find things that are integrated or not?” The answer is pretty resoundingly yes if you look at this, especially if you look at the 10,000 and more employees. The bigger organizations really are seeing driving demand. Here we've got two-thirds of folks saying they really would prefer tightly integrated or fully integrated approaches to network management.
Now I emphasize prefer because in reality my experience is that many shops while they would like to go to this direction are not there for a variety of reasons. Number one, there's really no case where one management tool will do everything for you, so there's always going to be some mix. But certainly the drive towards moving in this direction is pretty strong. Behind these numbers too by the way, those shops that were using cloud services and/or are going through an internal cloud transformation, changing their IT to a service-oriented bureau sort of approach, there's even more dramatic shift where you're seeing on the order of 70% of that subgroup that are looking heavily towards or preferring full or tightly integrated approaches.
This is part of what’s driving convergence, is these higher level macro effects, but then and lower down the recognition that there are really more and more interdependencies in play is driving the need for convergence and the management here. What do we need to think about specifically when it comes down to architecture for converging performance management? Let’s just look at that right now.
The key here is really in three areas. The first one is making sure that the platform can gather data of a variety of different types. This is, if you're looking at network management tools or tools that are going to do infrastructure facing performance management you'll need to be able to collect data on the flows going across the networking, you need the ability to collect statistics from the devices and elements. Then in general you also can benefit from bringing in any number of related time based metrics that you can add into the picture and overlay with the other monitoring data you have, so that you can recognize these bigger broader systemic issues.
From a technology perspective you have to be able to collect, you have to be able to poll and have to be able to go and connect directly to APIs. Flexibility from a data type collection and a data technique collection perspective is very, very important.
From a coverage and support standpoint of course mix networking infrastructure is always something that's a requirement, especially for the bigger shops. We are starting to approach a time here where we are really starting to see truly mixed hypervisor environments. There's still an awful lot of VMware ESX out there, but there is a growing number of hyper-v deployments, there's a lot of zen, especially in the service provider sectors. Today, get started with that, which is most likely in your environment. That's probably ESX VMware, but recognize that the direction that you go needs to accommodate the fact that you're going to have more than one hypervisor type out there in the future.
Then the last big set of requirements here is the ability to relate this data. Data can be brought in and integrated from multiple sources. That's one set of challenges. The other set of challenges, the other set of opportunities is to find tools that will do that integration for you in an organic manner by the way the tools are architected and the way the tool can actually incorporate data.
Ultimately what you're trying to do by bringing multiple types of data together is to align them and understand the causal relationships between them, the cause and effect that you can experience or you can see or visualize by looking at aligned metrics. Those alignments are typically going to be done on a time basis and/or on a relationship basis. Certainly some metrics are related to others while others may not be. But there's plenty of relationships in these environments that you need to pay attention to in correlating data.
The ultimate goal of course with this is faster understanding of diagnostics and in being able to communicate more clearly both within the team, outside of the team to the served business community, line of business, even out to customers and subscribers and the service provider case.
Scott, why don’t I head it back over to you so you can talk a little about SevOne’s take on management convergence and what sort of things you’re doing in this area?
Yeah, great, thanks Jim. Talking about convergence I’d say the need, it’s a desire for a single IT management platform that allows you to gather multiple sources of data from multiple vendors, multiple infrastructures, and then turn that information to insight about your entire IT environment. So how does SevOne handle this? Well let's take a look.
With SevOne you can pull any time based data from a third-party application and bring it into the system for reporting and analysis. In this somewhat convoluted chart right here you're looking at the SevOne appliance in the center of the screen, the orange appliance there, across the bottom everything that we monitor from virtual servers, networks, physical servers, applications, VoIP, cloud apps. But to bring in any third party time based data we have an open API. So if you have other applications that you're using for service management, portals management, to bring that in. SevOne brings it all into the system baselines it, and treats it then at that point like SevOne data, it allows you to overlay this information on top of visualizations of your IT infrastructure health.
For example, we have customers right now that use SevOne to monitor things like outside temperature, humidity alongside their server room data center conditions, so just temperature information that they're bringing in to SevOne as part of their infrastructure management. Is an incredibly flexible IT reporting platform and I think it meets the needs of a lot of organizations who are going along this path of convergence and consolidation.
Let's just take a second. We’ll look at a couple of practical examples of converging sources of data here. Now I'm not sure if any of you on the webinar today are in the financial sector. If you are you might be familiar with securities trading. You're probably aware that the majority of trades that are executed nowadays are executed automatically by computers, not humans. In fact, I think something like 84% of all stock trades today are done by high-frequency computers, only 16% or so are done by humans. The image here is a depiction of what the computer sees as it engages in this high frequency trading.
Now these computer-driven trades are literally happening in milliseconds. There's a lot of emphasis placed on speed. For our customers who deal with high frequency trading there's a lot of value in being able to import the transactional data that's recorded by a third party system. They might be using something like nPulse or Endace or Arista and overlay that transactional information about high-frequency trading on top of charts of their network latency graphs in order to see the correlation that happens between the two. Again, this convergence of data, looking at transactional high frequency trade information, laying and overlaying that on top of your network latency to see the correlation.
Another example. An airport. Just like retail stores, airport terminals typically have metrics on foot traffic or what's known as foot fall. That's how many people are walking through the terminal at any given time of the day or week. Now what would this have to do with monitoring your IT infrastructure? Well there's probably a very strong correlation between footfall and wireless access requirements within the terminal. If you could overlay these two metrics in a single dashboard or report you’re going to have a more complete picture of how one impacts the other and you can plan accordingly for those, for the capacity demands.
I think being able to bring together two or more disparate sources of information in order to provide that enhanced insight it helps you avoid what we call swivel chair monitoring where you're constantly looking at two or more different screens, trying to put the pieces together yourself from different applications. We'll bring that all into one platform here, a platform like SevOne that again supports that goal of convergence within your organization. Again, it's not just being able to look at converge physical and virtual environments. It's also being able to bring together any time based data or metrics from other applications or devices onto a single platform.
Now the real benefit here is that having that single repository for IT management data allows for advanced analytics around the health of your network and IT environment. Jim, I think that's going to be a perfect segue into you because I think you wanted to talk a little bit about advanced analytics.
Yeah, thanks Scott. Perfect set up. Well done. Yeah, that's our last topical area here. One I think it makes a lot of sense to address specifically for the virtualized environments and the adaptations we need to make there. What we mean by advanced analytics, analytics gets thrown around quite a bit as a term. What we're specifically looking at here and what EMA tracks as analytics is basically I think Wikipedia captures it really well. Long ago I’ve adopted the Wikipedia definition because it's a good standard to use to see whether we're really talking about analytics or it’s just a marketing term that's being kicked around.
Analytics as I like to find it is this way, it's really looking at data, large volumes of data, but finding meaningful patterns and then finding a way not only to find the meaningful patterns, but then to turn it into something that’s actionable. Discovery is a part of it, but communication is important as well, very important as well. When it comes to performance management technologies what we usually are looking at here in terms of analytics or advanced analytics is ways to interpret the significant volumes of data that can be generated and collected through a performance management platform like SevOne for instance, really huge volumes of data.
In fact, we have I think in this sector a big data challenge long before big data was a fashionable term being tossed around, meaning that there is a lot of data here to go through. The challenge is, number one, to able to collect it all, but then number two, be able to interpret it in a timely manner to give some actionable operational insights.
Things like trends, things like identifying and seeing how usage is shifting, activity is shifting, understanding baselines, what's normal versus what's not normal, and figuring out how to correlate data, these are all the types of things that we look for in terms of analytics, capabilities, and performance management. This is clearly a way where automation within the management tier can add value.
Now what does this look like in terms of specific features? You'll see things like dynamic thresholds. Dynamic thresholds are self-adjusting thresholds that adapt to changes so that you're not constantly banging up against statically defined thresholds or having to redefine your threshold because your environment has changed or evolved.
Concepts of baseline, these are becoming very common and actually are very powerful, very helpful. What we mean by this is recognizing that there are normal variations in key metrics that you’re going to be looking at, and essentially all metrics you’re going to be looking with respect to network performance based on the time of day or the day of the week, even sometimes the day of the month. Now there are even some systems out there that will do month of the year and keep a 13-month model running so they can understand is this January 3rd different from last January 3rd.
The essential thing, the value this adds is just recognizing that there is no one single flat normal or abnormal. What you have is a relative normal and relative abnormal based on what is commonly happening most typically this time of the day or this day of the week. Monday is a very different day from Saturday. Wednesday is a very different day from Sunday. Middle of the night you're going to see a lot of backup traffic maybe. Middle of the day at lunch hour you'll probably see a little bit extra web surfing traffic, like today it might be looking at commercial sites for Valentine's Day gifts. I know that what I’m going to be doing as soon as get out of this. Recognizing that there are normal variations in traffic is very, very important and trying to figure out what's important for me to pay attention to and what can I afford to say, “No worries, that's just a normal variation.”
Lastly, and this is the real payoff, is converting these changes in behavior into notifications back to the operations team so that they can take some sort of preventative or proactive action, or at least be aware that the situation has arisen, is recognized through analytics that requires at least analysis if not directly addressing via action.
I just want to put this in some visual context about the relative difference this provides. What you'll see here is this is the life cycle of an incident in a typical sequence of events that most of you should be familiar with. It starts out with something changed, it was a configuration change, something stopped working well, a new application got added, whatever it was it started to create an adverse impact on the business, and that impact grows over time.
It happens before it’s noticed. When helpdesk calls start coming in usually there's already been some amount of impact. A typical sequence of events is the issue gets escalated, it usually gets dumped on the network guys to go figure it out. If they can't figure it out they have hand it off to other people. If they can't figure it out, it's not obvious, even to put the Tiger Team, time keeps going on, eventually you get to the end where you figure out the cause and you apply effects and you get back to normal. In the meantime that whole area under this triangle is the impact that's accumulating on the business.
Now a better situation is to say if we've got inability to just even correlate data across the different domains it puts you in a position where you are already doing an integrated analysis. When this problem comes to you, you’re able to see the relationships between the server tier and the network as well as the physical infrastructure and the virtual infrastructure. Bringing it out all together in one place means you're not usually kicking it around from group to group, you've got all the data in one place and you're able to understand things more quickly and turn them around more quickly. That makes you more efficiently reactive is the way I like to put that.
The ideal of course is some of the early indicators when you recognize problems that are pending, when you recognize a shift in a baseline for instance that says, “Look, something fundamental has shifted here, I should go take a look at this,” those early warnings give you a chance to really be truly proactive and preventative and intervene and deal with problems maybe not before they’re noticed but be on the task by the time the helpdesk calls start coming in. The whole goal here is to shrink this impact triangle to the left, and that’s back to job one we were talking about earlier which is shortened MTTR, keep the infrastructure up and running as expected.
Scott, why don’t I hand this to you and you can talk a little about what SevOne is doing around advanced analysts.
Thanks Jim, and I really think this chart you have here is pretty critical because again I know from SevOne's perspective a lot of our focus on the development side and enhancing our tools and our solution is on being able to warn people ahead of time, even if something starts to occur, being able to find out about it before the end user is impacted. The last thing we want is somebody to be coming into our office saying, “Hey, why is my service not as expected,” or, “This is running slow,” and point fingers at IT, the network guy.
A key element of SevOne is that it baselines everything under the Sun. Every metric that we collect we baseline, whether that's a data on SNMP, or NetFlow, or NBAR like I mentioned before, any third party that’s brought in, it's all baselined. From these baselines which are recorded on a 10-week rolling basis at every 15-minute intervals, so you can go back and compare to this exact time over the past 10 weeks and see what my performance was, off those baselines we can then establish thresholds, whether they're fixed thresholds or maybe the based on deviation from baseline.
That's important because I know there are tools out there that you might be using or you might have looked at where you can set those hard thresholds yourself but they’re fixed. With SevOne you can look at a certain percent deviation from the baseline performance, so what's normal, and then are there any anomalies or problems that are trending away from what would be normal for this given point in time.
Here you can see just a little bit of baseline chart where you actually have the current performance. In this particular case it’s different voice gateways, but you're looking at current performance versus the dashed lines which are your baseline performance. If something spikes above baseline you can get an alert there.
Now we're talking about advanced analytics and just anomaly detection based off of baseline certainly I wouldn’t call it advanced, but again, it does alert you if something in your virtual environment is trending the wrong way. Now the problem with basic anomaly detection is that occasionally you're going to get some false positive alerts. But what if we could get alerts that were based on a variety of conditions within our infrastructure, conditions that when they all occur simultaneously that’s going to indicate a much greater problem and something you're going to want to avoid so you don't feel the financial ramifications of the problem?
To illustrate this concept let’s turn our attention to the field of meteorology just for a second. I think when you look at meteorology it’s certainly an area that’s benefited significantly over the years with use of better data models to improve forecasting. I think when I was much younger I remember, I live on the east coast, I’m just a little bit north of Philadelphia, but anytime a hurricane was coming out of the Atlantic and then headed toward the east coast they’d give you that, What did they call it? The cone of uncertainty for the hurricane path. I think when was a kid that cone of uncertainty could’ve stretched anywhere from Key West up to Cape May, New Jersey. Nowadays they're much more accurate because of the data analysis that they do and the models that they have in projecting where that storm is going to hit.
I think the point I'm going to make here is we’re looking for comparators, we’re looking for multiple factors that when they come together at any point could indicate a really big problem. If you have a cold air mass that’s pushing down from the north here and then you had a warm jet stream that’s shooting up from the west headed up to the northeast and at the same time you have a tropical moisture that's been absorbed by a coastal low coming off the Atlantic here, any one of these conditions on their own, may be not such a big deal. Okay, we’ve got some tropical moisture, it's going to rain, I’ll put on the raincoat and an umbrella. Colder air, I better bundle up and get the firewood. Alone it doesn't mean much.
But now you put these three conditions together at the same time and that's when you have what they refer to as the perfect storm. Some of us have lived through that. The perfect storm is certainly something you do not want in your IT infrastructure. Again, we're talking about comparators. With SevOne creating alert policy that's based on conditions across multiple devices.
You may say, “I don't want an alert when one piece of infrastructure goes awry. But when all three do in unison or they’re all trending the same way at the same time, that's something I want to be alerted to.” For example, maybe if my web server CPU and my database CPU and the middleware CPU, the average CPU is all trending above a certain percent over a 30-minute time frame. That's something I need to be alerted to.
Again, this limits the false positives that you get. The last thing you want is more alerts in your inbox that aren't indicative of the true issue with your IT environment. Being able to use these comparators I think is much more effective in predicting a potential major issue. As Jim said, let's get ahead of it and address and resolve it before the end user feels the impact.
Talking about getting ahead of things there's also predictive analytics. There’s proactive which is what we just talked about, using comparators and baselines, but there’s predictive as well, especially in the VM world we're looking at capacity issues. Essentially we're talking about capacity planning, I guess you could say. It takes proactive alerts a step further and it’s we're looking at the top utilization, capacity utilization of some different guests here and different hosts.
You can see that this is actually projecting out over a three-month period. You can go six months out, but it's going to do a linear projection out to be able to show you, okay, right now you don't have a problem. Three, six months down the road, this is where your utilization going to be. Start planning ahead of time so that you don't hit that max out point before there's a big issue.
By the way, the other thing I think I neglected to mention earlier, SevOne is also going to tell you about the flip side of this equation, and that is which physical resources are underutilized and may be prime candidates for virtualization. We're always looking for ways to reap the benefits of virtualization. But if you're not getting that information from a tool that only looks at your virtual environment as we said earlier, not the physical, you want that holistic view of both so that you can appropriately plan for capacity issues.
Before I toss the show back to Jim for a wrap up if I could offer one other bit of advice to keep in mind, we're talking about advanced analytics. Be sure to look at the data aggregation and retention policies when you're evaluating different monitoring tools that you’re going to use to monitor your IT environment.
There's a lot of popular monitoring solutions that aggregate historical data over time. For example, these solutions might roll up the past 30 days of polled data into daily averages, or the past six months into daily averages into weekly averages. You can’t engage in advanced network analytics with limited data points. You have to have granular historical data. Basing your analytics off averages of averages is not really a useful endeavor.
Of course, you know what I'm going to say here of course. SevOne does maintain a year of as-polled data for historical reporting and analysis. The past year of as-polled data without that roll up gives you a much more granular view and doesn't mask. Once you roll things up into hourly, daily, weekly averages a lot of issues get masked and it's not a good basis to perform analytics on top of.
With that Jim I’m going to hand things back to you to offer some are key recommendations before we get to the Q&A portion of the event.
Great, thanks Scott. Just to wrap up from my viewpoint, a few things I hope you’ve heard and can take away today. Remember that it's really essential to look at taking the existing practices you know have worked in prior infrastructure, make the adaptations, build on those for these mixed environments. You will have to find ways to extend to find the data that's now slightly different, in slightly different places, or techniques in these virtual environments, but you're not creating any new ways to manage or extending the ones who have.
Make sure that you take advantage of the fact that you have sources of data that will help you keep monitoring up in a pace with change, because change is much more frequent in these environments. Look at ways to bridge data sources. You can do it by a higher level aggregation tool. You can also find tools that will do that integration for you. That becomes a really essential ingredient to success in these environments.
Look to these advanced analytics. We talked about analytics and advanced analytics today, all of that data processing, that's what these tools really should be doing for you. Data collectors is one thing but data processors is another important thing in performance management. Look for solutions that give you more than just the ability to collect and do a chart. They should be interpreting it for you.
Last I will hammer on this one more time, pay attention to the fact, you should always be keeping in mind that virtual environments function because they are supported by a physical infrastructure. There are essential relationships to understand and keep an eye on there. Scott?
Great, thanks Jim. I know we're just a couple of minutes past the top of the hour. We do have a Q&A portion coming up with a couple questions come in and we certainly like to open the floor or at least the online Q&A panel. If you have any questions you want to submit. Just a couple of housekeeping notes real quick.
First, I neglected to mention upfront. We are recording the session today, so if you're interested in sharing the recording or the session or going back and reviewing the slides and the content, we’ll be sure to send that out everybody that registered for the event here by the end of the week, so you have access to that.
First question I had, Rich asked a question about big data a little bit earlier. Rich was asking how SevOne handles the big data world. I guess he's implying how does it scale to handle the big data needs of network analytics. What I would say as I mentioned scale is one of the areas were SevOne absolutely excels in the market. That scale applies not only to the ability to monitor millions of elements on your network and objects on your network, but also the ability to handle the volume, the variety, the velocity of this big data world that we're in.
I know we talk about big data and sometimes it's a little bit of a catch phrase, but there's different flavors of big data. I think a lot of people, Jim might have mentioned earlier, yeah, a lot of people will think different things when they hear big data. Some people might think of the marketing efforts of companies, trying to uncover new market opportunity and understand people's behavior so they can personalize their services is better. Big data is certainly something that financial institutions have been dealing with when it comes to fraud detection or try to determine the risk of a portfolio.
But there's this other subset of big data which is what we call at SevOne big data network analytics. By far there's more data generated today by machines than there are by people's actions, the machines and the devices on your network, the routers, the switches, the firewalls. Being able to harness all that data and make sense of it and present it back to you in a way that provides insight about the health of your business, that’s somewhere where SevOne absolutely excels. It is a complete IT management and reporting platform.
The important thing is a lot of companies will say, “Oh yeah, we can take all kinds of volumes and variety of data,” but can they turn that around into reports that are generated instantly? That's the key and that's what I think differentiates SevOne from others, being able to get reports in seconds, where frankly there's other solutions out there that may take hours or even days to generate the reports you need. That's not a pace that we can afford to move at in this world today. Rich, I hope that addresses your question.
Let’s see, we have a couple other coming in here. Jim, I’m going to pass one back to you. Somebody chatted in a little while ago. There was a chart there about preferred methods of convergence. Which of the following strategies does your organization most prefer to follow in deploying network management products where they’re fully integrated? I'll see if I can bring it up again. But somebody is asking, like that seems to indicate that’s what people prefer, but what’s the reality right now in that are people all fully integrated, or do they have loosely integrated suites from a single vendor? What's the reality of the convergence in the market lately?
I think I mentioned this one. I had the slide up, but just to reemphasize. Yeah, this is preference. This is where people are thinking they want to move towards. It's going to be a factor as people look at whether or not for instance the management tools they have in place now are going to help them do this transition to these integrated virtualized environments, mixed physical virtual environments.
But no, in reality far fewer are actually there than want to be there. There is always going to be some mix of tools, there is a history in the sector of the attempt to frameworks that will do everything for you. There are some cases of shops that have actually gone that far, but I would say it's far less of a percentage than those who wish they could accomplish that.
My takeaway of that by the way is look for integration wherever you can, claim victory where it's possible given the tools that you have today or the ones that you're looking at buying. Anywhere that you can achieve better convergence by bringing data together or bringing management functions together will pay dividends. That's the takeaway from that.
Great, thanks Jim. I did have somebody ask a question about within SevOne creating different dependency in status maps. Is that something that we feature? Yes it is. Within SevOne if you need to map out your infrastructure, whether it’s a geography map or maybe it's the map of your datacenter and you want to drop in different infrastructure objects and able to get really quick, red yellow green indicators about the health of those objects and how they relate then to other objects and linking to other objects and maybe remote locations or data centers on the map, we do provide that within SevOne.
Jim we’re about 10 minutes past the hour here. I do what to respect people's time. Maybe just a quick point if you would elaborate a little bit. People are always asking about the ROI, return on investment of consolidation, and some of it seems probably pretty obvious on the surface but there's many aspects to it. Could you maybe just touch on briefly where you see different organizations getting the most bang from the buck in consolidating in virtualized environments and consolidating the management tools that they use to oversee those environments?
Sure, you bet. Actually just real quickly, three key areas to look for in terms of where you can capture real monetary savings by using a converging approach, one is it can reduce the number of tools you need and that reduces both your licensing support and administrative costs. That’s a good thing. Number two, it can actually reduce the headcount you need in order to run your operations. It may not eliminate positions but it might help you prevent the need to hire more people as you grow.
Then lastly actually in many cases that last set of slides I showed you about trying to improve and accelerate your mean time resolution, there’s a lot of cases where this is going to actually improve and protect the top line of the organization by protecting and restoring services, getting business back up and running, getting people back on their keyboards and being productive. That has real measurable results, especially if you're in an organization that has experienced any number of recognizable and quantifiable outages or more challenging these days, degradations. Things don't go necessarily go down completely but they degrade, they slow down, and that retards both productivity of employees and it retards business success for online and connected businesses.
You can look for top line savings. You can look for bottom line savings. What we've seen significant ROI in investments in these kind of products and technologies oftentimes very commonly paying back the investment in less than a year, and certainly we’ll pay back within two years. It's pretty compelling, the capabilities that these tools can bring to improving operational efficiency.
Jim, thanks. It's certainly been a pleasure presenting with you today. I hope this has been beneficial for our audience. There are a couple questions that came through at the last minute. We are going to follow up with you guys, Shane and some others, we’re going to follow up with you offline after this event, but we are almost a quarter after 12 now Eastern Time, so I want to give everybody a chance to get back to their jobs. I'm sure there's more important things to do right now.
Again, thank you everybody for your time. We very much appreciate it. Thank you Jim, and we look forward to continuing conversation. If there's anything we can do for you at SevOne we’d be happy to talk to you further. Thank everybody. Have a good day.