Anomaly Detection – A Challenging Task in Today’s Dynamic Networks

Anomaly Detection

Using anomaly detection to help maintain network performance seems like it should be a fairly straightforward activity. Track enough performance data to determine what’s normal. Compare current activity to your ‘normal’ baseline. Look for things that are odd, things that deviate from normal. Those are your anomalies. Zoom in on them, figure out what’s going on, and if there’s a problem, fix it.

Pretty simple, right?

Wrong. Using anomaly detection in network performance monitoring and management has always been tricky. But over the past few years it has become even more complex and difficult thanks to several contributing factors. We’ll discuss some of the factors and approaches for overcoming them. Before we do, let’s start with a basic definition of anomaly detection and its use in network performance management.

The Basics – Anomaly Detection Defined

In any sort of data analysis, anomaly detection is the identification of any unusual items, events or observations. Due to the rarity of their occurrence, or he fact that they differ significantly from the majority of the data, anomalies raise suspicions.

Using anomaly detection in network performance management starts with capturing large amounts of performance data, and using it to develop baselines that show what’s considered to be normal network activity for things like times of day and days of the week.

Organizations also typically build other baselines that show the normal activity levels for different, geographic locations, network segments, business units, etc.  Whatever the focus, once your dataset of normal activity is robust enough, you have a baseline against which you can compare current activities.

When any of the network performance activity you’re tracking deviates significantly from its relevant baseline – Bingo! You have an outlier, an anomaly. And since anomalies are inherently suspicious, and are often indicators of a problem that’s building or already occurring, they can be used to generate alerts. Those warnings usually trigger quick scrutiny and further action, like investigation, analysis, and remediation steps.

But there are those challenges we mentioned...

Hurdles to Using Anomaly Detection in Network Performance Management

Lots of organizations today have deployed various types next-generation networking technologies. They’re tapping the power and scalability of things like virtualized network devices, software-defined networking, and cloud-based infrastructures to provision applications and services much faster and with much more flexibility than ever before. In these complex and fast-changing environments, network resources, apps and services are often here one minute and gone the next. That makes keeping baselines accurate and chasing legitimate anomalies a very difficult task.

Many organizations haven’t fully transitioned to next-gen network architectures, and as a result, they are simultaneously running new and traditional network environments. Since network performance issues can straddle both old and new segments, IT and NetOps teams need to manually combine performance data, baseline information, anomaly specifics and more in order to locate, analyze and fix a problem.

That gets us to a related issue – the data collection and management problems associated with building baselines. There are more types of devices now on our networks, and many more of them. And given how fast things change in our environments, the old standard, 5-minute device polling interval has given way in many shops to the 1-minute interval. The fact is, modern network environments generate mountains of performance data. For effective anomaly detection, all of that data needs to be collected and analyzed.

That means that for a sizable organization to spot a true anomaly in its network performance, it needs to be making on the order of hundreds of hundreds of millions of data point calculations every week. Another requirement is high speed. The calculations must be done rapidly to generate timely alerts.

And speaking of alerts, there’s the challenge of being able to take quick and effective action with the true positives, the anomalies that are indicative of user-impacting performance issues in the making or already occurring. The challenge here is that once the alert is in hand, locating, diagnosing, and initiating remediation action usually requires unified analysis of metrics, flows, and logs. Being able to pivot from a baseline metric to the specific log files or flow records for the device from which an alert was generated – all in the context of a single dashboard – can shorten mean time to repair. Automated anomaly detection is the key here, as manual approaches are neither fast nor accurate enough.

When you roll up comprehensive data collection, rapid polling, automated baselining, trending analysis for multi-vendor environments, and the automatics creation of troubleshooting workflows, it takes lots of compute power to handle it.

It’s definitely not a job for a light-weight or ‘point’ solution. Industrial-strength network performance monitoring is required, and that’s exactly what SevOne brings to the table.

Put the Power of Anomaly Detection in Your Hands

Our SevOne Network Data Platform is an industry-leading network performance monitoring solution. It has the power, flexibility, and scalability required to handle all of the challenges described above – and more.

The SevOne Network Data Platform uses machine learning to establish an accurate and up-to-date baseline for every performance metric it collects. This provides you with an understanding what is "normal" behavior for any given time of day and day of the week in any segment or area of your network. Alerts based on standard deviations from baseline performance norms notify teams whenever exception conditions occur. This method provides a more reliable predictor of service-impacting events, while also decreasing false positives.

Benefits of the SevOne Network Data Platform include:

  • Instant alerts when performance metrics deviate from expected behavior
  • Easy access to intuitive visualizations that compare typical performance to real-time metrics
  • The ability to easily take analyses in north/south and east/west directions whenever performance metrics for objects or devices merit further investigation
  • Greater granularity with baselines calculated at shorter polling intervals
  • Dynamic thresholds generate fewer false-positive alerts caused by static threshold violations
  • Integrated metric, flow and log troubleshooting workflows

Using anomaly detection in network performance monitoring is daunting. There are lots of facets and many moving parts that need to come together to get it done properly. When it is done properly, however, anomaly detection take an organization’s network performance management up to entirely new levels. And that can make a real difference for users and for the organization as a whole.

If you’d like to learn more about how our SevOne Network Data Platform can put the power of effective anomaly detection in your hands, our experts are ready to speak with you. To take your first step toward better network performance monitoring and management, contact us by Clicking Here.