Song Pang is the senior vice president of customer engineering at NetBrain, a market leader for NetOps automation.
Outage prevention has long been considered the holy grail in networking. If potential network problems can be detected and corrected early, production services can be preserved.
But how does one find potential problems? How does an enterprise organization proactively look for conditions that would lead to outages if left unchecked?
We may have had the answer all along, but never connected the dots to standard operating procedures nor implemented the technology to make prevention part of the network operations function.
Network assessment is part of the answer, but both networks and network assessment have changed. Today’s enterprises’ infrastructures are complex multi-vendor and multi-cloud environments that can no longer be maintained reactively.
The Challenges With Network Assessments
Every network operations team knows that they should be doing network assessments regularly. That said, network assessments rarely happen due to cost, complexity, competing priorities and perceived value.
Assessments often happen when a fiscal event triggers them, such as when preparing for a data center or cloud migration project, in preparation for an audit or to prepare for merger and acquisition activity. This frequency hints at a missed opportunity.
Additionally, network assessments are usually conducted by a third-party service provider. To keep costs low, they only include a “representative sample” of all of the components involved in service delivery. Project-oriented assessments can also take so long to complete that they are already out of date and have little or no value by the time they are published.
A big part of the problem is that assessments were imagined in much simpler times before enterprises spanned the globe and were dominated by virtualized multi-cloud services. In those simpler times, the focus of device-oriented assessments was often network topology alone. The output of those simple network assessments was a network diagram or map that the NetOps team stored away until asked for a copy of it.
Today, the simple connectivity of the underlying devices is far less important to the business; the crucial part is that the aggregation of all those devices delivers business services reliably.
The Strategic Network Assessment: Scope And Scale
The core problem keeping network assessments from being more strategic is one of scope and scale. It is hard to capture all of the ways that the many operational teams need to be assessing the infrastructure.
Even if they could articulate all of those needs, the time it would take to manually check every network service, device and port for those desired conditions is intimidating. In turn, this means the assessment can’t be done as often.
Here are some examples of the kind of operational goals that should be assessed:
• The existence of the suitable quality of service (QoS) profiles required to deliver crisp VoIP calls.
• The prevention of insecure protocol use (e.g., TELNET).
• The performance of key applications due to minimal latency thresholds.
• The overall throughput to the public cloud which is shared by many applications.
• The CPU and memory utilization of all of the core routers and firewalls.
• The mirroring of high-availability pairs to ensure that configurations always match.
• The ports that are accessible on every device connected to the network.
How Network Assessments Can Evolve
Changing network assessments from a simple audit and documentation tactic to a powerful operational strategy requires that they become both deeper and more frequent. The more goals that can be checked, the more issues can be caught and fixed before they affect users; the more often the assessment can be run, the faster those issues can be found.
There are a few ways to accomplish this:
1. In theory, NetOps could talk to other IT departments, build a master list of these goals and hire an army of engineers to check them regularly. The downside is that this would be expensive.
2. Another option is to build a team of automation engineers who have both the coding experience and network knowledge to build scripts and tools to check all of these goals. These people are not easy to find but a well-funded IT department could do it. This is a good option for enterprises with networks that are too complex and fast-changing for off-the-shelf automation tools to handle.
3. The final option is to use a low-code or no-code automation tool. In recent years, network automation tools have become more common as automation has spread beyond just business processes. In this scenario you don’t need to hire automation engineers; NetOps, security or other IT operations employees can build the necessary automation even if they don’t know how to code. This results in a much smaller and faster project, but efforts will be limited by the automation tool chosen for this. The more tailored it is to a NetOps use case, the easier the project will be.
Continuous Network Assessment Prevents Outages
When broad network assessments are conducted continuously, network outages and service disruptions will be prevented. Every service, every device/port and every desired behavior can be assessed in real-time against the long list of desired operational and performance goals to identify and correct discrepancies.
Continuous network assessment should, therefore, become the core of a modern network outage prevention strategy. In practice, NetOps teams can verify that the network meets design compliance standards, security teams can verify their protection mechanisms are doing what they should be and application teams can be assured that their service delivery expectations are being met.
As a side benefit: Automated continuous assessments mean less work to prepare for network compliance audits since much of the important data is already being gathered and is available for any purpose. The 2023 Gartner Hype Cycle for Infrastructure Strategy (subscription required) predicted significant growth in network automation over the next few years, with use cases that have largely been overlooked.
I suggest that NetOps leaders take a look at continuous network assessment, collaborate with their other operations service colleagues and begin to address the long-standing outage prevention and risk-reduction goals that have been lingering for years.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?