IT today still operates in silos and, as a result, visibility into IT Operations is significantly limited. According to a2015 Application Performance Monitoring survey, 65% of surveyed companies own more than 10 different monitoring tools. Yet research indicates that 50% or fewer of the tools companies have purchased are actively being used.
One of the key issues is that each tool provides organizations with only limited, compartmentalized components that do not offer the entire view of the entire IT environment. This narrow, isolated understanding makes it difficult and time consuming to identify the root causes of problems and prevent (or resolve) abnormalities. To establish a holistic view of operations and activities in an IT environment, data silos need to be consolidated, correlated, and annotated.
Correlating cross-silo information has been a difficult problem due to the unstructured and heterogeneous nature of the data, the amount of the collected measurements, and the fact that most monitoring tools can’t provide a broader perspective. Recent advances in big data and data science technologies allow companies to bridge the gap by correlating information across silos, extracting patterns, automatically identifying anomalies, and applying reasoning about root causes. This allows IT Operations Analytics (ITOA) to gain a broader view that will provide professionals with the ability to analyze IT environments more completely, accurately, and efficiently.
IT Operations regularly face the following big data issues:
A recent boom in the availability of big data technologies allows practitioners to effectively address these issues by deploying distributed storage, indexing, and processing algorithms. However, despite the increase in instrumentation capabilities and the amount of collected data, the enterprises barely use significantly larger data sets to improve availability and performance process effectiveness with root cause analysis and incident prediction. In a Gartner report released in October 2015, W. Cappelli emphasized that “although availability and performance data volumes have increased by an order of magnitude over the last 10 years, enterprises find data in their possession insufficiently actionable … Root causes of performance problems have taken an average of seven days to diagnose, compared to eight days in 2005 and only 3% of incidents were predicted, compared to 2% in 2005.” The key question is how do organizations make sense of this data?
Machine learning is a field that studies how to design algorithms that can learn by observing data. Machine learning has been traditionally used to discover new insights in data, develop systems that can automatically adapt and customize themselves, and to design systems where it is too complex/too expensive to implement all possible circumstances, for example, self-driving cars.
The IT Operations domain is a good fit for machine learning due to large amounts of data available for analysis, learning, and inducing new concepts. And given the growing progress of machine learning theory, algorithms, and computational resources on demand, it is no surprise that we see more and more machine learning applications in ITOA.
For example, VSE Corporation, one of the largest US government contractors, relies on their IT Operations team to be very responsive to changing business requirements, while at the same time making sure they’re able to maintain strong control over the IT environment. However, due to the complexity and dynamics of IT, investigating these complex incidents became more painful, time-consuming, and labor intensive. VSE implemented an analytics solution to crunch the vast amount of data, delivering insights that dramatically cut incident investigation time, facilitated validation of environment changes, and helped VSE stay in compliance effectively and efficiently.