IT Operational Analytics (ITOA) - overview
Logs generated from various components of IT infrastructure contains rich information which can be used for various tasks like troubleshooting, capacity management, performance optimization, security operations as well as gaining insights into the user's experience. The intuition is that machine generated logs when used for troubleshooting can be good indicators of brewing problems. In addition to logs there are various other kinds of configuration, machine state and event data that can be collected from the servers in a cloud environment.
However, getting insights from this unstructured log data is challenging & The backend technologies here depend on a combination of Information extraction, Indexing, Data Mining, Data Visualization, Machine Learning and Natural Language Processing (NLP) techniques running on large sets of machine and human generated data.
The goal of this project is to build technology to enable state-of-the-art log and machine data analytics. There are several research challenges that we are addressing:
- Indexing large amounts of evolving configuration and machine state data
- Maintaining historical state and being able to efficiently query it
- Mining the logs to detect signature patterns that considerably reduce log footprint
- Creating the control-flow graph by mining dependency patterns across the templates
- Detecting anomalies for root cause analysis and alerting
- Visualization for log navigation and analysis
- Scaling analysis to large amounts of log data over large periods of time
Automated detection of infra-structure problems from human generated tickets with high precision is valuable from the perspective of identifying problem areas within IT domains and there-in the automation opportunities. The intuition is that problem descriptions generated by humans in these environments are rich with relevant information. However parsing the incomplete and unstructured sentences to derive meaningful semantic context poses the challenge. We present a ticket analytics system in SCA-LA that:
- parses the unstructured text to derive the problem category which in turn helps in hot spot identification, problem trending and resolver group or automation script identification
- parses the resolution actions taken in history to create a repository of next best actions for a particular problem category.
These technologies provide building blocks of automated IT problem diagnostics and remediation technologies on which the IT self services agents of the future will be built.