IT Operational Analytics (ITOA)     


 Ingo Averdunk photo Mitch Gusat photo

IT Operational Analytics (ITOA) - overview

Log Analytics

Logs generated from various components of IT infrastructure contains rich information which can be used for  various tasks like troubleshooting, capacity management, performance optimization, security operations as well as  gaining insights into the user's experience. The intuition is that machine generated logs when used for troubleshooting can be good indicators of  brewing  problems. In addition to logs there are various other kinds of configuration,  machine state and event data that can be collected from the servers in a cloud environment.   

However, getting insights from this unstructured log data is challenging &  The backend technologies here depend on a combination of  Information extraction, Indexing, Data Mining, Data Visualization,  Machine Learning and Natural Language Processing (NLP) techniques running on large sets of machine and human generated  data.  

The goal of this project is to build technology to enable state-of-the-art log and machine data analytics.  There are several research challenges that we are addressing:

  • Indexing large amounts of evolving configuration and machine state data
  • Maintaining historical state and being able to efficiently query it
  • Mining the logs to detect signature patterns that considerably reduce log footprint
  • Creating the control-flow graph by mining dependency patterns across the templates
  • Detecting anomalies for root cause analysis and alerting
  • Visualization for log navigation and analysis
  • Scaling  analysis to large amounts of log data over large periods of time  

Ticket Analytics

Automated detection of infra-structure problems from human generated tickets with high precision is valuable from the perspective of identifying problem areas within IT domains and  there-in the automation opportunities. The intuition is that problem descriptions generated by humans in these environments are rich with relevant information. However parsing the incomplete and unstructured sentences  to derive meaningful semantic context poses the challenge. We present a ticket analytics system in SCA-LA that:

  • parses the unstructured text to derive the problem category which in turn helps in  hot spot identification, problem trending and resolver group or automation script identification
  • parses the resolution actions taken in history to create a repository of next best actions for a particular  problem category. 

These technologies provide building blocks of automated IT problem diagnostics and remediation technologies on which  the IT self services agents of the future will be built.