Knowledge Discovery and Data Mining - overview

Knowledge Discovery and Data Mining (KDD) is an interdisciplinary area focusing upon methodologies for extracting useful knowledge from data. The ongoing rapid growth of online data due to the Internet and the widespread use of databases have created an immense need for KDD methodologies. The challenge of extracting knowledge from data draws upon research in statistics, databases, pattern recognition, machine learning, data visualization, optimization, and high-performance computing, to deliver advanced business intelligence and web discovery solutions.

IBM Research has been at the forefront of this exciting new area from the very beginning. For over a quarter century, an active statistics research program has explored a broad range of issues in theory and practice. The pioneering work of Benoit Mandelbrot on self-similarity (fractals) and long-range dependent statistical models has had significant impact on many scientific disciplines, including hydrology, finance, and communications network and computer system analysis. Analysis of time-dependent data and non-standard distributions is another influential area of IBM’s statistics research. An example is L-moments distribution theory that led to innovative statistical methods for characterizing and estimating distributions, especially of heavy-tailed data in finance, risk management,and IT-system monitoring. Leadership in knowledge discovery and data mining (KDD)research was established in the 1990s by Rakesh Aggrawal’s introduction of association rule mining. IBM’s other major contributions in KDD include mining of excessive information stream throughput with lightweight data analysis techniques, high-performance mining techniques in parallel execution environments, and pioneering the area of privacy preserving data mining.

With the explosive growth of online data and IBM’s expansion of offerings in services and consulting, data-based solutions are increasingly crucial. Accordingly, methodological development for business intelligence, as well as IT-system and business process monitoring, has become a focal point of statistics and KDD research at IBM. In these areas,monitoring data that has been collected over time is used to make processes more efficient, effective, predictable, and profitable. Challenging aspects include handling large time-dependent data with varied characteristics, producing accurate and practical forecasting methods, and developing analytics relevant for business decision-making.Two specific problems that IBM Research is currently addressing, for example, are customer targeting and business metric forecasting.