Knowledge Discovery and Data Mining - overview

Knowledge Discovery and Data Mining (KDD) is an interdisciplinary area focusing upon methodologies for extracting useful knowledge from data. The ongoing rapid growth of online data due to the Internet and the widespread use of databases have created an immense need for KDD methodologies. The challenge of extracting knowledge from data draws upon research in statistics, databases, pattern recognition, machine learning, data visualization, optimization, and high-performance computing, to deliver advanced business intelligence and web discovery solutions.

IBM Research has been at the forefront of this exciting new area from the very beginning. For over a quarter century, an active statistics research program has explored a broad range of issues in theory and practice. The pioneering work of Benoit Mandelbrot on self-similarity (fractals) and long-range dependent statistical models has had significant impact on many scientific disciplines, including hydrology, finance, and communications network and computer system analysis. Analysis of time-dependent data and non-standard distributions is another influential area of IBM’s statistics research. An example is L-moments distribution theory that led to innovative statistical methods for characterizing and estimating distributions, especially of heavy-tailed data in finance, risk management,and IT-system monitoring. Leadership in knowledge discovery and data mining (KDD) research was established in the 1990s by Rakesh Agrawal’s introduction of association rule mining. IBM’s other major contributions in KDD include mining of excessive information stream throughput with lightweight data analysis techniques, high-performance mining techniques in parallel execution environments, and pioneering the area of privacy preserving data mining.

With the explosive growth of online data and IBM’s expansion of offerings in services and consulting, data-based solutions are increasingly crucial. Accordingly, methodological development for business intelligence, as well as IT-system and business process monitoring, has become a focal point of statistics and KDD research at IBM. In these areas,monitoring data that has been collected over time is used to make processes more efficient, effective, predictable, and profitable. Challenging aspects include handling large time-dependent data with varied characteristics, producing accurate and practical forecasting methods, and developing analytics relevant for business decision-making.Two specific problems that IBM Research is currently addressing, for example, are customer targeting and business metric forecasting.


Yada Zhu

KDD PIC is proud to support

ML Symposium NYAS 2019

This symposium, the thirteenth in an ongoing series presented by the Machine Learning Discussion Group at the New York Academy of Sciences, will feature Keynote Presentations from leading scientists in both applied and theoretical Machine Learning and Spotlight Talks, a series of short, early career investigator presentations across a variety of topics at the frontier of Machine Learning.

CIKM 2018

The 27th ACM International Conference on Information and Knowledge Management takes place on October 22 - 26, 2018 at 'Lingotto', Turin, Italy. The theme for 2018 is "From Big Data and Big Information to Big Knowledge".


The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases will take place in the Croke Park Conference Centre, Dublin, Ireland during the 10 – 14 September 2018.

COLT 2018

The 31st edition of the Conference on Learning Theory will take place at KTH Royal Institute of Technology, Stockholm, Sweden, July 5 - 9, 2018.