Knowledge Discovery and Data Mining - KDD Speaker Day


KDD Speaker Day

Event Host: Amit Dhurandhar

 

We hosted a few select speakers on August 28th 2014 following KDD conference. Here are the video links (active only for a year):

Morning Session (Jieping Ye and Geoff Webb):  http://watkvideo01.watson.ibm.com/Mediasite/Play/9a548dbc2a9e4f9697c9e049f9b689261d

 

Afternoon Session (Dafna Shahaf, Jennifer Neville and Cynthia Rudin):   http://watkvideo01.watson.ibm.com/Mediasite/Play/b1e3a79d67ef4ebabf92e801a4b610da1d

 

Room: 20-001

Call-in Number:1888-426-6840 (PPC: 10136130)

 

9:45 - 10 am  Kick-off

 

10 am - 11 am

Speaker: Jieping Ye

Host: Abhishek Kumar (abhishk@us.ibm.com)

Title: Exact Data Reduction for Big Data

Slides: PDF 

Abstract:
Recent technological innovations have enabled data collection of unprecedented size and complexity. Examples include web text data, social media data, gene expression images, neuroimages, and genome-wide association study (GWAS) data. Such data have incredible potential to address complex scientific and societal questions, however analysis of these data poses major challenges for the scientists. As an emerging and powerful tool for analyzing massive collections of data, data reduction in terms of the number of variables and/or the number of samples has attracted tremendous attentions in the past few years, and has achieved great success in a broad range of applications. The intuition of data reduction is based on the observation that many real-world data with complex structures and billions of variables and/or samples can usually be well explained by a few most relevant explanatory features and/or samples. Most existing methods for data reduction are based on sampling or random projection, and the final model based on the reduced data is an approximation of the true (original) model. In this talk, I will present fundamentally different approaches for data reduction in that there is no approximation in the model, that is, the final model constructed from the reduced data is identical to the original model constructed from the complete data. Finally, I will use several real world examples to demonstrate the potential of exact data reduction for analyzing big data.
 
Bio:
Jieping Ye is an Associate Professor of Computer Science and Engineering at the Arizona State University. He is a core faculty member of the Bio-design Institute at ASU. He received his Ph.D. degree in Computer Science from University of Minnesota, Twin Cities in 2005. His research interests include machine learning, data mining, and biomedical informatics. He has served as Senior Program Committee/Area Chair/Program Committee Vice Chair of many conferences including NIPS, KDD, IJCAI, ICDM, SDM, ACML, and PAKDD. He serves as an Associate Editor of Data Mining and Knowledge Discovery, IEEE Transactions on Knowledge and Data Engineering, and IEEE Transactions on Pattern Analysis and Machine Intelligence. He won the NSF CAREER Award in 2010. His papers have been selected for the outstanding student paper at ICML in 2004, the KDD best research paper honorable mention in 2010, the KDD best research paper nomination in 2011 and 2012, the SDM best research paper runner up in 2013, the KDD best research paper runner up in 2013, and the KDD best student paper award in 2014.

--------------------------------------------------------------------------------------------------------------------------

11 am - 12 pm

Speaker: Geoffrey Webb

Host: Fei wang (fwang@us.ibm.com)

Title: Scalable learning of Bayesian network classifiers

Slides: PDF

Abstract:

I present our work on highly-scalable out-of-core techniques for learning well-calibrated Bayesian network classifiers.  Our techniques are based on a novel hybrid generative and discriminative learning paradigm.  These algorithms - provide straightforward mechanisms for managing the bias-variance trade-off, - have training time that is linear with respect to training set size, - require as few as one and at most four passes through the training data, - allow for incremental learning, - are embarrassingly parallelisable, - support anytime classification, - provide direct well-calibrated prediction of class probabilities, - can learn using arbitrary loss functions, - support direct handling of missing values, and - exhibit robustness to noise in the training data. Despite their computationally efficiency the new algorithms deliver classification accuracy that is competitive with state-of-the-art in-core discriminative learning techniques.

Bio:

Geoff Webb is a Professor of Information Technology Research in the Faculty of Information Technology at Monash University, where he heads the Centre for Data Science.  His primary research areas are machine learning, data mining, user modelling and computational structural biology.   His commercial data mining software, Magnum Opus, incorporates many techniques from his association discovery research. Many of his learning algorithms are included in the widely-used Weka machine learning workbench.  He is editor-in-chief of Data Mining and Knowledge Discovery, co-editor of the Springer Encyclopedia of Machine Learning, a member of the advisory board of Statistical Analysis and Data Mining, a member of the editorial board of Machine Learning and was a foundation member of the editorial board of ACM Transactions on Knowledge Discovery from Data.  He is PC Co-Chair of the 2015 ACM SIGKDD International Conference on Knowledge Discovery from Data, was PC Co-Chair of the 2010 IEEE International Conference on Data Mining and General Co-Chair of the 2012 IEEE International Conference on Data Mining. He has received the 2013 IEEE ICDM Service Award and a 2014 Australian Research Council Discovery Outstanding Researcher Award.

-----------------------------------------------------------------------------------------------------------

12 pm - 2 pm             Lunch break

-----------------------------------------------------------------------------------------------------------

2 pm - 3 pm

Speaker: Dafna Shahaf

Host: Aurelie Lozano (aclozano@us.ibm.com)

Title: The Aha! Moment: From Data to Insight

Slides: https://www.dropbox.com/s/cmc3317s635bq34/talkIBM.pptx

Abstract:
The amount of data in the world is increasing at incredible rates. Large-scale data has potential to transform almost every aspect of our world, from science to business; for this potential to be realized, we must turn data into insight. In this talk, I will describe two of my efforts to address this problem computationally: The first project, Metro Maps of Information, aims to help people understand the underlying structure of complex topics, such as news stories or research areas. Metro Maps are structured summaries that can help us understand the information landscape, connect the dots between pieces of information, and uncover the big picture. The second project proposes a framework for automatic discovery of insightful connections in data. In particular, we focus on identifying gaps in medical knowledge: our system recommends directions of research that are both novel and promising. I will formulate both problems mathematically and provide efficient, scalable methods for solving them. User studies on real-world datasets demonstrate that our methods help users acquire insight efficiently across multiple domains. 

Bio:
Dafna Shahaf is a postdoctoral fellow at Stanford University. She received her Ph.D. from Carnegie Mellon University; prior to that, she earned an M.S. from the University of Illinois at Urbana-Champaign and a B.Sc. from Tel-Aviv university. Dafna's research focuses on helping people make sense of massive amounts of data. She has won a best research paper award at KDD 2010, a Microsoft Research Fellowship, a Siebel Scholarship, and a Magic Grant for innovative ideas.

--------------------------------------------------------------------------------------------------------------------------

 3 pm - 4 pm

Speaker: Jennifer Neville

Host: Peder Olsen (pederao@us.ibm.com)

Title: How to Exploit Network Properties to Improve Learning in Relational Domains

Slides: PDF

Abstract:

Although relational data from online social networks offer several opportunities to improve predictive models of user behaviors and interactions, the unique characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the unique impact of relational data characteristics on the choice of model representation, objective function, and search procedure---and show how complex interactions between local model properties, global network structure, and the availability of observed attributes affects subsequent predictive performance. By understanding the impact of these interactions on model and algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient methods that improve analysis of large, partially-observable social network and social media datasets.

Bio:
Jennifer Neville is an associate professor at Purdue University with a joint appointment in the Departments of Computer Science and Statistics. She received her PhD from the University of Massachusetts Amherst in 2006. In 2012, she was awarded an NSF Career Award, in 2008 she was chosen by IEEE as one of "AI's 10 to watch", and in 2007 was selected as a member of the DARPA Computer Science Study Group.  Her research focuses on developing data mining and machine learning techniques for relational domains, including citation analysis, fraud detection, and social network analysis.

 

--------------------------------------------------------------------------------------------------------------------------

4 pm - 5 pm

Speaker: Cynthia Rudin

Host: Marek Petrik (mpetrik@us.ibm.com) 

Title: Algorithms for Interpretable Machine Learning

Slides: https://prezi.com/elqb9kyvf5jt/

Abstract:
It is extremely important in many application domains to have transparency in predictive modeling. Domain experts do not tend to prefer "black box" predictive models. They would like to understand how predictions are made, and possibly, prefer models that emulate the way a human expert might make a decision, with a few important variables, and a clear convincing reason to make a particular prediction.
I will discuss recent work on interpretable predictive modeling with decision lists and sparse integer linear models. I will describe several approaches, including an algorithm based on discrete optimization, and an algorithm based on Bayesian analysis. I will show examples of interpretable models for stroke prediction in medical patients and prediction of violent crime in young people raised in out-of-home care.

 

Bio:
Cynthia Rudin is an associate professor of statistics at the Massachusetts Institute of Technology associated with CSAIL and the Sloan School of Management, and directs the Prediction Analysis Lab. Previously, Prof. Rudin was an associate research scientist at the Center for Computational Learning Systems at Columbia University, and prior to that, an NSF postdoctoral research fellow at NYU. She holds an undergraduate degree from the University at Buffalo where she received the College of Arts and Sciences Outstanding Senior Award in Sciences and Mathematics, and she received a PhD in applied and computational mathematics from Princeton University in 2004. She is the recipient of the 2013 INFORMS Innovative Applications in Analytics Award. She was given an NSF CAREER award in 2011. Her work has been featured in IEEE Computer, Businessweek, The Wall Street Journal, the Boston Globe, the Times of London, Fox News (Fox & Friends), the Toronto Star, WIRED Science, Yahoo! Shine, U.S. News and World Report, Slashdot, CIO magazine, and on Boston Public Radio.

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------