Center for Computational Health - overview

Research at the Interface of Data Science and Health

We pursue research in the application of data science to healthcare across the entire continuum from the health of individuals, to that of populations, to the healthcare system itself.

Healthcare is in the midst of dramatic changes on many levels, driven in no small part by the expanding role of data in achieving a deeper understanding of disease, behavior and the interaction of complex systems. New types of data, such as genomic and sensor data, combined with the increasing electronic availability of traditional health data, are having a major impact on conceptual models of how disease is diagnosed and treated.

The Center for Computational Health at IBM Research consists of a multi-disciplinary team of researchers with expertise in machine learning, data mining, visual analytics, biomedical & medical informatics, statistics, behavioral and decision sciences, and medicine. We work on developing cutting-edge methodologies to derive insights from diverse sources of health data, to support use cases in personalized care delivery and management, real world evidence, health behavior modeling, cognitive health decision support, and translational informatics.

Program Director: Jianying Hu

Team Locations:

IBM T.J. Watson Research Center, Yorktown Heights, New York

IBM Research Cambridge, Cambridge, Massachusetts

Research Areas

 Patient Similarity Analytics

Incorporating diverse patient attributes to develop similarity analytics by applying advanced machine learning methods to identify precision cohorts, combined with modeling methodologies for personalized predictive models capable of identifying patient level rankings of risk factors, leading to more targeted and actionable insights.

Predictive Modeling

Advanced machine learning approaches to address challenges in developing effective and efficient predictive models from observational healthcare data in different use cases. Examples include matrix based methods to address sparsity, feature engineering (i.e., temporal pattern mining, factor analysis), feature selection, scalable predictive modeling platform, personalized predictive modeling leveraging precision cohorts, and multi-task learning for comprehensive risk assessment. 

Disease Progression Modeling

Understanding disease onset, characteristics of disease stages, rate of progression from asymptomatic to symptomatic disease, from earlier to more severe stages, and factors that influence disease progression pathways.   

Translational Informatics

Drug Similarity Analytics combined with advanced machine learning methods such as joint matrix factorization can help pharmaceutical researchers quickly identify drugs that have similar characteristics to target drugs, supporting three distinct, but equally important use-cases: Drug Safety, Drug Repositioning and Personalized Medicine.  

Visual Analytics and Cognitive Decision Support

Innovative visual analytics platform and user interfaces that accelerate the process of exploring and mining data to derive new insights that can be translated into more effective therapeutics and processes.

Contextual & Behavioral Modeling

Combining real-time data from wearable devices, self-reported activity and clinical data, allows us to model behavior for both prediction and personalized wellness and fitness strategies customized to an individual’s unique needs.

Recent News and Posts

4/7/17 - IBM grantedU.S. Patent 9,536,194: Method and system for exploring the associations between drug side-effects and therapeutic indications.IBM press release:
 Blog Post: Video:

Articles of interest related to CHF prediction work recently published in Circulation: Cardiovascular Quality and Outcomes:IEEE Spectrum Article: Post:

Recent Presentations & Events

Keynote: IEEE ICHI 2017 - 8/23-26/2017, Park City, Utah
Keynote: Computational Methods for Next Generation Health Care
Presenter: Jianying Hu

Keynote: 7th Digital Health Conference 2017 - 7/2-5/2017, London England
Keynote: Health Innovation – An IBM Perspective
Presenter: Ching-Hua Chen

American Medical Informatics Association (AMIA )2016 Annual Symposium, 11/12-16, Chicago, IL
  • Characterizing Physicians Practice Phenotype from Unstructured Electronic Health Records
    Presenter: Sanjoy Dey
  • Data-Driven Prediction of Beneficial Drug Combinations in Spontaneous Reporting Systems
    Presenter: Ying Li
  • Predicting Negative Events: Using Post-discharge Data to Detect High-Risk Patients
    Authors: Lina Sulieman, Daniel Fabbri, Fei Wang, Jianying Hu, Bradley Malin
Data Analytics Challenge Win: IEEE International Conference on Healthcare Informatics (ICHI), 10/4-7/2016, Chicago, IL
Winner of Data Analytics Challenge - Team HARG, IBM T.J. Watson Research Center
Submitters: Janu Verma, Bum Chul Kwon, Yu Cheng, Soumya Ghosh, Kenney Ng
Best Paper Win: European Semantic Web Conference (ESWC), 5/29-6/2016, Anissaras, Crete, Greece
Best In-Use/Industrial PaperAward - Predicting Drug-Drug Interactions through Large-scale Similarity-Based Link Prediction
Authors: Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, and Ping Zhang
Featured: IBM Watson Health Showcases on Tackling Diabetes at American Diabetes Association’s 76th Scientific Sessions, June 10-14, 2016, New Orleans, LA
Personalized predictive modeling work led by Kenny Ng featured in the press release:
2016 SIAM International Conference on Data Mining, May 5-7, 2016, Miami, FL
Tutorial Presentation: Biomedical Data Mining with Matrix Models
Presenter: Ping Zhang
Keynote: 6th International Conference on Digital Health, April 11-13, 2016, Montreal, Quebec, Canada
Keynote Presentation - "Health Innovation - An IBM Perspective"
Presenter: Ching-Hua Chen
Special Session: ENDO 2016, April 1-4, 2016, Boston, MA
Symposium: Advanced Healthcare Informatics Analytics in the Areas of Precision Medicine, Translational Medicine and Population Health
Presenters: Kenney Ng, Yarra Goldschmidt, Ching-Hua Chen
Plenary Speach: 2016 Asian American Engineer of the Year Symposium, March 12, 2016, New Brunswick, NJ
Plenary speach on Data Driven Healthcare Analytics
Plenary Speaker: Jianying Hu
Invited Presentation: CHDI’s 11th Annual HD Therapeutics Conference, February 22–25, 2016, Palm Springs, CA
Invited closing presentation: Understanding Huntington’s disease progression: A multi–level probabilistic modeling approach
Presenter: Jianying Hu
Invited Panel Presentation: SINAInnovations 2015, October 27-28, 2015, New York, NY
Day One Panel Discussion - Precision Medicine
Invited Panel Presenter: Jianying Hu
Program & Video Link:
Machine Learning in Healthcare, August 8-9, 2014, Los Angeles, CA
Keynote: Data Driven Analytics for Personalized Healthcare
Presenter: Jianying Hu
Program & Video Link: c

Selected Publications

MindfulWatch: A Smartwatch-based System for Real-time Respiration Monitoring During Meditation. 
Tian Hao, Chongguang Bi, Guoliang Xing, Roxane Chan, Linlin Tu 
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT/UbiComp), 2017

Added Value from Secondary Use of Person Generated Health Data in Consumer Health Informatics. 
P.S. Hsueh. F. Martin-Sanchez, K. Kim, S. Peterson, S. Dey, Y-K Cheung, T. Wetter
IMIA Yearbook of Medical Informatics 26(1), 2017. 

Present and Future Trends in Consumer Health Informatics and Patient Generated Health Data for Consumer Health Informatics. 
Lai, A., P.S. Hsueh, Y. Choi, R. Austin
IMIA Yearbook of Medical Informatics 2017. 

Making sense of Patient Generated Health Data (PGHD) with better interpretability: The transition from "more" to "better". 
P.S. Hsueh, S. Das, S. Dey, T. Wetter
The 16th World Congress on Medical and Health Informatics (MedInfo), 2017

cHRV: Uncovering Daily Stress Dynamics using Bio-signal from Consumer Wearables. 
Tian Hao, Henry Chang, Marion Ball, Kun Lin, Xinxin Zhu 
The 16th World Congress on Medical and Health Informatics (MedInfo), 2017

The Power of the Patient Voice: Learning Indicators of Hormonal Therapy Adherence from an Online Breast Cancer Forum.
Z. Yin, M. Bradley, J. Warner, P.S. Hsueh, C-H. Chen
The 11th AAAI conference on Web and Social Media (AAAI ICWSM) 2017

FamilyLog: A Mobile System for Monitoring Family Mealtime Activities. 
Chongguang Bi, Guoliang Xing, Tian Hao, Jina Huh, Wei Peng, Mengyan Ma 
IEEE International Conference on Pervasive Computing and Communications (Percom), 2017 

Toward Predicting Social Support Needs in Online Health Social Networks.
Choi, M.-J., Kim, S.-H., Lee, S., Kwon, B. C., Yi, J. S., Choo, J., & Huh, J.
Journal of Medical Internet Research (2017)

Sampling for Scalable Visual Analytics.
Kwon, B. C., Verma, J., Demiralp, C., & Haas, P.
IEEE Computer Graphics and Applications. 37(1), 100-108 (2017).

AxiSketcher: Interactive Nonlinear Axis Mapping through User's Drawing on Visualization.
Kwon, B. C., Kim, H., Wall, E., Choo, J., Park, H., & Endert, A.
IEEE Transactions on Visualization and Computer Graphics (2017).

VLAT: Development of a Visualization Literacy Assessment Test.
Lee, S., Kim, S.-H., & Kwon, B. C.
IEEE Transactions on Visualization and Computer Graphics (2017).

 Optimal Expert Knowledge Elicitation for Bayesian Network Structure Identification.
Cao Xiao, Yan Jin, Ji Liu, Bo Zeng, and Shuai Huang
IEEE Transactions on Automation Science and Engineering , IEEE, 2017

Unsupervised Sequential Outlier Detection with Deep Architectures.
Weining Lu, Yu Cheng, Cao Xiao, Shiyu Chang, Shuai Huang, Bin Liang, and Thomas Huang
IEEE Transactions on Image Processing, IEEE, 2017

 Patient Subtyping via Time-Aware LSTM Networks.
Inci Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil Jain and Jiayu Zhou
Proceedings of the 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2017)

An Adaptive Pattern Learning Framework to Personalize Online Seizure Prediction.
Cao Xiao, Shouyi Wang, Leon Iasemidis, Stephen Wong, Art Chaovalitwongse
IEEE Transactions on Big Data, 2017

An RNN Architecture with Dynamic Temporal Matching for Personalized Predictions of Parkinson's Disease.
Chao Che*, Cao Xiao*, Jian Liang, Bo Jin, Jiayu Zhou, Fei Wang
SIAM International Conference on Data Mining (SDM), 2017

MELD-Na score predicts incident major cardiovascular events, in patients with nonalcoholic fatty liver disease (NAFLD). 
Simon T, Kartoun U, Zheng H, Chan A, Chung R, Shaw S, Corey K.
Hepatology Communications 2017.

Predictive modeling of physician-patient dynamics that influence sleep medication prescriptions and clinical decision-making.
Beam AL, Kartoun U, Pai JK, Chatterjee AK, Fitzgerald TP, Shaw SY, Kohane IS. 
Scientific Reports 7:42282 (2017).
Harvard's press release:

Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.
Milos Radovic, Mohamed Ghalwash, Nenad Filipovic, Zoran Obradovic
BMC Bioinformatics, (2017)

Ranking Based Multitask Learning of Scoring Functions.
Stojkovic I, Ghalwash M, Obradovic Z.
Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML), Skopje, Macedonia, September 2017.

Cost Sensitive Time-Series Classification.
Roychoudhury S, Ghalwash M, Obradovic Z.
Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML), Skopje, Macedonia, September 2017.

A Fast Structured Regression for Large Networks.
Fang Zhou, Mohamed Ghalwash, Zoran Obradovic
Proc. The IEEE International Conference on Big Data 2016, Washington, DC USA, Dec. 2016. 

Continuous Conditional Dependent Network for Structured Regression.
Chao Han, Mohamed Ghalwash, Zoran Obradovic
The Thirty-First AAAI Conference on Artificial Intelligence AAAI-17, San Francisco, California USA, Feb. 2017.

Clinical Trials.Gov: A Topical Analyses.
Vibha Anand, Amos Cahan, Soumya Ghosh
AMIA Joint Summits, March 2017.

Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation.
C. Xiao, P. Zhang, W.Chaovalitwongse, J. Hu and F. Wang
The 31st AAAI Conference on Artificial Intelligence (AAAI 2017)

Multitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug Interaction.
Jin, H. Yang,  C. Xiao, P. Zhang, and F. Wang.
The 31st AAAI Conference on Artificial Intelligence (AAAI 2017).

Clustervision: Visual Supervision of Unsupervised Clustering.
Kwon BC, Eysenbach B, Verma J, Ng K, deFilippi C, Stewart WF, Perer A.
IEEE Trans Vis Comput Graph. 2017

Personalizing Gesture Recognition Using Hierarchical Bayesian Neural Networks
A Joshi, S Ghosh, M Betke, S Sclaroff, H Pfister
Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2017

Improving precision medicine using individual patient data from trials.
Cahan A, Cimino JJ.
CMAJ 2017 Feb 6;189(5):E204-E207. doi: 10.1503/cmaj.160267.

Identifying and investigating unexpected response to treatment: a diabetes case study.
Ozery-Flato, Michal and Ein-Dor, Liat and Parush-Shear-Yashuv, Naama and Aharonov, Ranit and Neuvirth, Hani and Kohn, Martin S and Hu, Jianying
Big data4(3), 148--159, Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA, 2016

Early detection of heart failure using electronic health records: practical implications for time before diagnosis, data diversity, data quantity, and data density.
Ng K, Steinhubl SR, deFilippi C, Dey S, Stewart WF.
Circuation: Cardiovascular Quality and Outcomes. 2016;9:649-658.

Characterizing physicians practice phenotype from unstructured electronic health records.
Dey S, Wang Y, Byrd R, Ng K, Steinhubl S, deFilippi C, Stewart W
American Medical Informatics Association Annual Symposium (AMIA), 2016.

Data-driven prediction of beneficial drug combinations in spontaneous reporting systems.
Ying Li, Ping Zhang, Zhaonan Sun Jianying Hu
American Medical Informatics Association Annual Symposium (AMIA), 2016.

Integrated machine learning approaches for predicting ischemic stroke and thromboembolism in atrial fibrillation.
Xiang Li, Haifeng Liu, Xin Du, Ping Zhang, Gang Hu, Guotong Xie, Shijing Guo, Meilin Xu, Xiaoping Xie
American Medical Informatics Association Annual Symposium (AMIA), 2016.

Predicting negative events: Using post-discharge data to detect high-risk patients.
Lina Sulieman, Daniel Fabbri, Fei Wang, Jianying Hu, Bradley Malin.
American Medical Informatics Association Annual Symposium (AMIA), 2016.

DPDR-CPI, a server that predicts drug positioning and drug repositioning via chemical-protein interactome.
Heng Luo, Ping Zhang, Xi Hang Cao, Dizheng Du, Hao Ye, Hui Huang, Can Li, Shengying Qin, Chunling Wan, Leming Shi, Lin He, Lun Yang
Scientific Reports, Nature Publishing Group, 2016.

Deep state space models for computational phenotyping.
Soumya Ghosh, Yu Cheng, and Zhaonan Sun
IEEE International Conference on Health Informatics (ICHI) 2016.

Correlating eligibility criteria generalizability and adverse events using Big Data for clinical trials.
Sen A, Ryan PB, Goldstein A, Chakrabarti S, Wang S, Koski E, Weng C.
Ann N Y Acad Sci. 2016 Sep 6. doi: 10.1111/nyas.13195.

Improving precision medicine using individual patient data from trials.
Cahan A, Cimino JJ.
CMAJ 2017 Feb 6;189(5):E204-E207. doi: 10.1503/cmaj.160267.

Using frequent item set mining and feature selection methods to identify interacted risk factors - The atrial fibrillation case study.
Xiang Li, Haifeng Liu, Xin Du, Gang Hu, Guotong Xie, Ping Zhang
Medical Informatics Europe (MIE), 2016.

Visual assessment of the similarity between a patient and trial population: Is this clinical trial applicable to my patient?
Cahan A, Cimino JJ.
Applied Clinical Informatics, 2016 Jun 8;7(2):477-488.

Predicting drug-drug interactions through large-scale similarity-based link prediction.
Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang
Extended Semantic Web Conference (ESWC), 2016

Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models
Krause J, Perer A, and Ng K.
Proceedings of the 2016 CHI Conference in Human Factors in Computing Systems, 2016

Integrating population-based patterns with personal routine to re-engage Fitbit use.
Chung C, Danis C.
Proceedings of PervasiveHealth 2016, 2016

Risk prediction with electronic health records: A deep learning approach.
Cheng Y, Wang F, Zhang P, Hu J.
SIAM International Conference on Data Mining (SDM), 2016.

Clustering of elderly patient subgroups to identify medication-related readmission risks.
Olson, Catherine H and Dey, Sanjoy and Kumar, Vipin and Monsen, Karen A and Westra, Bonnie L
International Journal of Medical Informatics 2016 Jan;85(1):43-52, Elsevier.

Wearable technologies and telehealth in care management for chronic illness.
Zhu, Xinxin, and Cahan, Amos.
in Healthcare Information Management Systems
Charlotte A. Weaver, Marion J. Ball, George R. Kim, and Joan M. Kiel, Eds, Springer International Publishing, 2016.

Mining and exploring care pathways from electronic medical records with visual analytics.
A. Perer, F. Wang, and J. Hu.
Journal of Biomedical Informatics (JBI). 2015

Label Propagation Prediction of Drug-Drug Interactions Based on Clinical Side Effects.
Zhang P, Wang F, Hu J, Sorrentino R.
Sci Rep. 2015 Jul 21;5:12339.

Towards actionable risk stratification: a bilinear approach
Wang X, Wang F, Hu J., Sorrentino, R
Journal of Biomedical Informatics (JBI). 2015

Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity
Ng K, Sun J, Hu J, Wang F
AMIA Jt Summits Transl Sci Proc. 2015 Mar 25;2015:132-6.

LINKAGE: An Approach for Comprehensive Risk Prediction for Care Management
Sun Z, Wang F, HU J
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015

Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records.
Yajuan Wang, Ng K, Byrd RJ, Jianying Hu, Ebadollahi S, Daar Z, deFilippi C, Steinhubl SR, Stewart WF.
Conf Proc IEEE Eng Med Biol Soc. 2015 Aug;2015:2530-3.

Clinicians' evaluation of computer-assisted medication summarization of electronic medical records.
Zhu X, Cimino JJ. 
Comput Biol Med. 2015 Apr;59:221-31.

Prescription Extraction from Clinical Notes: Towards Automating EMR Medication Reconciliation
Wang Y, Steinhubl SR, Defilippi C, Ng K, Ebadollahi S, Stewart WF, Byrd RJ.
AMIA Jt Summits Transl Sci Proc. 2015 Mar 25;2015:188-93.

Relative Patterns Discovery toward Big Data Analytics
Pai H, Wu F, Hsueh PY, Lin G, Chan Y-H.
Proceedings of the 2015 IEEE 12th Interntional Conference e-Business Engineering (ICEBE), 2015

PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records
Ng K, Ghoting A, Steinhubl SR, Stewart WF, Malin B, Sun J
Journal of Biomedical Informatics (JBI), 2014

Unsupervised Learning of Disease Progression Models
Wang X, Sontag D, Wang F
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 2014

Predicting changes in hypertension control using electronic health records from a chronic disease management program
Sun J, McNaughton CD, Zhang P, Perer A, Gkoulalas-Divanis A, Denny JC, Kirby J, Lasko T, Salp A, Malin BA
Journal of American Medical Informatics Association (JAMIA), 2014

From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records
Zhou J, Wang F, Hu J, Ye J
Proceedings of 0th ACM SIGKDD international conference on Knowledge discovery and data mining, Pages 135-144  (KDD), 2014

Towards personalized medicine: leveraging patient similarity and drug similarity analytics
Zhang P, Wang F, Hu J, Sorrentino R
Proceedings of AMIA Joint Summits on Translational Sciences, 2014
Exploring joint disease risk prediction
Wang X, Wang F, Hu J, Sorrentino R.
Proceeding of AMIA Annual Symposium, 2014