Sayan Mandal, Aldo Guzmán-Sáenz, et al.
AICoB 2020
GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects heavily cross-validated feature sets, and LDpred2 PRS as a strong contrast to SVM, to explore significance and predictivity. Our Alzheimer's test case notoriously lacks strong genetic signals except for few very strong phenotype-SNP associations, which suits the problem we are exploring. We found that the most significant SNPs among ML and PRS-selected SNPs captured most of the predictivity, while weaker associations tend also to contribute weakly to predictivity. SNPs with weak associations tend not to contribute to predictivity, but deletion of these features does not injure it. Significance provides a ranking that helps identify weakly predictive features.
Sayan Mandal, Aldo Guzmán-Sáenz, et al.
AICoB 2020
Aritra Bose, Filippo Utro, et al.
Algorithms
Mirvat El-Sibai, Daniel E. Platt, et al.
Annals of Human Genetics
Mustafa Hajij, Karthikeyan Natesan Ramamurthy, et al.
NeurIPS 2021