Probing omics data via harmonic persistent homology
Davide Gurnari, Aldo Guzmán-Sáenz, et al.
RECOMB 2024
GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects heavily cross-validated feature sets, and LDpred2 PRS as a strong contrast to SVM, to explore significance and predictivity. Our Alzheimer's test case notoriously lacks strong genetic signals except for few very strong phenotype-SNP associations, which suits the problem we are exploring. We found that the most significant SNPs among ML and PRS-selected SNPs captured most of the predictivity, while weaker associations tend also to contribute weakly to predictivity. SNPs with weak associations tend not to contribute to predictivity, but deletion of these features does not injure it. Significance provides a ranking that helps identify weakly predictive features.
Davide Gurnari, Aldo Guzmán-Sáenz, et al.
RECOMB 2024
Sonia Youhanna, Daniel E. Platt, et al.
Atherosclerosis
Paul Brotherton, Wolfgang Haak, et al.
Nature Communications
GaneshPrasad ArunKumar, David F. Soria-Hernanz, et al.
PLoS ONE