Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Molecular dynamics simulation is well-established as a technique contributing to drug and materials discovery. Increasingly important is its use as a data source for training AI models. Scaling the scope and size of such data sets will be key to building foundation models based on large-scale and diverse information. We use an IBM-developed open-source toolkit, Simulation Toolkit for Scientific Discovery (ST4SD), to automate simulation workflows. These workflows can be readily scaled to take full advantage of traditional high-performance computing and emerging OpenShift clusters. We then show how large-scale simulation data can be digested by graph-based, deep neural networks that our team has designed. We build a model for antigen-peptide immunogenic prediction that outperforms hand-engineered features trained on the same dataset and is further shown to outperform state-of-the-art sequence-based models in the low-data regime.
Gosia Lazuka, Andreea Simona Anghel, et al.
SC 2024
Yidi Wu, Thomas Bohnstingl, et al.
ICML 2025
Ben Fei, Jinbai Liu
IEEE Transactions on Neural Networks
Robert Farrell, Rajarshi Das, et al.
AAAI-SS 2010