Meta-omics - overview
Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing (genome wide metagenomics and metatranscriptomics, or targeted to specific genes) is increasingly used in human and animal health, food safety, and environmental studies.
Accurate identification of metagenomic sample content
One of the key questions relating to metagenomic samples is identifying all the organisms that are present in the mixture, while avoiding false positive calls. We approach this question from the perspective of utilizing all the sequencing reads' mappings to, often multiple, reference genomes.
Our approach is based on promiscuity of reads, i.e., reads mapping to multiple organisms, in contrast to current approaches that rely on the abundance of reads. Ranking the potential matches for each read, we demonstrate through simulations that the rank frequency distribution of true positive organisms’ reads peak at rank 1. To further enrich the true positives, we define a normalized score per organism, based on the promiscuity. Sorting by the score, the false positives sink to the bottom. Our preliminary experiments demonstrate that false positive organisms can be substantially reduced, without losing any true positives. Research on this topic was presented as a talk at the International Association for Food Protection 2016 annual meeting .
An application of Topological Data Analysis to the problem of separating truly present organism from false positives will be presented at APBC 2019 .
Characterization and comparison of metagenomes
We are exploring the use of RoDEO, our method for differential gene expression, for sample comparisons and OTU abundance comparisons. First results on this ongoing work were presented at the 13th International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (2016) .
An extended journal version including results from using the top most differentially abundant OTUs was published in LNCS (2017) .
We have also developed PRROMenade for efficient and accurate functional classification of metagenomic and metatranscriptomic reads in terms of a functional annotation hierarchy . We take advantage of the fact that microbial sequences can be annotated relative to established tree structures, and we develop a highly scalable read classifier by enhancing the generalized Burrows-Wheeler transform with a labeling step to directly assign reads to the corresponding lowest taxonomic unit in an annotation tree.
We have also explored machine learning for phenotype prediction from human gut, oral, and skin microbiomes, specifically for the task of age prediction, to better understand age-related changes in the microbiome , see also https://www.ibm.com/blogs/research/2020/02/ai-predict-age-based-on-microbiome/.
Sequencing the Food Supply Chain
The Consortium for Sequencing the Food Supply Chain (SFSC), founded by IBM Research and Mars, Inc., examines the global food chain - from farms, transport, processing facilities and distribution channels to restaurants and grocery stories - and applies genomics and analytics techniques to mitigate food borne illness and other risks in food management.
Our research on meta-omics is closely linked to the consortium efforts to understand and characterize microbiomes of food samples. For more information on the consortium, see Consortium for Sequencing the Food Supply Chain.
The outcomes include development of the MCAW compute service for processing the massive amount of metagenomic and metatranscriptomic data . Results on this topic have been presented, among others, at Food Micro 2018 .
The consortium work also resulted in a pipeline and publication on food authentication from shotgun sequencing reads , see also https://www.ibm.com/blogs/research/2019/11/food-authentication/
 Understanding False Positives in Mapping of Microbiome Sequence Data Using In-Silico Simulations, talk by Niina Haiminen, IAFP Annual Meeting, St. Louis, Missouri, Aug 2016.
 Dimension reduction of metagenome data using RoDEO improves phenotype prediction, CIBB, Stirling, UK, Sept 2016.
 Host phenotype prediction from differentially abundant microbes using RoDEO Lecture Notes in Computer Science, pp. 27-41, Springer, 2017
 Design of the MCAW compute service for food safety bioinformatics IBM Journal of Research and Development 60(5/6), 2016
 Deep metatranscriptomic sequencing indicates stable microbial community across seasons and suppliers for protein meal factory ingredient, talk by Niina Haiminen, Food Micro Conference, Berlin, Germany, Sept 2018
 Signal enrichment with strain-level resolution in metagenomes using topological data analysis. BMC Genomics 20:2, 194, 2019
 Food authentication from shotgun sequencing reads with an application on high protein powders. npj Science of Food 3(24), 2019.
 Hierarchically labeled database indexing allows scalable characterization of microbiomes iScience 23(4), 2020.
 Human Skin, Oral, and Gut Microbiomes Predict Chronological Age mSystems 5(1), 2020.