Meta-omics - overview
Accurate identification of metagenomic sample content
One of the key questions relating to metagenomic samples is identifying all the Operational Taxonomic Units (OTU, e.g., species, genera) that are present in the mixture, while avoiding false positive calls. We approach this question from the perspective of utilizing all the sequencing reads' mappings to, often multiple, reference genomes.
Our approach is based on promiscuity of reads, i.e., reads mapping to multiple OTUs, in contrast to current approaches that rely on the abundance of reads. Ranking the potential OTU matches for each read, we demonstrate through simulations that the rank frequency distribution of true positive OTUs’ reads peak at rank 1. To further enrich the true positives, we define a normalized score per OTU, based on the promiscuity. Sorting by the score, the false positive OTUs sink to the bottom. Our preliminary experiments demonstrate that false positive OTUs can be substantially reduced, without losing any true positives.
Research on this topic will be presented as a talk at the International Association for Food Protection 2016 annual meeting .
Characterization and comparison of metagenomes
We are exploring the use of RoDEO, our method for differential gene expression, for sample comparisons and OTU abundance comparisons. First results on this ongoing work were presented at the 13th International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (2016) .
An extended journal version including results from using the top most differentially abundant OTUs was published in LNCS (2017) .
Sequencing the Food Supply Chain
The Consortium for Sequencing the Food Supply Chain (SFSC), founded by IBM Research and Mars, Inc., examines the global food chain - from farms, transport, processing facilities and distribution channels to restaurants and grocery stories - and applies genomics and analytics techniques to mitigate food borne illness and other risks in food management.
Our research on meta-omics is closely linked to the consortium efforts to understand and characterize microbiomes of food samples. For more information on the consortium, see Consortium for Sequencing the Food Supply Chain.
The outcomes include development of the MCAW compute service for processing the massive amount of metagenomic and metatranscriptomic data .
 Understanding False Positives in Mapping of Microbiome Sequence Data Using In-Silico Simulations, IAFP Annual Meeting, St. Louis, Missouri, Aug 2016.
 Dimension reduction of metagenome data using RoDEO improves phenotype prediction, CIBB, Stirling, UK, Sept 2016.
 Host phenotype prediction from differentially abundant microbes using RoDEO Lecture Notes in Computer Science, pp. 27-41, Springer, 2017
 Design of the MCAW compute service for food safety bioinformatics IBM Journal of Research and Development 60(5/6), 2016