Meta-omics - overview

Metagenomics is the study of metagenomes which are mixtures of genetic material from several organisms. Metagenomic sequencing (genome wide metagenomics and metatranscriptomics, or targeted to specific genes) is increasingly used in human and animal health, food safety, and environmental studies. 


Accurate identification of metagenomic sample content

One of the key questions relating to metagenomic samples is identifying all the organisms that are present in the mixture, while avoiding false positive calls. We approach this question from the perspective of utilizing all the sequencing reads' mappings to, often multiple, reference genomes.


Our approach is based on promiscuity of reads, i.e., reads mapping to multiple organisms, in contrast to current approaches that rely on the abundance of reads. Ranking the potential matches for each read, we demonstrate through simulations that the rank frequency distribution of true positive organisms’ reads peak at rank 1. To further enrich the true positives, we define a normalized score per organism, based on the promiscuity. Sorting by the score, the false positives sink to the bottom. Our preliminary experiments demonstrate that false positive organisms can be substantially reduced, without losing any true positives. Research on this topic was presented as a talk at the International Association for Food Protection 2016 annual meeting [1].

An application of Topological Data Analysis to the problem of separating truly present organism from false positives will be presented at APBC 2019 [6].

Characterization and comparison of metagenomes

RoDEO metagenomics

We are exploring the use of RoDEO, our method for differential gene expression, for sample comparisons and OTU abundance comparisons. First results on this ongoing work were presented at the 13th International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (2016) [2].

An extended journal version including results from using the top most differentially abundant OTUs was published in LNCS (2017) [3].

 RoDEO top DA OTUs

Sequencing the Food Supply Chain

The Consortium for Sequencing the Food Supply Chain (SFSC), founded by IBM Research and Mars, Inc., examines the global food chain - from farms, transport, processing facilities and distribution channels to restaurants and grocery stories - and applies genomics and analytics techniques to mitigate food borne illness and other risks in food management.

Our research on meta-omics is closely linked to the consortium efforts to understand and characterize microbiomes of food samples. For more information on the consortium, see Consortium for Sequencing the Food Supply Chain.

The outcomes include development of the MCAW compute service for processing the massive amount of metagenomic and metatranscriptomic data [4]. Results on this topic will be presented, among others, at Food Micro 2018 [5].


[1] Understanding False Positives in Mapping of Microbiome Sequence Data Using In-Silico Simulations, talk by Niina Haiminen, IAFP Annual Meeting, St. Louis, Missouri, Aug 2016.

[2] Dimension reduction of metagenome data using RoDEO improves phenotype prediction, CIBB, Stirling, UK, Sept 2016.

[3] Host phenotype prediction from differentially abundant microbes using RoDEO Lecture Notes in Computer Sciencepp. 27-41, Springer, 2017 

[4] Design of the MCAW compute service for food safety bioinformatics  IBM Journal of Research and Development 60(5/6), 2016 

[5] Deep metatranscriptomic sequencing indicates stable microbial community across seasons and suppliers for protein meal factory ingredient, talk  by Niina Haiminen, Food Micro Conference, Berlin, Germany, Sept 2018

[6] Signal Enrichment of Metagenome Sequencing Reads using Topological Data Analysis. Proc. APBC 2019, to appear in BMC Genomics