Plant Genomics - overview
The Cacao Genome collaboration has provided us with fascinating algorithmic challenges in plant genomics. The cacao genome and its application in mapping pod color was published in a Special Issue on Plant Genomics in Genome Biology .
The article has been accessed over 18,000 times (July 2014) and it has generated considarable online buzz ranking in top 5% of all articles tracked by attention [link].
An application of using ancestral recombination graphs for classification of cacao cultivars is presented in , and metholodogical advances are described below for haplotype inference  and assembly evaluation [4,5,6].
Haplotype Inference and Phenotype Association
We have designed iXora  (identifying crossovers and recombining alleles) for exact haplotype inference and trait association, see the iXora project page.
Haplotype Assembly of Polyploids
Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies have shown that available methods are not delivering satisfying results in practice. We have investigated models for haplotype assembly of tetraploid potato samples. Our haplotype assembly tool is hosted on GitHub.
Assembly of genomes now has become more of an art than science. As the collective community barrels through time assembling genomes at a rate more rapid than ever before, it becomes imperative to address the question: How good are the assemblies? We explore these questions in publications [4,5,6].
Genetic Trait Prediction
Whole genome prediction of complex phenotypic traits using high-density genotyping arrays is highly relevant to the fields of plant and animal breeding and genetic epidemiology. Given a set of plant, animal, or human biallelic molecular markers, such as SNPs, the goal is to predict the values of certain traits, usually highly polygenic and quantitative. Genomic selection involves simultaneously modeling all marker effects, in contrast to traditional GWAS. Our current work focuses on innovative methods to tackle this tough problem and address issues such as the high dimension of the data relative to the sample size, gene-by-environmental effects, epistatis, heritability, and the many computational challenges. Our current publications on this topic includes .
Constructing populations with pre-specified characteristics is a fundamental problem in population genetics and plant breeding, among others areas. One of the major challenges in handling realistic simulations for plant and animal breeding is the sheer number of markers. Due to advancing technologies, the requirement has quickly grown from hundreds of markers to millions. We present a scheme for representing and manipulating such realistic size genomes, without any loss of information .
SimBA* denotes algorithms for accurately simulating populations, with specific linkage and allele distributions for non-generative simulations, and recombination models for forward simulations . See the SimBA project page for details.
Differential Gene Expression
RoDEO (Robust DE Operator) is our novel framework for detecting differentially expressed genes and stable genes between RNA-seq experiments , see the RoDEO project page.
- J. C. Motamayor, K. Mockaitis, J.Schmutz, N. Haiminen, D. Livingstone, O. Cornejo, S. D. Findley, P. Zheng, F. Utro, S. Royaert, C. Saski, J. Jenkins, R. Podicheti, M. Zhao, B. E. Scheffler, J. C Stack, F. A. Feltus, G. M. Mustiga, F. Amores, W. Phillips, J. Philippe Marelli, G. D. May, H. Shapiro, J. Ma, C. D Bustamante, R. J. Schnell, D. Main, D. Gilbert, L. Parida, D. N. Kuhn: The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biology 14:6 R53, 2013.
- F. Utro, N. Haiminen, D. Livingstone, O.E. Cornejo, S. Royaert, R. Schnell, J.C. Motamayor, D.N. Kuhn, L. Parida : iXora:Exact haplotype inferencing and trait association. BMC Genetics 14(1), 48, 2013.
- F. Utro, O.E. Cornejo, D. Livingstone, J.C. Motamayor, L. Parida: ARG-based genome-wide analysis of cacao cultivars. BMC Bioinformatics 13(Suppl 19), S17, 2012.
- F.A. Feltus, C.A. Saski, K. Mockaitis, N. Haiminen, L. Parida, Z.M. Smith, J.B. Ford, M.E. Staton, S.P. Ficklin, B.P. Blackmon, R.J. Schnell, D.N. Kuhn , J.-C. Motamayor: Sequencing of a QTL-rich Region of the Theobroma cacao Genome using Pooled BACs, BMC Genomics, 2011.
- N. Haiminen, D. Kuhn, L. Parida, I. Rigoutsos: Evaluation of Methods for De Novo Genome Assembly from High-Throughput Sequencing Reads Reveals Dependencies That Affect the Quality of the Results, PLoS ONE, 2011.
- N. Haiminen, F.A. Feltus, L. Parida: Assessing Pooled BAC and Whole Genome Shotgun Strategies for Complex Genome Assembly, BMC Genomics, 2011.
- D. He, I. Rish, D. Haws, S. Teyssedre, Z. Karaman, L. Parida: MINT: Mutual Information based Transductive Feature Selection for Genetic Trait Prediction, MLSB workshop 2013. pdf here
- N. Haiminen, F. Utro, C. Lebreton, P. Flament, Z. Karaman, L. Parida: Efficient in silico Chromosomal Representation of Populations via Indexing, Algorithms 6(3), pp. 430-441, 2013.
- N. Haiminen, C. Lebreton, L. Parida: Best-Fit in Linear Time for Non-generative Population Simulation. Algorithms in Bioinformatics, in Lecture Notes in Computer Science 8701, 247-262, Springer, 2014.
- Niina Haiminen, Manfred Klaas, Zeyu Zhou, Filippo Utro, Paul Cormican, Thomas Didion, Christian Sig Jensen, Chris Mason, Susanne Barth, Laxmi Parida:Comparative Exomics of Phalaris cultivars under salt stress. BMC Genomics 15(Suppl 6):S18, 2014.