SimBA: Simulation using Best-fit Algorithms     


SimBA: Simulation using Best-fit Algorithms - overview

SimBA (Simulation using Best-fit Algorithms) denotes a suite of algorithms for accurately simulating populations co-fitting specific distributions. The methods simulate a non-generative population with given constrains, allowing for accurate and linear time fitting to the input distributions.

SimBA-LD: Linkage disequilibrium and allele frequency simulation

Accuracy and sensitivity of population simulation is critical to the quality of the output of the applications that use them. Therefore we devised novel efficient methods for simulating constructing populations with given allele and linkage disequilibrium constraints [1], also including subpopulation structure and showing increased accuracy compared to existing methods [2]. The latest development is gapped population construction allowing high density markers to be simulated from a smaller initial set of markers [3,4].

The non-generative population simulation algorithm for LD and MAF distribution fitting is available for non-commercial use [2].

  • Unix executable (64-bit) is available, provided user accepts the included license (contact )
  • User manual is available [ PDF ] 

SimBA-hap: Polyploid dosage and founder distribution simulation

The non-generative population simulation algorithm for autopolyploid dosage and founder distribution fitting [4] is available on GitHub:


Recombination simulation

We have also designed and implement accurate and efficient recombination simulation methods for forward simulations following the Kosambi, Haldane, Morgan models [1].

  • Unix executable (64-bit) and manual are available upon request ( contact )


Related publications

  1. N. Haiminen, C. Lebreton, L. Parida: Best-Fit in Linear Time for Non-generative Population SimulationAlgorithms in Bioinformatics, in Lecture Notes in Computer Science 8701, 247-262, Springer, 2014.
  2. L. Parida and N. Haiminen: SimBA: Simulation Algorithm to Fit Extant-Population Distributions. BMC Bioinformatics 16:82, 2015.
  3. L. Parida and N. HaiminenScalable Algorithms at Genomic Resolution to fit LD Distributions. In Proc. ACM BCB, 2016.
  4. E. Siragusa, N. Haiminen, F. Utro, L. Parida: Linear time algorithms to construct populations fitting multiple constraint distributions at genomic scales. IEEE/ACM Transactions on Computational Biology and Bioinformatics PP(99), 2017