The Scalable Enterprise Genomics Platform       


Mahtab Mirmomeni photo

The Scalable Enterprise Genomics Platform - overview

With the continuous drop of the cost of genome sequencing, the adoption of this technique as a routine procedure in clinical settings is approaching fast. It is thus essential to create systems that can handle and process the massive amount of data that will be produced continuously in clinical settings such as hospitals and public health labs, as a result of adopting this technology. These systems require mechanisms for uploading and managing of sequencing data, associating metadata to data, audit trail, analysis, visualization and analytics. In addition, they need to be easy for lab technicians, without advanced bioinformatics expertise, to use. Given that genome sequencing might be used in multiple different domains, such as cancer genomics, bacterial genomics etc, it is desirable to create systems that are flexible and not bound to a specific type of analysis. These systems need to be highly configurable to incorporate the requirements of different genomics analyses.

We introduce, SEGP, a Scalable Enterprise Genomics Platform, created to facilitate the routine use of genome sequencing in clinical settings. SEGP links metadata (information about the sample source), with sequencing data and keeps the metadata as XML documents in a DB2 database. This enables further analytics to be run on the data captured in the system. All sequencing files can be stored in a file system outside the database. All metadata are searchable with SEGP. The result of running different analyses on metadata and the associated data is captured as metaresults and results. Metaresults are the information about the result files that are saved in the database and are also searchable. The result files (for example a BAM file, resulting from mapping) also reside on the file system.  


SEGP has been designed to be easily adapted to different genomics applications. All data types of a particular system, in addition to the rules applicable to those data types, are configurable outside the platform using XML schema. This creates the flexibility to adapt the platform to different settings without the overhead of modifying the source code.