ProvLake - Implementation details


This prototype of ProvLake was evaluated in Python scripts that execute data processing workflows.

ProvLake server components were developed in Python webservers and deployed using Docker images in a Kubernetes cluster. Webservers run on uWSGI for parallel processing of work queues of requests sent to the server at workflow runtime.

ProvLake's data representation is implemented in as an OWL ontology that extends W3C PROV-O, for multi-store data relationships jointly with multi-workflow data dependencies.

The PLView's DBMS is AllegroGraph for analytical graph traversal queries over millions of RDF triples.

The polystore is implemented in PostgreSQL version 11 with Foreign Data Wrappers to MongoDB and AllegroGraph. The polystore uses a global data schema dynamically generated based on a methodology for multi-workflow data design.