Declarative large-scale machine learning (ML) in SystemML aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations to distributed computations on MapReduce or Spark. ML algorithms are expressed in an R-like syntax, that includes linear algebra primitives, statistical functions, and ML-specific constructs. This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics, and (2) data independence from the underlying input formats and physical data representations. Automatic optimization according to data and cluster characteristics ensures both efficiency and scalability. As such, SystemML differs from existing work on large-scale ML libraries, which mostly provide fixed algorithms and runtime plans.
In June 2015, we announced to open source SystemML. The open source repository is here: https://github.com/SparkTC/systemml