Randomized Numerical Linear Algebra for Large Scale Data Analysis - overview
The Sketching Linear Algebra Kernel is a library for matrix computations suitable for general statistical data analysis and optimization applications.
Many tasks in machine learning and statistics ultimately end up being problems involving matrices: whether you're matching lenders and loans in the microfinance space, or finding the key players in the bitcoin market, or inferring where tweets came from, you'll want to have a toolkit for low-rank matrix approximation, least-squares and robust regression, eigenvector analysis, CUR and non-negative matrix factorizations, and other matrix computations.
Sketching is a way to compress matrices that preserves key matrix properties; it can be used to speed up many matrix computations. Sketching takes a given matrix A and produces a sketch matrix B that has fewer rows and/or columns than A. For a good sketch B, if we solve a problem with input B, the solution will also be pretty good for input A. For some problems, sketches can also be used to get faster ways to find high-precision solutions to the original problem. In other cases, sketches can be used to summarize the data by identifying the most important rows or columns.
A simple example of sketching is just sampling the rows (and/or columns) of the matrix, where each row (and/or column) is equally likely to sampled. This uniform sampling is quick and easy, but doesn't always yield good sketches; however, there are sophisticated sampling methods that do yield good sketches.
The goal of this project is to build a sketching-based open-source software stack for NLA and its applications, as shown:
Matrix Completion | Nonlinear RLS, SVM, PCA |
Robust Regression | Other applications |
Python: Python-based data analytics scripting layer | |||
PythonBinding: C++ to Python bindings | |||
NLA: Numerical Linear Algebra primitives (Least squares regression, low-rank approximation, randomized estimators) |
|||
Sketch: Sketching kernels JL, FJL, Gaussian, Sign, Sparse Embedding |
|||
Third-Party Libraries: MPI, Elemental, BLAS, CombBLAS, FFTW, Boost |