Wei Tan  Wei Tan photo       

contact information

Research Staff Member - GPU, Spark, NoSQL, services computing.
Thomas J. Watson Research Center, Yorktown Heights, NY USA


Professional Associations

Professional Associations:  ACM  |  IEEE Member


I currently work on big data and distributed systems. Specifically, to accelerate large-scale machine learning algorithms using scale-out (e.g., Spark) and scale-up (e.g., GPU) approaches. I also work on NoSQL and services computing.

My work and code have been incorporated into IBM patent portfolio and software products such as BigInsights and Cognos. I am an adjunct professor at Department of Automation, Tsinghua University, China, and an associate editor of IEEE Transactions on Automation Science and Engineering.

What's New.

Tutorial Large-Scale Matrix Factorization with Prof. Fei Wang, at IEEE BigData 2016.  [pdf] [bibliography]

cuMF_SGD, the new member to the cuMF family! It is a SGD version which complements the previously released ALS one. Outperform all previous approaches with a single GPU! [arXiv]
cuMF, a CUDA-based matrix factorization library that optimizes alternate least square (ALS) method to solve very large-scale MF.
CuMF maximizes the performance on single and multiple GPUs. CuMF can be used in recommender systems, embedding layer in deep neural networks, and topic modeling.

With only one machine with four Nvidia GPU cards, cuMF can be 6-10 times as fast, and 33-100 times as cost-efficient, compared with the state-of-art distributed CPU solutions. Moreover, cuMF can solve the largest matrix factorization problem ever reported yet in current literature.

[HPDC 16 Paper] [PPT] [video] [GitHub] [IBM packages for Apache Spark version 2]

Brief Bio.

From 2008 to 2010 I worked at Computation Institute, University of Chicago and Argonne National Laboratory, on caGrid Workflow Toolkit, a web-service-based scientific workflow platform for cancer Biomedical Informatics Grid (caBIG). It was funded by US National Cancer Institute and adopted by many major US bioinformatics projects.

My awards include the Outstanding Technology Accomplishment Award from IBM (2014), Best Student Paper Award at ccGrid (2015), Best Student Paper Award at IEEE ICWS (2014), Best Paper Award at IEEE SCC (2011), Pacesetter Award from Argonne National Laboratory (2010), and caBIG Teamwork Award from the NIH (2008). I got my Ph.D in Control Science and Engineering from Tsinghua University, China.

Research Streams.

GPU: cuMF (HPDC16, NIPS 15 WS)

Big Data: HBase Index (EDBT 14, ccGrid 15), NoSQL (ICWS 14 tutorial).

Web service: CACM 16, IEEE T-ASE 14, IEEE Computer 09.

Distributed and cloud computing: IEEE T-ASE 13, IEEE T-ASE 12.