Yuanyuan is currently a Research Staff Member at IBM Almaden Research Center. She received her PhD degree in Computer Science & Engineering in 2008 and MS degree in Computer Science & Engineering in 2005 both from University of Michigan, and BS degree in Computer Science & Technology with honor in 2003 from Peking University.
SQL for Big Data: Yuanyuan's work in this area includes SQL-on-Hadoop, Hybrid Warehouse (integration between Hadoop and Enterprise Data Warehouses), and HTAP (Hybrid Transactional and Analytical Processing) for Big Data. She collaborated closely with the IBM software group. Her work on SQL-on-Hadoop was tied to the IBM Db2 Big SQL product, and the Wildfire HTAP system she has co-developed has been released as the IBM Db2 Event Store product.
Selected Papers: HTAP for Big Data (BigData'19, SIGMOD'19, EDBT'19, CIDR'17, SIGMOD'16 Demo), Hybrid Warehouses (TODS'16, EDBT'15), CoHadoop (PVLDB'11), Hadoop Joins (SIGMOD'10)
Graph Analytics: Yuanyuan has long standing interests in graph analytics. She has written two books on large scale graph processing. Her current research includes building distributed graph-processing systems, designing distributed graph algorithms, and social network analysis. Her PhD thesis was on querying graph databases.
Graph Processing/Databases Papers: IBM Db2 Graph (VLDB'19), Dynamic Graph Analysis (ICDE'15), Giraph++ (PVLDB'13), Graph Summarization (CIKM'14, ICDE'10, SIGMOD'08), Graph Matching (ICDE'08, Bioinformatics'07)
Social Network Analysis Papers: Topic-Specific Influence Analysis (WSDM'14), Event-Based Social Network (SIGKDD'12).
System Support for Machine Learning: Yuanyuan is the co-inventor and a lead developer for a large-scale machine learning system, called SystemML. It is now a top-level Apache open source project. Recently, she has worked on integrating big SQL and big ML systems, and designing novel distributed time-biased sampling algorithms for online ML model management.
Selected Papers: Time-biased Sampling for Online Model Mangement (TODS'19, SIGMOD Record'19, EDBT'18), Integration of SQL and ML (EDBT'15), SystemML on YARN (SIGMOD'15), SystemML Optimizer (IEEE DE Bulletin'14), ParFor in SystemML (PVLDB'14), Numerical Stability in SystemML (ICDE'12), SystemML Archtecture (ICDE'11).
2019 IBM A-Level Accomplishment for contribution to IBM Db2 Event Store, IBM Research
2019 Invention Achievement Award, IBM
2019 VLDB 2019 Distinguished Reviewer Award, VLDB 2019
2019 SIGMOD 2019 Research Highlight Award, "Online Model Management via Temporally Biased Sampling", SIGMOD 2019
2019 Research Division Award for the work in declarative machine/deep learning, IBM
2019 Outstanding Technical Achievement Award for the work in large-scale graph analytics and infrastructure, IBM
2018 EDBT Best Paper Award, "Temporally-Biased Sampling for Online Model Management", EDBT 2018
2018 IBM A-Level Accomplishment for the work in large scale graph analytics and infrastructure, IBM Research
2018 IBM A-Level Accomplishment for the work in declarative machine/deep learning (SystemML), IBM Research
2016 Outstanding Technical Achievement Award for the work in join algorithms for big data, IBM
2016 Eminence & Excellence Award, IBM Research
2015 IBM A-Level Accomplishment for the work in join algorithms for big data, IBM Research
2015 IBM A-Level Accomplishment for the contributions to the SystemML project, IBM Research
2013 High Value Patent Application Award, IBM Research
2012 Eminence & Excellence Award, IBM Research
2011 Eminence & Excellence Award, IBM Research
2008 Distinguished Achievement Award, University of Michigan
2007 2nd Place, CSE Honor Competition, University of Michigan
2007 Rackham Predoctoral Fellowship, University of Michigan
2003 Rackham Graduate Fellowship, University of Michigan
Editor: Associate Editor for VLDB Journal (since 2019), Associate Editor for PVLDB Vol. 11 (VLDB 2018), Section Editor for Encyclopedia on Big Data Technologies.
Workshop Chair: 3rd Workshop on Large Scale Network Analysis (LSNA 2014), 5th Workshop on Graph Data Management (GDM 2014), 2nd Workshop on Large Scale Network Analysis (LSNA 2013), 4th Workshop on Graph Data Management (GDM 2013), 1st Workshop on Large Scale Network Analysis (LSNA 2012)
- NSF Advisory Panel, 2013 & 2016.
- NSF Career Mentoring Panel, ICDE 2012.
My 2 cents on How to Be Competitive for Industrial Research Jobs presented in this career panel.
PC Member: SIGMOD 2020, VLDB 2019, SIGMOD 2018, VLDB 2017, VLDB 2016 Industrial Track, TKDE 2016 Poster Track, VLDB 2015, ICDE 2014, WISE 2013, SIGMOD 2012, GDM 2012, VLDB 2011 Industrial Track, DBSocial 2011, GDM 2011, ICDE 2011, GDM 2010, VLDB 2009.
Reviewer for Journals: VLDB Journal (2014, 2017), TODS (2013, 2015), Statistical Analysis and Data Mining (2009), Information System (2010, 2011, 2013), ACM Transactions on Intelligent Systems and Technology (2010), Distributed and Parallel Databases (2012).
Reviewer for Books: Data Processing Techniques in The Era of Big Data.
Reviewer for Research Grants: Research Grants Council (RGC) of Hong Kong (2010, 2011).
Reviewer for Awards: The NCWIT Award for Aspirations in Computing.
Hybrid Transactional/Analytical Processing (Tutorial), [Youtube Video], SIGMOD'2017, May 2017.
Big Graph Analytics Platforms (Tutorial), [Slides], SIGMOD'2016, June 2016.
Giraph++: From "Think Like a Vertex" to "Think Like a Graph", Facebook, Nov 2013.
Large Scale Topic-specific Influence Analysis on Microblogs, UC Santa Barbara, May 2013.
Large Scale Topic-specific Influence Analysis on Microblogs, UC Santa Cruz, May 2013.
SystemML: Large Scale Machine Learning on MapReduce, Peking University, Beijing, China, Aug 2012.
SystemML: Large Scale Machine Learning on MapReduce, IBM China Research Lab, Beijing, China, Aug 2012.
SystemML: Large Scale Machine Learning on MapReduce, University of Maryland, College Park, Maryland, Apr 2012.