Contact Information

David P. Woodruff
Research Staff Member
Almaden Research Center, San Jose, CA, USA

Tab navigation

My current interests are communication complexity, data stream algorithms and lower bounds, graph algorithms, machine learning, numerical linear algebra, sketching, and sparse recovery.

Contact: dpwoodru (at) us (dot) ibm (dot) com  

Copyright: Persons copying the material below should adhere to the terms of each author's copyright.

  • Sketching as a Tool for Numerical Linear Algebra, Foundations and Trends in Theoretical Computer Science, vol 10, issue 1-2, pp. 1-157, 2014. You can download a free copy (for personal use only) here .

    v3. fixed typos/minor errors in section 4.3, in particular in Fact 6 and its propagation, clarified when Lemma 4.2 can be applied, typos in section 4.2.3 (G should be applied on the left), other minor typos throughout

    v2. Some errors in the proof of Lemma 4.2 were fixed, as well as some typos. Thanks for feedback!

Summer School 2016 Course Slides (Sketching as a Tool for Numerical Linear Algebra)

Below are slides for all lectures on days 1-3 and 3/4 of day 4


The other slides for day 4 are regressionM and lowRankM and weighted


Lecture Notes (MADALGO and BASICS)

Here are three lectures, slight variants of which were given at the MADALGO summer school on streaming 2015 as well as the BASICS summer school on communication complexity 2015. The first lecture is an introduction to information theory for data streams, the second contains direct sum theorems for data streams, and the third covers multiplayer communication complexity.

Lecture 1

Lecture 2

Lecture 3





  • SODA, Low-Rank PSD Approximation in Input-Sparsity Time (to appear)
    (with Ken Clarkson)
  • SODA, Adaptive Matrix Vector Product (to appear)
    (with Santosh Vempala)



  • Scientific Reports, True Randomness from Big Data
    (with Periklis Papakonstantinou and Guang Yang)
    Full version here
  • NIPS, Communication-Optimal Distributed Clustering (to appear)
    (with Jiecao Chen, He Sun, and Qin Zhang)
  • NIPS, Sublinear Time Orthogonal Tensor Decomposition (to appear)
    (with Zhao Song and Huan Zhang)
  • SPAA, Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond pdf
    (with Hossein Esfandiari and Mohammad Taghi Hajiaghayi)
    Full version on arXiv
  • RANDOM, Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings pdf
    (with Yi Li)
  • ESA, Stochastic Streams: Sample Complexity vs. Space Complexity pdf
    (with Michael Crouch, Andrew McGregor, and Gregory Valiant)
  • KDD, Communication Efficient Distributed Kernel Principal Component Analysis pdf
    (with Maria-Florina Balcan, Yingyu Liang, Le Song, and Bo Xie)
    Full version on arXiv
  • ICML, How to Fake Multiply by a Gaussian Matrix pdf
    (with Michael Kapralov and Vamsi k. Potluru)
  • ICALP, Optimal Approximate Matrix Product in Terms of Stable Rank pdf
    (with Michael B. Cohen and Jelani Nelson)
    Full version on arXiv
  • ICDT (invited paper), New Algorithms for Heavy Hitters in Data Streams
    Full version on arXiv
  • PODS, An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related Problems pdf
    Full version on arXiv
    (with Arnab Bhattacharyya and Palash Dey)
  • PODS, Streaming Space Complexity of Nearly All Functions of One Variable on Frequency Vectors pdf
    Full version on arXiv
    (with Vladimir Braverman, Stephen R. Chestnut, and Lin F. Yang)
  • CCC, New Characterizations in Turnstile Streams with Applications pdf
    (with Yuqing Ai, Wei Hu, and Yi Li)
  • STOC, Beating CountSketch for Heavy Hitters in Insertion Streams pdf 
    (with Vladimir Braverman, Stephen R. Chestnut, and Nikita Ivkin)
    Full version on arXiv
  • STOC, On Approximating Functions of the Singular Values in a Stream pdf
    (with Yi Li)
    Full version on arXiv
  • STOC, Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality pdf
    Full version on arXiv
    (with Mark Braverman, Ankit Garg, Tengyu Ma, and Huy L. Nguyen)
  • STOC, Optimal Principal Component Analysis in Distributed and Streaming Models pdf
    Full version on arXiv
    (with Christos Boutsidis and Peilin Zhong)
  • STOC, Weighted Low Rank Approximations with Provable Guarantees pdf
    (with Ilya Razenshteyn and Zhao Song)
  • ICDE, Distributed Low Rank Approximation of Implicit Functions of a Matrix pdf
    (with Peilin Zhong)
  • SODA, Nearly-optimal bounds for sparse recovery in generic norms, with applications to k-median sketching pdf
    (with Arturs Backurs, Piotr Indyk and Ilya Razenshteyn)
  • ITCS, On Sketching Quadratic Forms pdf
    Merging of full versions of papers arXiv and arXiv
    (with Alexandr Andoni, Jiecao Chen, Bo Qin, Robert Krauthgamer, Qin Zhang)



  • FOCS, Input Sparsity and Hardness for Robust Subspace Approximation pdf
    (with Ken Clarkson)
  • APPROX, Tight Bounds for Graph Problems in Insertion Streams pdf
    (with Xiaoming Sun)
  • ICALP, The Simultaneous Communication of Disjointness with Applications to Data Streams pdf
    (with Omri Weinstein)
  • ICALP, Amplification of One-Way Information Complexity via Codes and Noise Sensitivity
    Full version on ECCC
    (with Marco Molinaro and Grigory Yaroslavtsev)
  • PODS, The Communication Complexity of Distributed Set-Joins with Applications to Matrix Multiplication pdf
    (with Dirk Van Gucht, Ryan Williams, and Qin Zhang)
  • SODA, Sketching for M-Estimators: A Unified Approach to Robust Regression pdf
    (with Ken Clarkson)


  • Bulletin of EATCS, Data Streams and Applications pdf
  • NIPS, Subspace Embeddings for the Polynomial Kernel pdf
    (with Haim Avron and Huy L. Nguyen)
  • NIPS, Low Rank Approximation Lower Bounds in Row-Update Streams pdf
  • NIPS, Improved Distributed Principal Component Analysis pdf
    (with Nina Balcan, Vandana Kanchanapally, and Yingyu Liang)
  • DISC, On the Communication Complexity of Linear Algebraic Problems in the Message Passing Model
    Full version on arXiv
    (with Yi Li, Xiaoming Sun, and Chengu Wang)
  • RANDOM, Certifying Equality with Limited Interaction pdf
    (with Joshua Brody, Amit Chakrabarti, Ranganath Kondapally, and Grigory Yaroslavtsev)
  • KDD, Improved Testing of Low Rank Matrices pdf
    Invited to the special issue for KDD, 2014 in Transactions on Knowledge Discovery from Data
    (with Yi Li and Zhengyu Wang)
  • COLT, Principal Component Analysis and Higher Correlations for Distributed Data
    Full version: pdf
    (with Ravi Kannan and Santosh Vempala)
  • PODC, Spanners and Sparsifiers for Dynamic Streams pdf
    Invited to the special issue for PODC, 2014 in Distributed Computing
    (with Michael Kapralov)
  • PODC, Beyond Set Disjointness: The Communication Complexity of Finding the Intersection pdf
    (with Joshua Brody, Amit Chakrabarti, Ranganath Kondapally, and Grigory Yaroslavtsev)
  • PODS, Is Min-Wise Hashing Optimal for Summarizing Set Intersection? pdf
    (with Rasmus Pagh and Morten Stockel)
  • STOC, Optimal CUR Matrix Decompositions
    Full version on arXiv
    (wtih Christos Boutsidis)
  • STOC, Turnstile Streaming Algorithms Might as Well Be Linear Sketches pdf
    (with Yi Li and Huy L. Nguyen)
  • SODA, On Sketching Matrix Norms and the Top Singular Vector
    (with Yi Li and Huy L. Nguyen)
  • SODA, An Optimal Lower Bound for Distinct Elements in the Message Passing Model
    (with Qin Zhang)
  • VLDB, Multi-Tuple Deletion Propagation: Approximations and Complexity (in PVLDB 2013 proceedings, presented in VLDB 2014)
    (with Benny Kimelfeld and Jan Vondrak)


  • NIPS, Sketching Structured Matrices for Faster Nonlinear Regression
    (with Haim Avron and Vikas Sindhwani)
  • DISC, When Distributed Computation is Communication Expensive
    Full version on arXiv
    Invited to the special issue for DISC, 2013, in Distributed Computing.
    (with Qin Zhang)
  • RANDOM, A Tight Lower Bound for High Frequency Moment Estimation with Small Error (draft of full version)
    (with Yi Li)
  • COLT, Subspace Embeddings and L_p Regression Using Exponential Random Variables
    Full version on arXiv
    (with Qin Zhang)
  • STOC, Low Rank Approximation and Regression in Input Sparsity Time
    Full version on arXiv
    Co-Winner of STOC Best Paper Award
    Pat Goldberg Best Paper Award, 2013
    Invited to the Journal of the ACM
    (with Ken Clarkson)
  • STOC, How Robust Are Linear Sketches to Adaptive Inputs?
    Full version on arXiv
    (with Moritz Hardt)
  • SODA, Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching pdf
    (with Marco Molinaro and Grigory Yaroslavtsev)
  • SODA, Lower Bounds for Adaptive Sparse Recovery pdf
    Full version on arXiv
    (with Eric Price)
  • SODA, Faster Robust Linear Regression pdf
    Full version on arXiv
    (with K.L. Clarkson, P. Drineas, M. Magdon-Ismail, M.W. Mahoney, and X. Meng)


  • RANDOM, On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation pdf
    (with Jelani Nelson and Huy L. Nguyen)
  • ICML, Fast approximation of matrix coherence and statistical leverage. pdf . Full version on arXiv
    (with Petros Drineas, Malik Magdon-Ismail, and Michael Mahoney)
  • SSP, Reusable Low-Error Compressive Sampling Schemes Through Privacy pdf
    (with Anna Gilbert, Brett Hemenway, Martin Strauss, and Mary Wootters)
  • ISIT, Applications of the Shannon-Hartley Theorem to Data Streams and Sparse Recovery pdf
    (with Eric Price)
  • PODS, Rectangle-Efficient Aggregation in Spatial Data Streams pdf
    (with Srikanta Tirthapura)
    Talk: ppt
  • PODS, Space-Efficient Estimation of Statistics over Sub-Sampled Streams pdf
    (with Andrew McGregor, A. Pavan, and Srikanta Tirthapura)
    Talk: pptx
  • STOC, Tight Bounds for Distributed Functional Monitoring pdf . Earlier full version on arXiv
    (with Qin Zhang)
    Talk: .ppt
    More recent talk: .ppt
  • ICDE, A General Method for Estimating Correlated Aggregates over a Data Stream pdf
    (with Srikanta Tirthapura)


  • FOCS, (1+eps)-Approximate Sparse Recovery pdf
    (with Eric Price)
    Talk: .ppt
  • FOCS, On the Power of Adaptivity in Sparse Recovery pdf
    (with Piotr Indyk and Eric Price)
  • DISC, Optimal Random Sampling from Distributed Streams Revisited pdf
    (with Srikanta Tirthapura)
  • ESA, Tolerant Algorithms pdf
    (with Rolf Klein, Rainer Penninger, and Christian Sohler)
  • RANDOM, Streaming Algorithms with One-Sided Estimation pdf
    (with Joshua Brody)
  • ICALP, Linear Programming and Combinatorial Bounds on Steiner Transitive-Closure Spanners pdf . Full version on arXiv
    (with Piotr Berman, Arnab Bhattacharyya, Elena Grigorescu, Sofya Raskhodnikova, and Grigory Yaroslavtsev)
  • STOC, Near-Optimal Private Approximation Protocols via a Black Box Transformation pdf
    Talk: .ppt
  • STOC, Subspace Embeddings for the L_1-norm with Applications pdf
    (with Christian Sohler)
    Talk: .ppt
  • STOC, Fast Moment Estimation in Data Streams in Optimal Space pdf , Full version on ArXiv
    (with Daniel M. Kane, Jelani Nelson, and Ely Porat)
    Talk: .ppt
  • ICS, The Complexity of Linear Dependence Problems in Vector Spaces pdf
    (with Arnab Bhattacharyya, Piotr Indyk, and Ning Xie)
    Talk: .ppt
  • SODA, Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Low Error pdf
    Full version in the special issue for Soda, 2011 in Transactions on Algorithms pdf
    (with T.S. Jayram)
    Talk: .ppt


  • FOCS, Sublinear Optimization for Machine Learning. Short version: pdf , Full version on ArXiv
    Full version in Journal of the ACM, 2012.
    Pat Goldberg Best Paper Award, 2012
    (with Ken Clarkson and Elad Hazan)
    Talk: .ppt
  • RANDOM, A Quadratic Lower Bound for Three-Query Linear Locally Decodable Codes over Any Field pdf
    Talk: .ppt
  • RANDOM, Lower Bounds for Local Monotonicity Reconstruction from Transitive-Closure Spanners Conference version: pdf
    Draft of Full version: pdf
    (with Arnab Bhattacharyya, Elena Grigorescu, Madhav Jha, Kyomin Jung, and Sofya Raskhodnikova)
  • ICALP, Additive Spanners in Nearly Quadratic Time pdf
    Talk: .ppt
  • PODS, An Optimal Algorithm for the Distinct Elements Problem pdf
    PODS Best Paper Award, 2010
    Pat Goldberg Best Paper Award, 2010
    Invited to Journal of the ACM, preliminary full version: pdf
    (with Daniel M. Kane and Jelani Nelson)
    Long Talk: .ppt Short Talk: .ppt
  • PODS, Fast Manhattan Sketches in Data Streams pdf
    (with Jelani Nelson)
  • SODA, Coresets and Sketches for High Dimensional Subspace Approximation Problems pdf
    (with Dan Feldman, Morteza Monemizadeh, and Christian Sohler)
  • SODA, Lower Bounds for Sparse Recovery pdf
    (with Khanh Do Ba, Piotr Indyk, and Eric Price)
  • SODA, On the Exact Space Complexity of Sketching and Streaming Small Norms pdf
    (with Daniel M. Kane and Jelani Nelson)
  • SODA, 1-pass Relative-Error L_p-Sampling with Applications pdf
    (with Morteza Monemizadeh)
    Talk: .ppt


  • FOCS, The Data Stream Space Complexity of Cascaded Norms pdf
    (with T.S. Jayram)
    Talk: .ppt
  • FOCS, Efficient Sketches for Earth-Mover Distances, with Applications pdf
    (with Alex Andoni, Khanh Do Ba, and Piotr Indyk)
    Talk: .ppt
  • STOC, Numerical Linear Algebra in the Streaming Model (full version) pdf
    (with Ken Clarkson)
    Talk: .ppt Longer .ppt
  • ICDT, The Average Case Complexity of Counting Distinct Elements pdf
    Talk .ppt
  • SODA, Transitive-Closure Spanners. Full version on ArXiv
    (with Arnab Bhattacharyya, Elena Grigorescu, Kyomin Jung, and Sofya Raskhodnikova)
    Talk: .ppt


  • PODS, Epistemic Privacy ps, pdf ,
    Full version in the special issue for PODS, 2010 in Journal of the ACM: pdf
    (with Alexandre Evfimievski and Ron Fagin)
  • RANDOM, Corruption and Recovery-Efficient Locally Decodable Codes (extended abstract) pdf
    Talk: .ppt


  • Ph.D. Thesis, Efficient and Private Distance Approximation in the Communication and Streaming Models pdf
    This thesis contains results from the papers "Optimal Approximations of the Frequency Moments", "Private Polylogarithmic Approximations and Efficient Matching", "Optimal Space Lower Bounds for all Frequency Moments", and "Tight Lower Bounds for the Distinct Elements Problem". It serves as a better and more thorough exposition of these papers.
  • New Lower Bounds for General Locally Decodable Codes ECCC link
  • Invited article in the Encyclopedia of Database Systems, Frequency Moments pdf
  • Eurocrypt, Revisiting the Efficiency of Malicious Two-Party Computation eprint link
    Talk: .ppt
  • SODA, The Communication and Streaming Complexity of Computing the Longest Common and Increasing Subsequences ps, pdf
    (with Xiaoming Sun)
    Talk: .ppt


  • FOCS, Lower Bounds for Additive Spanners, Emulators, and More ps, pdf
    Long talk: .ppt , FOCS talk: ppt
  • FOCS, Explicit Exclusive Set Systems with Applications to Broadcast Encryption ps, pdf
    (with Craig Gentry, Zulfikar Ramzan)
    Long Talk: .ppt Crypto rump session Talk: .ppt , FOCS talk: ppt
  • Crypto, Fast Algorithms for the Free Riders Problem in Broadcast Encryption ps, pdf Full version on eprint eprint
    (with Zulfikar Ramzan)
    Talk: .ppt
  • APPROX, Better Approximations for the Minimum Common Integer Partition Problem ps, pdf
    Talk: .ppt
  • TCC, Private Polylogarithmic Approximations and Efficient Matching. Also, ECCC ps, pdf Conference version (with some edits) ps, pdf
    (with Piotr Indyk)
    Talk: long .ppt and short .ppt
    Also see my Ph.D. thesis.


  • STOC, Optimal Approximations of the Frequency Moments of Data Streams (some typos fixed) ps, pdf
    (with Piotr Indyk)
    Talk: .ppt
    Also see my Ph.D. thesis.
  • Eurocrypt, Practical Cryptography in High Dimensional Tori. Available: eprint
    (with Marten van Dijk, Robert Granger, Dan Page, Karl Rubin, Alice Silverberg, Martijn Stam)
    Talk: .ppt
  • CCC, to appear in SICOMP, A Geometric Approach to Information-theoretic Private Information Retrieval. Also, ECCC ps, pdf
    Conference version: pdf
    (with Sergey Yekhanin)


  • SODA, Optimal Space Lower Bounds for all Frequency Moments, revised ps, conf pdf
    Talk: short .ppt and long .ppt
    Also see my Ph.D. thesis.
  • PODS, Clustering via Matrix Powering, ps, pdf
    Also, a note clarifying the algorithm txt
    Thanks to Ding Bolin for pointing this out.
    (with Hanson Zhou)
  • Crypto, Asymptotically Optimal Communication for Torus-Based Cryptography, ps, pdf
    (with Marten van Dijk)
    Talk: .ppt
  • CCS, Private Inference Control. Conference version: ps, Long version available: IACR eprint
    (with Jessica Staddon)
    Talk: Dimacs/Portia Privacy-Preserving workshop: .ppt, conf: .ppt


  • FOCS, Tight Lower Bounds for the Distinct Elements Problem, revised ps, conf pdf
    (with Piotr Indyk)
    Talk: (Also DIMACS Embeddings workshop) .ppt
    Also see my Ph.D. thesis.


  • Eurocrypt, Cryptography in an Unbounded Computational Model, ps, pdf
    (with Marten van Dijk)
    Talk: .pdf



Miscellaneous Papers

  • Master's thesis (based on Eurocrypt 2002) ps, pdf
  • Survey on stream-computing F_0 ps, pdf
    (with Johnny Chen, Naveen Sunkavalley)