Qi He  Qi He photo       

contact information

Research Staff Member
D3-246, Almaden Research Center, San Jose, CA 95120, USA


Professional Associations

Professional Associations:  ACM SIGIR  |  ACM SIGKDD  |  W3C - World Wide Web Consortium

In this page, I maintain large scale graph processing resources for data mining research.

Open-source large scale graph processing systems

Apache Incubator Giraph
- Scope:

Giraph is an efficient offline graph processing system for large directed/undirected weighted/weighted graphs. One unique feature is that it supports multigraphs, graphs that allow multiple heterogeneous edges between two vertexes. As an alternative to MapReduce, it keeps the entire graph in memory and is thus more efficient. The data structure of its vertex is : I - Vertex ID, V - Vertex data, E - Edge data, M - Message data.

By default the edges are directed: all edges represented in input data are out-edges extending from the vertex. For undirected graphs, you need ensure in input data two vertexes are in the out-edge list of each other.

- Interesting Slides:
  1. A preliminary introduction to Giraph given by Sebastian Schelter: Large Scale Graph Processing with Apache Giraph

- Scope:

As an offline graph processing system, PEGASUS works on undirected graphs with node ids and unweighted links between them. It is a good system for simple calculations like counting node degree, computing PageRank scores etc.

Existing Graph Mining Systems

Watson is an artificial intelligence computer system capable of answering questions posed in natural language. The machine won on Jeopardy Game in 2011. The next generation of Watson moves to graph reasoning on hypotheses extracted from massive data. I maintain all IBM Watson related papers with a restrict access inside IBM only: download link
List of Papers:
  1. Cover Letter
  2. Content
  3. Introduction to "This is Watson"
  4. Question analysis: How Watson reads a clue
  5. Deep parsing in Watson
  6. Textual resource acquisition and engineering
  7. Automatic knowledge extraction from documents
  8. Finding needles in the haystack: Search and candidate generation
  9. Typing candidate answers using type coercion
  10. Textual evidence gathering and analysis
  11. Relation extraction and scoring in DeepQA
  12. Structured data and inference in DeepQA
  13. Special Questions and techniques
  14. Identifying implicit relationships
  15. Fact-based question decomposition in DeepQA
  16. A framework for merging and ranking of answers in DeepQA
  17. Making Watson fast
  18. Simulation, learning, and optimization techniques in Watson’s game strategies
  19. In the game: The interface between Watson and Jeopardy!

Facebook Graph Search
Query examples
  1. Searching people: “friends of friends who are single men in San Francisco and who are from India”.
  2. Searching photos: “photos of my friends taken in Paris”, “photos of my friends taken in national parks”, “photos I like”.
  3. Searching interests: “movies my friends like”, “TV shows my friends like”, “Videos by TV shows liked by my friends”, “TV shows liked by doctors”, "what kind of music people who like Mitt Romney or Barack Obama like".
  4. Searching places: “bars in Dublin liked by people who live in Dublin”, "people who have been to Ireland".