Professional AssociationsProfessional Associations: ACM SIGIR | ACM SIGKDD | W3C - World Wide Web Consortium
In this page, I maintain large scale graph processing resources for data mining research.
Open-source large scale graph processing systems
- Apache Incubator Giraph
- - Scope:
- - Interesting Slides:
- A preliminary introduction to Giraph given by Sebastian Schelter: Large Scale Graph Processing with Apache Giraph
- - Scope:
Giraph is an efficient offline graph processing system for large directed/undirected weighted/weighted graphs. One unique feature is that it supports multigraphs, graphs that allow multiple heterogeneous edges between two vertexes. As an alternative to MapReduce, it keeps the entire graph in memory and is thus more efficient. The data structure of its vertex is : I - Vertex ID, V - Vertex data, E - Edge data, M - Message data.
By default the edges are directed: all edges represented in input data are out-edges extending from the vertex. For undirected graphs, you need ensure in input data two vertexes are in the out-edge list of each other.
As an offline graph processing system, PEGASUS works on undirected graphs with node ids and unweighted links between them. It is a good system for simple calculations like counting node degree, computing PageRank scores etc.
Existing Graph Mining Systems
- Watson is an artificial intelligence computer system capable of answering questions posed in natural language. The machine won on Jeopardy Game in 2011. The next generation of Watson moves to graph reasoning on hypotheses extracted from massive data. I maintain all IBM Watson related papers with a restrict access inside IBM only: download link
- List of Papers:
- Cover Letter
- Introduction to "This is Watson"
- Question analysis: How Watson reads a clue
- Deep parsing in Watson
- Textual resource acquisition and engineering
- Automatic knowledge extraction from documents
- Finding needles in the haystack: Search and candidate generation
- Typing candidate answers using type coercion
- Textual evidence gathering and analysis
- Relation extraction and scoring in DeepQA
- Structured data and inference in DeepQA
- Special Questions and techniques
- Identifying implicit relationships
- Fact-based question decomposition in DeepQA
- A framework for merging and ranking of answers in DeepQA
- Making Watson fast
- Simulation, learning, and optimization techniques in Watson’s game strategies
- In the game: The interface between Watson and Jeopardy!
- Facebook Graph Search
- Query examples
- Searching people: “friends of friends who are single men in San Francisco and who are from India”.
- Searching photos: “photos of my friends taken in Paris”, “photos of my friends taken in national parks”, “photos I like”.
- Searching interests: “movies my friends like”, “TV shows my friends like”, “Videos by TV shows liked by my friends”, “TV shows liked by doctors”, "what kind of music people who like Mitt Romney or Barack Obama like".
- Searching places: “bars in Dublin liked by people who live in Dublin”, "people who have been to Ireland".