Qi He  Qi He photo       

contact information

Research Staff Member
D3-246, Almaden Research Center, San Jose, CA 95120, USA
  +1dash408dash927dash1871

links

Professional Associations

Professional Associations:  ACM SIGIR  |  ACM SIGKDD  |  W3C - World Wide Web Consortium


Qi He - IBM Research


Event-based Social Network


Duration: 2012 – Now.

Description: Newly emerged event-based online social services, such as meetup.com and plancast.com, have experienced increased popularity and rapid growth. From these services, we observed a new type of social network -- Event-based Social Network (EBSN). An EBSN does not only contain online social interactions as in other conventional online social networks, but also includes valuable offline social interactions captured in offline activities. In our research, we investigated Meetup EBSN properties and discovered many unique and interesting characteristics, such as heavy-tailed degree distributions and strong locality of social interactions. We believe the event-based social network is a new type of social network that deserves in depth study and research collaborated from people with various backgrounds and expertise.



ebsn degree distribution




EBSN defines a new genre of co-presence social networks where users are linked together through their online virtual event group co-participation behaviors or offline real event co-checkin behaviors or both. The left figure depicts the heavy-tailed degree distributions of EBSN whose tails are not exponentially bounded: much heavier than traditional location-based social networks.


Context-aware Citation Recommendation


Duration: 2008 – 2010.

Description: Recommend papers to cite for every place that the scientists want to make some citations in their paper manuscripts. During this study, we focus on context and citation network analyses for citation recommendation, where a context is defined as words surrounding each citation place and naturally explains the motivations with which the scientists would like to make citations. High quality citation recommendation is challenging: not only should the citations recommend be relevant to the paper under composition, but also should match the local contexts of the places citations are made. I built a citation recommendation prototype in CiteSeerX by asking the user to provide citation contexts manually, and designed a novel non-parametric probabilistic model, which can measure the context-based relevance between a citation context and a document. We also automatically extracted citation contexts from free text as the following step. The related work of exploring influences of one scientist over another or one topic over another in citation and co-authorship networks has been conducted independently.


G1
G1: probability that a random context drawn from the unit vector universe is relevant to both d1 and d2.
G2
G2: probability that a random context drawn from the out-link context set of the query manuscript d1 is relevant to both d1 and d2.
G2
L1: the relevance of d2 to x without considering the other local contexts of d1 and the generative process of x.
G2
L2: how likely a random context drawn from the uniform distribution over all unit vector contexts is relevant to both d1 and d2.
G2
L3: probability that a random context drawn from the out-link context set of the query manuscript d1 is relevant to both x and d2.


Web Query Suggestion


Duration: 2007 – 2008.

Description: Explored two directions for Web search query recommendation by analyzing the MSN live search logs. First, we used the frequent historical search query patterns within query sessions to predict the next most possible query the user would issue. We observed that, query prediction accuracy significantly increases with the accumulation of query history (within the same query session). This method is especially well suited to real-life scenarios in which the user has already issued 2 or more queries and expect to receive 1 more correlated query as the suggestion. Second, we proposed another context-aware query suggestion approach implemented in two steps. In the offline model-learning step, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. Then, from session data a concept sequence suffix tree is constructed as a query suggestion model. In the online query suggestion step, the user search context is captured by mapping the query sequence issued by the user to a sequence of concepts. By looking up the concept sequence in the concept sequence suffix tree, we suggested to the user context-aware queries. This method is suitable for real-life scenarios in which the user would like to receive a query suggestion with a different yet relevant concept.




Event Detection from News Articles


Duration: 2005 – 2008.

Description: Designed and developed a set of novel time-aware models for topical analysis of text streams. Discovered that bursty temporal word features play an important role in improving topic detection performance, and venture to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, I proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. I also presented a case study of a personalized news alert application, where subscribers can specify interesting anticipatory topics, and show how a simple supervised topic transition classifier can be used to effectively identify user anticipated topics. The research work has been very successfully applied to various event detection problems, and demonstrated to outperform state-of-the-art related work in the Topic Detection and Tracking (TDT) research community.