Qi He  Qi He photo       

contact information

Research Staff Member
D3-246, Almaden Research Center, San Jose, CA 95120, USA
  +1dash408dash927dash1871

links

Professional Associations

Professional Associations:  ACM SIGIR  |  ACM SIGKDD  |  W3C - World Wide Web Consortium




Reasoning Entity Relationships in Life Science


Role: project leader and service developer (Solr, Java Servlet, Python).

Duration: 2012 - Now.

Description: The objective of this project is to integrate and analyze large-scale heterogeneous data sources (e.g., Wikipedia, patents, Medline articles, DrugBank, Chebi ontologies, Clinical trials, etc.) to build a huge online entity-level information network, in which nodes are heterogeneous chemical entities like compounds, drugs, diseases, targets, genes and mesh terms, and edges are relationships between entities. One research challenge is to infer unknown relationships beyond the simple co-occurrence statistics between entities. For example, given the state-of-the-art biological role ontology, only thousands of belong-to relationships are directly observable for chemical compounds from millions of the same-type relationships. Then, all known and inferred unknown relationships will be profiled by integrated data. As the first step, we have finished the prediction of drugs for biological roles based on the labeled-LDA model and the prediction of drugs for diseases based on the Chi-square measure and a novel language model by sampling a pair of features simultaneously. I built an online system to allow the user to issue one biological role or one disease as the query for drug prediction, and used Wikipedia and all types of entities to reason the predicted relationship. To approach the complete entity network, we shall reason other types of relationships like drug-target, target-gene etc. as the following steps. Finally, the ultimate goal is to use this entity network to support the deep QA in the domain of life science.

The project's internal Web service: ChemPrediction





Company Search and Patent Licensing Recommendation


Role: project leader and service developer (Solr, Java Servlet, Python).

Duration: 2010 – 2012.

Description: It is an online patent recommendation system that locates the patents of the right technologies for licensing to prospective clients (companies), which is a more than one billion USD business annually to IBM. Searching for right technologies from multiple massive data sources for a value presentation to customers is a typical human labor-intensive task in the past. I built a prospective client driven technology recommendation system to enable the automatic search and profiling of technologies and companies for patent licensing. The idea has been to make use of knowledge from the encyclopedia Wikipedia in conjunction with 12 millions patent documents to develop an online technology recommendation system for prospective clients of IBM. This system helped internal customers (IBM T&IP Licensing Department) to successfully locate the right patents for licensing to prospective clients like Logitech in 2011. In December of 2011, the system was also demonstrated at the SIIP Consortium meeting to IBM clients where it was positively received, thus helping to obtain renewals of external contracts ($800K for the next phrase research, Reasoning Entity Relationships in Life Science, in 2012). Now, the system has been successfully integrated into the IBM commercial product – SIIP, a patent and scientific literature search engine and analytics platform.

The project's internal Web service: SPAR