Medical Text Analytics       


 Anni R. Coden photo

Medical Text Analytics - overview

Our text analysis system (MedTAS) is tailored to the medical domain, the pathology version (MedTAS/P) containing additional components for extracting cancer-specific characteristics from unstructured text. It is based on Natural Language Processing (NLP) principles, and contains both rule-based and machine-learning based components and runs within the UIMA framework. An application within such a framework consists of a set of programs (annotators), each having a configuration file in XML format. The execution sequence, or pipeline, of annotators is also described in a configuration file. Configuration files can be modified with any text (XML) editor. Additionally, MedTAS/P provides a mechanism to ingest, process and use external resources, such as terminologies and ontologies.

In order to represent the relationships between cancer characteristics and to track disease progression, we developed a complete Structured Cancer Disease Knowledge Model. Concepts such as primary and metastatic tumors and lymph node status are represented, as well as the timeline of the disease progression. MedTAS/P automatically and with high precision populates large parts of the knowledge model from unstructured textual sources for comparison and summarization of a patient’s disease status. A light-weight web based application QbM/C (Query by Model/Cancer) allows for searching and browsing of patient data in a simple way that exploits the relationships represented within the knowledge model.

Figure 1 shows a user query: Find all patients who have adenocarcinoma in the colon with a tumor bigger than 2 cm, positive lymph nodes and whose cancer has metastazised.

Figure 2 shows a result screen: Three patients were identified. The complete diagnosis of one of the patients is shown: two primary tumors, one metastatic tumor and characteristics about the tumors and lymph nodes. The full report of a patient can be viewed by clicking the appropriate button.

Figure 1
Figure 2


  1. The MedTAS Movie

  2. The Mayo Clinic Movie

  3. The Query by Model / Cancer Movie