IBM Research Accelerated Discovery Lab       


Laura Haas photoSARAH E. KNOOP photo

IBM Research Accelerated Discovery Lab - overview

The Path from Data to Foresight

There are now powerful tools for capturing and integrating data, as well as for extracting meaning and leveraging that meaning to create value. However, the path from raw data to insight, or better yet, predictive capability, is still long, error-prone, and expensive. To get there, data must be acquired and enhanced in type-specific ways to improve quality. Key entities must be identified, and matched across datasets. To understand the data, models must be created, and analytics developed, tested, and then deployed. With the increasing volume and velocity of data, these analytics must run efficiently, typically on highly parallel systems. The results must be interpreted, often requiring further analysis or visualization. These various tasks are all part of the discovery process, and they require a broad set of skills – most of which are not core competencies of the scientists and businesses that want to gain the insight. Significant collaboration across disciplines, departments, and in many cases, institutions, may be needed as a result.

The IBM Research Accelerated Discovery Lab is creating a plug-and-play environment for facilitating this discovery process. The environment will meet dual goals: (1) to enable research in and improvements to the tools and systems that facilitate discovery, and (2) to enable the business person or domain expert who uses the environment to focus on their investigations, alleviating the systems and data challenges to speed discovery. In other words, it will improve the technology used for discovery, while at the same time, allowing users (for example, the business analyst or scientist) to make new discoveries in their fields more easily, at a more rapid pace. The Lab will support a set of domain specific “Centers” such as, a Center for Healthcare Analytics. Research will be performed on problems relevant to the Centers (for example, finding the best policies for reducing the incidence of diabetes in a particular population, or the most effective use of limited funding to improve overall longevity, etc), and on the technology foundations for enabling these decisions (for example, tools for automating the discovery of entities and relationships in data, or new machine learning algorithms for detecting interesting correlations or trends, or tools for better visualizing data which has both structured and unstructured elements).

Key elements provided by the Accelerated Discovery Lab include a treasure trove of “pre-integrated” public and licensed data available to serve as context and the source of additional insights for project- or Center-specific data. A powerful compute platform coupled with generous storage facilities and a rich software stack form the runtime infrastructure; a library of analytics, models, and analytic tools and frameworks support new analyses; other tools enable collaboration as users create analytics and try to understand the results. Perhaps most valuable is the support for using this open platform, provided by the involvement of researchers with expertise in all aspects of the environment, including systems specialists, data scientists and domain researchers. The Accelerated Discovery Lab researchers benefit from use and requirements of “Center” researchers; and the “Center” researchers benefit from both the expertise of the Accelerated Discovery Lab researchers and the efforts to evolve the environment to better support the Centers’ needs.