Projects and Groups
IBM Research Accelerated Discovery Lab
For my official bio, click on the tab above. The following is a more informal summary of some of what I do now, not a vita.
I am an IBM Fellow (there are about 75 active Fellows, from a technical population of around 190,000), recognized for my work on the integration of data and for helping to broaden the definition of information management in IBM beyond simply "database". My technical interests include all aspects of helping people find and use information. I've mostly worked in the area of distributed information systems, especially information integration.
I am the founder of a new IBM Research facility, the IBM Research Accelerated Discovery Lab. We are building a collaborative platform for research in (or leveraging) data and analytics, working with a small team and an army of volunteers from across IBM. This multidisciplinary effort is bringing together researchers in computer science, mathematics, and a variety of vertical domains from IBM, academia, government and our clients. Our goal is to develop solutions to important data-intensive challenges, while advancing the state of the art in all relevant disciplines. I drive both technical and client-facing aspects of the Lab, and I continue to do research in the area of information integration, and, more recently, what it means to "accelerate discovery".
Some Prior Work
Arguably my most influential work was done in the context of the Clio project, done jointly with colleagues at the University of Toronto (UT's Clio site). Technology from Clio has been used in many IBM products, and the pioneering concept of schema mapping spawned a new sub-field in the database research community, starting a new wave of research on information integration.
Previously, I led the Garlic project. Garlic was a distributed, heterogeneous, multimedia information system. It could be thought of as object-oriented middleware. Garlic provided its user with an integrated object-oriented view of data in a set of underlying data stores. Users could query the data, or access it from a C++ API. If the underlying data store provided search capabilities, Garlic made those capabilities available to the user. Papers on Garlic appeared in many major conferences, and the Garlic technology, particularly the wrapper and optimization technology, has been key to several IBM products and offerings. It was the power behind DiscoveryLink, our first offering when we launched the IBM Life Sciences business, and it forms the core of InfoSphere Federation Server, which in turn was the first product in InfoSphere Platform.
R* was a distributed relational database, and Starburst, an extensible, object-relational database. A lot of the technology from Starburst is now available in DB2 for Linux, Unix and Windows. Some technology from R* was used in DRDA, and some is relevant to Garlic, too.
I received my PhD from the University of Texas at Austin, and my AB from Harvard. I joined IBM in 1981, and have been here ever since, except for two sabbaticals. In 1992-3, I spent a year at the University of Wisconsin, when I worked on modeling join costs, among other things. More recently, I spent four months with the Systems Group at ETH Zurich, working on a variety of topics around information integration and distributed systems.