Vittorio Castelli  Vittorio Castelli photo         

contact information

Distinguished Research Staff Member
Thomas J. Watson Research Center, Yorktown Heights, NY USA


Professional Associations

Professional Associations:  American Statistical Association  |  IEEE   |  Sigma Xi


My current work is in the Natural Language Processing area, and focuses on machine-learning algorithms for information extraction from text and on multilingual question answering.

I manage the to the Statistical Content Analysis group of the Multilingual NLP Technologies department, part of IBM Research AI.   I lead the Domain Adaptation subtheme of the Mastering Language theme, and co-lead the Understand Language subtheme of the Neuro-Symbolic AI theme.

Currently, my main focus area is question answering including multilingual question answering. Part of my work is in collaboration with the IBM Watson Group.   Among other customer engagements, I have worked on answering questions on legal data for an IBM customer.

You can see some of the recent work from my department here: - look for the "Research Literature Q&A service" link.  Another demo of the general QA technology can be found here:  - look for "GAAMA, an answer engine for your documents". 

I have created TechQA, a challenging dataset for question answering in the technical support domain, with a leaderboard hosted here: , and described in our paper The TechQA Dataset, accepted at ACL 2020, whose authors contributed to the dataset generation and leaderboard creation.

My theoretical work is on properties of active learning algorithms, specifically on the optimal number of labeled samples as a function of the number of unlabeled samples under broad parametric assumptions.

I mentor co-workers on developing inventive ideas into patentable inventions and I serve a the lead of a technical expert board (IDT) in IBM internal evaluation of invention disclosures in the area of Natural Language Processing, and formerly in the general area of Cognitive Computing.  The latter board has been split into multiple boards alinged with various AI disciplines.  I am also a member of IDT that evaluates inventions from our Africa Labs and.

My past work in my department was mostly focused on government projects:

I was the technical lead and principal architect for the ENEX project, which developed a system that enables users to search and browse a corpus of technical or news documents in terms of entities and and their relations.

I was the technical lead for the DELPHI consortium team that participated to the BOLT IR task in the DARPA BOLT program.  The team includes IBM as the primary and Columbia, UMASS, UMD, and Stanford as partners.  I am also the architect of the DELPHI IR system.

I worked on algorithms for the DARPA GALE Distillation task (precursor to the BOLT IR task), and in the last two years of the program I was the principal architect of the distillation system for the Rosetta consortium, lead by IBM.

My previous work at IBM has been in areas including intelligent user interfaces, autonomic computing, memory compression, statistical pattern recognition, image digital libraries, data mining, and multidimensional indexing structures.

In my spare time, I have taught Information Theory, as well as Statistical Pattern Recognition at Columbia University, through the EE department.

I served as the Watson chair of the Natural Language Processing Professional Interest Community, I am a member of the board that evaluates IBM Corporate Technical Awards (the highest technical recognition within IBM), I serve on the IBM T.J. Watson Culture Club, and I server as instructor for the Family Science Saturday program (an outreach program sponsored by the IBM T.J. Watson Research Center for fourth and fifth grade students), where I also coordinate one of the courses.