IBM Research - Ireland Internship Project: Automated extracting of information from scientific and patent literature on chemistry - overview
This internship project will develop new algorithms to extract and understand information in text and images in scientific and patent literature documents on chemistry.
Collecting important information from such sources in an automated fashion is a key technology for artificial intelligence since the number of available documents and their complexity is continuously increasing. Unlike texts commonly found in Natural Language Processing (NLP) applications, the chemical literature contains crucial information like chemical structures and reaction pathways in images. Therefore this internship will focus on combining NLP with computer vision and image processing technologies to improve existing methods.
Good programming language skills are required to implement algorithms in either Python, Java/Scala or C++. Research expertise or experience in NLP, computer vision, image processing or Machine Learning is required. Familiarity with chemistry is an advantage.