IBM Research - Ireland Internship Project: Multi-modal Information Retrieval for Annotating Text Documents with Relevant Images - overview


Abstract:
Recent advances in word embedding approaches have taken the natural language processing and the speech processing community by storm. Joint embedding of text (ranging from character n-grams to whole documents) with images has opened up avenues to explore multi-modal data processing. In particular, this proposed project will investigate the potential effectiveness of joint embedding approaches, i.e. images with words (Frome et. al., NIPS '13), for multi-modal information retrieval. The specific application that we are interested in is to automatically enhance the readability of a text document, e.g. a Wiki page, by automatically inserting relevant images in appropriate places of the text.

 

Required skills

1. Working knowledge in recurrent neural networks, and word/document embeddings.
2. Working knowledge of extracting image features using convolutional nets.
3. Knowledge in information retrieval models.
4. Strong programming skills in a high level programming language, such as Java, Python or C/C++.
5. Quick text manipulation with shell programming, e.g. bash/awk etc.