Deep Learning Algorithms & Infrastructure @ IBM Research India - overview
Deep learning techniques have ushered in huge advances to the state of the art in a number of important domains such as speech, image and natural language understanding. Furthermore, deep neural networks are being explore for several other complex tasks such as natural language generation, conversational systems, generating images and music, and so on. As these networks grow deeper and more complex, they need more and more compute power to be trained. This leads to an interesting intersection of Neural Network Training and High Performance Computing techniques to make the process more efficient over large computing infrastructure.
Team at India Research Lab is involved in all aspects of deep learning from inventing new algorithms for natural language understanding & generation to image understanding & generation all the way upto optimization of core deep learning algorithms (e.g. synchronous/asynchronous SGD), communication libraries and individual DL platforms (Torch, Tensorflow,..). on large & heterogeneous GPU clusters. We are also working on an important aspect of deep learning, i.e., consumability. We are creating some tools to be used by application developers with varying exposure to deep learning to easily developing deep learning models and train them efficiently without writing any code!
Here are a sample set of projects going on at the lab:
Torch Optimization: The goal of this effort is to optimize Torch on single node as well as distributed systems. The optimizations encompass improving I/O performance, improving GPU utilization and improving accuracy for specific models with appropriate initialization and normalization. Some of these optimizations are released at https://github.com/soumith/imagenet-multiGPU.torch/pull/94.
Asynchronous SGD for distributed deep learning: This project aims to investigate and improve the state-of-the-art algorithms in asynchronously distributed optimization for machine learning. Distributed synchronous optimization is a proven strategy to accelerate learning in a big-data setting, but they are constrained by comparatively slow machines in a heterogeneous system/architecture. Asynchronous counterparts are known to be much faster, at the cost of occasional instability in convergence due to stale updates to the model parameters. Therefore, the goal of this work is to determine and possibly invent 'good' asynchronously distributed optimization algorithms for learning in a big-data setting, where 'good' will possibly include metrics of scalability, speed, reproducibility and accuracy.
Deep Learning Model Compression: For most of the complex DNNs, large number of weights consume considerable storage and memory bandwidth. This makes it difficult to deploy on embedded systems with limited hardware resources/ mobile system; also inferencing requires huge energy in order to fetch the weights to make the computations for dot products. The objective of model compression is to reduce the memory and disc footprint of the learned models, without sacrificing the accuracy and time to do the inferencing. The goal is to efficiently use compressed models to do the retraining and inferencing.
Deep Learning for Conversational Systems: People use context of their previous conversation while interacting with each other. For example, an utterance “and what about China?” could be interpreted only by considering the previous utterances in the conversation, e.g. “what is the population of India?” For building natural interfaces to machine it is important to resolve utterances in context of the earlier conversation. We adapt deep learning techniques to understand utterances in the context of conversation.
We are also applying deep learning to be able to generate all possible questions from a given text. It’s important to automatically generate such questions in order to train a conversational system. The problem is somewhat similar to answering questions and can be modeled similar to the machine translation problem. The added complexity is that the boundary of answer text is not known and the available data is very sparse.
Multi Modal Conversational System: In this project, we aim to automatically learn a conversational system given past human to human interactions. In one of the thread, we try to learn an end to end conversational system using deep learning techniques. Modeling of long range context is one of the big challenges in this task. We are also working on learning of conversational systems where a user can interact across different modalities such as text and images. As an example, in a fashion retail agent scenario a user may interact saying “show me more like the second shirt but in pink color”.
Deep Learning IDE: Although deep learning techniques have ushered in huge advances to several cognition tasks, training of deep learning models by normal developers remains a big challenge. Although there are several existing platforms for deep learning model development, it’s quire challenging to use them as it requires a huge learning curve and each one of them has a different syntax and format with absolutely no inter-operability. To address some of these challenges and to make deep learning consumable and accessible to larger audience, even to non-experts, we have launched a new IDE for deep learning, temporarily called IBM DARVIZ. The key aspects of IBM DARIVZ are:
- Easy designing of DL models to enable mass adaptation of developers with limited DL knowledge
- Interoperability across existing DL libraries. For example, upload a CAFFE model, we will generate its corresponding Tensorflow or Theano code
- Interpretation of DL model learning, including validation of DL model design, next layer suggestion, and hyper-parameter suggestion.
Deep-Solar-Eye: The growth of solar energy segment in the whole electricity generation is unprecedented in the last few years which has not only boosted the energy sector, but also lowered the energy price immensely. Techniques, such as reflectors, are an effective step towards improving the solar efficiency to combat the lowering energy PPA. However, various environmental induced issues, such as dust, soil, and crack, hamper the efficiency at large and are growing concern in such a competitive market. At IRL, Deep Neural Network-based solution, “Deep-Solar-Eye”, is being developed to combat the inefficiency by factoring out the issues for effective proactive measures. For example, type of dust or soil, location, and blob distribution are known to have drastically different performance impact. Deep-Solar-Eye aims at combining various input sources, such as solar panel RGB images, IR, and solar irradiance, through a multimodal neural network to efficiently localize, classify, and analyze the impact of affected areas relevant to PV performance. In contrast to traditional CNN study that focusses on objects with unique characteristics, our work aims to give attention to imperfection over object (panel). We resort to a hybrid fusion approach in which relevant features are reinforced over different layers of CNN for performance boosting. In addition to analyses, CNN approach helps to identify various complex features from different levels; thus enabling a better understanding of relevant performance factors that can help enlightening other physics-based studies.