**2019**

Unsupervised Learning by Competing Hidden Units
Dmitry Krotov, John Hopfield
*Proceedings of the National Academy of Sciences of the USA* *116*(*16*), 7723-7731, 2019
Abstract
It is widely believed that end-to-end training with the backpropagation algorithm is essential for learning good feature detectors in early layers of artificial neural networks, so that these detectors are useful for the task performed by the higher layers of that neural network. At the same time, the traditional form of backpropagation is biologically implausible. In the present paper we propose an unusual learning rule, which has a degree of biological plausibility and which is motivated by Hebb's idea that change of the synapse strength should be local - i.e., should depend only on the activities of the pre- and post-synaptic neurons. We design a learning algorithm that utilizes global inhibition in the hidden layer and is capable of learning early feature detectors in a completely unsupervised way. These learned lower-layer feature detectors can be used to train higher-layer weights in a usual supervised way so that the performance of the full network is comparable to the performance of standard feedforward networks trained end-to-end with the backpropagation algorithm on simple tasks.

**2018**

Dense associative memory is robust to adversarial inputs
Dmitry Krotov, John Hopfield
*Neural Computation* *30*(*12*), 3151-3167, MIT Press, 2018
Abstract Also available as an arXiv preprint: https://arxiv.org/abs/1701.00939
Deep neural networks (DNNs) trained in a supervised way suffer from two known problems. First, the minima of the objective function used in learning correspond to data points (also known as rubbish examples or fooling images) that lack semantic similarity with the training data. Second, a clean input can be changed by a small, and often imperceptible for human vision, perturbation so that the resulting deformed input is misclassified by the network. These findings emphasize the differences between the ways DNNs and humans classify patterns and raise a question of designing learning algorithms that more accurately mimic human perception compared to the existing methods.
Our article examines these questions within the framework of dense associative memory (DAM) models. These models are defined by the energy function, with higher-order (higher than quadratic) interactions between the neurons. We show that in the limit when the power of the interaction vertex in the energy function is sufficiently large, these models have the following three properties. First, the minima of the objective function are free from rubbish images, so that each minimum is a semantically meaningful pattern. Second, artificial patterns poised precisely at the decision boundary look ambiguous to human subjects and share aspects of both classes that are separated by that decision boundary. Third, adversarial images constructed by models with small power of the interaction vertex, which are equivalent to DNN with rectified linear units, fail to transfer to and fool the models with higher-order interactions. This opens up the possibility of using higher-order models for detecting and stopping malicious adversarial attacks. The results we present suggest that DAMs with higher-order energy functions are more robust to adversarial and rubbish inputs than DNNs with rectified linear units.

doi
Also available as an arXiv preprint: https://arxiv.org/abs/1701.00939
**2017**

Feature to prototype transition in neural networks
Dmitry Krotov, John Hopfield
*APS March Meeting*, 2017
Abstract
Models of associative memory with higher order (higher than quadratic) interactions, and their relationship to neural networks used in deep learning are discussed. Associative memory is conventionally described by recurrent neural networks with dynamical convergence to stable points. Deep learning typically uses feedforward neural nets without dynamics. However, a simple duality relates these two different views when applied to problems of pattern classification. From the perspective of associative memory such models deserve attention because they make it possible to store a much larger number of memories, compared to the quadratic case. In the dual description, these models correspond to feedforward neural networks with one hidden layer and unusual activation functions transmitting the activities of the visible neurons to the hidden layer. These activation functions are rectified polynomials of a higher degree rather than the rectified linear functions used in deep learning. The network learns representations of the data in terms of features for rectified linear functions, but as the power in the activation function is increased there is a gradual shift to a prototype-based representation, the two extreme regimes of pattern recognition known in cognitive psychology.

**2016**

Dense associative memory for pattern recognition
Dmitry Krotov, John Hopfield
*Advances in Neural Information Processing Systems*, *pp. 1172-1180*, 2016
Abstract
A model of associative memory is studied, which stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. One limit is referred to as the feature-matching mode of pattern recognition, and the other one as the prototype regime. On the deep learning side of the duality, this family corresponds to feedforward neural networks with one hidden layer and various activation functions, which transmit the activities of the visible neurons to the hidden layer. This family of activation functions includes logistics, rectified linear units, and rectified polynomials of higher degrees. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions - the higher rectified polynomials which until now have not been used in deep learning. The utility of the dense memories is illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set.

**2014**

Morphogenesis at criticality
Dmitry Krotov, Julien O Dubuis, Thomas Gregor, William Bialek
*Proceedings of the National Academy of Sciences* *111*(*10*), 3683-3688, 2014
Abstract
Spatial patterns in the early fruit fly embryo emerge from a network of interactions among transcription factors, the gap genes, driven by maternal inputs. Such networks can exhibit many qualitatively different behaviors, separated by critical surfaces. At criticality, we should observe strong correlations in the fluctuations of different genes around their mean expression levels, a slowing of the dynamics along some but not all directions in the space of possible expression levels, correlations of expression fluctuations over long distances in the embryo, and departures from a Gaussian distribution of these fluctuations. Analysis of recent experiments on the gap gene network shows that all these signatures are observed, and that the different signatures are related in ways predicted by theory. Although there might be other explanations for these individual phenomena, the confluence of evidence suggests that this genetic network is tuned to criticality.

**2011**

Infrared sensitivity of unstable vacua
Dmitry Krotov, Alexander M Polyakov
*Nuclear Physics B* *849*(*2*), 410-432, North-Holland, 2011
Abstract
We discover that some unstable vacua have long memory. By that we mean that even in the theories containing only massive particles, there are correllators and expectation values which grow with time. We examine the cases of instabilities caused by the constant electric fields, expanding and contracting universes and, most importantly, the global de Sitter space. In the last case the interaction leads to a remarkable UV/IR mixing and to a large back reaction. This gives reasons to believe that the cosmological constant problem could be resolved by the infrared physics.