Model Zoo

The Neural Network
Architecture Catalog

Highlighting some of the most interesting and impactful neural network architectures through history.



Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

CoAtNets feature the unification of depthwise convolutions and self attention via relative attention in this hybrid network structure. Convolution layers and attention layers are vertically stacked in a principled way that proves to be effective for improving generalization, capacity, and efficiency.


MLP Mixer

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

A new take on vision architectures that eschew convolution and attention for only fully connected layers. MLP mixer applies spatial embeddings to the input image and then applies permutation-invariant operations through channel and token mixing layers.



Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun

ShuffleNet is a mobile optimized network that uses a combination of depthwise separable convolutions, channel shuffle operations, and pointwise convolutions to greatly reduce computational cost while maintaining accuracy.


Capsule Network

Sara Sabour, Nicholas Frosst, Geoffrey E Hinton

Capsule networks were introduced to solve some of the limitations that convolutional networks have in understanding hierarchies between objects in images. These capsules use small groups of neurons that specialize in identifying various parts of an object and their spatial relationships.



Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Lion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin

These models replace the traditional recurrent and convolutional networks with a self attention mechanism that allows the model to attend to different parts of the input sequence to generate contextual representations. The transformer consists of an encoder and decoder architecture that perform input/output mapping.



Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Weinberger

DenseNet builds on top of the ideas introduced in residual networks by implementing skip connections between a given layer and every other successive layer following it. Each layer receives a concatenation of all the feature maps of all the layers preceding it. This allows for learning at different levels of abstraction throughout the network.



Forrest Iandola, Song Han, Matthew Moskewicz, Khalid Ashraf, William Dally, Kurt Keutzer

SqueezeNet is notable for achieving comparable accuracy to larger neural network architectures while using significantly fewer parameters. SqueezeNet utilizes fire modules that consist of a 1x1 convolutional squeeze layer followed by a 3x3 convolutional expand layer.



Olaf Ronneberger, Philipp Fischer, Thomas Brox

U-Net is widely used in various segmentation and image-to-image translation tasks. The architecture consists of a contracting path that uses convolutional and pooling layers to capture context and a symmetric expanding path that uses deconvolutional and upsampling layers to achieve localization. The architecture's unique U-shape allows it to preserve fine-grained details during upsampling and is particularly well-suited for segmenting structures with varying shapes and sizes.



Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Residual Networks popularized skip connections in deep networks to alleviate vanishing gradient problems. The shortcut connections between blocks allows the gradients to flow directly from later layers to earlier layers, enabling depths of over a hundred layers.



Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

Inception is a deep convolutional network developed by Google researchers. It introduced the concept of inception modules, which are composed of multiple parallel convolutional layers of different kernel sizes to capture different feature scales.


Gated Recurrent Unit

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

The GRU is a streamlined variant of the LSTM recurrent network. It contains memory cells that combine the input and forget gates into a single update gate which merges hidden states. This gate reduction results in fewer parameters, faster training times and reduced computational complexity.



Karen Simonyan, Andrew Zisserman

The Visual Geometry Group developed this network architecture that is characterized by its simplicity and uniformity. This model popularized the 3x3 convolutional filter size and its design has influenced the development of modern convolutional architectures.


Variational Autoencoder

Diederik P Kingma, Max Welling

A probabilistic generative model consisting of an encoder that maps input data to a latent space that corresponds to the parameters of a variational distribution and a decoder that maps the latent representations back to the input space.



Alex Krishevsky, Ilya Sutskever, Geoffrey Hinton

The model that kickstarted the deep learning revolution. Consisting of eight layers, including five convolutional layers and three fully connected layers. AlexNet was one of the first deep convolutional neural networks to achieve state-of-the-art results on ImageNet. The original implementation split the network over two independent gpus to alleviate challenges with memory at the time.


Echo State Network

Herbert Jaeger

The echo state network is a type of recurrent architecture that utilizes a reservoir of fixed and randomly weighted connections that remain frozen during training. Only layers outside of the reservoir are optimized. The dynamic behavior of the reservoir generates complex temporal patterns which create an echo of the input signal.


Long Short-Term Memory

Sepp Hochreiter, J├╝rgen Schmidhuber

The LSTM is a recurrent network that contains a memory cell and three gating units (input, output, and forget) that regulate the flow of information into and out of the cell. The memory cell can selectively remember or forget information over long periods of time.


Boltzmann Machine

David Ackley, Geoffrey Hinton, Terrence Sejnowski

The Boltzmann Machine is a type of probabilistic graph model that consists of binary stochastic units and weights learned through an energy based minimization algorithm called contrastive divergence. The name comes from using the Boltzmann distribution to model the joint probability distribution of the input data.



Yann LeCun, Bernhard Boser, John Denker, Donnie Henderson, Richard Howard, Wayne Hubbard, Lawrence Jackel

LeNet is the earliest introduction of the modern convolutional architecture. It contains three alternating convolutional and pooling layers followed by two fully connected layers. This model was introduced alongside the back propagation algorithm which produced incredible results at the time for image classification.


Hopfield Network

John Hopfield

The Hopfield network is a type of recurrent network based on bidirectional connections between neurons. Hopfield networks are a type of Ising model (also known as spin glass) that use Hebbian learning to train associative memory systems.



Kunihiko Fukushima

A lesser known piece of history. The earliest inspiration for convolutional neural networks comes from Japan in the midst of the AI winter. A breakthrough development for computer vision, the neocognitron utilizes alternating layers of locally connected feature extraction and positional shift cells.



Frank Rosenblatt

Neural network research begins with the first implementation of an artificial neuron called the perceptron. The theory for the perceptron was introduced in 1943 by McCulloch and Pitts as a binary threshold classifier. The first implementation was actually intended to be a machine rather than a program. Photocells were interconnected with potentiometers that were updated during learning with electric motors.