If you have studied the deep learning timeline history, you may know that the big moment was in 2012 when Alex Krizhevsky presented a deep convolutional architecture able to improve the state-of-the-art error on ImageNet dataset by 11%. After that, it was proved that deep learning era had come.
Two years later a better architecture was proposed by Karen Simonyan and Andrew Zisserman from the University of Oxford. In this model, the main difference was the number of layers, where they have stacked more layers creating a deeper model and achieving a better score. A common sense of “deeper is better” appeared and researchers started to increase more and more the number of layers trying to get a new SOTA easily.
However, things started getting complicated and we were not able to improve our models just by putting more layers, which led the Microsoft researchers to develop ResNet, an architecture containing residual connections able to solve the problem of Vanishing Gradient. ResNet article has become the most cited one of human history and all the next architectures have it as reference. By now, you may have understood that VGGNets was not a big deal anymore.
But things have changed this year at the Computer Vision and Pattern Recognition Conference, considered by many to be the largest conference on computer vision and deep learning applications for the field. Ding et al., Chinese researchers, demonstrated that we can continue using VGG-style for inference using only 3×3 convolutions with ReLU, only modification the training architecture creating a multi-branch topology, which makes this model has a state-of-the-art score being faster than a ResNet-50 with higher accuracy. The inference part is obtaining by decoupling those branches and doing re-parametrization, leading to a small sequential model.
For those reasons, the paper was titled “RepVGG: Making VGG-style ConvNets Great Again”. That’s not only a cool name, but an interesting technique that is bringing to me many new ideas. Time to test it!