guanfuchen / deepnetmodel Goto Github PK

记录每一个常用的深度模型结构的特点（图和代码）

Jupyter Notebook 1.44% Python 98.56%

deep-learning alexnet resnet inception vgg googlenet object-detection object-classification convolution-lstm residual-convolution-lstm

deepnetmodel's Introduction

DeepNetModel

深度网络模型从LeNet5、AlexNet、VGGNet和ResNet等等不断改进，每一个模型独特的设计思路都值得好好记录下来，本仓库主要为了整理零散的网络资料，力求图示和代码精简地介绍每一个深度网络模型。

网络结构目录

face_detection
- ...
object_classification
- resnet
- inception
- network_in_network
- mobilenet
- shufflenet
- alexnet
- densenet
- ...
object_detection
- R-FCN
misc
- group_convolution
- normalization

ResNet

深度残差网络使得百层网络的训练成为可能，其他deep learning模型中大量采用了该架构。

ResNeXt

具体查看resnext

Network in Network

caffe model zoo中提供了ImageNet预训练模型文件Netowork in Network ILSVRC和CIFAR10预训练模型文件Network in Network CIFAR10 Model。

Inception v1,v2,v3,v4

增加Inception v1，v2，v3和v4论文思路整理，具体查看inception理解。

Xception

增加Xception论文整理思路，具体查看xception

轻量级网络

轻量级网络中经常遇到group convolution结构，相关参考group_convolution理解

MobileNet v1,v2

增加轻量级网络MobileNet v1和v2知识整理，具体查看mobilenet理解。

ShuffleNet

增加轻量级网络ShuffleNet知识整理，具体查看shufflenet理解。

AlexNet

增加AlexNet知识整理，具体查看alexnet理解。

ZFNet

增加ZFNet知识整理，具体查看zfnet理解。

VGGNet

增加VGGNet知识整理，具体查看vggnet理解。

DenseNet

增加DenseNet知识整理，具体查看densenet理解。

R-FCN

增加R-FCN知识整理，具体查看rfcn理解。

FPN

参考论文Feature Pyramid Networks for Object Detection

RetinaNet

参考论文Focal Loss for Dense Object Detection

normalization

Layer normalization
Instance Normalization: The Missing Ingredient for Fast Stylization
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
MegDet: A Large Mini-Batch Object Detector

参考资料

Deep Residual Learning for Image Recognition深度残差网络论文。
ResNet, AlexNet, VGGNet, Inception: Understanding various architectures of Convolutional Networks 科普了AlexNet等网络结构。
CNN卷积神经网络架构综述 CNN相关的网络架构综述博客，介绍了AlexNet、ZFNet、GoogLeNet、VGGNet和ResNet等常用深度神经网络，同时可以参考综述类文章An Analysis of Deep Neural Network Models for Practical Applications。
Caffe神经网络结构汇总介绍了caffe中常用的分类网络的模型结构。
自己项目的总结包括面试其中包括了一些目标检测的总结。
Convolutions Types，其中设计到非常多不同类型的卷积，可以作为细分探索。
donkey's blog，该博客对与网络模型的分类非常细致，同时其中的博客对论文的解释非常简洁，可以作为参考。
cnn-benchmarks，在不同GPU配置下不同网络的性能比较，其中包括AlexNet，ResNet，VGGNet。
pretrained-models.pytorch，其中包含了常用DL模型预训练权重。
conv-benchmark，比较了Keras和PyTorch上常用几种卷积的性能，包括conv1x1、conv3x1、conv1x3、conv3x3sep、conv3x3、conv5x5和conv3x3dilated，可以参考并今后设计用来形成自己对卷积计算在各种不同平台上的直觉。

deepnetmodel's People

Contributors

Stargazers

Watchers

Forkers

ustcpcs zy20091082 eglrp zac-ji makerkrunner lsl192317 5l1v3r1 mancy123123

deepnetmodel's Issues

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

related paper

摘要
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depthwise separable convolutions to build light weight deep neural networks. We introduce two simple global hyperparameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.

Identity Mappings in Deep Residual Networks

related paper

摘要
Deep residual networks [1] have emerged as a family of extremely deep architectures showing compelling accuracy and nice convergence behaviors. In this paper, we analyze the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation. A series of ablation experiments support the importance of these identity mappings. This motivates us to propose a new residual unit, which further makes training easy and improves generalization. We report improved results using a 1001-layer ResNet on CIFAR-10 (4.62% error) and CIFAR-100, and a 200-layer ResNet on ImageNet. Code is available at: https://github.com/KaimingHe/ resnet-1k-layers.

Deep Residual Learning for Image Recognition

related paper

摘要
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8* deeper than VGG nets [41] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers.
The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

related paper

摘要
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4, we achieve 3.08% top-5 error on the test set of the ImageNet classification (CLS) challenge.

Densely Connected Convolutional Networks

related paper

摘要
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10,CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance.

Very Deep Convolutional Networks for Large-Scale Image Recognition

related paper

摘要
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

项目下一步开发计划

删除线表示任务完成或基本完成。

阅读论文Residual Networks Behave Like Ensembles of Relatively Shallow Networks，增加对残差网络的深入理解。
~~阅读论文Wide residual networks，对wide残差网络有更深入的理解~~，实现相关代码，参考 #2
阅读论文Systematic evaluation of CNN advances on the ImageNet，对ImageNet上取得的相关进展进行了解。

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

related paper

摘要
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet [12] on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ∼13× actual speedup over AlexNet while maintaining comparable accuracy.

Wide Residual Networks

related paper

摘要
Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this paper we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16-layer-deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand-layerdeep networks, achieving new state-of-the-art results on CIFAR, SVHN, COCO, and significant improvements on ImageNet. Our code and models are available at https: //github.com/szagoruyko/wide-residual-networks.

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

related paper

摘要
The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance levels of 3D CNNs in the field of action recognition have improved significantly. However, to date, conventional research has only explored relatively shallow 3D architectures. We examine the architectures of various 3D CNNs from relatively shallow to very deep ones on current video datasets. Based on the results of those experiments, the following conclusions could be obtained: (i) ResNet-18 training resulted in significant overfitting for UCF-101, HMDB-51, and ActivityNet but not for Kinetics. (ii) The Kinetics dataset has sufficient data for training of deep 3D CNNs, and enables training of up to 152 ResNets layers, interestingly similar to 2D ResNets on ImageNet. ResNeXt-101 achieved 78.4% average accuracy on the Kinetics test set. (iii) Kinetics pretrained simple 3D architectures outperforms complex 2D architectures, and the pretrained ResNeXt-101 achieved 94.5% and 70.2% on UCF-101 and HMDB-51, respectively.
The use of 2D CNNs trained on ImageNet has produced significant progress in various tasks in image. We believe that using deep 3D CNNs together with Kinetics will retrace the successful history of 2D CNNs and ImageNet, and stimulate advances in computer vision for videos. The codes and pretrained models used in this study are publicly available.

Xception: Deep Learning with Depthwise Separable Convolutions

related paper

摘要
We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

guanfuchen / deepnetmodel Goto Github PK

deepnetmodel's Introduction

DeepNetModel

网络结构目录

ResNet

ResNeXt

Network in Network

Inception v1,v2,v3,v4

Xception

轻量级网络

MobileNet v1,v2

ShuffleNet

AlexNet

ZFNet

VGGNet

DenseNet

R-FCN

FPN

RetinaNet

normalization

参考资料

deepnetmodel's People

Contributors

Stargazers

Watchers

Forkers

deepnetmodel's Issues

Recommend Projects

Recommend Topics

Recommend Org