700字范文 > #论文《Deep Residual Learning for Image Recognition》

#论文《Deep Residual Learning for Image Recognition》

时间：2021-08-03 16:14:06

的paper，微软何凯明等。

解决的问题：

梯度消失/爆炸问题；

之前的解决方案：This problem, however, has been largely addressed by normalized initial- ization [23, 9, 37, 13] and intermediate normalization layers [16], which enable networks with tens of layers to start con- verging for stochastic gradient descent (SGD) with back- propagation [22].

The degradation (of training accuracy) ：可能深层的网络训练结果反而没有浅层好，并且这个问题并非由过拟合导致。结果，之前最深的网络基本在30层以内。

解决的思路：

residual learning reformulation ：拟合残差，来解决梯度消失/爆炸问题。

假设输出是H(x)，某层可以拟合残差F(x) = H(x) - x。理论上等价。但是，H(x)中要经过多层非线性转换，可能有梯度问题。而用残差你和，就没有这个问题了。

认为高深度网络不应该比低深度网络差，通过shortcut来做恒等映射，使得至少让高深度网络表现不差于低深度网络。这样，以后可以在计算能力可以的情况下，可以增加任意多层。

附加的问题和结果:

最大训练到152层（ImageNet）和1000层（CIFAR-10）。

计算量并没有大很多。参数也是。

实验表现很好，包括ImageNet（达到3.57%的top 5错误率，拿了第一名），COCO（提升28%左右）。

其他：

文章里主要和VGG plain网络进行对比。

参考翻译的一篇文章：