home resnet 主页 ResNet

#

Deep Residual Learning for Image Recognition (ResNet)
用于图像识别的深度残差学习（ResNet）

This is a PyTorch implementation of the paper Deep Residual Learning for Image Recognition.
这是论文《深度残差学习用于图像识别》的 PyTorch 实现。

ResNets train layers as residual functions to overcome the degradation problem. The degradation problem is the accuracy of deep neural networks degrading when the number of layers becomes very high. The accuracy increases as the number of layers increase, then saturates, and then starts to degrade.
ResNet 将层训练为残差函数，以克服退化问题。退化问题是指当层数变得非常高时，深度神经网络的准确性会下降。准确性随着层数的增加而增加，然后饱和，然后开始下降。

The paper argues that deeper models should perform at least as well as shallower models because the extra layers can just learn to perform an identity mapping.
该论文认为，更深的模型应该至少与更浅的模型表现一样好，因为额外的层可以学习执行恒等映射。

Residual Learning 残差学习

If $H (x)$ is the mapping that needs to be learned by a few layers, they train the residual function
如果 $H (x)$ 是需要由若干层学习的映射，他们训练残差函数

$F (x) = H (x) - x$

instead. And the original function becomes $F (x) + x$ .
取而代之。原始函数变为 $F (x) + x$ 。

In this case, learning identity mapping for $H (x)$ is equivalent to learning $F (x)$ to be $0$ , which is easier to learn.
在这种情况下，学习 $H (x)$ 的恒等映射等同于学习 $F (x)$ 为 $0$ ，这更容易学习。

In the parameterized form this can be written as,
参数化形式可以写成：

$F (x, {W_{i}}) + x$

and when the feature map sizes of $F (x, W_{i})$ and $x$ are different the paper suggests doing a linear projection, with learned weights $W_{s}$ .
当 $F (x, W_{i})$ 和 $x$ 的特征图大小不同时，论文建议进行线性投影，使用学习到的权重 $W_{s}$ 。

$F (x, {W_{i}}) + W_{s} x$

Paper experimented with zero padding instead of linear projections and found linear projections to work better. Also when the feature map sizes match they found identity mapping to be better than linear projections.
论文尝试了零填充而非线性投影，发现线性投影效果更好。此外，当特征图尺寸匹配时，他们发现恒等映射优于线性投影。

$F$ should have more than one layer, otherwise the sum $F (x, {W_{i}}) + W_{s} x$ also won't have non-linearities and will be like a linear layer.
$F$ 应该有多于一层，否则求和 $F (x, {W_{i}}) + W_{s} x$ 也将没有非线性，会像一个线性层。

Here is the training code for training a ResNet on CIFAR-10.
这是在 CIFAR-10 上训练 ResNet 的训练代码。

55from typing import List, Optional
56
57import torch
58from torch import nn
59
60from labml_helpers.module import Module

#

Linear projections for shortcut connection
快捷连接的线性投影

This does the $W_{s} x$ projection described above.
这执行了上述的 $W_{s} x$ 投影。

63class ShortcutProjection(Module):

#

in_channels is the number of channels in $x$
in_channels 是 $x$ 中的通道数
out_channels is the number of channels in $F (x, {W_{i}})$
out_channels 是 $F (x, {W_{i}})$ 中的通道数
stride is the stride length in the convolution operation for $F$ . We do the same stride on the shortcut connection, to match the feature-map size.
stride 是 $F$ 卷积操作中的步长。我们在快捷连接上使用相同的步长，以匹配特征图大小。

70    def __init__(self, in_channels: int, out_channels: int, stride: int):

#

77        super().__init__()

#

Convolution layer for linear projection $W_{s} x$
用于线性投影的卷积层 $W_{s} x$

80        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)

#

Paper suggests adding batch normalization after each convolution operation
论文建议在每次卷积操作后添加批量归一化

82        self.bn = nn.BatchNorm2d(out_channels)

#

84    def forward(self, x: torch.Tensor):

#

Convolution and batch normalization
卷积和批量归一化

86        return self.bn(self.conv(x))

#

Residual Block 残差块

This implements the residual block described in the paper. It has two $3 \times 3$ convolution layers.
这实现了论文中描述的残差块。它有两个 $3 \times 3$ 卷积层。

Residual Block

The first convolution layer maps from in_channels to out_channels , where the out_channels is higher than in_channels when we reduce the feature map size with a stride length greater than $1$ .
第一个卷积层从 in_channels 映射到 out_channels ，当我们将步长大于 $1$ 的步幅减小特征图大小时， out_channels 高于 in_channels 。

The second convolution layer maps from out_channels to out_channels and always has a stride length of 1.
第二个卷积层从 out_channels 映射到 out_channels ，并且步长始终为 1。

Both convolution layers are followed by batch normalization.
两个卷积层之后都跟着批量归一化。

89class ResidualBlock(Module):

#

in_channels is the number of channels in $x$
in_channels 是 $x$ 中的通道数
out_channels is the number of output channels
out_channels 是输出通道数
stride is the stride length in the convolution operation.
stride 是卷积操作中的步长。

110    def __init__(self, in_channels: int, out_channels: int, stride: int):

#

116        super().__init__()

#

First $3 \times 3$ convolution layer, this maps to out_channels
第一个 $3 \times 3$ 卷积层，这映射到 out_channels

119        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)

#

Batch normalization after the first convolution
首次卷积后的批归一化

121        self.bn1 = nn.BatchNorm2d(out_channels)

#

First activation function (ReLU)
第一个激活函数（ReLU）

123        self.act1 = nn.ReLU()

#

Second $3 \times 3$ convolution layer
第二个 $3 \times 3$ 卷积层

126        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)

#

Batch normalization after the second convolution
第二次卷积后的批量归一化

128        self.bn2 = nn.BatchNorm2d(out_channels)

#

Shortcut connection should be a projection if the stride length is not $1$ or if the number of channels change
如果步长不为 $1$ 或通道数发生变化，则快捷连接应为投影

132        if stride != 1 or in_channels != out_channels:

#

Projection $W_{s} x$ 投影 $W_{s} x$

134            self.shortcut = ShortcutProjection(in_channels, out_channels, stride)
135        else:

#

Identity $x$ 恒等 $x$

137            self.shortcut = nn.Identity()

#

Second activation function (ReLU) (after adding the shortcut)
第二次激活函数 (ReLU)（在添加快捷连接之后）

140        self.act2 = nn.ReLU()

#

x is the input of shape [batch_size, in_channels, height, width]
x 是形状为 [batch_size, in_channels, height, width] 的输入

142    def forward(self, x: torch.Tensor):

#

Get the shortcut connection
获取快捷连接

147        shortcut = self.shortcut(x)

#

First convolution and activation
首次卷积和激活

149        x = self.act1(self.bn1(self.conv1(x)))

#

Second convolution 第二次卷积

151        x = self.bn2(self.conv2(x))

#

Activation function after adding the shortcut
在添加快捷连接后的激活函数

153        return self.act2(x + shortcut)

#

Bottleneck Residual Block
瓶颈残差块

This implements the bottleneck block described in the paper. It has $1 \times 1$ , $3 \times 3$ , and $1 \times 1$ convolution layers.
这实现了论文中描述的瓶颈块。它有 $1 \times 1$ 、 $3 \times 3$ 和 $1 \times 1$ 个卷积层。

Bottlenext Block

The first convolution layer maps from in_channels to bottleneck_channels with a $1 \times 1$ convolution, where the bottleneck_channels is lower than in_channels .
第一个卷积层通过 $1 \times 1$ 卷积从 in_channels 映射到 bottleneck_channels ，其中 bottleneck_channels 低于 in_channels 。

The second $3 \times 3$ convolution layer maps from bottleneck_channels to bottleneck_channels . This can have a stride length greater than $1$ when we want to compress the feature map size.
第二个 $3 \times 3$ 卷积层从 bottleneck_channels 映射到 bottleneck_channels 。当我们需要压缩特征图大小时，步长可以大于 $1$ 。

The third, final $1 \times 1$ convolution layer maps to out_channels . out_channels is higher than in_channels if the stride length is greater than $1$ ; otherwise, $o u t_{c} hann e l s$ is equal to in_channels .
第三个也是最后一个 $1 \times 1$ 卷积层映射到 out_channels 。如果步长大于 $1$ ，则 out_channels 高于 in_channels ；否则， $o u t_{c} hann e l s$ 等于 in_channels 。

bottleneck_channels is less than in_channels and the $3 \times 3$ convolution is performed on this shrunk space (hence the bottleneck). The two $1 \times 1$ convolution decreases and increases the number of channels.
bottleneck_channels 小于 in_channels ，并且 $3 \times 3$ 卷积是在这个缩小的空间上执行的（因此是瓶颈）。两次 $1 \times 1$ 卷积分别减少和增加了通道数。

156class BottleneckResidualBlock(Module):

#

in_channels is the number of channels in $x$
in_channels 是 $x$ 中的通道数
bottleneck_channels is the number of channels for the $3 \times 3$ convlution
bottleneck_channels 是 $3 \times 3$ 卷积的通道数
out_channels is the number of output channels
out_channels 是输出通道数
stride is the stride length in the $3 \times 3$ convolution operation.
stride 是 $3 \times 3$ 卷积操作中的步长。

184    def __init__(self, in_channels: int, bottleneck_channels: int, out_channels: int, stride: int):

#

191        super().__init__()

#

First $1 \times 1$ convolution layer, this maps to bottleneck_channels
第一个 $1 \times 1$ 卷积层，这映射到 bottleneck_channels

194        self.conv1 = nn.Conv2d(in_channels, bottleneck_channels, kernel_size=1, stride=1)

#

Batch normalization after the first convolution
首次卷积后的批归一化

196        self.bn1 = nn.BatchNorm2d(bottleneck_channels)

#

First activation function (ReLU)
第一个激活函数（ReLU）

198        self.act1 = nn.ReLU()

#

Second $3 \times 3$ convolution layer
第二个 $3 \times 3$ 卷积层

201        self.conv2 = nn.Conv2d(bottleneck_channels, bottleneck_channels, kernel_size=3, stride=stride, padding=1)

#

Batch normalization after the second convolution
第二次卷积后的批量归一化

203        self.bn2 = nn.BatchNorm2d(bottleneck_channels)

#

Second activation function (ReLU)
第二次激活函数（ReLU）

205        self.act2 = nn.ReLU()

#

Third $1 \times 1$ convolution layer, this maps to out_channels .
第三个 $1 \times 1$ 卷积层，这映射到 out_channels 。

208        self.conv3 = nn.Conv2d(bottleneck_channels, out_channels, kernel_size=1, stride=1)

#

Batch normalization after the second convolution
第二次卷积后的批量归一化

210        self.bn3 = nn.BatchNorm2d(out_channels)

#

Shortcut connection should be a projection if the stride length is not $1$ or if the number of channels change
如果步长不为 $1$ 或通道数发生变化，则快捷连接应为投影

214        if stride != 1 or in_channels != out_channels:

#

Projection $W_{s} x$ 投影 $W_{s} x$

216            self.shortcut = ShortcutProjection(in_channels, out_channels, stride)
217        else:

#

Identity $x$ 恒等 $x$

219            self.shortcut = nn.Identity()

#

Second activation function (ReLU) (after adding the shortcut)
第二次激活函数 (ReLU)（在添加快捷连接之后）

222        self.act3 = nn.ReLU()

#

x is the input of shape [batch_size, in_channels, height, width]
x 是形状为 [batch_size, in_channels, height, width] 的输入

224    def forward(self, x: torch.Tensor):

#

Get the shortcut connection
获取快捷连接

229        shortcut = self.shortcut(x)

#

First convolution and activation
首次卷积和激活

231        x = self.act1(self.bn1(self.conv1(x)))

#

Second convolution and activation
第二次卷积和激活

233        x = self.act2(self.bn2(self.conv2(x)))

#

Third convolution 第三次卷积

235        x = self.bn3(self.conv3(x))

#

Activation function after adding the shortcut
在添加快捷连接后的激活函数

237        return self.act3(x + shortcut)

#

ResNet Model ResNet 模型

This is a the base of the resnet model without the final linear layer and softmax for classification.
这是 ResNet 模型的基础部分，不包含用于分类的最终线性层和 Softmax。

The resnet is made of stacked residual blocks or bottleneck residual blocks. The feature map size is halved after a few blocks with a block of stride length $2$ . The number of channels is increased when the feature map size is reduced. Finally the feature map is average pooled to get a vector representation.
ResNet 由堆叠的残差块或瓶颈残差块组成。特征图尺寸在经过几个步长为 $2$ 的块后减半。当特征图尺寸减小时，通道数会增加。最后，特征图经过平均池化以获得向量表示。

240class ResNetBase(Module):

#

n_blocks is a list of of number of blocks for each feature map size.
n_blocks 是每个特征图尺寸对应的块数列表。
n_channels is the number of channels for each feature map size.
n_channels 是每个特征图尺寸对应的通道数。
bottlenecks is the number of channels the bottlenecks. If this is None , residual blocks are used.
bottlenecks 是瓶颈层的通道数。如果为 None ，则使用残差块。
img_channels is the number of channels in the input.
img_channels 是输入中的通道数。
first_kernel_size is the kernel size of the initial convolution layer
first_kernel_size 是初始卷积层的核大小

254    def __init__(self, n_blocks: List[int], n_channels: List[int],
255                 bottlenecks: Optional[List[int]] = None,
256                 img_channels: int = 3, first_kernel_size: int = 7):

#

265        super().__init__()

#

Number of blocks and number of channels for each feature map size
每个特征图大小对应的块数和通道数

268        assert len(n_blocks) == len(n_channels)

#

If bottleneck residual blocks are used, the number of channels in bottlenecks should be provided for each feature map size
如果使用瓶颈残差块，则应为每个特征图大小提供瓶颈中的通道数

271        assert bottlenecks is None or len(bottlenecks) == len(n_channels)

#

Initial convolution layer maps from img_channels to number of channels in the first residual block (n_channels[0] )
初始卷积层从 img_channels 映射到第一个残差块中的通道数 ( n_channels[0] )

275        self.conv = nn.Conv2d(img_channels, n_channels[0],
276                              kernel_size=first_kernel_size, stride=2, padding=first_kernel_size // 2)

#

Batch norm after initial convolution
初始卷积后的批量归一化

278        self.bn = nn.BatchNorm2d(n_channels[0])

#

List of blocks 块列表

281        blocks = []

#

Number of channels from previous layer (or block)
前一层（或块）的通道数

283        prev_channels = n_channels[0]

#

Loop through each feature map size
遍历每个特征图大小

285        for i, channels in enumerate(n_channels):

#

The first block for the new feature map size, will have a stride length of $2$ except fro the very first block
新特征图大小的第一个块的步长为 $2$ ，但第一个块除外

288            stride = 2 if len(blocks) == 0 else 1
289
290            if bottlenecks is None:

#

residual blocks that maps from prev_channels to channels
将 prev_channels 映射到 channels 的残差块

292                blocks.append(ResidualBlock(prev_channels, channels, stride=stride))
293            else:

#

bottleneck residual blocks that maps from prev_channels to channels
瓶颈残差块，从 prev_channels 映射到 channels

296                blocks.append(BottleneckResidualBlock(prev_channels, bottlenecks[i], channels,
297                                                      stride=stride))

#

Change the number of channels
改变通道数

300            prev_channels = channels

#

Add rest of the blocks - no change in feature map size or channels
添加其余块——特征图大小或通道数不变

302            for _ in range(n_blocks[i] - 1):
303                if bottlenecks is None:

#

residual blocks 残差块

305                    blocks.append(ResidualBlock(channels, channels, stride=1))
306                else:

#

bottleneck residual blocks
瓶颈残差块

308                    blocks.append(BottleneckResidualBlock(channels, bottlenecks[i], channels, stride=1))

#

Stack the blocks 堆叠块

311        self.blocks = nn.Sequential(*blocks)

#

x has shape [batch_size, img_channels, height, width] x 的形状为 [batch_size, img_channels, height, width]

313    def forward(self, x: torch.Tensor):

#

Initial convolution and batch normalization
初始卷积和批归一化

319        x = self.bn(self.conv(x))

#

Residual (or bottleneck) blocks
残差（或瓶颈）块

321        x = self.blocks(x)

#

Change x from shape [batch_size, channels, h, w] to [batch_size, channels, h * w]
将 x 的形状从 [batch_size, channels, h, w] 更改为 [batch_size, channels, h * w]

323        x = x.view(x.shape[0], x.shape[1], -1)

#

Global average pooling 全局平均池化

325        return x.mean(dim=-1)

Deep Residual Learning for Image Recognition (ResNet)用于图像识别的深度残差学习（ResNet）

Residual Learning 残差学习

Linear projections for shortcut connection快捷连接的线性投影

Residual Block 残差块

Bottleneck Residual Block瓶颈残差块

ResNet Model ResNet 模型

Deep Residual Learning for Image Recognition (ResNet)
用于图像识别的深度残差学习（ResNet）

Linear projections for shortcut connection
快捷连接的线性投影

Bottleneck Residual Block
瓶颈残差块