Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
卷积 LSTM 网络：一种用于降水预报的机器学习方法

Xingjian Shi Zhourong Chen Hao Wang Dit-Yan Yeung
施行健周荣陈浩王迪-杨扬Department of Computer Science and Engineering
计算机科学与工程系Hong Kong University of Science and Technology
香港科技大学{xshiab, zchenbb, hwangaz, dyyeung}@cse.ust.hk

Wai-kin Wong Wang-chun Woo
黄伟健胡宏俊
Hong Kong Observatory 香港天文台
Hong Kong, China 中国香港
{wkwong,wcwoo}@hko.gov.hk

Abstract 抽象的

The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (FC-LSTM) to have convolutional structures in both the input-to-state and state-to-state transitions, we propose the convolutional LSTM (ConvLSTM) and use it to build an end-to-end trainable model for the precipitation nowcasting problem. Experiments show that our ConvLSTM network captures spatiotemporal correlations better and consistently outperforms FC-LSTM and the state-of-theart operational ROVER algorithm for precipitation nowcasting.
降水临近预报的目标是预测局部区域在相对较短时间内的未来降雨强度。此前很少有研究从机器学习的角度来研究这一关键且具有挑战性的天气预报问题。本文将降水临近预报定义为一个时空序列预测问题，其中输入和预测目标均为时空序列。通过扩展全连接长短期记忆 (FC-LSTM)，使其在输入到状态和状态到状态的转换中都具有卷积结构，我们提出了卷积长短期记忆 (ConvLSTM)，并利用它构建了一个端到端的可训练降水临近预报模型。实验表明，我们的 ConvLSTM 网络能够更好地捕捉时空相关性，并且在降水临近预报方面始终优于 FC-LSTM 和最先进的 ROVER 算法。

1 Introduction 1 简介

Nowcasting convective precipitation has long been an important problem in the field of weather forecasting. The goal of this task is to give precise and timely prediction of rainfall intensity in a local region over a relatively short period of time (e.g., 0-6 hours). It is essential for taking such timely actions as generating society-level emergency rainfall alerts, producing weather guidance for airports, and seamless integration with a longer-term numerical weather prediction (NWP) model. Since the forecasting resolution and time accuracy required are much higher than other traditional forecasting tasks like weekly average temperature prediction, the precipitation nowcasting problem is quite challenging and has emerged as a hot research topic in the meteorology community [22].
对流降水的临近预报长期以来一直是天气预报领域的一个重要课题。该任务的目标是在相对较短的时间内（例如 0-6 小时）对局部区域的降雨强度进行精确及时的预报。这对于及时采取诸如发布社会紧急降雨警报、为机场提供天气指导以及与长期数值天气预报（NWP）模型无缝集成等行动至关重要。由于对预报分辨率和时间精度的要求远高于其他传统预报任务（例如周平均气温预报），降水临近预报问题极具挑战性，并已成为气象学界的一个热门研究课题[22]。
Existing methods for precipitation nowcasting can roughly be categorized into two classes [22], namely, NWP based methods and radar echd

^{1}

extrapolation based methods. For the NWP approach, making predictions at the nowcasting timescale requires a complex and meticulous simulation of the physical equations in the atmosphere model. Thus the current state-of-the-art operational precipitation nowcasting systems [19, 6] often adopt the faster and more accurate extrapolation based methods. Specifically, some computer vision techniques, especially optical flow based methods, have proven useful for making accurate extrapolation of radar maps [10, 6, 20]. One recent progress along this path is the Real-time Optical flow by Variational methods for Echoes of Radar (ROVER)
现有的降水临近预报方法大致可分为两类[22]：基于数值天气预报（NWP）的方法和基于雷达回波

^{1}

外推的方法。对于数值天气预报（NWP）方法，在临近预报时间尺度上进行预测需要对大气模型中的物理方程进行复杂而细致的模拟。因此，当前最先进的降水临近预报业务系统[19, 6]通常采用更快、更准确的基于外推的方法。具体而言，一些计算机视觉技术，尤其是基于光流的方法，已被证明可用于对雷达地图进行精确外推[10, 6, 20]。该领域的一个最新进展是雷达回波变分光流实时计算系统（ROVER）。

algorithm [25] proposed by the Hong Kong Observatory (HKO) for its Short-range Warning of Intense Rainstorms in Localized System (SWIRLS) [15]. ROVER calculates the optical flow of consecutive radar maps using the algorithm in [5] and performs semi-Lagrangian advection [4] on the flow field, which is assumed to be still, to accomplish the prediction. However, the success of these optical flow based methods is limited because the flow estimation step and the radar echo extrapolation step are separated and it is challenging to determine the model parameters to give good prediction performance.
香港天文台（HKO）在其局部强暴雨短程预警系统（SWIRLS）[15]中提出的算法[25]。ROVER 使用[5]中的算法计算连续雷达地图的光流，并对假设静止的流场进行半拉格朗日平流[4]以完成预测。然而，这些基于光流的方法的成功率有限，因为流量估计步骤和雷达回波外推步骤是分开的，而且确定模型参数以获得良好的预测性能具有挑战性。

These technical issues may be addressed by viewing the problem from the machine learning perspective. In essence, precipitation nowcasting is a spatiotemporal sequence forecasting problem with the sequence of past radar maps as input and the sequence of a fixed number (usually larger than 1) of future radar maps as output

^{2}

However, such learning problems, regardless of their exact applications, are nontrivial in the first place due to the high dimensionality of the spatiotemporal sequences especially when multi-step predictions have to be made, unless the spatiotemporal structure of the data is captured well by the prediction model. Moreover, building an effective prediction model for the radar echo data is even more challenging due to the chaotic nature of the atmosphere.
这些技术问题可以通过从机器学习的角度看待问题来解决。本质上，降水临近预报是一个时空序列预测问题，输入是过去雷达地图序列，输出是固定数量（通常大于 1）的未来雷达地图序列

^{2}

。然而，这种学习问题，无论其确切应用如何，首先并不简单，因为时空序列的维数很高，尤其是在需要进行多步预测时，除非预测模型能够很好地捕捉数据的时空结构。此外，由于大气的混沌特性，为雷达回波数据建立有效的预测模型更具挑战性。

Recent advances in deep learning, especially recurrent neural network (RNN) and long short-term memory (LSTM) models [12, 11, 7, 8, 23, 13, 18, 21, 26, provide some useful insights on how to tackle this problem. According to the philosophy underlying the deep learning approach, if we have a reasonable end-to-end model and sufficient data for training it, we are close to solving the problem. The precipitation nowcasting problem satisfies the data requirement because it is easy to collect a huge amount of radar echo data continuously. What is needed is a suitable model for end-to-end learning. The pioneering LSTM encoder-decoder framework proposed in [23] provides a general framework for sequence-to-sequence learning problems by training temporally concatenated LSTMs, one for the input sequence and another for the output sequence. In [18], it is shown that prediction of the next video frame and interpolation of intermediate frames can be done by building an RNN based language model on the visual words obtained by quantizing the image patches. They propose a recurrent convolutional neural network to model the spatial relationships but the model only predicts one frame ahead and the size of the convolutional kernel used for state-to-state transition is restricted to 1 . Their work is followed up later in [21] which points out the importance of multi-step prediction in learning useful representations. They build an LSTM encoder-decoderpredictor model which reconstructs the input sequence and predicts the future sequence simultaneously. Although their method can also be used to solve our spatiotemporal sequence forecasting problem, the fully connected LSTM (FC-LSTM) layer adopted by their model does not take spatial correlation into consideration.
深度学习的最新进展，尤其是循环神经网络 (RNN) 和长短期记忆 (LSTM) 模型 [12、11、7、8、23、13、18、21、26]，为如何解决这个问题提供了一些有用的见解。根据深度学习方法背后的哲学，如果我们有一个合理的端到端模型和足够的数据来训练它，我们就接近解决问题了。降水临近预报问题满足数据要求，因为很容易连续收集大量雷达回波数据。我们需要的是一个合适的端到端学习模型。[23] 中提出的开创性 LSTM 编码器-解码器框架通过训练时间级联的 LSTM（一个用于输入序列，另一个用于输出序列），为序列到序列学习问题提供了一个通用框架。 [18] 表明，可以通过在量化图像块获得的视觉词汇上构建基于 RNN 的语言模型来预测下一视频帧和插值中间帧。他们提出了一种循环卷积神经网络来模拟空间关系，但该模型只能预测前一帧，而且用于状态间转换的卷积核大小限制为 1。后来 [21] 继续了他们的工作，指出了多步预测在学习有用表示中的重要性。他们构建了一个 LSTM 编码器-解码器预测器模型，可以同时重建输入序列和预测未来序列。虽然他们的方法也可以用于解决我们的时空序列预测问题，但是他们的模型采用的全连接 LSTM（FC-LSTM）层没有考虑空间相关性。

In this paper, we propose a novel convolutional LSTM (ConvLSTM) network for precipitation nowcasting. We formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem that can be solved under the general sequence-to-sequence learning framework proposed in [23]. In order to model well the spatiotemporal relationships, we extend the idea of FC-LSTM to ConvLSTM which has convolutional structures in both the input-to-state and state-to-state transitions. By stacking multiple ConvLSTM layers and forming an encoding-forecasting structure, we can build an end-to-end trainable model for precipitation nowcasting. For evaluation, we have created a new real-life radar echo dataset which can facilitate further research especially on devising machine learning algorithms for the problem. When evaluated on a synthetic Moving-MNIST dataset [21] and the radar echo dataset, our ConvLSTM model consistently outperforms both the FC-LSTM and the state-of-the-art operational ROVER algorithm.
本文提出了一种用于降水临近预报的新型卷积长短期记忆 (ConvLSTM) 网络。我们将降水临近预报定义为一个时空序列预测问题，可以利用 [23] 中提出的通用序列到序列学习框架进行求解。为了更好地建模时空关系，我们将全连接长短期记忆 (FC-LSTM) 的思想扩展为卷积长短期记忆 (ConvLSTM)，其在输入到状态转换和状态到状态转换中均采用卷积结构。通过堆叠多个卷积长短期记忆 (ConvLSTM) 层并形成编码-预测结构，我们可以构建一个端到端的可训练降水临近预报模型。为了进行评估，我们创建了一个新的真实雷达回波数据集，该数据集有助于进一步研究，尤其是在设计针对该问题的机器学习算法方面。在合成的 Moving-MNIST 数据集 [21] 和雷达回波数据集上进行评估时，我们的卷积长短期记忆 (ConvLSTM) 模型的表现始终优于全连接长短期记忆 (FC-LSTM) 和最先进的 ROVER 算法。

2 Preliminaries 2 准备工作

2.1 Formulation of Precipitation Nowcasting Problem
2.1 降水临近预报问题的公式化

The goal of precipitation nowcasting is to use the previously observed radar echo sequence to forecast a fixed length of the future radar maps in a local region (e.g., Hong Kong, New York, or Tokyo). In real applications, the radar maps are usually taken from the weather radar every 6-10 minutes and nowcasting is done for the following 1-6 hours, i.e., to predict the 6-60 frames ahead. From the ma-
降水临近预报的目标是利用先前观测到的雷达回波序列，预测局部区域（例如香港、纽约或东京）未来固定长度的雷达图。在实际应用中，通常每6-10分钟从气象雷达获取一次雷达图，并对接下来的1-6小时进行临近预报，即预测未来6-60帧的降水。从

chine learning perspective, this problem can be regarded as a spatiotemporal sequence forecasting problem.
从学习的角度来看，这个问题可以看作是一个时空序列预测问题。
Suppose we observe a dynamical system over a spatial region represented by an

M \times N

grid which consists of

M

rows and

N

columns. Inside each cell in the grid, there are

P

measurements which vary over time. Thus, the observation at any time can be represented by a tensor

X \in R^{P \times M \times N}

, where

R

denotes the domain of the observed features. If we record the observations periodically, we will get a sequence of tensors

{\hat{X}}_{1}, {\hat{X}}_{2}, \dots, {\hat{X}}_{t}

. The spatiotemporal sequence forecasting problem is to predict the most likely length-

K

sequence in the future given the previous

J

observations which include the current one:
假设我们在一个由

M

行和

N

列组成的

M \times N

网格表示的空间区域内观察一个动态系统。网格中的每个单元格内都有

P

个随时间变化的测量值。因此，任何时间的观测值都可以用一个张量

X \in R^{P \times M \times N}

表示，其中

R

表示观测特征的定义域。如果我们定期记录观测值，我们将得到一个张量序列

{\hat{X}}_{1}, {\hat{X}}_{2}, \dots, {\hat{X}}_{t}

。时空序列预测问题是根据之前的

J

个观测值（包括当前观测值）预测未来最可能的长度

K

序列：

{\tilde{X}}_{t + 1}, \dots, {\tilde{X}}_{t + K} = \underset{X_{t + 1}, \dots, X_{t + K}}{\arg max} p (X_{t + 1}, \dots, X_{t + K} ∣ {\hat{X}}_{t - J + 1}, {\hat{X}}_{t - J + 2}, \dots, {\hat{X}}_{t})

For precipitation nowcasting, the observation at every timestamp is a 2D radar echo map. If we divide the map into tiled non-overlapping patches and view the pixels inside a patch as its measurements (see Fig. 1], the nowcasting problem naturally becomes a spatiotemporal sequence forecasting problem.
对于降水临近预报，每个时间戳的观测数据都是一个二维雷达回波图。如果我们将地图划分为不重叠的平铺图块，并将图块内的像素视为其测量值（见图 1），那么临近预报问题自然就变成了一个时空序列预测问题。

We note that our spatiotemporal sequence forecasting problem is different from the one-step time series forecasting problem because the prediction target of our problem is a sequence which contains both spatial and temporal structures. Although the number of free variables in a length-

K

sequence can be up to

O (M^{K} N^{K} P^{K})

, in practice we may exploit the structure of the space of possible predictions to reduce the dimensionality and hence make the problem tractable.
我们注意到，我们的时空序列预测问题与单步时间序列预测问题不同，因为我们问题的预测目标是一个同时包含空间和时间结构的序列。虽然长度为

K

的序列中自由变量的数量最多可达

O (M^{K} N^{K} P^{K})

，但在实际应用中，我们可以利用可能预测空间的结构来降低维数，从而使问题更易于处理。

2.2 Long Short-Term Memory for Sequence Modeling
2.2 用于序列建模的长短期记忆

For general-purpose sequence modeling, LSTM as a special RNN structure has proven stable and powerful for modeling long-range dependencies in various previous studies [12, 11, 17, 23]. The major innovation of LSTM is its memory cell

c_{t}

which essentially acts as an accumulator of the state information. The cell is accessed, written and cleared by several self-parameterized controlling gates. Every time a new input comes, its information will be accumulated to the cell if the input gate

i_{t}

is activated. Also, the past cell status

c_{t - 1}

could be “forgotten” in this process if the forget gate

f_{t}

is on. Whether the latest cell output

c_{t}

will be propagated to the final state

h_{t}

is further controlled by the output gate

o_{t}

. One advantage of using the memory cell and gates to control information flow is that the gradient will be trapped in the cell (also known as constant error carousels [12]) and be prevented from vanishing too quickly, which is a critical problem for the vanilla RNN model [12] 17, 2]. FC-LSTM may be seen as a multivariate version of LSTM where the input, cell output and states are all 1D vectors. In this paper, we follow the formulation of FC-LSTM as in [11]. The key equations are shown in (2) below, where ’

\circ

’ denotes the Hadamard product:
对于通用序列建模，LSTM 作为一种特殊的 RNN 结构，已被多项先前研究证明在长程依赖关系建模方面稳定且强大 [12, 11, 17, 23]。LSTM 的主要创新之处在于其记忆单元

c_{t}

，它本质上充当了状态信息的累加器。该单元由多个自参数化的控制门访问、写入和清除。每当有新的输入到来时，如果输入门

i_{t}

被激活，其信息就会被累加到单元中。此外，如果遗忘门

f_{t}

处于开启状态，之前的单元状态

c_{t - 1}

可能会在此过程中被“遗忘”。最新的单元输出

c_{t}

是否会传播到最终状态

h_{t}

则由输出门

o_{t}

进一步控制。使用记忆单元和门来控制信息流的一个优点是，梯度将被困在单元中（也称为恒定误差轮播 [12]），并防止其消失过快，而这对于 vanilla RNN 模型 [12] 17, 2] 来说是一个关键问题。FC-LSTM 可以看作是 LSTM 的多变量版本，其中输入、单元输出和状态都是一维向量。在本文中，我们遵循 [11] 中 FC-LSTM 的公式。关键方程如下面的 (2) 所示，其中 '

\circ

' 表示 Hadamard 积：

\begin{aligned} i_{t} & = σ (W_{x i} x_{t} + W_{h i} h_{t - 1} + W_{c i} \circ c_{t - 1} + b_{i}) \\ f_{t} & = σ (W_{x f} x_{t} + W_{h f} h_{t - 1} + W_{c f} \circ c_{t - 1} + b_{f}) \\ c_{t} & = f_{t} \circ c_{t - 1} + i_{t} \circ \tanh (W_{x c} x_{t} + W_{h c} h_{t - 1} + b_{c}) \\ o_{t} & = σ (W_{x o} x_{t} + W_{h o} h_{t - 1} + W_{c o} \circ c_{t} + b_{o}) \\ h_{t} & = o_{t} \circ \tanh (c_{t}) \end{aligned}

Multiple LSTMs can be stacked and temporally concatenated to form more complex structures. Such models have been applied to solve many real-life sequence modeling problems [23, 26].
多个 LSTM 可以堆叠并在时间上连接起来，形成更复杂的结构。此类模型已被用于解决许多实际的序列建模问题 [23, 26]。

3 The Model 3 模型

We now present our ConvLSTM network. Although the FC-LSTM layer has proven powerful for handling temporal correlation, it contains too much redundancy for spatial data. To address this problem, we propose an extension of FC-LSTM which has convolutional structures in both the input-to-state and state-to-state transitions. By stacking multiple ConvLSTM layers and forming an encoding-forecasting structure, we are able to build a network model not only for the precipitation nowcasting problem but also for more general spatiotemporal sequence forecasting problems.
现在我们来介绍我们的 ConvLSTM 网络。尽管 FC-LSTM 层已被证明能够有效处理时间相关性，但它对于空间数据而言包含过多的冗余。为了解决这个问题，我们提出了一个 FC-LSTM 的扩展，它在输入到状态和状态到状态的转换中都采用了卷积结构。通过堆叠多个 ConvLSTM 层并形成一个编码-预测结构，我们不仅可以构建一个用于降水临近预报问题的网络模型，还可以构建一个用于更一般的时空序列预测问题的网络模型。

$^{1}$ In real-life systems, radar echo maps are often constant altitude plan position indicator (CAPPI) images [9].
$^{1}$ 在实际系统中，雷达回波图通常是恒定高度平面位置指示器 (CAPPI) 图像 [9]。
$^{2}$ It is worth noting that our precipitation nowcasting problem is different from the one studied in [14], which aims at predicting only the central region of just the next frame.
$^{2}$ 值得注意的是，我们的降水临近预报问题不同于 [14] 中研究的问题，后者旨在仅预测下一帧的中心区域。

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络：一种用于降水预报的机器学习方法