CROSS-INPUT SUPER RESOLUTION
APPLIED TO SEA SURFACE HEIGHT DOWNSCALING
APPLIED TO SEA SURFACE HEIGHT DOWNSCALING
We explore a method to downscale Sea Surface Height using features from Sea Surface Temperature which comes at a higher resolution, based on the Laplacian Pyramid Network.
INTRODUCTION
The ocean flux circulation is a key player in global climate regulation. Specifically, Sea Surface Height (SSH) provides critical information on ocean currents and is correlated with Sea Surface Temperature (SST). However, SSH has to be estimated by satellite altimetry [1] whereas SST can be measured with infrared or microwave sensors at a higher resolution [2]. Improving the resolution of SSH data would provide more refined and accurate information. Fortunately, downscaling allows us to reconstruct high-resolution (HR) data from low-resolution (LR) observations.
Here, we adapt the cross-input network given by [3] - which uses Infrared as LR and RGB images as HR - to our own SST downscaling task.
PREVIOUS WORKS
SRCNN
The SRCNN published in 2014 [4] is the first convincing deep learning method. It is a 3-layer CNN which extracts convolutional features to reconstruct the image, with each following layer decreasing the number of channels of the previous one.
Although it is faster than any solution prior to deep learning methods, it has some drawbacks : this kind of solution pre-samples LR into the desired HR. This is a relatively straightforward solution and it reduces the learning difficulty, but pre-upsampling methods such as bicubic interpolation tend to amplify noise and blur edges [5].
LAPLACIAN PYRAMID NETWORK
An interesting alternative is the Laplacian Pyramid Network (LapSRN) [6]. It uses progressive upsampling to avoid said drawbacks, while speeding up computations by reducing the number of learnable parameters. Instead of pre-upsampling the input to the desired output’s dimensions, the network extract convolutional features at the input dimension, then upsample – let’s say by a factor 2 – the result then further extract features, and so on and so forth until the desired dimensions are reached.
Each pass being added on top of the previous one at higher dimensions, it can be represented by a pyramid, hence the name. Upsampling is done with transposed convolution, also known as deconvolution which may result in checkerboard artifacts [5], due to patch overlapping.
RGB-IR CROSS-INPUT NETWORK
The RGB-IR cross input and sub-pixel upsampling network is based on LapSRN and keeps most of its core features. The trick here is to use the LapSRN architecture to use HR RGB images as inputs for the feature extraction branch and keep LR IR for the image reconstruction one. With such a model, repeated standard interpolation-based methods for upsampling can not be used since they cause loss of details [5] – and it cannot directly concatenate images of different nature, unlike the LapSRN. Plus, classical bilinear or bicubic interpolation are not learnable, that means, it can only fit as a pre or post-upsampling method and cannot be used in-between convolutional layers – which, again, is exactly what we are aiming at. We instead use sub-pixel convolution as an end-to-end learnable upsampling method which provides more contextual information at the cost of, yet again, generating checkerboard artifacts.
METHOD
We stick to the core architecture of the LapSRN while taking into account the cross-input particularity of the RGB-IR network, since Du & al. do not elaborate on its detailed architecture – nor do they provide their code at time of writing. Thus, we end up with something like that :
RESULTS
We test the performances of our method against those of SRCNN and standard bicubic interpolation as our baseline, for two SR task : one at a very low resolution and one at a low resolution. Here are some visual outputs :
At lower resolutions, our method is more likely to render the essential structures of the HR images as it beats the SRCNN. However, at higher resolutions, it does not even compete with the bicubic interpolation. This may have two explanations. The first is that both SST and SSH offer rather smooth shapes and detailed compared to other data SR is usally used for. This means that the oversmoothing due to the interpolation may not be as much as a defect as it uses to - and so the baseline is hard to beat. The other explanation can be found by looking at error maps.
The top-right corner really shows typical checkerboard artifact that the interpolation does not suffer from. Hopefully, Aitken & al. proposed a layer initialization method to avoid such artifacts [7]. The first way of improving the network would be without a doubt to implement this initializer.
For further details about this work, including state-of-the-art, implementation details and result analysis, have a look at the full pdf available here.
CODE AVAILABILITY
All code used for this project is available on Github.
REFERENCES
- L. Fu and A. Cazenave, "Satellite Altimetry and Earth Sciences: A Handbook of Techniques and Applications" , 2001
- K. Pearson, S. Good, C. J. Merchant, C. Prigent, O. Embury, and C. Donlon, “Sea surface temperature in global analyses: Gains from the copernicus imaging microwave radiometer” , 2019
- J. Du, H. Zhou, K. Qian, W. Tan, Z. Zhang, L. Gu, and Y. Yu, “RGB-IR Cross Input and Sub-Pixel Upsampling Network for Infrared Image Super-Resolution” , 2019
- C. Dong, C. C. Loy, K. He, and X. Tang, “Image Super-Resolution Using Deep Convolutional Networks" , 2014
- Z. Wang, J. Chen, and S. C. H. Hoi, “Deep Learning for Image Super-Resolution: A Survey” , 2021
- W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks” , 2019
- A. P. Aitken, C. Ledig, L. Theis, J. Caballero, Z. Wang, and W. Shi, “Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize” , 2017