Image Super resolution using Convolutional Neural networks- Review and summary.
Image data quality is essential in terms of Computer vision tasks. This article provides summary and review of a paper related to super resolution of images.
Introduction
Convolutional Neural networks are generally used for Image Classification problems or Object detection, image segmentation which has either to do with some prediction or estimation. In this paper CNN is used for Single Image Super Resolution ( SISR ). This helps in various other problems related to computer vision. Before this up-sampling methods were used like Nearest neighbour interpolation, bilinear or bicubic interpolations.
Nearest Neighbours Interpolation —Interpolation by nearest neighbours is a straightforward and obvious approach. It chooses the value of the nearest pixel for each interpolated point, regardless of the value of any other pixels.
Bilinear Interpolation (BLI) — It’s a technique that conducts linear interpolation on one axis of an image before moving on to the other. Because it produces a quadratic interpolation with a receptive field size of 2x2, it outperforms nearest-neighbour interpolation while maintaining a reasonable speed.
Bicubic Interpolation (BCI) — Like cubic interpolation, bicubic interpolation (BCI) conducts it on both axes. The BCI, in comparison to the BLI, considers 4 4 pixels, resulting in smoother output with less artifacts but a much slower pace.
The SRCNN network consists of three steps using CNN layers. They are
- Patch extraction and representation,
2. Non linear mapping,
3. Reconstruction.
Related work
Among the four SISR methods- prediction models, edge based methods, image statistical methods and patch based (or example-based) methods. SRCNN uses patch based methods. Internal example-based approaches take advantage of the self similarity property of the input image to generate exemplar patches. SRCNN uses sparse coding formulation in order to map low and high resolution patches. YCbCr color channels are considered for the images.
Deep Learning for Image Restoration
Most of the image restoration deep learning methods are denoising driven. Auto encoders perform very well in the area of denoising images. Although they cannot provide end to end mapping of Low resolution to High resolution images. SRCNN focuses on addressing this problem.
CNN for Super Resolution
Consider a single low-resolution image: we first use bicubic interpolation to upscale it to the appropriate size, it is the only preprocessing we do. Let’s use the term “Y” to describe what we’re talking about. Y is the interpolated image. Our goal is to get back on track an image F(Y) that is as close to Y as possible to the high-resolution ground truth image (X). We still refer to Y as “low-resolution” due to its ease of presentation despite the fact that it is the same size as (X). We’d like to learn the F(Y) mapping, which is made up of three parts operations:
1. Patch extraction and representation: This operation takes the low-resolution image Y and extracts (overlapping) patches from it, then represents each patch as a high-dimensional vector. These vectors are made up of a set of feature maps, the number of which is equal to the vectors’ dimensions.
2. Nonlinear mapping: Each high-dimensional vector is nonlinearly mapped onto another high-dimensional vector in this process. A high-resolution patch is conceptually represented by each mapped vector. Another collection of feature maps is made up of these vectors.
3. Reconstruction: This procedure combines the high-resolution patch-wise representations mentioned before to produce the final high-resolution image. This image should resemble the X ground truth image.
Patch extraction and representation
Image restoration uses extraction of patches and representing it by a set of pretrained bases such as PCA, DCT( discrete cosine transform) etc. as a popular technique. This is the same as running the image through a series of filters, each of which is a foundation. The operation is expressed as : Here W1, B1 are filters and biases and * represents performing the convolution. W1 is n1 filters of support c x f1 x f1, where c is for channels and f1 is size of filter. And B1 is of size n1.
Non linear mapping
The non linear mapping is performed to reduce dimensions with an attempt to retain distances between data points.
Here W2 is n1x f2 x f2 xn2 and f2 = 1 and n1>n2.
Reconstruction
Finally a convolution layer is again used to produce the final high resolution image.
W3 is of size n2 x f3 x f3 and B3 is c dimensional vector.
Relationship to Sparse-Coding-Based Methods
In the case of Sparse Coding (SC), the input picture is conv by f1 and projected onto a n1-dimensional dictionary. In most cases, n1=n2 is the case with SC. Then, without reduction, n1 to n2 is mapped with the same dimensions. It’s similar to a low-resolution vector to a high-resolution vector mapping. After that, f3 reconstructs each patch. Convolution averages overlapping patches rather than putting them together with varying weights.
Training
Loss function while training images is MSE mean squared error.
For training T91 image dataset as well as ImageNet were used. For evaluation of SRCNN a popular evaluation metric in image restoration PSNR( Peak Signal to Noise Ratio) is considered. SRCNN for 91 images dataset was 31.42 and for ImageNet dataset was 35.2 dB (decibels) both performances are great as compared to the previous techniques for super resolution.
We can see that SRCNN clearly performs better. Even in other evaluation metrics for Image super resolution SRCNN performs well. Set14 dataset is sub-images from 91 images dataset which are 24.800 sub-images using stride 14 and gaussian blur.
Conclusion
Although it might not be the state of the art model for image super resolution like SRGANs etc.
SRCNN is an apt example of a simple network that produces exemplary results as it contains only 3 layers and worth for image restoration problems.
Reference :
You can refer the paper and implementation links below.