Author: | Cai, Jianrui |
Title: | Learning deep neural networks for image compression and enhancement |
Advisors: | Zhang, Lei (COMP) |
Degree: | Ph.D. |
Year: | 2020 |
Subject: | Image processing Hong Kong Polytechnic University -- Dissertations |
Department: | Department of Computing |
Pages: | xix, 147 pages : color illustrations |
Language: | English |
Abstract: | Digital cameras convert the CCD/CMOS sensor data into displayable full-color images by a set of cascaded modules, which are often called image signal processing (ISP), and compress the generated images to save storage space. However, the in-camera ISP may not be effective enough to generate photographically pleasing images due to the limited in-camera computational resources or poor imaging conditions, while the commonly used image compression techniques such as JPEG and WEBP may sacrifice much the image quality. To improve the perceptual quality of camera output images, in this thesis, we aim to develop new image enhancement and compression technologies by learning deep neural network models. For the image enhancement problem, it aims to improve the perceptual quality of an image. Generally, it can be divided into two parts, image restoration and image color mapping. For the image restoration problem, algorithms mainly focus on how to hallucinate the high-frequency detail, while for the image color mapping issue, methods aim to correct the low-frequency color tone. As for the image compression problem, we focus on the lossy image compression (LIC) task, which aims to reduce the storage space while maintaining the image quality. As one of the fundamental image restoration topics, image deblurring aims to remove the blurry artifacts caused by camera shake, object motion, and out-of-focus. In chapter 2, we propose a Dark and Bright Channel Priors embedded Network (DBCPeNet) to plug the channel priors into a neural network for effective dynamic scene deblurring. A novel trainable dark and bright channel priors embedded layer (DBCPeL) is developed to aggregate both channel priors and blurry image representations, and a sparse regularization is introduced to regularize the DBCPeNet model learning. Furthermore, we present an effective multi-scale network architecture, namely image full scale exploitation (IFSE), which works in both coarse-to-fine and fine-to-coarse manners for better exploiting information flow across scales. Single image super-resolution is another important problem in image restoration task. In chapter 3, we build a real-world super-resolution (RealSR) dataset where paired low-resolution (LR) and high-resolution (HR) images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. With the new constructed dataset, we can benchmark the real-world single image super-resolution problem. Besides, considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. As for the image color mapping problem, image contrast enhancement aims to adjust the contrast of the image, especially when the image is captured under bad lighting conditions (e.g., under/over-exposure). Different from those multi-exposure fusion based solutions, single image contrast enhancement (SICE) improves the visibility of the photo with only the given single low-contrast image. In chapter 4, we propose, for the first time to our best knowledge, to use a CNN to train a SICE enhancer. To achieve this goal, we construct a dataset of low-contrast and high-contrast image pairs. The SICE dataset contains 589 elaborately selected high-resolution multi-exposure sequences with 4,413 images. Thirteen representative multi-exposure image fusion and stack-based high dynamic range imaging algorithms are employed to generate the contrast enhanced images for each sequence, and subjective experiments are conducted to screen the best quality one as the reference image of each scene. With the constructed dataset, a CNN-based SICE enhancer is trained to improve the contrast of an under-/over-exposure image, which demonstrates significantly better performance than previous SICE methods. Finally, in chapter 5, considering that the commonly used LIC methods (i.e., JPEG, JPEG 2000 and WEBP) often introduce visible artifacts (i.e., blurring and ringing), we develop a convolutional neural network (CNN) based lossy image compressor. Specifically, we learn a single CNN to perform LIC at multiple bpp rates. A simple yet effective Tucker Decomposition Network (TDNet) is developed, where a tucker decomposition layer (TDL) is introduced to decompose the latent image representation into a set of projection matrices and a core tensor. By changing the rank of core tensor and its quantization, we can adjust the bpp rate of latent image representation within a single CNN. Furthermore, an iterative non-uniform quantization scheme is presented to optimize the quantizer, and a coarse-to-fine training strategy is introduced to reconstruct the decompressed images. In summary, in this thesis, we present two novel real-world image enhancement datasets, which provide good platforms for researchers to train and test their deep models, and develop several deep neural network models for image enhancement and compression, which demonstrate state-of-the-art performance. |
Rights: | All rights reserved |
Access: | open access |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/11048