Author: Li, Mu
Title: Learning deep neural networks for image compression
Advisors: Zhang, David (COMP) ; You, Jane (COMP)
Degree: Ph.D.
Year: 2020
Subject: Image compression -- Data processing
Image processing -- Digital techniques
Neural networks (Computer science)
Hong Kong Polytechnic University -- Dissertations
Department: Department of Computing
Pages: xvii, 125 pages : color illustrations
Language: English
Abstract: Data compression is a very basic problem in computer science and has been studied for decades. The contradiction between the increasing amount of data and the limited saving space arises soon after the invention of the computer and has never been completely solved until today. After the population of the Internet, the limited bandwidth and the increasing data to be transported became another signifcant contradiction. With the limited saving space and bandwidth, a method to compress the data into a smaller size is very valuable to the whole computer society. Recently, social media becomes a hot topic in our daily life. Media like image and video is now a major data type. Besides, the deep learning methods have shown unprecedented success and powerful fitting ability in many computer vision and natural language processing problems. In this thesis, we choose to develop better image compression methods with powerful deep learning toolkits. Image compression methods are divided according to the decoded images. One branch is the lossless image compression method which requires the decompressed images to be the same as the original images. The other branch is the lossy image compression method that allows the decompressed images to be different from the original images. With small scarification on the image quality, the lossy image compression methods could compress the image into a much smaller size. In practice, most of the popular image compression standards are lossy image compression methods. Lossy image compression methods usually are modeled as a rate-distortion optimization problem. Two issues should be considered when building a deep image compression method. The first issue is quantization. Quantization functions are generally step functions with zero gradients at almost all the regions except for several points where the gradient is infinite. All the neural networks before the quantization operation could not be optimized with the back-propagation algorithm. Another issue is how to model the discrete entropy of the codes. In the first job, we analyze the informative content in the image and build a content weighted lossy image compression framework with deep networks. Considering that in an image with an eagle flying in the blue sky, the informative part, i.e., the eagle, should be more important than the sky. Thus, when the bits used to code the image are limited, it is reasonable to allocate more bits to the informative important parts and fewer bits to code the unimportant parts instead of allocating the same bits for all the parts evenly. We introduce a side information network to summarize the informative importance of different parts of the image as the importance map and allocate the different number of bits to different parts correspondingly. And the sum of the importance map is adopted as the upper bound of the discrete entropy of the codes. For the quantization, a binarization function is adopted. And a continuous proxy function of quantization function is introduced for back-propagation to tackle the gradient problem. The whole framework consists of an analysis transform, a synthesis transform, and a side importance map network. The analysis transform takes the image as input and generates the code representations which are further quantized into discrete codes. And the codes are decoded by the synthesis transform to produce the decoded image. The model is end-to-end optimized on a subset of the ImageNet dataset and tested on Kodak dataset where it outperforms the image compression standards like JPEG and JPEG2O0O by SSIM index and generates better visual results. The first job is still inferior to the state-of-art image compression standards like BPG. We analyze the shortcomings of the first job and improve it as follows. First, a DenseBlock is introduced to build the encoder and decoder. Secondly, a channel-wise learnable quantization function is introduced by minimizing the quantization error between the proxy function and the quantization function. With smaller quantization error, the gradient produced by the proxy function would be more accurate. Especially, when the quantization error is 0, the quantization function is the same as the proxy function. The gradient estimated by the proxy function is the real gradient. Finally, we introduce a 3D mask convolutional network for post entropy coding. In the previous job, a small 3D block around a target code is extracted as the context to predict the discrete probability table of the code. Compared to the previous job, the mask CNN could employ a larger context and is computational more efficient due to the share computation. With the above improvements, our job could outperform the state-of­art image compression methods especially at low bit rates and achieve visually much better results. And we further apply the framework for task-driven image compression with task-driven distortion loss.
In the third job, we focus on entropy modeling, generalize the mask CNN proposed in the second job, and introduce a general context-based convolutional network (CCN) for efficient and effective context-based entropy modeling. The CCN is more general and can be applied for any context and coding order with the given property. The previous mask CNN could predict the probability of all the codes in parallel in encoding but have to process the codes in serial order in decoding due to the limitation of the context. For better efficiency in decoding, we proposed a 3D zigzag scanning order for the 3D code block generated by analysis transform together with a code dividing technique to cut the codes into different groups. By removing the dependency among the code in the same group, the introduce context can be used in CCNs for parallel decoding. Without a clear drop in the effectiveness, the proposed special context-based CCN can speed up the decoding process by a lot. We test the CCN for lossy and lossless image compression. For lossy image compression, we directly apply a CCN on binarized grayscale image planes to predict the Bernoulli distribution of each code. For lossy image compression, without further hypothesis on the probabilistic distribution of the codes, we adopt a mixture of Gaussian (MoG) distributions to predict the distribution of the codes whose parameters are estimated with CCNs. The discrete entropy built on the MoGs is further used as the rate loss to guide the end-to-end optimization of the transforms and the CCN based entropy model. On both lossy image compressions, the proposed CCN based entropy modeling outperforms all the current lossless image compression standards. As for lossy image compression, the proposed methods achieve state-of-art performance in low bit rate region. The traditional convolutional networks could only adopt some local information in its receptive filed for computation, and the information outside the receptive filed is usually ignored. Due to the structural limitation, the CCN can only apply local context for entropy modeling. The global context and non-local similarity are naturally discarded. In the fourth job, we dig out the non-local similarity of the codes inner the context and exploit this prior in context-based entropy modeling. The CCNs proposed in the third job are adopted to handle the local context. And a non-local attention block is introduced to combine the local representation produced by the CCNs and the non-local estimation generated by the content related weights from the global context. Also, a UnetBlock is introduced for the synthesis and analysis transforms. The width of the network, i.e., the minimum number of filters in the network, is supposed to be important in determining the performance for low distortion models. The introduced UnetBlock can help increase the width of the transforms with manageable computational consumption and time complexity. With the UnetBlock and the context-based non-local entropy modeling, the model is end-to-end optimized on images collected from the Flickr. We test the model on Kodak and Tecnick datasets and find that both of the non-local entropy modeling and the UnetBlock are effective in improving the performance and the whole model can achieve the state-of-art performance not only at low bit rate region but also high bit rate region. Among the four jobs, the fourth job achieves the best performance. To summarize, we have done four jobs for lossy image compression with deep convolutional networks. The first two of them focus on the content variant image compression framework. And the last two jobs are more general and aim to build better entropy modeling which could be used for any other image compression jobs. With the four jobs, we have achieved state-of-art performance on lossy image compression tasks.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022385358903411.pdfFor All Users2.83 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10473