|Towards reliable CNN architecture design for visual recognition
|Zhang, Lei (COMP)
|Neural networks (Computer science)
Image processing -- Digital techniques
Hong Kong Polytechnic University -- Dissertations
|Department of Computing
|xx, 191 pages : color illustrations
|While several popular network architectures have been developed and widely used, it remains an important topic to design effective and efficient convolutional neural network (CNN) architectures for visual recognition. The design of reliable CNN architectures faces three main challenges, including how to reduce the computational cost, how to improve the accuracy, and how to enhance the robustness against adversarial attacks. In this thesis, we study the design of reliable CNN architectures for visual recognition. In Chapter 1, we review some common CNN architectures and their design methods for visual recognition, and discuss contribution and organization of this thesis. In Chapter 2, we present a detachable second-order pooling network to improve the performance of first-order CNNs in image classification while keeping the same computational cost at testing stage. In Chapter 3, we propose to train deep CNNs with a learnable sparse transform (LST), which learns to convert the input features into a more compact and sparser domain together with the CNN training process. The proposed LST is more effective in reducing the spatial and channel-wise feature redundancies than the conventional Conv2d, and it can be efficiently implemented with existing CNN modules for seamless training and inference. We also present a hybrid LST-ReLU activation to enhance the robustness of the learned CNN models. In Chapter 4, we further improve LST to faithfully build CNNs for visual recognition. The proposed LST v2 employs hierarchical depth-wise separable convolution to allow incomplete yet flexible expansion. LST v2 can achieve comparable or even higher accuracy than LST-Net in a wide range of visual recognition tasks. Finally, in Chapter 5, we study the application of LST to adversarial attacks. A robust convolutional layer with multiple kernels, namely RConv-MK, is proposed to improve the robustness of LST against various types of image corruptions and manually designed adversarial attacks. In summary, in this thesis we present four reliable CNN architecture design methods, including a detachable second-order pooling network, a learnable sparse transform and its improved version, and a robust convolutional layer. Extensive experiments demonstrate their effectiveness and efficiency for accurate, lightweight and robust visual recognition.
|All rights reserved
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: