Author: Animasahun, Idowu Oluwaseun
Title: Instance-based source camera identification for small size images using deep learning-based techniques
Advisors: Lam, K. M. Kenneth (EIE)
Law, Ngai-fong Bonnie (EIE)
Siu, Wan-chi (EIE)
Leung, Hung-fat Frank (EIE)
Degree: M.Phil.
Year: 2020
Subject: Image processing -- Digital techniques
Machine learning
Digital forensic science
Hong Kong Polytechnic University -- Dissertations
Department: Department of Electronic and Information Engineering
Pages: xxiv, 163 pages : color illustrations
Language: English
Abstract: Source camera identification (SCI) is an area of forensic science that has to do with attributing a photo to the camera that has captured it. SCI has been widely researched using several approaches especially the use of image photo-response non-uniformity (PRNU) fingerprints with normalized correlation and peak to energy correlation as decision parameters. Several classifiers, such as support vector machines (SVM) and neural network (NN), have been used for source camera identification. In some research works, deep learning methods, such as convolutional neural networks (CNN), have been used for camera identification. However, most of the proposed methods have considerably good identification accuracy for identifying the camera models, but poor identification accuracy for individual instance-based SCI. Furthermore, existing source camera identification algorithms mostly have good performance for images of size higher than 256 × 256. Motivated by the knowledge gap in the literature, in the thesis, we propose methods for robust deep learning method, so as to achieve high discriminative power for instance-based SCI for small-sized images. The small-sized images can be due to low resolution or a cropped image patch from an image. Moreover, most of the deep learning-based camera identification methods use images directly as input into the deep networks. However, the contents of the images suppress the camera features, and this has a negative impact on the identification accuracies of cameras. Therefore, we propose the use of noise residues or individual PRNU images so as to suppress the contamination of camera features by image contents. The proposed noise residues are also pre-processed by zero-meaning so as to remove linear patterns, and as a normalization technique for our input data. The work is useful in applications, such as splicing translocations and small-sized forgery detection. Firstly, we proposed a stacked sparse autoencoder (SSAE) for SCI. Autoencoder is an auto-associative architecture, which is considered suitable for learning input data that is not completely random. Since PRNU is Gaussian distributed and is not completely random, so the autoencoder implemented can learn some interesting structure from the pre-processed noised residues of cameras. The robust features of camera characteristics are learned through stacking several encoding layers of autoencoders recursively. These robust features are then taken as inputs to a regularised softmax classifier for probabilistic predictions of the source camera. We investigated the structure of the SSAE and the hyper-parameters that give optimal performance on our data. For all our proposed deep learning methods, the cross-entropy loss function was used. Furthermore, mini-batch stochastic optimisation was used for updating the network weights. Experimental results on 20 cameras from the Dresden database show that the proposed method achieves comparable identification accuracy when compared with some state-of-the-art methods. The proposed SSAE also generalizes well using the same hyper-parameters on different cameras sets.
Secondly, we propose a robust deep CNN architecture for instance-based SCI. The proposed neural network consists of three convolutional layers and two fully connected layers. The convolutional layer of the proposed CNN includes processing operations, such as convolution, strides, batch normalization, and leaky rectilinear activation (Leaky ReLU). Strided convolution was used as the downsampling operation, instead of the max-pooling in our proposed method. This is because maxpooling aggressively downsamples feature maps, and the quality of the PRNU signal is dependent on the total number of pixel values. Therefore, it will affect the quality of the feature maps. Dropout layers are also used in the fully connected layers to prevent network overfitting. Our dataset consists of a cameras with an unequal number of images (an unbalanced dataset), so we introduced the use of a class weight to the training function. Class weights penalize under or over-represented classes during training. To further prevent overfitting of deep networks and reduce unnecessary computation during training, we adopted the use of early stopping. The output of the FC2 is given as input into a regularized softmax classifier (CNN-SC) for probabilistic prediction of camera classes. Furthermore, after training the proposed CNN, the flattened output of the third convolutional layer with a linear activation was extracted and used as the embedded layer for one-vs- rest linear support vector machines (CNN-SVM). Using one-vs-rest linear SVM classifier gives room for more training samples in a training set for each phase of training. Furthermore, the proposed deep CNN model was also pre-trained on 10 non-target camera classes and fine-tuned on 10 target camera classes. A comparative study with some state-of-the-art methods was carried out. Experimental results show that the identification accuracies of our proposed CNN-based methods (CNN-SC and CNN-SVM) are 18%-25.6% and 20.37%-25.02% higher than four other PRNU-based SCI methods, without and with fine-tuning on a deep CNN pretrained model. We also compared our method with a deep learning-based method, namely content-adaptive fusion networks (CA-FRN). Our proposed CNN-based methods (CNN-SVM without and with fine-tuning) have the identification accuracy 6.12-10.02% lower than CA-FRN for camera brand identification, but have identification accuracy 1.34% and 11.02-12.83% higher than CA-FRN for camera model identification and camera device identification, respectively. Furthermore, we have evaluated the effectiveness of our proposed deep CNN, fine-tuned on non-target cameras, under geometric distortions, such as JPEG compression with quality factors of 95, 90 and 80, before the extraction of the noise residues of images. The average identification accuracies with post JPEG compression on targeted camera classes are 0.68%, 1.74%, and 3.37% lower than the average identification accuracy, without post JPEG compression, for the quality factors of 95, 90 and, 80 respectively. This shows that our proposed CNN-system, with fine-tuning is robust to post JPEG compression with only little reduction in accuracy, as compared to those images not being compressed. Finally, we propose a learning method to extract the PRNU fingerprint and to perform camera identification. The extraction of the PRNU fingerprint of a camera from a smooth, plain image is much easier than from a natural or cluttered image. Based on this observation, we propose both a manual and automatic curriculum learning method for instance-based SCI deep residual CNN (ResNet). Residual connections are added to our proposed deep CNN architecture so as to generate more robust representational bottlenecks, and also to tackle the vanishing gradient problem. The idea of curriculum learning (CL) is to train a system, which may be a student or a deep network, from simple concepts to hard concepts. For the manual CL method, the proposed ResNet is first trained with flat images. Having trained with flat images, those cluttered or natural images are mixed with flat images to continue training up the network. In real applications, all the available images are usually of natural images. Therefore, the last stage of our proposed CL uses natural images only. For the second CL algorithm, the features of training images are extracted from the softmax layer of the trained ResNet, and the cross-entropies of each instance of the extracted features are calculated. The indices of the sorted cross-entropies are used to classify the training images as simple or hard images. Our experimental results show that ResNet-SVM has 2.18% and 0.27% higher identification accuracies than CNN-SVM, without and with fine-tuning respectively. For the manual CL, our experimental results show that ResNet-SVM has 3.74% identification accuracy higher than training with no curriculum learning. Furthermore, our proposed automatic CL approach only has 0.47% identification accuracy higher than training with no CL. In conclusion, our proposed deep learning methods for instance-based SCI can still achieve good performance using a small data size, unlike a large amount of data required for good performance in some camera identification problems. Moreso, unlike the inability of the proposed CNN-based method in a work published in 2019 to acheive better identification than a PRNU-based technique, our work can acheive better identication accuracy than the conventional state-of-the-art methods that use PRNU-based methods using identical settings.
Rights: All rights reserved
Access: open access

Files in This Item:
File Description SizeFormat 
991022385553603411.pdfFor All Users1.87 MBAdobe PDFView/Open


Copyright Undertaking

As a bona fide Library user, I declare that:

  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/10495