Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Electronic and Information Engineering | en_US |
dc.contributor.advisor | Chan, Yui-lam (EIE) | - |
dc.contributor.advisor | Siu, Wan-chi (EIE) | - |
dc.creator | Kuang, Wei | - |
dc.identifier.uri | https://theses.lib.polyu.edu.hk/handle/200/10214 | - |
dc.language | English | en_US |
dc.publisher | Hong Kong Polytechnic University | - |
dc.rights | All rights reserved | en_US |
dc.title | HEVC based screen content coding and transcoding using machine learning techniques | en_US |
dcterms.abstract | Screen content video is one of the emerging videos, and it usually shows mixed content with both of nature image blocks (NIBs) and computer-generated screen content blocks (SCBs). Since High Efficiency Video Coding (HEVC) is only optimized for NIBs while SCBs exhibit different characteristics, new techniques are necessary for SCBs. Screen Content Coding (SCC) extension was developed on top of HEVC to explore new coding tools for screen content videos. SCC employs two additional coding modes, intra block copy (IBC) mode and palette (PLT) mode for intra-prediction. However, the exhaustive mode searching makes the computational complexity of SCC increase dramatically. Therefore, in this thesis, some novel machine learning based techniques are suggested to simplify both encoding and transcoding of SCC. A fast intra-prediction algorithm for SCC by content analysis and dynamic thresholding is firstly proposed. A scene change detection method is adopted to obtain a learning frame in each scene, and the learning frame is encoded by the original SCC encoder to collect learning statistics. The prediction models are tailor-made for the following frames in the same scene according to the video content and QP of the learning frame. Simulation results show that the proposed scheme can achieve remarkable complexity reduction while preserving the coded video quality. Afterwards, we propose a decision tree based framework for fast intra mode decision by investigating various features in training sets. To avoid the exhaustive mode searching process, a framework with a sequential arrangement of decision trees is proposed to check each mode separately by inserting a classifier before checking a mode. As compared with the previous approaches that both IBC and PLT modes are checked for SCBs, the proposed coding framework is more flexible which facilitates either IBC or PLT mode to be checked for SCBs such that computational complexity is further reduced. Simulation results show that the proposed scheme can provide significant complexity saving with negligible loss of coded video quality. To avoid the necessity of hand-crafted features, a deep learning based fast prediction network DeepSCC is then proposed by using convolutional neural network (CNN), which contains two parts, DeepSCC-I and DeepSCC-II. Before fed to DeepSCC, incoming coding units (CUs) are divided into two categories: dynamic coding tree units (CTUs) and stationary CTUs. For dynamic CTUs with different content as their collocated CTUs, DeepSCC-I takes raw sample values as the input to make fast predictions. For stationary CTUs with the same content as their collocated CTUs, DeepSCC-II additionally utilizes the optimal mode maps of the stationary CTU to further reduce the computational complexity. Simulation results show that the proposed scheme further improves the complexity reduction. Finally, we propose a fast HEVC to SCC transcoder. To migrate the legacy screen content videos from HEVC to SCC to improve the coding efficiency, a fast transcoding framework is proposed by analyzing various features from 4 categories. They are the features from the HEVC decoder, static features, dynamic features, and spatial features. First, the CU depth level collected from the HEVC decoder is utilized to early terminate the CU partition in SCC. Second, a flexible encoding structure is proposed to make early mode decisions with the help of various features. Simulation results show that the proposed scheme dramatically shortens the transcoding time. | en_US |
dcterms.extent | xvi, 145 pages : color illustrations | en_US |
dcterms.isPartOf | PolyU Electronic Theses | en_US |
dcterms.issued | 2019 | en_US |
dcterms.educationalLevel | Ph.D. | en_US |
dcterms.educationalLevel | All Doctorate | en_US |
dcterms.LCSH | Hong Kong Polytechnic University -- Dissertations | en_US |
dcterms.LCSH | Digital video | en_US |
dcterms.LCSH | Coding theory | en_US |
dcterms.LCSH | Video compression | en_US |
dcterms.LCSH | Machine learning | en_US |
dcterms.accessRights | open access | en_US |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
991022289514203411.pdf | For All Users | 3.25 MB | Adobe PDF | View/Open |
Copyright Undertaking
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item:
https://theses.lib.polyu.edu.hk/handle/200/10214