|Title:||Sample-aware early-exit prediction for efficient device-edge collaborative inference|
|Advisors:||Mao, Yuyi (EIE)|
|Subject:||Neural networks (Computer science)|
Hong Kong Polytechnic University -- Dissertations
|Department:||Department of Electronic and Information Engineering|
|Pages:||viii, 51 pages : color illustrations|
|Abstract:||Recent emergent studies on artificial intelligence for mobile applications have been widely concerned by researchers. The deployment of the cumbersome deep neural networks (DNNs) on resource-constrained mobile devices introduces significant latency. The device-edge co-inference mode is a flexible solution that a DNN is partitioned into two parts: A head network on the device and a tail network on the edge server. Early-exit neural networks provide a dynamic inference method by terminating the inference process for some samples in the early layers. But inserting the early exits generates additional computational overhead, which burdens the resource-limited devices when the early exits are inserted into the on-device network. In this dissertation, a new methodology, called early-exit prediction, is proposed to alleviate the computational overhead brought by the early exits. The system consists of an early-exit network and a low-cost Exit Predictor. The Exit Predictor guides the inference to skip the computation of the early exits so that some "hard" samples can be directly inferred by the backbone network without being processed by any early exit. To verify the effectiveness of the early-exit prediction, extensive experiments are conducted using three DNN models (i.e., AlexNet, VGG16-BN, ResNet44) on the CIFAR10 and CIFAR100 datasets. The experimental results show that the early exit prediction method is able to reduce over 20% of the on-device computation without degrading too much the classification accuracy.|
In practical applications, some of the sensed data contain no information of interest (called "task-irrelevant data"), which do not need to be fed to the early-exit neural network and can be filtered out at the early stage. Moreover, the data size of the intermediate features computed by the early layers in commonly used backbone DNNs is larger than a raw image, which burdens the communication network in the device-edge co-inference. Therefore, we further adopt irrelevant-data filtering and feature compression to improve the overall performance of the system. Specifically, irrelevant-data filtering that discards some distinctly task-irrelevant data is achieved by a low-cost model. On the other hand, an autoencoder that consists of a convolutional encoder and decoder is applied to reduce the transmitted data size by compressing the features and recover them before being processed by the edge server. Under the dual effects of early-exit prediction and irrelevant-data filtering methods, the end-to-end inference latency can be reduced significantly compared to the original early-exit neural network.
|Rights:||All rights reserved|
Files in This Item:
|6519.pdf||For All Users (off-campus access for PolyU Staff & Students only)||5.31 MB||Adobe PDF||View/Open|
As a bona fide Library user, I declare that:
- I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
- I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
- I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.
Please use this identifier to cite or link to this item: