Compiler-assisted high performance and low power optimizations for embedded systems

Pao Yue-kong Library Electronic Theses Database

Compiler-assisted high performance and low power optimizations for embedded systems


Author: Wang, Meng
Title: Compiler-assisted high performance and low power optimizations for embedded systems
Degree: Ph.D.
Year: 2009
Subject: Hong Kong Polytechnic University -- Dissertations
Embedded computer systems -- Programming
Compiling (Electronic computers)
Department: Dept. of Computing
Pages: xvii, 164 p. : ill. ; 30 cm.
InnoPac Record:
Abstract: optimization, we focus on reducing the number of memory accesses for embedded systems. The memory access significantly limits the performance of embedded systems due to the widening processor-memory gap. Besides performance, memory accesses consume a large fraction of overall power consumption. With the emergence of memory-intensive embedded applications, effective optimization techniques are required to reduce the number of memory accesses. In this thesis, we make the following original contributions in this field. - First, we develop a general compiler optimization technique called “REAM”; to reduce the number of memory accesses for Digital Signal Processing (DSP) applications with loops. In the loop kernels of DSP applications, one important characteristic is that the same memory location is repeatedly accessed by different memory operations over multiple loop iterations. For DSP applications, therefore, an important problem is how to explore redundant memory accesses and eliminate them by exploiting the desired value across iterations. We solve this problem by replacing redundant memory operations with register operations. The results show that our technique can effectively reduce the number of memory accesses and improve performance compared with previous approaches. - Second, as embedded systems have a limited number of registers, we propose a register allocation and instruction scheduling technique to improve the “REALM”; technique with register constraints. For the register operations generated by the “REALM”; technique, we analyze their data dependencies for instruction scheduling, and build up a register-matching graph model to find available physical registers that can be allocated to the operands of the register operations. The register allocation problem is solved by finding a simple path of fixed length between two specified vertices in the register-matching graph. We perform instruction scheduling based on the results of the allocation. In low power optimization, we address two challenging issues, leakage and temperature, for embedded systems. Leakage power has become an issue comparable in importance to dynamic power as semi-conductor technologies move down to the nanometer scale. Besides leakage power, temperature issues are also important because both on-chip power density and temperature are rising exponentially with decreasing feature sizes. The increase in on-chip temperature can lead to severe problems with reliability, performance, and cooling costs for embedded systems. To address these issues, we make the following contributions. - The first contribution is to reduce the leakage power consumption of VLIW (Very Long Instruction Word) processors. We propose a novel leakage-aware modulo scheduling technique that helps hardware-based leakage control schemes to achieve leakage power savings for embedded VLIW processors. We also consider transition time and power overhead in our technique, and discuss the trade-off between leakage savings and performance penalties. - The second contribution is to reduce the peak temperature of the on-chip memory subsystem. Most embedded systems adopt a hybrid memory architecture, which contains both hardware-managed cache and software-managed scratchpad memory (SPM). However, both cache and SPM have become hot spots, as they are the most frequently accessed on-chip components. We propose a temperature-aware data allocation technique to explore such a hybrid architecture to jointly optimize performance and peak temperature. Our technique can greatly alleviate the temperature hot spots of the memory subsystem by adaptively distributing the workload between cache and SPM.

Files in this item

Files Size Format
b23429926.pdf 1.745Mb PDF
Copyright Undertaking
As a bona fide Library user, I declare that:
  1. I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
  2. I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
  3. I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.
By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.


Quick Search


More Information