Michigan Tech Publications

Loop transformations for architectures with partitioned register banks

Xianglong Huang, University of Massachusetts Amherst
Steve Carr, Michigan Technological University
Philip Sweany, Texas Instruments

Document Type

Article

Publication Date

1-1-2001

Abstract

Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve performance, at the cost of substantially increasing register requirements. These increasing register requirements, however, make it difficult to build a high-performance embedded processor with a single, multi-ported register file while maintaining clock speed and limiting power consumption. Some digital signal processors, such as the TI C6x, reduce the number of ports required for a register bank by partitioning the register bank into multiple banks. Disjoint subsets of functional units are directly connected to one of the partitioned register banks. Each register bank and its associate functional units is called a cluster. Clustering reduces the number of ports needed on a per-bank basis, allowing an increased clock rate. However, execution speed can be hampered because of the potential need to copy "non-local" operands among register banks in order to make them available to the functional unit performing an operation. The task of the compiler is to both maximize parallelism and minimize the number of remote register accesses needed. Previous work has concentrated on methods to partition virtual registers amongst the target architecture's clusters. In this paper, we show how high-level loop transformations can enhance the partitioning obtained by low-level schemes. In our experiments, loop transformations improved software pipelining by 27% on a machine with 2 clusters, each having 1 floating-point and 1 integer register bank and 4 functional units. We also observed a 20% improvement on a similar machine with 4 clusters of 2 functional units. In fact, by performing the described loop transformations we were able to show improvements of greater than 10% over schedules (for un-transformed loops) generated with the unrealistic assumption of a single multi-ported register bank. Copyright ACM 2001.

Publication Title

SIGPLAN Notices (ACM Special Interest Group on Programming Languages)

Recommended Citation

Huang, X., Carr, S., & Sweany, P. (2001). Loop transformations for architectures with partitioned register banks. SIGPLAN Notices (ACM Special Interest Group on Programming Languages), 36(8), 48-55. http://doi.org/10.1145/384196.384206
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p/12587

Link to Full Text

COinS

Michigan Tech Publications

Loop transformations for architectures with partitioned register banks

Document Type

Publication Date

Abstract

Publication Title

Recommended Citation

LINKS

Browse

Search

Author Corner

Links

Michigan Tech Publications

Loop transformations for architectures with partitioned register banks

Authors

Document Type

Publication Date

Abstract

Publication Title

Recommended Citation

Share

LINKS

Browse

Search

Author Corner

Links