ScaleUPC: A UPC compiler for multi-core systems

Document Type

Conference Proceeding

Publication Date

12-1-2009

Abstract

Since multi-core computers began to dominate the market, enormous efforts have been spent on developing parallel programming languages and/or their compilers to target this architecture. Although Unified Parallel C (UPC), a parallel extension to ANSI C, was originally designed for large scale parallel computers and cluster environments, its partitioned global address space programming model makes it a natural choice for a single multi-core machine, where the main memory is physically shared. This paper builds a case for UPC as a feasible language for multi-core programming by providing an optimizing compiler, called ScaleUPC, which outperforms other UPC compilers targeting SMPs. As the communication cost for remote accesses is removed because all accesses are physically local in a multi-core, we find that the overhead of pointer arithmetic on shared data accesses becomes a prominent bottleneck. The reason is that directly mapping the UPC logical memory layout to physical memory, as used in most of the existing UPC compilers, incurs prohibitive address calculation overhead. This paper presents an alternative memory layout, which effectively eliminates the overhead without sacrificing the UPC memory semantics. Our research also reveals that the compiler for multi-core systems needs to pay special attention to the memory system. We demonstrate how the compiler can enforce static process/thread binding to improve cache performance. © 2009 ACM.

Publication Title

ACM International Conference Proceeding Series

Share

COinS