Cool-Fetch: A Compiler-Enabled IPC Estimation Based Framework for Energy Reduction

With power consumption becoming an increasingly important factor , it is necessary to reevaluate traditional, power-intensive, architectural techniques and their relative performance benefits. We believe that combined architecture-compiler efforts open up new and efficient ways to retain the performance benefits of modern architectures while addressing their power impact.

Attachment	Size
PDF	219.76 KB

Combining Compiler and Runtime IPC Predictions to Reduce Energy in Next Generation Architectures

Next generation architectures will require innovative solutions to reduce energy consumption. One of the trends we expect is more extensive utilization of compiler information directly targeting energy optimizations. As we show in this paper, static information provides some unique benefits, not available with runtime hardware-based techniques alone. To achieve energy reduction, we use IPC information at various granularities, to adaptively adjust voltage and speed, and to throttle the fetch rate in response to changes in ILP.

Attachment	Size
PDF	328.15 KB

LoGPC: Modeling Network Contention in Message-Passing Programs

In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP and LogGP models to account for the impact of network contention and network interface DMA behavior on the performance of message-passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages on the MIT Alewife multiprocessor.

Attachment	Size
PDF	147.43 KB

Opportunities and Challenges in Application-Tuned Circuits and Architectures Based on Nanodevices

Nanoelectronics research has primarily focused on devices. By contrast, not much has been published on innovations at higher layers: we know little about how to construct circuits or architectures out of such devices. In this paper, we focus on the currently most promising nanodevice technologies, such as arrays of semiconductor nanowires (NWs) and arrays of crossed carbon nanotubes (CNTs).

Attachment	Size
PDF	279.93 KB

Coupling Compiler-Enabled and Conventional Memory Accessing for Energy Efficiency

This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms.

Attachment	Size
PDF	1.34 MB

Latching on the Wire and Pipelining in Nanoscale Designs

In contrast to general-purpose programmable fabrics, such as PLAs, we develop nano-fabrics that, while also programmable and hierarchical, are more tuned towards an application domain. Our goal is to achieve denser designs with better fabric utilization and efficient cascading of circuits. We call these designs NASICs: Nanoscale Application-Specific Inte grated Circuits. We believe NASICs are a more natural fit to implement microprocessors, out of semiconductor nanowires and crossed carbon nanotubes, than PLA style designs.

Attachment	Size
PDF	191.75 KB

Energy Characterization of Hardware-Based Data Prefetching

This paper evaluates several hardware-based data prefetching techniques from an energy perspective, and explores their energy/performance tradeoffs. We present detailed simulation results and make performance and energy comparisons between different configurations. Power characterization is provided based on HSpice circuit-level simulation of state-of-the-art low-power cache designs implemented in deep-submicron process technology. This is combined with architecture-level simulation of switching activities in the memory system.

Attachment	Size
PDF	110.89 KB

Energy-Aware Data Prefetching for General-Purpose Programs

There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown. Our experiments show that although software prefetching tends to be more energy efficient, hardware prefetching outperforms software prefetching on most of the applications in terms of performance. This paper proposes several techniques to make hardware-based data prefetching power-aware. Our proposed techniques include three compiler-based approaches which make the prefetch predictor more power efficient.

Attachment	Size
PDF	153.54 KB

PARE: A Power-Aware Data Prefetching Engine

Aggressive hardware prefetching often significantly increases energy consumption in the memory system. Experiments show that a major fraction of prefetching related energy degradation is due to the hardware history table related energy costs. In this paper, we present PARE, a PowerAware pRefetching Engine that uses a newly designed indexed hardware history table. Compared to the conventional single table design, the new prefetching table consumes 7-11X less power per access.

Attachment	Size
PDF	159.79 KB

Compiler-Based Adaptive Fetch Throttling for Energy Efficiency

Front-end instruction delivery accounts for a significant fraction of energy consumption in dynamically scheduled superscalar processors. Different front-end throttling techniques have been introduced to reduce the chip-wide energy consumption caused by redundant fetching. Hardwarebased techniques, such as flow-based throttling, could reduce the energy consumption considerably, but with a high performance loss.

Attachment	Size
PDF	177.43 KB

Others