With power consumption becoming an increasingly important factor , it is necessary to reevaluate traditional, power-intensive, architectural techniques and their relative performance benefits. We believe that combined architecture-compiler efforts open up new and efficient ways to retain the performance benefits of modern architectures while addressing their power impact.
Next generation architectures will require innovative solutions to reduce energy consumption. One of the trends we expect is more extensive utilization of compiler information directly targeting energy optimizations. As we show in this paper, static information provides some unique benefits, not available with runtime hardware-based techniques alone. To achieve energy reduction, we use IPC information at various granularities, to adaptively adjust voltage and speed, and to throttle the fetch rate in response to changes in ILP.
In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP and LogGP models to account for the impact of network contention and network interface DMA behavior on the performance of message-passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages on the MIT Alewife multiprocessor.
Nanoelectronics research has primarily focused on devices. By contrast, not much has been published on innovations at higher layers: we know little about how to construct circuits or architectures out of such devices. In this paper, we focus on the currently most promising nanodevice technologies, such as arrays of semiconductor nanowires (NWs) and arrays of crossed carbon nanotubes (CNTs).
This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms.
In contrast to general-purpose programmable fabrics, such as PLAs, we develop nano-fabrics that, while also programmable and hierarchical, are more tuned towards an application domain. Our goal is to achieve denser designs with better fabric utilization and efficient cascading of circuits. We call these designs NASICs: Nanoscale Application-Specific Inte grated Circuits. We believe NASICs are a more natural fit to implement microprocessors, out of semiconductor nanowires and crossed carbon nanotubes, than PLA style designs.
This paper evaluates several hardware-based data prefetching techniques from an energy perspective, and explores their energy/performance tradeoffs. We present detailed simulation results and make performance and energy comparisons between different configurations. Power characterization is provided based on HSpice circuit-level simulation of state-of-the-art low-power cache designs implemented in deep-submicron process technology. This is combined with architecture-level simulation of switching activities in the memory system.
There has been intensive research on data prefetching focusing on performance improvement, however, the energy aspect of prefetching is relatively unknown. Our experiments show that although software prefetching tends to be more energy efficient, hardware prefetching outperforms software prefetching on most of the applications in terms of performance. This paper proposes several techniques to make hardware-based data prefetching power-aware. Our proposed techniques include three compiler-based approaches which make the prefetch predictor more power efficient.
Aggressive hardware prefetching often significantly increases energy consumption in the memory system. Experiments show that a major fraction of prefetching related energy degradation is due to the hardware history table related energy costs. In this paper, we present PARE, a PowerAware pRefetching Engine that uses a newly designed indexed hardware history table. Compared to the conventional single table design, the new prefetching table consumes 7-11X less power per access.
Front-end instruction delivery accounts for a significant fraction of energy consumption in dynamically scheduled superscalar processors. Different front-end throttling techniques have been introduced to reduce the chip-wide energy consumption caused by redundant fetching. Hardwarebased techniques, such as flow-based throttling, could reduce the energy consumption considerably, but with a high performance loss.