# Post-CMOS Hybrid Spin-Charge Nanofabrics

## Prasad Shabadi and Csaba Andras Moritz

Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, Massachusetts, MA 01003, USA Email: <u>shabadi@ecs.umass.edu</u>, <u>andras@ecs.umass.edu</u>

Abstract—We propose a hybrid spin-charge fabric with computation in spin domain and communication in charge domain. In nanofabrics based on non-equilibrium physical phenomenon like interference of spin waves, switching times are lower than the thermal relaxation times leading to fast multi-value logic at high fan-in without the exponential performance degradation noticeable in CMOS. While computation is much more efficient than in CMOS, these benefits can be lost due to the communication requirements between spin-wave blocks, when implemented with wave guides. This inspired a new type of hybrid nanofabric with spin wave high fan-in functions connected to an interconnect stack similar to CMOS: our analysis shows a delay reduction of up to 10X (8.64ns) along the critical path for a (511;9) parallel counter implemented in this fabric vs. spin-wave only. Similar benefits are also shown for a CLA adder with ~4.2ns delay reduction for 1024 bit CLA adder.

Index Terms – Spin Wave Functions, Parallel Counters, Magnonic Logic, Hybrid Logic.

## I. BACKGROUND

Fundamental limits on CMOS technology scaling have forced researches to explore alternative devices and materials for building future nanoscale systems. Devices based on novel state variables, materials and integration approaches are being actively investigated. Some of the promising examples include the spin-FETs [1], nano-wire based xnFETs [2][3], graphene ribbons [4], CNTs [5], MQCA [6], spin waves [7][8][9][10] etc. The primary focus in the emerging devices research community is to improve the intrinsic characteristics of single devices/switches keeping the overall integration approach fairly conventional. However, due to high logic complexity and wiring requirements, the overall system performance does not scale proportional to the performance of individual devices. Thereby, we propose a fundamental shift in mindset, to make the devices themselves more functional than simple switches (see Fig. 1).

Magnonic logic based on spin waves is one of the promising directions for building nanoscale systems. A spin wave is a collective oscillation of electron spins in an ordered spin lattice around the direction of magnetization in ferromagnetic materials. Information may be encoded into the phase of the spin wave. Fig. 2 shows the key physical components of spin wave based fabrics [7][8]. The Magneto-Electric (ME) cell is a key component of the proposed fabric. It enables voltage control of magnetization which is critical



Figure 1. Devices for nanofabrics: (left) conventional switch; (right) envisioned device with alternate state variables.



Figure 2. Key physical components of a spin-wave computing fabric.

for low energy operation. The ME cell is mainly responsible for i) I/O coupling ii) Amplification iii) Latching and iv) Synchronization. Spin waves propagate and interfere in the Spin Wave Bus (SWB). Circuits exploiting wave interference enable accomplishing complex logic functions such as high fan-in majority function(s) in a single computational step. The concept of such magnonic functions also called as Spin Wave Functions (SPWFs) was first introduced in [11]. Some of the major benefits of designs based on SPWFs include i) Low power operation: spin wave propagation does not involve physical movement of charge particles ii) Implicit latching mechanism: ME cells behave as non-volatile implicit latches, thus no separate latches are required iii) Design complexity reduction: wave superposition principle enables efficient realization of threshold/majority logic and finally iv) maintains compatibility to CMOS.

Prior research on spin wave transport parameters has shown that the spin wave propagation (in SWB) is inferior to charge propagation in conventional copper interconnects. Thereby, while we can expect significant logic complexity reduction with SPWF-based designs vs. corresponding CMOS designs, the interconnect delays between SPWFs can be the bottlenecks. This is one of the main design limiting factors for spin-only type of fabrics. Moreover, since our designs leverage on the high fan-in capability of the SPWFbased designs, the individual SPWFs are expected to large, leading to high interconnect delays. In this work, we evaluate how the transport parameters of spin wave propagation (in the SWB) would impact the overall circuit level performance. In particular, we evaluate parallel counters and adders of different size and discuss how these designs would benefit from a hybrid spin-charge type of approach where

This work was supported in part by the Center for Hierarchical Manufacturing (CHM) at UMass Amherst, Focus Center Research Program (FCRP) – Center on Functionally Engineering Nano Architectonics (FENA), the DARPA, and NSF awards CCR:0105516, NER:0508382, and CCR:051066. Corresponding author: shabadi@ecs.umass.edu

computation is effectively implemented in spin domain and communication is mainly implemented in charge domain.

The rest of the paper is organized as follows. Section II explains SPWF majority logic based 2-bit adder design. We provide a discussion on the need for hybrid logic in section III. Section IV presents the case studies (parallel counter and CLA adder) for hybrid spin-charge fabric. Section V concludes the paper.

## II. SPWFs CIRCUITS: 2-BIT RIPPLE ADDER

Fig. 3 shows a 2-bit SPWF adder implemented using two (3;2) parallel counters. Parallel counters are commonly



Figure 3. SPWF 2-bit adder layout using two (3;2) counters.



Figure 4. NCSU 45nm PDK based 2-bit CMOS adder layout.

used in fast multipliers; they are digital circuits with 'n' inputs and  $\log_2 (n+1)$  output bits representing the number of 1's in the 'n' input bits [12]. This SPWF design is compared with a corresponding NCSU 45nm PDK-based CMOS custom layout (Fig. 4). Table 1 shows that the SPWF-based adder significantly reduces circuit complexity with ~33X reduction in area. While the results presented in [11] show a

TABLE I. COMPARISON RESULTS SPWF VS. CMOS 2-BIT ADDER

| Fabric | Delay  | Complexity                                                             |
|--------|--------|------------------------------------------------------------------------|
| CMOS   | ~400ps | <b>Area</b> $\sim$ = 2*6.5*3.1 = 40µm2<br><b>Transistor Count</b> = 64 |
| SPWF   | ~535ps | Area $\sim= 8\lambda^* 15\lambda = 1.2\mu m2$<br>ME Cell Count = 11    |



Figure 5. a) Delay of charge transfer at various technology nodes for minimum sized metal interconnects [19]. b) Delay vs. interconnect length with different transport mechanisms [18].

large performance gain for *high fan-in* arithmetic circuits, Table 1 shows that the SPWF version does not compare favourably for smaller circuits like 2-bit adder. This is mainly attributed to the lower group velocity of spin waves vs. propagation on metal interconnects. Hence, SPWFs are mostly suitable for high fan-in logic where we can better amortize the low cost of computation vs. communication.

## III. NEED FOR HYBRID SPIN-CHARGE FABRIC

System delay mainly consists of the computational delay and communication delay. We envision computation using SPWFs (spin domain) and communication using conventional metal interconnects (charge domain). Several recent analyses have been provided on metal interconnect scaling trends for current and future technology nodes [13][14][15][16][17]. Evaluation of transport parameters for spin propagation and electric charge, indicate that spin propagation is inferior to charge transfer [16]. Group velocity and wave attenuation control the delay and power consumption of spin waves. Experimental results have shown that, the maximum propagation velocity of spin waves is about  $10^4$  m/s [7]. This is considerably slower than the charge propagation speed in metal interconnects. Fig. 5.a shows the signal delays at various technology nodes for minimum sized interconnects [19]. Fig 5.b shows the comparison of different transport mechanisms for varying interconnect lengths [18]. This means that, at 45nm technology node signal delay in charge domain is about 10ps, while it is about 100ps in spin domain (10X slower) for interconnect length of 1µm (minimum sized). Moreover, with increased fan-in, the size of SPWFs increases and thereby communication becomes the bottleneck. Thus, while high fan-in logic may be efficiently accomplished in the spin domain, we believe that communication is better accomplished in the charge domain.



Figure 6. Simplified layout of (511;9) parallel counter using 9 high fan-in SPWFs. In this example, the max fan-in of SPWFs is 519, which includes 511 input bits and 8 feedback signals. Picture also highlights interconnects on the critical path that can be realized using the proposed hybrid 3D approach. In this, ME cells will connect to standard pins that will connect to the metal layers similar to CMOS.

## IV. CASE STUDIES

In this section, we provide our initial analysis on projected benefits of such hybrid logic for arithmetic circuits like adders and parallel counters. Many other arithmetic circuits and cryptographic algorithms would also similarly gain from this approach. Benefits are evaluated based on the length of interconnect SWB on the critical path of these designs. It should be noted that, a delay reduction of 10X is assumed for metal vs. SWB interconnects as per the arguments presented in section III.

#### A. Hybrid Parallel Counters

As mentioned in section II, parallel counters are mainly used in design of fast parallel multipliers for partial product reduction. Several algorithms have been proposed to identify the optimal type of parallel counter and the corresponding reduction sequence for partial product reduction [20]. It is observed that with the use for large parallel counters, the number of reduction steps reduce. However, with increased counter size, the CMOS based implementation complexity significantly increases. Thus, in practice, counter sizes are limited to (7;3) or a variant of counters namely (4;2) compressors are used. However, since SPWFs enable efficient realization of high fan-in logic, higher order counters can be efficiently implemented using SPWFs. But, as mentioned earlier, due to large size of the SPWFs and due to slow spin wave propagation velocity, interconnects delays may limit the overall performance.

Fig. 6 shows a simplified design of (511;9) counter using SPWFs. Equation (1) gives the interconnect length ( $L_{PC}$ ) along the critical path for a given counter of size (N), ME cell width ( $L_{ME}$ ) and spacing (S). It was observed that for a (511;9) counter, a delay reduction of up to 8.6ns is possible with the hybrid logic (without pipelining). Fig. 8.a also shows that even higher benefits can be obtained as the counter size increases.

$$L_{PC} = (K-1) \left[ \frac{(N+K)(L_{ME}+S)}{2\pi} \right], where K = \log_2(N+1)$$
(1)

#### B. Hybrid CLA adders

The fundamental principle in CLA adders is to generate all intermediate carries in parallel. The operation of the CLA adders is based on the generate (G) and the propagate signals (P). The general expression for carry generation is given by equation 2.

$$C_{i+1} = G_i + G_{i-1}P_i + G_{i-2}P_{i-1}P_i + \dots + C_0P_0P_1\dots P_i$$
(2)

Where  $G_i = x_i \cdot y_i$  and  $P_i = x_i + y_i$ 

Thereby, all intermediate carries can be generated in parallel using the above 2-level logic equation. However, since the fan-in for CMOS gates is practically limited to 3 or 4, multi-level carry look-ahead units are generally used for high bit-width adders. In general, for a 'N' bit adder, with 'k' as the max individual gate fan-in, the number of levels (m) of CLA is given by  $m = log_k N$ .



Figure 7. Simplified carry look ahead generation block for CLA adders. In this case, SPWFs with fan-in equal to the adder bit-width will be required. The interconnect along the critical path that can be realized using hybrid approach is also highlighted here.

However, adders based on SPWFs (with support for high fan-in) can realize eq. 2 with a direct 2 level logic. The OR function and the last term in the SOP equation (eq. 2) require max fan-in and these terms would require largest SPWFs for

the CLA adders. Fig. 7 shows the simplified SPWF based implementation for the carry look ahead equation. The delay of this structure is mainly limited by the 2 largest SPWFs shown in Fig. 7 and the interconnect distance between them. For such a design, we leverage on the opportunity of using a possible hybrid metal interconnect to improve the overall performance. Eq. 3 gives the length ( $L_{CLA}$ ) along the critical path that is replaced with metal interconnects for a N-bit CLA adder. The corresponding results for delay reduction for various sizes of CLA adders are shown in Fig 8.b.

$$L_{CLA} = \frac{N(L_{ME} + S)}{\pi}$$
(3)

It should be noted that such a hybrid approach may not be suitable/beneficial for all (local and global) interconnects in the design. This is due to the fact that additional delay may be required for converting signal from spin to charge domain and vice-versa. Moreover, current experimental progress indicate a relatively large ME cell delay of 100ps. This would require additional analysis on trade-offs between the granularity of such hybrid interconnects and the overall ME cell count.



Figure 8. a) Delay reduction for various sizes of SPWF based parallel counters b) Delay reduction for various bit-width CLA adders (ME cell size = 100nm and spacing = 45nm). Plots show that higher order counters and high bit-width adders benefit increasingly more from the proposed hybrid nanofabric design.

### V. CONCLUSIONS

Wave interference phenomenon based SPWF logic enables realization of high fan-in functions efficiently and is one of the promising alternatives for building future nanoscale systems. While significant benefits can be expected with these designs, interconnect delay on the global interconnects can be the performance bottleneck. This is mainly due to low propagation velocity of spin waves vs. metal interconnects. We propose hybrid designs with computation in spin domain and communication in charge domain with ME cells providing the essential interface mechanism between the two domains. Critical path based analysis is provided for representative arithmetic circuits like CLA adders and parallel counters. Our analysis shows a delay reduction of up to 8.64ns along the critical path for a (511;9) parallel counter implemented in this fabric vs. spin-wave only. Similar benefits are also shown for a CLA adder with ~4.2ns delay reduction for 1024 bit CLA adder. In future we plan to analyse the trade-offs between the granularity of such hybrid interconnects, ME cell count, complexity and overall delay.

#### REFERENCES

- S. Datta and B. Das, "Electronic analog of the electro-optic modulator," *Appl. Phys. Lett.*, vol. 56, pp. 655-667, 1990.
- [2] C. Moritz, T. Wang, P. Narayanan, M. Leuchtenburg, Y. Guo, C. Dezan, and M. Bennaser, "Fault-Tolerant Nanoscale Processors on Semiconductor Nanowire Grids," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 54, pp. 2422-2437, 2007.
- [3] T. Wang, P. Narayanan, and C. A. Moritz, "Heterogeneous 2-level Logic and its Density and Fault Tolerance Implications in Nanoscale Fabrics," *IEEE Trans. on Nanotechnology*, vol. 8, no. 1, pp. 22-30, 2009.
- [4] Z.F. Wang, H. Zheng, Q.W. Shi, J. Chen, "Emerging nanocircuit paradigm: Graphene-based electronics for nanoscale computing," *IEEE/ACM Symposium on Nanoscale Architectures*, 2007.
- [5] P. L. McEuen, et al., "Single-Walled Carbon Nanotube Electronics," *IEEE Trans. Nanotechnology*, vol. 1, no. 1, pp.78-85, 2002.
- [6] A. Imre, G. Csaba, L. Ji, A. Orlov, G. Bernstein, and W. Porod, "Majority logic gate for Magnetic Quantum-Dot Cellular Automata," *Science*, vol. 311, No. 5758, pp. 205–208, 2006.
- [7] A. Khitun, M. Bao, and K. L. Wang, "Spin Wave Magnetic NanoFabric: A New Approach to Spin-based Logic Circuitry," IEEE Trans. on Magnetics, vol. 44, pp. 2141-53, 2008.
- [8] A. Khitun, D. E. Nikonov, M. Bao, K. Galatsis, and K. L. Wang, "Efficiency of spin-wave bus for information transmission," *IEEE Trans. Electron Devices*, vol. 54, no. 12, pp. 3418–3421, 2007.
- [9] M. Covington, T. M. Crawford, and G. J. Parker, "Time-resolved measurement of propagating spin waves in ferromagnetic thin films," *Physical Review Letters*, vol. 89, pp. 237202-1-4, 2002.
- [10] W. Eerenstein, N. D. Mathur, and J. F. Scott, "Multiferroic and magnetoelectric materials," *Nature*, vol. 442, pp. 759-65, 2006.
- [11] P. Shabadi et al., "Towards logic functions as the device," *IEEE/ACM NANOARCH*, pp. 11-16, 2010.
- [12] I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A. K. Peters, Natick, MA, 2002.
- [13] International Technology Roadmap for Semiconductors, 2009 new edition. Available online at <u>http://public.itrs.net/</u>.
- [14] R. Ho, K.W. Mai, and M.A. Horowitz, "The future of wires," *Proceedings of the IEEE*, vol. 89, pp. 490-504, 2001.
- [15] R.H. Havemann and J.A. Hutchby, "High-performance interconnects: an integration overview," *Proceedings of the IEEE*, vol. 89, pp. 586-601, 2001.
- [16] S. Borkar, "Design challenges of technology scaling," Micro, Proceedings of the IEEE, vol. 19, pp. 23-29, 1999.
- [17] Nir Magen, Avinoam Kolodny, Uri Weiser, Nachum Shamir, "Interconnect-power dissipation in a microprocessor," *Proceedings of* the 2004 international workshop on System level interconnect prediction, 2004.
- [18] S. Rakheja, A. Naeemi, and J. D. Meindl, "Physical limitations on delay and energy dissipation of interconnects for post-CMOS devices," *IEEE IITC*, pp. 1-3, 2010.
- [19] M. Sellier et al., "Predictive Delay Evaluation on Emerging CMOS Technologies: A Simulation Framework," *IEEE ISQED*, pp. 492-497, 2008.
- [20] L.Dadda, "On Parallel Digital Multipliers," Alta Frequenza, 45(1976), 574-580.