# Architecting Connectivity for Fine-grained 3-D Vertically Integrated Circuits

Santosh Khasanvis, Mostafizur Rahman, Mingyu Li, Jiajun Shi, and Csaba Andras Moritz\* Dept. of Electrical and Computer Engineering, University of Massachusetts Amherst, MA - 01003 Email: andras@ecs.umass.edu\*

Abstract—Conventional CMOS technology is reaching fundamental scaling limits, and interconnection bottleneck is dominating IC power and performance. Migrating to 3-D integrated circuits, though promising, has eluded us due to inherent customization and manufacturing requirements in CMOS that are incompatible with 3-D organization. Skybridge, a fine-grained 3-D IC fabric technology was recently proposed towards this aim, which offers a paradigm shift in technology scaling and design. In this paper we present specifically architected core Skybridge structures to enable fine-grained connectivity in 3-D intrinsically. We develop predictive models for interconnect length distribution for Skybridge, and use them to quantify the benefits in terms of expected reduction in interconnect lengths and repeater counts when compared to 2-D CMOS in 16nm node. Our estimation indicates up to 10x reduction in longest global interconnect length vs. 16nm 2-D CMOS, and up to 2 orders of magnitude reduction in the number of repeaters for a design consisting of 10 million logic gates. These results show great promise in alleviating interconnect bottleneck due to a higher degree of connectivity in 3-D, leading to shorter global interconnects and reduced power and area overhead due to repeater insertion.

## Keywords—3D Integration, Skybridge, Vertical integration, Connectivity bottleneck, Interconnect distribution

# I. INTRODUCTION

Conventional integrated circuits (ICs) using CMOS technology implement logic with gates placed in a twodimensional array. This scheme limits the degree of connectivity resulting in very long interconnects. In nanoscale technologies, interconnect delay is the dominant component as it scales quadratically with interconnect length. To mitigate this, repeaters are used to break long interconnects into shorter segments; as a result the delay scales linearly with length. While this method ameliorates the performance impact, it introduces significant overhead in terms of area and leakage power dissipation due to repeater insertion. Increasing wire resistance and delay for nanoscale technologies exacerbates this problem leading to a significant increase in the number of repeaters required to maintain acceptable performance [1].

Due to the interconnect bottleneck and other fundamental scaling challenges associated with CMOS, migrating to the third dimension is seen as a way to advance scaling. Towards this end, we proposed a true 3-D fabric called Skybridge [2]. It offers fine-grained 3-D integration with vertically stacked gates interconnected in 3-D. This results in higher gate density than 2-D IC technologies [2] and shorter wires for signal propagation. In this paper, we present core fabric features in Skybridge, architected to enable 3-D connectivity intrinsically.

We develop 3-D predictive models for interconnect length distribution and repeater count estimation to enable quantifying the benefits of 3-D connectivity in large-scale Skybridge ICs, and compare with 2-D CMOS. We show that due to much shorter interconnection requirements, Skybridge drastically reduces the number of repeaters, which implies tremendous area and leakage power savings for large-scale ICs. This paper is organized as follows: Section II presents the core structures for 3-D connectivity in Skybridge. Section III is an overview of the methodology used to quantify connectivity benefits in Skybridge vs. 2-D CMOS. Sections IV and V present details on the predictive model used to estimate interconnect length distribution and for estimating repeater counts, followed by conclusion in Section VI.

## II. SKYBRIDGE: CORE FEATURES FOR 3-D CONNECTIVITY

Skybridge is a fine-grained true 3-D nanoscale fabric [2], designed with a fabric-centric mindset and providing an integrated solution for all technology challenges (Fig. 1A). It is built with a regular array of uniform vertical nanowires that forms the Skybridge template (Fig. 1B), which is functionalized by material deposition. All inserted structures for device, circuits, connectivity, and thermal management are co-architected for 3-D compatibility and manufacturability [2]. Lithographic precision is required only for patterning vertical nanowires, and has been shown to be more relaxed than for 2-D CMOS [2]. All active components/structures described in this work rely on multi-layer material deposition techniques, which is lower cost, and can be controlled to few Angstrom's precision. Manufacturing pathway for Skybridge ICs and experimental demonstrations are discussed in ref. [2].

Core components include Vertical Gate-All-Around (V-GAA) Junctionless transistors (Fig. 1C), which are stacked on nanowires to realize logic and memory. These form logic gates on each nanowire that need additional structures to enable signal routing in 3-D between inputs/outputs and for global signals. We define two key components for routing - Bridges and Coaxial Routing Structures (Fig. 1D). Bridges are metal lines that allow horizontal signal routing by forming links between adjacent nanowires. They can be placed at any height on nanowires, and span the required distance by hopping over intermediate nanowires facilitated by Coaxial Routing Structures. These consist of concentric metal shells around nanowires separated by dielectric (Fig. 1E), which is unique for Skybridge enabled by its vertical integration approach. Fig. 1D shows an example: signal A is carried by the vertical nanowire and signal B is routed by Bridges; the Coaxial Routing Structure allows signal B to hop the nanowire and

This material is based upon work supported by the National Science Foundation grant no. 1407906 at UMass Amherst.



Fig. 1. (A) Abstract view of Skybridge 3-D integrated fabric; (B-D) Core components: (B) Template array of regular vertical Si nanowires; (C) Vertical Gate-All-Around Junctionless nanowire transistor, (D) 3-D Connectivity features: Bridges and Coaxial Routing Structure; (E) Top view of Coaxial Routing structure showing concentric metal shells (NW – Nanowire; M1, M2 – Metal shells; ILD – inter layer dielectric); and (F) Schematic of Skybridge routing scheme for large scale ICs: Local interconnects are implemented using Bridges and Coaxial Routing Structures; Semi-global and global interconnects are implemented using metal layers on top of nanowire array.

continue its propagation. Coaxial routing is enabled by specially configuring material structures for insulating oxide and contact metal. By controlling the thickness of the insulating oxide (SiO<sub>2</sub>), and by choosing a low workfunction metal (Ti) as contact, proper signal isolation can be achieved. Workfunction difference between Ti and *n*-doped Si is such that there is no carrier depletion; moreover a thick layer of SiO<sub>2</sub> ensures no electron tunneling between the contact metal and silicon nanowire. Manufacturability is discussed in ref. [2].

Using multiple coaxial layers can route multiple signals and provide noise isolation. By configuring the Coaxial Routing Structure to incorporate a ground (GND) signal for noise shielding, coupling noise can be mitigated between coaxial routing shells. Fig. 1D illustrates this concept; the GND signal in between signal A and B acts as noise shield, and prevents coupling between these signals. Coaxial Routing Structures and signal carrying nanowires also allow routing in the vertical direction. These components work in conjunction for full 3-D connectivity while minimizing interconnect congestion.

For large-scale Skybridge integrated circuits, we define three interconnect tiers – *Local* interconnects for closeproximity communication between logic gates implemented intrinsically with Bridges and Coaxial Routing Structures; *semi-global* interconnects for intermediate-range communication; and *global* interconnects for long distance communication across chip, clocking and power distribution (Fig. 1F). Semi-global and global interconnects require wider aspect ratios and pitch than local interconnects, implemented using metal routing layers on top of the vertical nanowires. To minimize signal delays over long interconnects, semi-global and global wires can be segmented with repeaters.

#### III. METHODOLOGY FOR ESTIMATING BENEFITS OF 3-D CONNECTIVITY IN SKYBRIDGE

Fine-grained 3-D connectivity in Skybridge is expected to lead to shorter wires than 2-D ICs for large-scale designs, which mitigates interconnect delays and ameliorates the impact of repeater insertion. In order to quantify these benefits over 2-D CMOS, we develop and use 3-D predictive interconnect



Fig. 2. Methodology for predicting the interconnect length distribution in Skybridge and 2-D CMOS.

modeling tailored to Skybridge for 16nm technology node, and compare with 16nm 2-D CMOS. The methodology is outlined in Fig. 2. Several arithmetic circuits and 3-D microprocessor were designed in Skybridge and 2-D CMOS [2]. Data from these circuits is used to derive parameters for the interconnect models. In addition, typical CMOS parameters from literature [14] are also considered for comparison. This yields the full interconnect distribution for Skybridge and 2-D CMOS, that quantifies the reduction in wire lengths.

We then apply this distribution to evaluate the impact on repeater requirements in 3-D. This step requires identification of the boundaries between interconnect hierarchical levels, for which we use delay criterion. We then estimate the number of repeaters for each tier, based on optimal interconnect segment length for repeater insertion and the number of interconnects for a given length (from the interconnect length distribution). Subsequent sections provide details on this approach.

### IV. PREDICTIVE MODEL FOR INTERCONNECT DISTRIBUTION IN LARGE SCALE SKYBRIDGE ICS

We develop and use a predictive model for 3-D integrated circuits, tailored to Skybridge fabric, to estimate the interconnect length distribution for large-scale designs. Such predictive models have been used in literature for 2D CMOS [3] and stacked-die approaches [4]. We follow a similar high-level mindset in Skybridge by considering that all gates are distributed uniformly for a given design (Fig. 3), and the number of gates that can be vertically stacked on nanowires determines the number of gate layers in Skybridge. By following Rent's rule for terminal count estimation and through the use of intrinsic 3-D connectivity, this predictive model is adjusted and applied to Skybridge. Rent's parameters, fan-out/fan-in and gate-pitch reflect Skybridge's circuit-style, architecture, connectivity style and high gate density.

The wire-length distribution in an IC is determined by estimating (i) the number of interconnections I(l) of length l(using Manhattan routing measured in terms of gate-pitches) between a set of logic gate pairs, and (ii) the number of such logic gate pairs M(l) separated by distance l. The total number of interconnections of length l is then given by f(l) = $\Gamma.I(l).M(l)$ , where  $\Gamma$  is a normalization constant. For 2-D CMOS with  $L_{max}$  as the maximum interconnect length, the number of gate-pairs  $M_{2D}(l)$  separated by distance l is estimated by:

$$M_{2D}(l) = \frac{l^3}{3} - l^2 L_{max} + \frac{l^2 L_{max}}{2}, 1 \le l < \frac{L_{max}}{2}$$

$$= \frac{(L_{max} - l)^3}{3}, \frac{L_{max}}{2} \le l < L_{max}$$
(1)

For 2-D CMOS, the longest interconnect  $L_{max}$  is the one that spans from one corner of a square IC to the opposite corner using Manhattan routing. If the total number of gates under consideration is  $N_{tot}$ , and assuming that they are distributed uniformly throughout the IC,  $L_{max}$  for 2-D CMOS is  $2(\sqrt{N_{tot}}-1)$  in units of gate-pitches (gate-pitch is defined as the average separation between adjacent gates). Eq. (1) can be extended to Skybridge, and is given by

$$M_{SB}(l) = G_z \cdot M_{2D}(l) + \sum_{i=1}^{G_z - 1} 2(G_z - i) \cdot M_{2D}(l - ip_z) \cdot u(l - (2))$$
  
*ipz*,



Fig. 3. Procedure to estimate number of interconnects between a gate-pair separated by l gate-pitches in (A) 2-D CMOS ICs, and (B) 3-D Skybridge ICs. Here block A is the source gate and the destination gate belongs to block C. Block B contains all the gates lying between this gate pair. Gates are laid out both vertically and horizontally in Skybridge vs. only horizontal in CMOS.

where  $G_z$  is the number of gates that can be accommodated vertically in Skybridge,  $p_z$  is the vertical gate-pitch, and u(.) is a unit-step function. The maximum interconnect length spanning three dimensions using Manhattan routing is  $2[ \checkmark (N_{tot}/G_z) -1] + (G_z -1)p_z$ . Here, we use  $G_z=2$  for Skybridge [2].

The number of interconnects of length l gate-pitches, I(l)is estimated using Rent's rule as described below. For a partitioned design, Rent's rule relates the number of logic gates N within a sub-module or logic block to the number of external signals or terminals T to that block as  $T = k N^p$ . Here, k is the Rent's coefficient defined as the average number of terminals per logic gate. Rent's exponent p is an empirical parameter used to fit the observed data from circuits to the relationship above. Consider the group of gates shown in Fig. 3. For the gates under consideration in block A, there are several gates in block C that lie at a Manhattan distance of l gate-pitches. By counting the number of terminals from logic block A to logic block C, we get the total interconnections from block A to block C. Using a partial Manhattan circle approximation [3] and taking the average fan-out (f.o.) into consideration, the average number of interconnects of length l gate-pitches for a gate-pair in blocks A-C is given as follows.

$$I(l) = \frac{\alpha k}{N_C} [(N_A + N_B)^p - (N_B)^p + (N_B + N_C)^p - (N_A + N_B + N_C)^p].$$
(3)

Here k and p are Rent's parameters,  $\alpha = (f.o.)/(1+f.o.)$ , and  $N_A$  is set to 1. For 2-D CMOS,  $N_{A-2D} = 1$ ,  $N_{B-2D} = l(l-1)$  and  $N_{C-2D} = 2l$ . This can be extended to Skybridge as follows:

$$N_{A-SB}(l) = 1, (4)$$

$$N_{B-SB}(l) = N_{B-2D}(l) +$$
(5)

$$\frac{1}{G_z} \sum_{i=1}^{G_z-1} [2(G_z - i) \cdot N_{B-2D}(l - ip_z) \cdot u(l - ip_z)],$$

$$N_{G_z \in D}(l) = N_{G_z \in D}(l) + (6)$$

$$\frac{1}{G_z} \sum_{i=1}^{G_z-1} [2(G_z-i) \cdot N_{C-2D}(l-ip_z) \cdot u(l-ip_z)].$$
(0)

Substituting equations (4)-(6) in eq. (3) gives the expression to estimate the number of interconnections between gates separated l gate-pitches. If  $I_{total}$  is the total number of interconnects [5], then the normalization constant  $\Gamma$  is:

$$\Gamma = [I_{total}] / (\sum_{l=1}^{L_{max}} M(l) . I(l)) =$$

$$[\alpha k N_{tot} (1 - N_{tot})^{p-1}] / (\sum_{l=1}^{L_{max}} M(l) . I(l)).$$
(7)

## A. Determination of Model Parameters for Skybridge

a) Rent's Parameters: We use data from designed Skybridge arithmetic circuits and microprocessor [2] to extract Rent's parameters. Using the definition of Rent's coefficient k (average number of terminals per gate), we enumerate the gates and their terminal count for all designed circuits, and calculate the average of the terminal counts. To estimate Rent's exponent p, we extract data-points by computing gatecounts N and terminal-counts T for sub-modules of circuits at various levels of hierarchy, and use curve fitting for Rent's rule expression. In case of multiple terminal counts for a given gate-count, which is possible since different circuits can have same number of logic gates but differ in the number of I/O terminals depending on the function being realized, we use the geometric mean of these data-points since it has been statistically observed to track Rent's rule quite accurately [6]. The results of this analysis are shown in Table I.

b) Average Gate-Pitch and Fan-Out: Gate-pitch is defined as the average separation between adjacent logic gates. The interconnect prediction model described earlier for a 3-D fabric like Skybridge takes horizontal and vertical gate-pitch as parameters. We determine the average horizontal gate-pitch across all the designed logic circuits by considering the number of gates and the area occupied by the circuits. For each module, if the footprint area is A and it contains N gates with a vertical stacking of 2 gates, the horizontal gate pitch (G.P.) assuming uniform distribution of gates is calculated as:

$$G.P. = \sqrt{2A/N}.$$
(8)

For *m* modules under consideration, the net average horizontal gate pitch across all logic circuits is calculated as

$$G.P_{avg} = \frac{1}{m} \sum_{i=1}^{m} \sqrt{\frac{2A_i}{N_i}}.$$
(9)

The vertical gate-pitch  $p_z$  is determined by dividing the total height of nanowires by number of gates that can be stacked vertically (in this case 2). The average fan-out is determined by calculating the average fan-out of each module in the designed logic circuits, followed by taking the arithmetic mean across all modules. The parameters extracted using this method for Skybridge are shown in Table I.

For comparison with 2-D CMOS, we use two sets of parameters for the predictive models for a comprehensive

|                                        | Rent's<br>Parameters |      | Avg.<br>horizontal                        | Vertical                     | Avg.<br>Fan- |
|----------------------------------------|----------------------|------|-------------------------------------------|------------------------------|--------------|
|                                        | k                    | р    | gate-pitch<br>G.P. <sub>avg</sub><br>(nm) | pitch p <sub>z</sub><br>(nm) | out<br>f.o.  |
| CMOS<br>(from<br>designed<br>circuits) | 3.42                 | 0.47 | 803.87                                    | NA                           | 1.7          |
| CMOS<br>(from [3])                     | 4                    | 0.66 |                                           |                              | 3            |
| Skybridge                              | 5.39                 | 0.57 | 150                                       | 448                          | 2.018        |

TABLE I. PARAMETERS FOR INTERCONNECT PREDICTION MODELS

evaluation. The first set of parameters is extracted from designed circuits that were used to compare with Skybridge [2]. In addition, typical values for Rent's parameters and average fan-out are taken from literature [3] for microprocessors, and both sets of parameters (see Table I) are used to derive CMOS interconnect distributions.

#### V. REPEATER COUNT ESTIMATION

#### A. 2-D CMOS Fabric

CMOS integrated circuit interconnects are typically classified into three tiers – *global*, *semi-global*, and *local* interconnects. Each tier is characterized by wiring parameters such as pitch, aspect ratio and material choice. These parameters affect the signal delay when using a particular tier to communicate between gates. To mitigate propagation delay, long interconnects are broken down into shorter segments driven by repeaters (static inverters, see Fig. 4A) [7].

a) Interconnect Delay Models: The equivalent RC circuit used to model the delay of each interconnect segment of length l is shown in Fig. 4B. If  $R_{tr}$  is the resistance of the driver transistor having a parasitic output capacitance  $C_P$ , r and c are the resistance and capacitance per unit length for the interconnect respectively, and  $C_L$  is the load capacitance of the next stage, then the delay ( $\tau$ ) of this segment is given by:

$$\tau = b(x) \cdot R_{tr.} (C_L + C_P) + b(x) \cdot (cR_{tr} + rC_L) \cdot l + a(x) \cdot rcl^2 .$$
(10)

Here  $R_{tr}$ ,  $C_P$  and  $C_L$  can be expressed in multiples of resistance  $r_0$ , parasitic output capacitance  $c_p$  and input capacitance  $c_0$  respectively of a minimum sized inverter. If the size of the driver is *s* times the minimum inverter, then  $R_{tr} = r_0/s$ ,  $C_L = s.c_0$  and  $C_P = s.c_p$ . In eq. (10), *a* and *b* are constants determined by the voltage swing *x* being considered (see Table II). CMOS static logic delay is typically characterized by the propagation delay, which considers a 50% output voltage swing (i.e. *x*=0.5). For a given set of wiring parameters, optimal interconnect length  $l_{opt}$  and optimal repeater size  $s_{opt}$  (in multiples of minimum-sized inverters) can be determined to minimize the overall delay by the following expressions [8].

$$l_{opt} = \sqrt{\frac{b(x).r_0.(c_0 + c_p)}{a(x).r.c}}; \text{ and } s_{opt} = \sqrt{\frac{r_0c}{rc_0}}$$
(11)

The total delay of a full interconnect as a function of its length *l* consisting of *n* such segments is then simply  $\tau_d(l) = n.\tau$ , where  $\tau$  is the delay of each segment given by eq. (10). Since the repeaters are much larger than minimum-sized inverters, cascaded drivers are typically used (see Fig. 4A). Transistor parameters ( $r_0$ ,  $c_0$ , and  $c_p$ ) were extracted for 16nm PTM FinFET models [9][10]. The interconnect resistance parameters (see Table III) were taken from ITRS specifications for 16nm technology node [11], and capacitance parameters were derived using PTM Interconnect RC models [12] which takes into account both ground and coupling capacitances.

TABLE II. PARAMETERS FOR DELAY MODELING [7]

|                                  | a(x) | b(x) |
|----------------------------------|------|------|
| Propagation Delay (50% swing)    | 0.4  | 0.7  |
| Fall/Rise Time (10% - 90% swing) | 0.9  | 2.2  |



Fig. 4. Repeater insertion in 2-D CMOS. (A) Segmentation of long interconnects and repeater insertion. Cascaded drivers are used to drive the large repeaters. Here, multiples indicate the size of the driver in units of minimum inverter size. (B) Equivalent RC circuit used to model the delay of each wire segment driven by a repeater.

| 16nm<br>Node    | Effective Resistivity<br>(µOhm-cm) | Wire Aspect<br>Ratio | Wire Pitch<br>(nm) |
|-----------------|------------------------------------|----------------------|--------------------|
| Global          | 5.26                               | 2.34                 | 152                |
| Semi-<br>Global | 6.96                               | 2                    | 76                 |
| Local           | 6.96                               | 2                    | 38                 |

TABLE III. WIRING PARAMETERS FOR 16NM TECHNOLOGY [11]

*b)* Classification of interconnects Predicted in Interconnect Distribution: The interconnect distribution can be classified into different tiers by estimating the longest interconnect for a given tier. This is determined based on the maximum allowed delay expressed as a fraction  $\beta$  of clock period [13]. Since global signals are expected to have large delay while propagating over large distances, they are typically allowed to use 90% of the clock period ( $\beta$ =0.9) for signal propagation alone. Local and semi-global wires typically are allowed to use 25% of the clock period ( $\beta$ =0.25) for propagation delay, while accommodating delay due to intermediate logic stages during the remaining time. The longest global wire Lmax-global can be determined from the interconnect distribution as the length *l* for which f(l) = 1. Using this as baseline, longest interconnects in local and semiglobal tiers can be estimated using delay criterion as follows.

$$\frac{\tau(L_{max-local})}{\tau(L_{max-global})} = \frac{\beta_{local} \cdot T_{clock}}{\beta_{global} \cdot T_{clock}} = \frac{\beta_{local}}{\beta_{global}}$$

$$\tau(L_{max-local}) = \frac{\beta_{local}}{\beta_{global}} \tau(L_{max-global})$$
(12)

$$(L_{max-semi-global}) = \frac{\beta_{semi-global}}{\beta_{global}} \tau(L_{max-global})$$
(13)

τ

c) Repeater Count Estimation: In any given tier, interconnects whose lengths are between  $l_{opt}$  and  $L_{max}$  will

have optimally sized repeaters inserted for minimizing propagation delay. The number of segments can be computed for a given interconnect length l for that tier, which in turn yields the number of repeaters required R(l). Using the interconnect distribution f(l) which estimates the total number of interconnects of a given length l, the total number of repeaters in a given tier i can be estimated as follows.

$$R_{i} = \sum_{l=lopt-i}^{lmax} f(l).R(l)$$
<sup>(14)</sup>

# B. Skybridge Fabric

In Skybridge, we similarly defined different tiers for interconnects (see Section III). The wire parameters for local tier are based on minimum feature size for 16nm node, and for semi-global and global tiers they are assumed to be the same as that for 16nm CMOS (see Table III).

a) Interconnect Delay Model: Interconnection scheme for different tiers in Skybridge is shown in Fig. 5. Semiglobal and global tiers use repeaters, implemented through coarse-grained heterogeneous integration with islands of CMOS inverters. Local interconnects, implemented with Bridges and Coaxial Routing Structures, are not segmented since fine-grained integration with CMOS inverters would incur overhead in area. Delay modeling is similar to the method described earlier, where delay for each segment is calculated using eq. (10). Skybridge uses dynamic circuit style [2], and the switching model considers the fall-time at the output, i.e. 10% to 90% voltage swing. This fall-time delay should be accommodated within the evaluation clock period for correct functionality of cascaded logic gates. The parameters a and b for Skybridge are shown in Table II.

b) Classification of interconnects in Predicted Interconnect Distribution: Since Skybridge uses dynamic



Fig. 5. Interconnection scheme in Skybridge. (A) Global/semi-global interconnects using static CMOS repeaters; and (B) Local interconnects. Repeaters are not used for local interconnects.



Fig. 6. Predicted interconnect length distribution (A) – (C) and estimated repeater counts (D) – (F) in Skybridge vs. 2D CMOS. Here, number of gates  $N_{CMOS} = 10^7$ . Number of gates in Skybridge are (A)  $N_{SB} = 0.5x10^7$  and corresponding repeater counts in (D); (B)  $N_{SB} = 0.75x10^7$  and corresponding repeater counts in (E); and (C)  $N_{SB} = 10^7$  and corresponding repeater counts in (F). Parameters for Skybridge: k=5.39, p=0.577 (Rent's parameters), average fan-out = 2.018. For CMOS, Parameter Set 1: k=4, p=0.66, average fan-out = 3; and Parameter Set 2: k=3.416, p=0.473, average fan-out = 1.7.

circuits [2], the entire evaluation clock period is devoted to the total delay of a gate driving an interconnect and gate load. Thus for all tiers,  $\beta = 1$  which implies that the delay of the longest interconnect in any tier is the same. The methodology described before for 2-D CMOS was used to classify interconnects to each tier based on delay criterion.

c) Repeater count estimation: The methodology for estimating the total number of repeaters for a given integrated circuit is the same as described earlier.

#### VI. EVALUATION AND CONCLUSION

We compare the system-level implications of Skybridge with 2-D CMOS for a design with 10 million gates. In particular, we look at the effect of reduced interconnect lengths in Skybridge on repeater count. Since Skybridge supports high fan-in circuits [2], the atomic logic gates are much more expressive than CMOS and are expected to result in fewer gates for a given high bit-width function. Thus, we analyze three different scenarios – if  $N_{CMOS}$  is the total number of gates considered for a 2-D CMOS IC, the number of gates for Skybridge implementation  $(N_{SB})$  is varied between  $0.5N_{CMOS}$ ,  $0.75N_{CMOS}$  and  $N_{CMOS}$ . The results are shown in Fig. 6. Here, we can see that the longest interconnect in Skybridge is significantly shorter when compared to 2-D CMOS, specifically up to 10x shorter. We also see that there is up to 2 orders of magnitude reduction in the total number of repeaters required in Skybridge. This implies tremendous reduction in terms of area overhead of repeaters as well as leakage power savings for larger designs such as multi-core processors.

#### REFERENCES

- R. Puri, and D. S. Kung, "The dawn of 22nm era: Design and CAD challenges," in Proceedings of 23rd International Conference on VLSI Design, pp. 429-433, 2010.
- [2] M. Rahman, S. Khasanvis, J. Shi, M. Li, and C. A. Moritz, "Skybridge: 3-D Integrated Circuit Technology Alternative to CMOS". <u>Available</u> <u>Online:</u> http://arxiv.org/abs/1404.0607
- [3] J. A. Davis, V. K. De, and J. Meindl, "A stochastic wire-length distribution for gigascale integration (GSI)—Part I: Derivation and validation," *IEEE Transactions on Electron Devices*, vol. 45, pp. 580– 589, 1998.
- [4] A. Rahman, R. Reif, "System-level performance evaluation of threedimensional integrated circuits," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 8, pp. 671-678, 2000.
- [5] W. Donath, "Placement and average interconnection lengths of computer logic," *IEEE Transactions on Circuits and Systems*, vol. 26, pp. 272-277, 1979.
- [6] P. Christie, and D. Stroobandt, "The interpretation and application of Rent's rule," *IEEE Transactions on Very Large Scale Integration Systems*, vol. 8, pp. 639-648, 2000.
- [7] H. Bakoglu, Circuits, Interconnects and Packaging for VLSI, Addison-Wesley, Boston, 1990.
- [8] R. H. J. M. Otten, and R. K. Brayton, R. K., "Planning for performance," in Proceedings of 35th Annual Design Automation Conference, pp. 122–127, 1998.
- [9] Arizona State University. PTM-MG device models for 16nm node. <u>http://ptm.asu.edu/</u>. 2011.
- [10] S. Sinha, G. Yeric, V. Chandra, B. Cline, and Y. Cao, "Exploring sub-20nm FinFET design with Predictive Technology Models," in Proceedings of 49th ACM/EDAC/IEEE Design Automation Conference, pp. 283-288, 2012.
- [11] ITRS. ITRS 2012 Interconnect Tables. http://itrs.net/. 2012.
- [12] Arizona State University. PTM R-C Interconnect models. <u>http://ptm.asu.edu/</u>. 2012.
- [13] J. A. Davis, V. K. De, and J. D. Meindl, "A stochastic wire-length distribution for gigascale integration (GSI)—Part II: Applications to clock frequency, power dissipation, and chip size estimation," *IEEE Transactions on Electron Devices*, vol. 45, 1998.