# Design and Benchmarking of Hybrid CMOS-Spin Wave Device Circuits Compared to 10nm CMOS

Odysseas Zografos\*†, Bart Sorée\*†, Adrien Vaysset\*†, Stefan Cosemans\*, Luca Amarù‡, Pierre-Emmanuel Gaillardon‡, Giovanni De Micheli‡, Rudy Lauwereins\*†, Safak Sayan\*§, Praveen Raghavan\*, Iuliana P. Radu\* and Aaron Thean\*

\*imec, Leuven, Belgium - Email: zogra@imec.be

†Department of Electrical Engineering, KU Leuven, Belgium

‡Integrated Systems Laboratory, EFPL, Switzerland

§Intel Corporation, Santa Clara, CA, USA

Abstract—In this paper, we present a design and benchmarking methodology of Spin Wave Device (SWD) circuits based on micromagnetic modeling. SWD technology is compared against a 10nm FinFET CMOS technology, considering the key metrics of area, delay and power. We show that SWD circuits outperform the 10nm CMOS FinFET equivalents by a large margin. The areadelay-power product (ADPP) of SWD is smaller than CMOS for all benchmarks from 2.5× to 800×. On average, the area of SWD circuits is 3.5× smaller and the power consumption is two orders of magnitude lower compared to the 10nm CMOS reference circuits.

### I. INTRODUCTION

Novel beyond-CMOS devices and circuits are actively being studied to expand the functionality, while overcoming the power limits, of future nano-electronics. Spin-based logic with its propensity for non-volatility, intrinsic data parallelism, and high endurance are among the popular emerging device-circuit architectures. Spin Wave Devices (SWD) are logic components that utilize the oscillations of magnetization in ferromagnetic materials and were introduced in [1]. One of the most promising concepts of circuit design using SWDs was presented in [2] and in [3] it was put forward as a competitive option to CMOS.

The operating principle of these circuits relies on a synthetic multiferroic stack used to generate and detect spin waves, called Magneto-Electric (ME) cell [2]. The generated spin waves propagate in ferromagnetic wires, called spin wave buses. The computation principle is based on the interference of propagating spin waves, where the information is encoded in the phase of the waves. To gain insights into the device-circuit-system interactions for such radically different information processing scheme, it is important to develop means to compare against the familiar sequential CMOS logic of today. Hence, purpose of this work is to quantify the benefits of the SWD technology compared to a state-of-the-art CMOS technology [4].

Spin wave generation, detection and propagation has been studied both experimentally [5], [6] and with simulations and modeling [7]–[9]. However, none of these publications explore the possibility to use ME cells for spin-wave-based logic circuits. The concept of SWD circuits was introduced in [2], but its modeling at nano-scaled dimensions was not studied with accuracy of micromagnetic simulations. In [10], the authors use SWDs to compose circuits but assume unrealistic capabilities of ME cells, without addressing their feasibility. Finally, [3] introduces a first order benchmarking of SWDs as circuits components but ignores any overhead sensing CMOS circuitry.

In this paper, we provide a thorough analysis of SWD circuits ranging from modeling and simulations to circuit architecture, synthesis and benchmarking. The main contributions of this work are: (1) thorough micromagnetic simulations of the magnetic behavior of nano-scaled ME cells as circuit components; (2) use of Majority synthesis (MIG) [11] to exploit native majority gate of SWDs; (3) benchmarking with large combinational designs including specifically designed sense amplifiers as output sensing circuitry of the SWD designs.

The remainder of this paper is organized as follows. In Section II, we describe the operating principle of the ME cell. The voltage modeling and micromagnetic simulations setup and results are described in Section III. Section IV, summarizes the system-level benchmarking methodology we used. In Section V, we present the area, delay and power comparative results against 10nm CMOS reference designs. Section VI concludes the paper.

#### II. MAGNETO-ELECTRIC CELL OPERATION

The basic component used as input and output structure for SWDs is the Magneto-Electric cell (ME cell). It is proposed and modeled analytically in [2]. In Fig. 1a, a schematic view of the ME cell is given with NiFe as the ferromagnetic material comprising the spin wave bus.



Fig. 1. (a) ME cell schematic and (b) formation of canted magnetization in the magnetostrictive layer in the ME cell.

The ME cell consists of a bottom magnetostrictive layer (Ni), in which the spin wave produces a strain that in turn is translated into voltage through the piezoelectric layer (PE) and is read out via the top contact layer. The inverse process is used to generate spin waves that propagate through the spin wave bus. The magnetization of the magnetostrictive layer is bi-stable and can switch between two canted magnetization states. The phase of the spin wave generated depends on the canted state (state '0' produces wave with phase  $\phi=0$ , state '1' produces phase  $\phi=\pi$ ). Figure 1b presents the canting of magnetization ( $\vec{m}$ ) in the magnetostrictive layer of the ME cell. The spin wave bus (NiFe) is defined to have out of plane (along z-axis) magnetic anisotropy and the magnetostrictive layer (Ni)

to have in plane (along y-axis) anisotropy. This means that there will be a transition of the magnetization moving from the waveguide into the magnetostrictive layer. As a consequence, the magnetization will be tilted in the ME cell region. Two symmetric equilibrium states are thus possible, at  $\theta_{ME}$  and  $-\theta_{ME}$  with respect to the z-axis (Fig. 1b). The value of  $\theta_{ME}$  will depend on the ratio between the anisotropies of the wave guide and the ME cell.

#### III. ME CELL MODELING AND SIMULATIONS

#### A. Output Voltage modeling

The output voltage of an ME cell is a critical parameter of SWD circuits because it defines the complexity and power consumption of the surrounding sensing circuitry. In previous works the output voltages produced by the ME cell were considered  $\pm 10$  mV [2], [12]. However, in this work we investigate the voltage to magnetization transducing further, to allow a global optimization of both the SWD circuit as well as the CMOS periphery circuit needed for digital computation.

The ME cell output voltage 
$$V_{OUT}$$
 can be simply modeled as:  $V_{OUT} = |\vec{E}| \cdot t_{PE}$  (1)

where  $\vec{E}$  is the electric field across the piezoelectric layer of the ME cell and  $t_{PE}$  is its thickness. Equation 1 can be written

$$V_{OUT} = c_{ME} \cdot |\vec{H}_{eff}| \cdot t_{PE} \tag{2}$$

where  $c_{ME}$  is the magneto-electric coefficient of the ME cell and  $\vec{H}_{eff}$  is the effective magnetic field induced by the detected spin wave.

As modeled in [2], in order to have correct switching from one canted state to the other, the spin wave amplitude arriving to the output ME cell has to be similar to the difference in magnetization of the two stable states. We assume that this amplitude and hence the effective magnetic field induced is:

$$|\vec{H}_{eff}| = 2 \cdot M_Y \tag{3}$$

where  $M_Y$  is the y-component of the magnetization, as shown in Fig. 1b. Given equations 2 and 3, we have:

$$V_{OUT} = c_{ME} \cdot 2 \cdot M_Y \cdot t_{PE} \tag{4}$$

Based on Fig. 1b, equation 4 can be written as:

$$V_{OUT} = c_{ME} \cdot M_S \cdot \tan(2 \cdot \theta_{ME}) \cdot t_{PE} \tag{5}$$

Equation 5 serves as a model to better define the configuration of an ME cell. Further on we use it assuming that the magnetoelectic coefficient is equal to 27 V/cm·Oe [13].

## B. Micromagnetic simulation setup

To better understand the switching mechanism between the two ME cell states, we performed micromagnetic simulations using the widely accepted micromagnetic solver OOMMF [14]. The structure we used for these simulations included only the magnetic layers (Ni, NiFe, as shown in Fig. 1a). Simulations were performed in two phases: the first one to compute equilibrium states and the second one to observe switching. In the initialization phase we assign perpendicular anisotropies to the wave bus and ME cell regions of the structure and let the system relax into an equilibrium state. The saturation magnetization of each region of the structure is assumed to be the same,  $M_S = 500kA/m$ . The amplitude of the anisotropy of each region was varied in order to obtain

different configurations of the ME cell, meaning different angles  $\theta_{ME}$ . The different configurations produced are shown in Table I.

TABLE I. MICROMAGNETIC SIMULATIONS SETUP  $\theta_{ME}$  (°)  $H_{app}$  (Oe)  $T_{\it app}$  (ns) 2 438.827 [0.05 - 1.0] [0.01 - 0.2][0.05 - 1.0] [0.01 - 0.2] 658.576 878.726 [0.05 - 1.0][0.01 - 0.2] 1099.415 [0.05 - 1.0]

[0.05 - 1.0]

[0.01 - 0.2]

2785.897

In the second stage, a rectangular magnetic field pulse  $(H_{app})$  was applied in the ME cell region in order to mimic the response of an input voltage. Varying the duration of the applied magnetic field  $(T_{app})$  and the damping  $(\alpha)$ , the evolution of the ME cell magnetization is monitored. This set of simulations was repeated for the five configurations listed in Table I. Thermal noise and material defects were not taken into account.

## C. Micromagnetic simulation results

12.5

Fig. 2 presents the switching behavior of the different ME cell configurations. Each plot in Fig. 2 (2a - 2e) is a Shmoo plot, with green representing switching and red not switching. We observed that there were two reasons for which an ME cell failed to switch. First if the pulse was too short, not enough energy was given to the magnetization of the ME cell region in order to precess from one state to the other. Second, in some cases the energy given to the ME cell was too high (or the damping too low) so that the magnetization rotated back to the initial state, failing to switch correctly.

In Fig. 2f, a summative Shmoo plot, the saturation of the color represents the amount of times each configuration switched correctly varying from 0 to 4 (out of 5 simulated). We observe that the top right-hand side of the simulation space is the most probable to give us an ME cell configuration that switched correctly. This can be interpreted in a qualitative way: a nano-scaled (48nm×48nm) ME cell will most probably behave correctly if the applied input pulse is relatively long (>0.2ns) and its damping has a high value ( $\approx 0.2$ ). In Table II we present the lowest switching delays (measured from the simulations) and the output voltage (eq.5) of each configuration, assuming piezoelectric thickness  $t_{PZT} = 40$ nm.

TABLE II. FASTEST SWITCHING CONFIGURATION FOR EACH ME ANGLE SIMULATED

| $\theta_{ME}$ (°) | $(\alpha, T_{app})$ | $T_{ME}$ (ns) | $V_{OUT}$ (mV) |
|-------------------|---------------------|---------------|----------------|
| 2                 | (0.15, 0.3)         | 0.812         | 50             |
| 3                 | (0.15, 0.2)         | 0.528         | 68             |
| 4                 | (0.20,0.2)          | 0.475         | 97             |
| 5                 | (0.20,0.2)          | 0.420         | 119            |
| 12.5              | (0.10, 1.0)         | 1.113         | 300            |

Based on these results, we select the configuration highlighted in Table II to use in our circuit benchmarking presented in the next section. We assume that the ME cell used at input and output of the circuits are identical and that the input voltage is equal to the output one.



Fig. 2. Results of input switching simulations.

#### IV. SYNTHESIS & BENCHMARKING OF SWD CIRCUITS

## A. Majority Synthesis

Because SWD technology is based on a wave computation scheme, it provides the capability of implementing simple and compact majority gates (MAJ), that can be produced by merging three waveguides. Majority gates enhance logic power of a design because they can emulated both AND and OR operation and is one of the basis for basic operation of binary arithmetic [15]. In order to fully utilize them we used a Majority synthesis methodology, Majority-Inverter Graph (MIG) [11]. The MIG is a logic representation structure consisting of three-input majority nodes and regular/complemented edges. This means that only two logic components are required for this representation, a MAJ gate and and inverter (INV). Figure 3 presents the two primitive gates we have considered to be implemented in SWD technology.



Fig. 3. Gate primitives used for SWD circuits.

In Fig. 3a we present the INV component which is a simple waveguide, with a magnetically pinned layer, that inverts the phase of the propagating signal. The MAJ gate (Fig. 3b) is the merging of three waveguides. For the gates presented in Fig. 3, we assume minimum propagation length equal to one wavelength of the spin wave  $(\lambda_{SW})$  which in our study is

calculated at 48nm, since the wavelength is defined/confined by the width of the spin wave bus. A completed SWD circuit consists of cascaded gates (INVs and MAJs), where the output ME cell of one gate is used also as the input ME cell of the next gate.

# B. Benchmarks

The gains of SWD circuits are quantified using ten arithmetic combinational benchmarks that have varying input and output number of bits (I/O bits), which is critical in order to quantify the impact of the CMOS peripheral circuitry. The list includes adders, multipliers, a MAC module (all generated by [16]) and a divider (DIV32) and a cyclic redundancy check module (CRC32). The specifications in Table III are used to calculate the results presented in Table IV and are extracted based on the selected ME cell configuration. The sense amplifier (SA) specifications are from a custom designed SA to sense the ME cells  $V_{OUT}$ . All CMOS 10nm reference results are provided post-synthesis by a commercial synthesis tool.

TABLE III. SPECIFICATIONS OF SWD CIRCUIT COMPONENTS

| Component | Area $(\mu m^2)$ | Delay (ns) | Energy (fJ)                                                                        |
|-----------|------------------|------------|------------------------------------------------------------------------------------|
| INV       | 0.006912         | 0.42       | $ \begin{array}{c} 1.44 \times 10^{-8} \\ 4.33 \times 10^{-8} \\ 2.7 \end{array} $ |
| MAJ       | 0.03456          | 0.42       |                                                                                    |
| SA        | 0.050688         | 0.03       |                                                                                    |

## V. EXPERIMENTAL RESULTS

Table IV includes the area metric for both technologies, the energy calculated to be consumed in the SWD circuits, the delay metric and the power consumption metric. The last three rows summarize all the aforementioned metrics into one the ADPP.

TABLE IV. SUMMARY OF BENCHMARKING RESULTS

| Name         | Area $(\mu m^2)$ |         |           | Energy (fJ) | Delay (ns) |        | Power $(\mu W)$ |       | ADPP      |                      |                      |           |
|--------------|------------------|---------|-----------|-------------|------------|--------|-----------------|-------|-----------|----------------------|----------------------|-----------|
|              | SWD core         | CMOS SA | SWD Total | 10nm Ref.   | SWD Total  | SWD    | 10nm Ref.       | SWD   | 10nm Ref. | SWD                  | 10nm Ref.            | Impr. (x) |
| BKA264       | 36.48            | 3.12    | 39.60     | 118.55      | 175.50     | 5.07   | 0.21            | 34.62 | 133.92    | 6.95·10 <sup>3</sup> | $3.33 \cdot 10^3$    | 0.48      |
| HCA464       | 82.71            | 3.17    | 85.88     | 262.63      | 178.20     | 8.01   | 0.29            | 22.25 | 594.28    | $1.53 \cdot 10^4$    | $4.53 \cdot 10^4$    | 2.96      |
| CSA464       | 78.42            | 3.17    | 81.59     | 240.26      | 178.20     | 7.59   | 1.78            | 23.48 | 663.17    | $1.45 \cdot 10^4$    | $2.84 \cdot 10^5$    | 19.51     |
| DTM32        | 326.31           | 3.07    | 329.38    | 1183.64     | 172.80     | 14.73  | 0.52            | 11.73 | 3667.50   | $5.69 \cdot 10^4$    | $2.26 \cdot 10^{6}$  | 39.66     |
| WTM32        | 264.96           | 3.07    | 268.04    | 1163.37     | 172.80     | 20.61  | 0.58            | 8.38  | 3571.90   | $4.63 \cdot 10^4$    | $2.41 \cdot 10^{6}$  | 52.04     |
| DTM64        | 1192.69          | 6.14    | 1198.83   | 3459.32     | 345.60     | 18.09  | 0.63            | 19.10 | 12793.10  | $4.14 \cdot 10^5$    | $2.79 \cdot 10^7$    | 67.29     |
| <b>GFMUL</b> | 44.09            | 0.82    | 44.91     | 162.98      | 45.90      | 7.17   | 0.16            | 6.40  | 433.92    | $2.06 \cdot 10^3$    | $1.13 \cdot 10^4$    | 5.49      |
| MAC32        | 295.25           | 3.12    | 298.37    | 1372.83     | 175.50     | 24.39  | 0.66            | 7.20  | 3872.10   | $5.24 \cdot 10^4$    | $3.51 \cdot 10^{6}$  | 67.00     |
| DIV32        | 899.04           | 6.14    | 905.18    | 3347.73     | 345.60     | 117.21 | 14.00           | 2.95  | 5346.10   | $3.13 \cdot 10^5$    | $2.51 \cdot 10^{8}$  | 800.94    |
| CRC32        | 27.61            | 1.54    | 29.14     | 95.88       | 86.40      | 5.07   | 0.22            | 17.04 | 304.30    | $2.52 \cdot 10^3$    | $6.42 \cdot 10^3$    | 2.55      |
| Averages     | 324.76           | 3.34    | 328.09    | 1140.72     | 187.65     | 22.79  | 1.91            | 15.31 | 3138.03   | $9.24 \cdot 10^4$    | 2.87·10 <sup>7</sup> | 105.79    |

First, we observe that for all benchmarks the CMOS-SWD hybrid circuits give smaller area (on average  $3.5 \times$  smaller). This is based on two main factors; (1) the Majority synthesis in conjunction with the MAJ SWD gate yield great results, (2) the output voltage modeled doesn't require bulky SAs. Second, we note that for all benchmarks the CMOS-SWD circuits are much slower than the reference circuits (on average  $13 \times$  slower). This is due to the large ME cell switching delay (the input pulse alone is 0.2ns). However, due to the low energy consumption of both the SWD gates and the SA design the power consumption metrics are in large favor of the SWD circuits for all the benchmarks. Figure 4 depicts how much the SWD circuits outperform the 10nm CMOS reference ones. Especially, for the largest benchmark (DIV32) the lack of delay/performance of SWD technology is balanced out by the big gains in area and power ( $800 \times$  better ADPP).



Fig. 4. ADP product of all benchmarks.

These results compels us to characterize SWD (with CMOS overhead circuitry) as a technology adept for ultra-low power applications, where latency is a secondary objective. SWD circuits perform in a way that CMOS circuits are not able to even if their optimized only for power consumption. Just their innate leakage power would be enough in large designs to exceed the power consumption of their SWD equivalents.

#### VI. CONCLUSIONS

We show first-reported micromagnetic simulations results to model the properties of the SWD technology and compare its performance with state-of-the-art CMOS. We have assumed that such a novel device-circuit architecture will need to be embedded with CMOS logic components for signal restoration to complete the hybrid system. Our study shows that using simple gate structures, Majority synthesis and optimized ME cell configurations, SWD would give significant gains on area

 $(3.5 \times \text{smaller})$ , power consumption ( $100 \times \text{lower}$ ) and throughput compared to advanced CMOS technology. Therefore SWD appears as a strong contender for ultra-low power applications.

#### REFERENCES

- [1] A. Khitun and K. L. Wang, "Nano scale computational architectures with spin wave bus," *Superlattices and Microstructures*, vol. 38, no. 3, pp. 184 200, 2005.
- [2] —, "Non-volatile magnonic logic circuits engineering," *Journal of Applied Physics*, vol. 110, no. 3, 2011.
- [3] D. Nikonov and I. Young, "Overview of beyond-cmos devices and a uniform methodology for their benchmarking," *Proc. of the IEEE*, vol. 101, no. 12, pp. 2498–2533, Dec 2013.
- [4] J. Ryckaert, P. Raghavan, R. Baert et al., "Design technology cooptimization for n10," in Custom Integrated Circuits Conf. (CICC), 2014 IEEE Proc. of the, Sept 2014, pp. 1–8.
- [5] M. Madami, S. Bonetti, G. Consolo et al., "Direct observation of a propagating spin wave induced by spin-transfer torque," *Nature nanotechnology*, vol. 6, no. 10, pp. 635–638, 2011.
- [6] A. Chumak, P. Pirro, A. Serga et al., "Spin-wave propagation in a microstructured magnonic crystal," Applied Physics Letters, vol. 95, no. 26, p. 262508, 2009.
- [7] S.-K. Kim, "Micromagnetic computer simulations of spin waves in nanometre-scale patterned magnetic elements," *Journal of Physics D: Applied Physics*, vol. 43, no. 26, 2010.
- [8] S. Dutta, D. Nikonov, S. Manipatruni et al., "Spice circuit modeling of pma spin wave bus excited using magnetoelectric effect," Magnetics, IEEE Trans. on, vol. 50, no. 9, pp. 1–11, Sept 2014.
- [9] J. Alzate, P. Upadhyaya, M. Lewis et al., "Spin wave nanofabric update," in Nanoscale Architectures (NANOARCH), 2012 IEEE/ACM International Symp. on. IEEE, 2012, pp. 196–202.
- [10] P. Shabadi, S. N. Rajapandian, S. Khasanvis *et al.*, "Design of spin wave functions-based logic circuits," in *SPIN*, vol. 2, no. 03. World Scientific, 2012.
- [11] L. Amarú, P.-E. Gaillardon, and G. De Micheli, "Majority-inverter graph: A novel data-structure and algorithms for efficient logic optimization," in *Proc. of the 51st Annual Design Automation Conf.*, ser. DAC '14, 2014, pp. 1–6.
- [12] O. Zografos, P. Raghavan, L. Amaru et al., "System-level assessment and area evaluation of spin wave logic circuits," in Nanoscale Architectures (NANOARCH), 2014 IEEE/ACM International Symp. on, July 2014, pp. 25–30.
- [13] T. Wu, A. Bur, K. Wong et al., "Electric-poling-induced magnetic anisotropy and electric-field-induced magnetization reorientation in magnetoelectric ni/(011)[pb (mg1/3nb2/3) o3](1-x)-[pbtio3] x heterostructure," *Journal of applied physics*, vol. 109, no. 7, p. 07D732, 2011.
- [14] NIST, "Object oriented micromagnetic framework," http://math.nist. gov/oommf/.
- [15] J. Von Neumann, "Non-linear capacitance or inductance switching, amplifying, and memory organs," Dec. 3 1957, uS Patent 2,815,488.
- [16] T. U. Aoki laboratory, "Arithmetic module generator," http://www.aoki. ecei.tohoku.ac.jp/arith/.