IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

# Benchmarking of Scaled Majority-Logic-Synthesized Spintronic Circuits Based on Magnetic Tunnel Junction Transducers

Fanfan Meng<sup>®</sup>, Member, IEEE, Siang-Yun Lee<sup>®</sup>, Odysseas Zografos, Mohit Gupta<sup>®</sup>, Van D. Nguyen<sup>®</sup>, Giovanni De Micheli<sup>®</sup>, Life Fellow, IEEE, Sorin Cotofana<sup>®</sup>, Fellow, IEEE, Inge Asselberghs<sup>®</sup>, Christoph Adelmann<sup>®</sup>, Member, IEEE, Gouri Sankar Kar, Sebastien Couet<sup>®</sup>,

and Florin Ciubotaru<sup>®</sup>, *Member, IEEE* 

Abstract-It is envisaged that spintronic logic devices will ultimately be utilized in hybrid CMOS-spintronic systems where signal interconversion between magnetic and electrical domains via transducers takes place. This underscores the vital role of transducers in influencing the overall performance of such hybrid systems. This paper addresses the question: Can spintronic circuits based on Magnetic Tunnel Junction (MTJ) transducers outperform their state-of-the-art CMOS counterparts? To this end, we use the EPFL (École Polytechnique Fédérale de Lausanne) combinational benchmark sets, synthesize them in 7 nm CMOS and in MTJ transducer based spintronic technologies, and compare the two implementation methods in terms of Energy-Delay-Product (EDP). To fully utilize the technologies' potential, CMOS and spintronic implementations are built upon standard Boolean and Majority Gates, respectively. For the spintronic circuits, we assumed that domain conversion (electric/magnetic to magnetic/electric) is performed by means of MTJs and the computation is accomplished by domain wall (DW)-based majority gates, and considered two EDP estimation scenarios: (i) Uniform Benchmarking, which ignores the circuit's internal structure and only includes domain transducers' power and delay contributions into the calculations, and (ii) Majority-Inverter-Graph Benchmarking, which also embeds the circuit structure, the associated critical path delay and energy consumption by DW propagation. Our results indicate that, for the uniform case, the spintronic route is better suited for the implementation of complex circuits with few inputs and outputs. On the other hand, when the circuit structure is also considered via majority and inverter synthesis, our analysis clearly indicates that in order to match and eventually outperform CMOS performance, MTJ transducers' efficiency has to be improved by 3-4 orders

Manuscript received 6 March 2024; revised 19 May 2024; accepted 12 June 2024. This work was supported by IMEC's Industrial Affiliation Program on Exploratory Logic Devices. This article was recommended by Associate Editor X. Fong. (*Corresponding author: Florin Ciubotaru.*)

Fanfan Meng was with the Department of Electrical Engineering, KU Leuven, 3000 Leuven, Belgium. He is now with the Interuniversity Microelectronics Centre (IMEC), 3001 Leuven, Belgium (e-mail: fanfan.meng@ imcc.be).

Siang-Yun Lee and Giovanni De Micheli are with EPFL, 1015 Lausanne, Switzerland.

Odysseas Zografos, Mohit Gupta, Van D. Nguyen, Inge Asselberghs, Christoph Adelmann, Gouri Sankar Kar, Sebastien Couet, and Florin Ciubotaru are with IMEC, 3001 Leuven, Belgium (e-mail: florin.ciubotaru@imec.be).

Sorin Cotofana is with the Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science Faculty, TU Delft, 2628 CD Delft, The Netherlands.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2024.3420250.

Digital Object Identifier 10.1109/TCSI.2024.3420250

of magnitude. While it is clear that for the time being the MTJ-based-spintronic way cannot compete with CMOS, further technological transducer developments may tip the balance, which, when combined with information non-volatility, may make spintronic implementation for certain applications that require a large number of calculations and have a rather limited amount of interaction with the environment.

*Index Terms*—Magnetic logic, magnetic tunnel junction, domain wall devices.

#### I. INTRODUCTION

THE sharp increase in electronic equipment used daily across the globe, from end-user devices to data centers, and the associated energy consumption has led to a craving for more energy-efficient computing devices [1]. However, the current Moore's law epitomized miniaturization of microelectronic circuits that rely on CMOS transistors has been gradually limited due to increasing power density and associated chip heating [2]. Therefore, intensive research has been devoted to exploring alternative devices [3], [4] such as 2D material channel FETs [5], Mott FETs [6], excitonic devices [7], etc. Spintronic devices centered on nanomagnets are seen as a promising category of beyond CMOS devices for (1) the ultra-low energy associated with magnetization dynamics and nanomagnet switching; (2) high endurance; (3) non-volatility to counteract leakage power; (4) capability to build more expressive logic gates (e.g., majority gates); and (5) applicability to both traditional and emerging architectures [8]. In the past decade, numerous spintronic logic concepts have been proposed and demonstrated for realizing Boolean logic gates, utilizing, e.g., dipolar interactions between nanomagnets, interactions between domain walls, interference of spin waves, and Magneto-Electric Spin-Orbit (MESO) logic [8], [9], [10], [11], [12].

However, for the time being, there is no concept for a full spintronic computer, which incorporates logic, memory, and interconnects using exclusively magnetic signals [8]. Therefore, it is envisaged that spintronic logic devices will be utilized in hybrid CMOS-spintronic systems where signal interconversion between magnetic and electrical domains via transducers takes place as illustrated in Fig. 1a. The performance of such hybrid systems, in terms of energy consumption and computing throughput, will highly depend on the utilized

1549-8328 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. (a) A schematic of a hybrid-CMOS-spintronic logic circuit: charge-based information is first converted to magnetic information carriers (*e.g.*, domain wall, spin waves, magnetization) via transducers. Then, computation is achieved by information carriers' interaction within the magnetic domain, and finally, the resultant magnetic information is converted back to electrical outputs via transducers. (b) Two benchmarking approaches with different levels of circuit abstractions. (c) Full Adder representation in Uniform and MIG benchmarking, respectively.

conversion mechanisms and the number of interconversions needed to perform the computation. Although many spintronic concepts have been proven to materialize in individual logic gates, their integration into CMOS systems, *i.e.*, the development of corresponding transducers, is at various stages of maturity. Up to date, Magnetic Tunnel Junctions (MTJs) that are the key elements in Magnetic Random-Access Memory (MRAM), are the only transducers demonstrated in fully integrated, scaled, and CMOS-compatible Domain Wall (DW) based spintronic logic devices [13]. Hence, in this work, which attempts to evaluate the targets and challenges of building efficient spintronic Boolean logic circuits from the transducer perspective, we make use of MTJs as a discussion vehicle. Specifically, the Energy-Delay-Product (EDP) is used as a figure of merit to compare a collection of spintronic logic circuits using MTJs as input/output transducers to 7 nm node CMOS technology. Spintronic devices are characterized by their energy efficiency but also low speed. Hence, the energy-delay-product (power  $\times$  delay<sup>2</sup>), which increases the weight of delay, provides a more balanced perspective between energy and delay and serves as a more impartial and relevant figure-of-merit for spintronic systems. Additionally, this metric also aligns with the benchmarking performed by Intel for beyond CMOS devices [3]. As depicted in Fig. 1b, different levels of circuit abstractions are applied to the spintronic circuits, namely Uniform and Majority-Inverter-Graph (MIG)based benchmarking, to gain insights on the energy-delay cost contributions from different sources.

## II. OVERVIEW OF THE BENCHMARKING STRATEGIES

The benchmarking evaluations are carried out in order of increasing complexity; as the analysis progresses, more contributors to the total EDP of the spintronic circuits are considered. We start with uniform benchmarking (Fig. 1b), which (i) considers only the energy and delay associated with input  $(x_i)$  and output  $(y_j)$  transducers, *i.e.*, the switching and detection of the magnetization orientation of MTJs' free layers and (ii) disregards the magnetic circuits between inputs and outputs, including intermediate spin logic gates and magnetic interconnects. Additionally, this method takes into account the minimum number of transducers required in a hybrid CMOS-spintronic circuit, as defined by the circuit's function. As an example, a Full Adder (FA) (Fig. 1c) adds together two binary digits plus a carry-in digit to produce a sum and carry-out digit and therefore requires at least three inputs and two outputs. By only considering the minimum number of transducers required in the system, the uniform method provides a system EDP lower bound as well as a minimum target for transducer efficiency for which spin-hybrid circuits can outperform CMOS. The minimum transducer efficiency target derived with this method holds true regardless of the paradigm (e.g., spin wave computing, plasmonic computing, MESO logic) and different circuit implementations, and hence the method is known as uniform benchmarking [14].

To get better EDP estimates, we need to further consider the actual structure of the benchmark circuits. Given that spintronics provides natural support for inverter [11] and majority (MAJ) gate [15] implementations, which together form a universal gate set, we make use of such elements to describe the internal organization of the circuit. A MAJ gate operates according to the majority voting principle, it returns true if more than 50% of inputs are true. By setting one input to be a constant 0 or 1, it can emulate both logic AND and OR operations and promises circuits with higher computational density [16]. Thus, to fully exploit the gains brought by majority functions, instead of using standard logic synthesis tools based on AND, OR, XOR, and NAND gates, we employ a customized logic synthesis tool known as a Majority-Inverter-Graph (MIG), which provides guidelines on the realization of logic circuits using majority gates and inverters [17]. MIGs provide the number of gates required and how they are connected at the logic level, however, they do not reveal the physical placements and routings of these gates. Again, using the FA as an example, the additional cost related to four repeated inputs, three MAJs and two INVs is considered (Fig. 1c). We further divide this benchmarking into two phases.

In the first one, we consider the additional energy cost due to repeated inputs, and the delay in the magnetic domain. In the second phase, we additionally include the energy cost related to information propagation in the magnetic domain. Details on the made assumptions are provided in section IV.

In both benchmarking studies, we use a collection of representative combinational logic circuits from the EPFL Combinational Benchmarking Suite (Fig. 2a) with a large variation in sizes, complexity levels, and input/output (I/O) ratios [18]. These circuits were first synthesized with commercial software in 7nm CMOS technology to provide a comparison base of spintronic circuits with CMOS counterparts implemented in leading-edge mass production technology [19]. The CMOS synthesis is optimized for low-power operation and the EDP is used to set targets for their spintronic counterparts. As for spintronic circuit transducers, we considered state-of-art Spin-Transfer-Torque (STT) [20]- and Spin-Orbit-Torque (SOT) [21]-based MTJ technologies. The energy and delay of individual MTJ's writing and reading [22] are summarized in Fig. 2b.

# **III. UNIFORM BENCHMARKING**

In uniform benchmarking, where only the energy and delay cost of the minimum number of transducers is considered, the spintronic circuits' EDP is defined as

$$EDP_{spin} = \underbrace{(n_{in} \times E_w + n_{out} \times E_r)}_{total \ energy} \times \underbrace{(t_w + t_r)}_{total \ delay}, \quad (1)$$

where  $n_{in}$  and  $n_{out}$  are the number of inputs and outputs as listed in Fig. 2a and  $E_w$ ,  $E_r$ ,  $t_w$ ,  $t_r$  are the energy and delay associated with the writing and reading operations on an individual MTJ. Fig. 3a depicts EDP values calculated for CMOS, SOT- and STT-MTJ-enabled spintronic circuits. The circuits highlighted with a red background, such as 'ctrl' and 'arbiter' have two to four orders of magnitude higher EDP for spintronic circuits than for CMOS. The EDP cost solely at the transducer interfaces already greatly exceeds the budget set by CMOS and implies that the MTJ performance must be drastically improved for spintronic circuits to match CMOS. However, for circuits highlighted with a green background like 'log2' and 'sqrt', the spintronic EDP is lower than CMOS allowing a margin for magnetic circuitry to be included in further MIG benchmarking.

Figure 3a clearly shows that transducers by themselves in the spintronic circuits have used up a great portion of the total energy-delay budget set by the CMOS circuits. Consequently, circuits that entail dense calculations but fewer inputs and outputs are likely to benefit from spintronic implementations. To quantify this relationship, we introduced the metric q as

$$q = \frac{(area \times delay)_{\rm cmos}}{(n_{in} + n_{out})}.$$
 (2)

which measures a circuit's internal computation density relative to the number of I/Os. q is not a metric to measure the performance of the circuits but to assist the identification of spintronic-favourable circuits and applications in future endeavors. Fig. 3b indicates that with state-of-art SOT-MTJ transducers, only circuits with q > 10 have a lower EDP than CMOS, i.e., the potential to outperform their CMOS equivalents in terms of energy-delay efficiency. Note that in this approach, the spintronic systems do not include the logic circuit itself, whereas this is included for CMOS implementations. For complex circuits, this leads to a larger advantage of spintronic circuits, which is expected to reduce when the logic circuit is considered as demonstrated in the MIG benchmarking section. For circuits where the energy-delay cost in I/O transduction already exceeds the EDP budget set by CMOS, we apportion the difference in EDP to each individual MTJ,

$$\Delta EDP_{per MTJ} = \frac{EDP_{spin} - EDP_{cmos}}{n_{in} + n_{out}}$$
(3)

leading to EDP performance upper bound of individual SOT/STT MTJs. As plotted in Fig. 3c, an average decrease of  $50 \times (SOT)$  to  $1100 \times (STT)$  is required in terms of EDP for single MTJ devices. Note that reducing writing and reading delays will have a much stronger impact on the EDP when compared to improving power consumption since a longer delay also increases energy consumption. Regarding energy consumption at transduction interfaces only (as presented in Fig. 4), in SOT-MTJ-driven spintronic circuits, on average 63% of the energy is consumed by the input transducers, while in STT-MTJ-driven circuits, 84 % of the energy is consumed at the input interfaces.

# IV. MAJORITY-INVERTER-GRAPH (MIG) BENCHMARKING

In addition to the minimum number of I/O transducers considered in uniform benchmarking, we bring into the picture the internal spintronic circuit structure and the associated energy consumption and delay overheads by means of MIGbased synthesis. All benchmark implementations are optimized to minimize the number of MAJ gates, and we assumed that MAJ and INV have infinite fan-out and cascading capability in the magnetic domain [17]. Note this is a very optimistic assumption for spintronic logic gates, as currently there is no experimental demonstration of these capabilities. Fig. 5a and 5b display a section and the full MIG of the 'ctrl' circuit, respectively, which is one of the smallest circuits in the benchmark set. Primary inputs  $(x_i)$ , majority gates  $(n_i)$ , outputs  $(y_k)$ , and inverters are depicted as blue squares, black dots, red squares, and blue lines, respectively. Other assumptions used in this benchmarking will be explained using this circuit as an example.

## A. Duplication of Inputs and Delay in the Magnetic Domain

First, as shown in Fig. 5a, each independent input  $(x_i)$  can potentially drive multiple gates at different logic depths which are defined as the maximum number of gates a signal needs to travel from the primary inputs to the destination. For instance, primary input  $x_2$  will drive majority gates  $n_{13}$ ,  $n_{18}$  and  $n_{17}$  that are at different logic depths. As illustrated in Fig. 6, since 3D magnetic signal crossing is not available, to supply the primary inputs to deeper-level gates, long magnetic interconnects are needed to bypass gates at shallower depths. To minimize the delay due to signal propagation in long magnetic interconnects, we assume a duplication of each primary input at the place it

| (a)      |                                                          |      |      |                       |          | (b)                          |                       |                       |  |
|----------|----------------------------------------------------------|------|------|-----------------------|----------|------------------------------|-----------------------|-----------------------|--|
| Circuit  | Description                                              | In   | Out  | EDP Target            |          | Metric                       | STT                   | SOT                   |  |
| 1        |                                                          | 7    | 26   | (fJ·ns)               | 14       | CD (nm)                      | 50                    | 50                    |  |
|          | Simple control unit for an arithmetic logic unit         | /    | 26   | 3.74×10 <sup>-2</sup> | 11       | $R_{stack/track}(\Omega)$    | 2100                  | 200                   |  |
|          | Look-ahead XY routing function                           | 60   | 30   | 9.72×10-2             | łΓ       | $J_{sw}$ (A/m <sup>2</sup> ) | $7.7 	imes 10^{10}$   | $110 \times 10^{10}$  |  |
|          | 11-bit integer to 4-bit mantissa/3-bit exponent float    | 11   | 7    | 2.09×10-1             | ιt       | $P_{write}(W)$               | $4.80 \times 10^{-5}$ | $2.96 \times 10^{-5}$ |  |
| dec      | Standard decoder function                                | 8    | 256  | 4.02×10-2             | ۱H       |                              |                       | 2.90 1 10             |  |
| priority | Priority encoder                                         | 128  | 8    | $1.37 \times 10^{1}$  | ۱H       | $t_{write}$ (ns)             | 5                     | 1                     |  |
| cavlc    | Context-adaptive variable-length coding                  | 10   | 11   | 1.07                  | ╷╷       | $t_{read}$ (ns)              | 3                     | 1                     |  |
| i2c      | controller (serial bus)                                  | 147  | 142  | 1.18                  | ] L      | $P_{read}$ (W)               | $1.35 \times 10^{-5}$ | $2.07 \times 10^{-5}$ |  |
| arbiter  | Blind Round Robin arbiter                                | 256  | 129  | $1.27 \times 10^{1}$  | ] ]      | Read and write pa            | th Read               | path                  |  |
| bar      | Barrel shifter                                           | 135  | 128  | 3.60×101              |          | !                            |                       | L, ۱                  |  |
| adder    | 128-bit adder                                            | 256  | 129  | $1.78 \times 10^{2}$  | 1        |                              | r.                    | : -1,                 |  |
| mem      | Memory controller                                        | 1204 | 1231 | 2.62×103              | ]        |                              |                       |                       |  |
| max      | Maximum finder in 4 x128-bit inputs                      | 512  | 130  | 6.52×10 <sup>2</sup>  | ]        |                              |                       |                       |  |
| sin      | Boolean function approximating the sinus trig. function  | 24   | 25   | 3.85×103              | ] r      |                              |                       |                       |  |
| voter    | Majority voting of 1001 bits                             | 1001 | 1    | 7.24×10 <sup>2</sup>  | 11       |                              |                       |                       |  |
| square   | 64-bit square (b = $a^2$ ) module                        | 64   | 128  | 1.10×104              | ] i      |                              | ŧ                     |                       |  |
| log2     | 32-bit logarithm with base 2                             | 32   | 32   | 1.37×105              | <b>'</b> | ΗC                           |                       | Write path            |  |
| mult     | 64-bit combinational multiplication of unsigned integers | 128  | 128  | 4.41×104              |          | 1                            |                       | · write path          |  |
| sqrt     | 128-bit square-root integer approximation                | 128  | 64   | 8.64×10 <sup>6</sup>  | ]        | STT                          |                       | SOT                   |  |

Fig. 2. (a) List of circuits with descriptions, number of primary inputs and outputs, and the Energy-Delay-Product (EDP) from CMOS synthesis. The EDP of CMOS 7 nm node technology sets the target for spintronic circuits. All CMOS reference results are provided post-synthesis by a commercial synthesis tool. The gate-level netlist is sourced from the EPFL combinational benchmark suite [18] and the CMOS 7nm node library is provided by imec and is detailed in [19]. (b) The writing energy of STT and SOT devices is determined by the switching current passing though the MTJ pillar [20] and SOT track [21], respectively. The thickness of the SOT track is 7nm. For the detection of magnetisation, both devices rely on the TMR effect by reading the final MTJ resistance with a sense amplifier (SA), The read energy per bit (SA included) is determined by assuming operating 64 bits out of a 64 Kbit memory and averaging the total energy per bit [20], [22].



Fig. 3. Uniform Benchmarking Results. (a) The comparison of EDP between CMOS circuits and SOT- and STT-MTJ mediated spintronic circuits. The circuits with lower or higher EDP compared to CMOS are highlighted in green and red, respectively. (b) q metric is defined to identify potential spintronic circuits that are more efficient than CMOS circuits. (c) The improvement in EDP required from individual MTJs to have the total EDP comparable to CMOS circuits.

is needed. The resultant number of input transducers required is summarized in Fig. 7a. Compared to the number of inputs considered in the uniform benchmarking, an average factor of  $10 \times$  is found for the analyzed circuits, which leads to a similar increase in the EDP. As shown in Fig. 7b and 7c, duplication of inputs suggested by MIG synthesis also results in the fact that, at the transducing interfaces, more than 90% of the energy is spent at the input stage for both SOT- and STT-MTJ-driven spintronic circuits, *i.e.*, improving the energy performance of the input transducers will have a larger impact on the overall EDP performance.

Second, the delay of the circuits is estimated by determining the maximum logic depth between inputs and outputs. For example, the longest path for the complete 'ctrl' circuit (marked by yellow triangles in Fig. 5b) is formed by 11 gates (8 MAJs and 3 INVs). Adopting the most common geometries proposed for a domain wall based spin torque majority gate and an inverter, as graphically depicted in Fig. 8a [11], [15], their delays are estimated to be  $t_{mai} = 4a/v_{dw}$  and  $t_{inv} =$  $2a/v_{dw}$ , where a is the critical dimension of MTJs and  $v_{dw}$ is the domain wall velocity. In this benchmarking, we assume a = 50 nm, which is the most common reported critical dimension for STT-MTJ [20] and  $v_{dw} = 750$  m/s [23], [24], a typical value for domain wall velocity reported in materials that are compatible with MTJ structures. Assuming these gates can be cascaded together directly without additional interconnects that add to the delay (Fig. 8b), the total computation time in the magnetic domain is estimated for the 'ctrl' circuit to be  $t_{mag} = 8 \times t_{maj} + 3 \times t_{inv}$ . The longest path for each circuit, i.e., the maximum number of cascaded gates, are presented as blue bars in Fig. 8c. The corresponding delay in the magnetic domain calculated using these assumptions and the total delay

MENG et al.: BENCHMARKING OF SCALED MAJORITY-LOGIC-SYNTHESIZED SPINTRONIC CIRCUITS



Fig. 4. Percentage energy consumption by inputs and outputs transducers using (a) SOT- (b) STT- MTJs.



Fig. 5. Majority Inverter Graph (MIG). (a) A small section of the MIG graph from 'ctrl' circuit. Inputs  $(x_i)$ , majority gates  $(n_j)$ , inverters, and outputs  $(y_k)$  are shown as blue squares, black dots, blue lines, and red squares, respectively. (b) The full MIG graph of the 'ctrl' circuit. The longest path is marked by yellow triangles.

in CMOS circuits are plotted as lines in Fig. 8c as well. The data shows that the computing time in spintronic circuits that



Fig. 6. Schematics illustrating the assumption of input transducer duplication.

has a large impact on the overall EDP is already one order of magnitude greater than the total delay seen in CMOS circuits even without considering the potential propagation time in the interconnects between gates. As mentioned earlier, these interconnect can be very long due to the lack of 3D magnetic signal crossing.

Taking the two additional components in magnetic circuitry revealed by MIG, *i.e.*, the increase in the number of input transducers and the delay in the magnetic domain, the EDP of spintronic circuits becomes

$$EDP_{spin} = \underbrace{(n_{mig\_in} \times E_w + n_{out} \times E_r)}_{total \ energy} \times \underbrace{(t_w + t_r + t_{mag})}_{total \ delay},$$

where  $n_{mig\_in}$  is the number of input transducers required by the MIG synthesis and  $t_{mag} = d_{maj} \times t_{maj} + d_{inv} \times t_{inv}$  is the total operation time in the magnetic domain.  $d_{maj}$  and  $d_{inv}$  are the number of majority gates and inverters on the critical path of each circuit. Note, at this stage, we assumed that propagating domain walls in the magnetic domain requires no energy, which is relevant to the logic concept based on exchange-driven domain wall automation [25]. Fig. 9a presents the EDP of both CMOS circuits and spintronic circuits. Now, the EDPs for all investigated circuits in spintronics are on average two orders of magnitude higher. As previously, we evenly distribute the EDP excess over the budget set by CMOS to all MTJs. As a result, EDP performances of SOTand STT-MTJs need to be reduced by about 790× and 8700×, respectively (see Fig. 9b).

Additionally, we calculate the area of the spintronic circuit by considering only the footprints of majority and inverter gates, whereas the area related to interconnects is neglected, thus since the MIG synthesis targeted gate count minimization, we calculate the area lower bound. The area of individual MAJ and INV is estimated as  $49a^2$  and  $15a^2$  [15], respectively as indicated in Fig. 8a. In Fig. 9c, we compare the area of CMOS and spintronic circuits for MTJ critical dimensions of 40 and 20 nm. The results indicate that even without considering the real physical layout of the circuits, the MTJ critical dimensions have to be at most 20 nm to surpass CMOS circuits in terms of area compactness. Note that area calculations are done for STT-MTJs transducers as SOT-MTJs-based implementations require an even larger footprint due to their three-terminal design.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS



Fig. 7. (a) Comparison of number of inputs considered in uniform and MIG-based benchmarking. (b)-(c) Percentage energy consumption by inputs and outputs at transducing interfaces for spintronic circuits using SOT- and STT- MTJs, respectively.



Fig. 8. (a) Delay and area assumptions for majority gates and inverters based on domain wall logic driven by STT-MTJ transducers. (b) A schematic illustrating the assumption that delay in the interconnects is not considered. (c) The maximum number of gates cascaded between inputs and outputs in different circuits. The delay in the CMOS circuits and the delay in the spintronic circuits without considering the DW propagation time in the interconnects.



Fig. 9. MIG benchmarking results considering the increased number of inputs and delay in the magnetic domain. (a) The comparison of EDP between CMOS circuits and SOT- and STT-MTJ mediated spintronic circuits. (b) The improvement in EDP required from individual MTJs to have the total EDP comparable to CMOS circuits. (c) Area comparison for CMOS and spintronic circuits. MTJ critical dimensions of 20 nm and 40 nm are assumed for the area estimation for spintronic circuits.

# B. The Energy Required to Propagate Domain Walls

Finally, we consider the energy required to propagate domain walls in the logic gates. In 2020, Luo et al., have

demonstrated SOT-current driven domain wall logic, and here we adopt the same method to calculate the energy consumption per operation of the gate, which is the power-delay product of

MENG et al.: BENCHMARKING OF SCALED MAJORITY-LOGIC-SYNTHESIZED SPINTRONIC CIRCUITS



Fig. 10. (a) Comparisons of energy consumption in the CMOS circuits with the energy consumed in the magnetic domain and the transducing interfaces in the SOT-MTJs and STT-MTJs driven spintronic circuits, respectively. (b) EDP for CMOS circuits and spintronic circuits including the energy consumption in the magnetic domain. (c) The improvement in EDP required from individual MTJs to have the total EDP comparable to CMOS circuits.

the current in the bottom Pt layer [11]. The energy required to push domain walls across one arm of the majority gate is

$$E_{arm} = \frac{\rho J^2 w h L^2}{v_{dw}}$$

where  $\rho = 30 \ \mu\Omega cm$  is the resistivity of the Pt layer [11],  $v_{dw} = 750$  m/s is the domain wall velocity, and J = $3 \times 10^{12}$  A/m<sup>2</sup> is the current density required to achieve this domain wall velocity [23], [24]. We assume that the length and width of the domain wall track are w = a and L = 2a (see Fig. 8a), and h = 5 nm is the thickness of the Pt layer [11]. It is worth noting that for a cascaded network to work, the network paths have to be resistively balanced (e.g., by making use of clipping resistors [26]) so that the same current can flow through all devices in the network. Here, we only consider the minimum current density required to push domain walls; the additional energy cost of clipping resistance is not considered. Hence, the energy required by an individual majority gate and an inverter is  $E_{mai} = 4E_{arm}$  and  $E_{inv} = E_{arm}$ , respectively. Now the total energy required to drive domain walls in the spintronic circuits is

$$E_{DW} = n_{maj} \times E_{maj} + n_{inv} \times E_{inv}$$

where  $n_{maj}$  and  $n_{inv}$  are the total number of majority gates and inverters needed to build the circuit. In Fig. 10a, for each spintronic circuit, the energy consumption to push domain walls in the logic gates ( $E_{DW}$ ) and the energy cost of SOT or STT-MTJs transducers( $E_{trans}$ ), are compared with the total energy consumption of the corresponding CMOS circuit. The energy consumption within the magnetic domain is of the same order of magnitude as the energy spent at the transducing interfaces, which leads to a further 2× EDP increase (Fig. 10b). Again, we evenly distribute the EDP excess over the budget set by CMOS to all MTJs and, as a result, the SOT- and STT-MTJs performance needs to be improved by 5800×, and 13300×, respectively (see Fig. 10c).

### V. CONCLUSION

In this work, we evaluate the challenges and targets of building Boolean spintronic circuits from the perspective of transducers and specifically focused on MTJs, the only scalable option up to date. By only considering components revealed at the MIG logic synthesis level, the EDP performance of SOT- and STT-MTJs needs to be reduced by  $\sim$  5800×, and  $\sim$  13300×, respectively, and the critical dimension of MTJs needs to be reduced to 20 nm to be more compact than CMOS circuits. It is also important to note that there are still major contributors to the EDP yet to be considered due to the lack of experimental demonstrations of the relevant capabilities. First, at the logic synthesis level, we have yet to consider the fan-out and cascading limitations of spintronic logic gates, which will lead to duplication of sections of circuits and hence an increase in the number of input transducers and logic gates required. Second, at the physical layout level, a main limitation for some of the spintronic concepts, e.g., domain walls and spin waves, is the lack of information signal crossing in magnets at the nanoscale. To layout such circuits without any line crossover, duplication of circuits and long interconnects are expected [27], which will add significant delay and energy costs. In our benchmarking, no energy or delay related to the interconnects is considered.

In conclusion, a synergy of effort from various aspects is required to build efficient spintronic circuits. First, efficient transducers are vital in the construction of spintronic circuits. The performance of MTJs must be strongly enhanced to be considered viable choices in the traditional Boolean logic architecture. Voltage-based transducers may be able to bridge this gap [10], [28]. As important as transducers are, the delay in the magnetic domain must be drastically improved. In addition to the enhancement of the speed of information carriers, such as increasing the domain wall velocities, efforts need to be put into minimizing the interconnect length, which will require the abilities of fan-out, cascading, and signal crossing of magnetic information carriers, which are not well addressed in the current literature. Wave-pipelining is one option for increasing the throughput of spintronic circuits [29], however, it requires uniform propagation delay between any gates from two logic depths, *i.e.*, the same interconnect length, which demands more stringent design flexibility of interconnects. Spintronic concepts such as MESO [10] that only require charge interconnects are preferred. In the magnetic domain, the energy required to propagate domain walls is also a significant contributor to overall energy consumption. Fundamental studies on reducing the current density while maintaining

8

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS

high speed as well as new mechanisms to propagate magnetic information are in demand. More computing paradigms explorations (e.g., analog [10], [30], approximate [31], neuromorphic computing [32]), time-domain computing [33], [34], [35] that can exploit the non-volatility, stochasticity, and the ability to handle continuous signals of of spintronic devices, while requiring a large number of calculations and limited interaction with the environment, may compare more favorably with CMOS circuits. However, the performance of spintronic devices in these computing paradigms also depends on the transducer efficiency and their ability to be integrated with the relevant circuit architectures. Therefore, further research is needed to fully explore and optimize spintronic devices and assess them with the relevant benchmarks for other computing paradigms.

#### REFERENCES

- L. Belkhir and A. Elmeligi, "Assessing ICT global emissions footprint: Trends to 2040 & recommendations," *J. Cleaner Prod.*, vol. 177, pp. 448–463, Mar. 2018.
- [2] P. P. Gelsinger, "Microprocessors for the new millennium: Challenges, opportunities, and New Frontiers," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2001, pp. 22–25.
- [3] D. E. Nikonov and I. A. Young, "Benchmarking of beyond-CMOS exploratory devices for logic integrated circuits," *IEEE J. Explor. Solid-State Comput. Devices Circuits*, vol. 1, pp. 3–11, 2015.
- [4] A. Chen, "Beyond-CMOS roadmap—From Boolean logic to neuroinspired computing," *Jpn. J. Appl. Phys.*, vol. 61, no. SM, Oct. 2022, Art. no. SM1003.
- [5] B. Radisavljevic, A. Radenovic, J. Brivio, V. Giacometti, and A. Kis, "Single-layer MoS<sub>2</sub> transistors," *Nature Nanotechnol.*, vol. 6, no. 3, pp. 147–150, Mar. 2011.
- [6] C. H. Ahn, J.-M. Triscone, and J. Mannhart, "Electric field effect in correlated oxide systems," *Nature*, vol. 424, no. 6952, pp. 1015–1018, Aug. 2003.
- [7] C. J. Dorow, J. R. Leonard, M. M. Fogler, L. V. Butov, K. W. West, and L. N. Pfeiffer, "Split-gate device for indirect excitons," *Appl. Phys. Lett.*, vol. 112, no. 18, Apr. 2018, Art. no. 183501.
- [8] B. Dieny et al., "Opportunities and challenges for spintronics in the microelectronics industry," *Nature Electron.*, vol. 3, no. 8, pp. 446–459, Aug. 2020.
- [9] S. Sivasubramani, V. Mattela, C. Pal, and A. Acharyya, "Nanomagnetic logic design approach for area and speed efficient adder using ferromagnetically coupled fixed input majority gate," *Nanotechnology*, vol. 30, no. 37, Sep. 2019, Art. no. 37LT02.
- [10] S. Manipatruni, D. E. Nikonov, and I. A. Young, "Beyond CMOS computing with spin and polarization," *Nature Phys.*, vol. 14, no. 4, pp. 338–343, Apr. 2018.
- [11] Z. Luo et al., "Current-driven magnetic domain-wall logic," *Nature*, vol. 579, no. 7798, pp. 214–218, Mar. 2020.
- [12] A. Barman and G. Gubbiotti, "The 2021 magnonics roadmap," J. Phys., Condens. Matter, vol. 33, no. 41, 2021, Art. no. 413001.
- [13] E. Raymenants et al., "All-electrical control of scaled spin logic devices based on domain wall motion," in *IEDM Tech. Dig.*, Dec. 2020, pp. 21.5.1–21.5.4.
- [14] D. E. Nikonov and I. A. Young, "Uniform methodology for benchmarking beyond-CMOS logic devices," in *IEDM Tech. Dig.*, Dec. 2012, pp. 25.4.1–25.4.4.

- [15] D. E. Nikonov, G. I. Bourianoff, and T. Ghani, "Proposal of a spin torque majority gate logic," *IEEE Electron Device Lett.*, vol. 32, no. 8, pp. 1128–1130, Aug. 2011.
- [16] O. Zografos et al., "Design and benchmarking of hybrid CMOS-spin wave device circuits compared to 10nm CMOS," in *Proc. IEEE 15th Int. Conf. Nanotechnol. (IEEE-NANO)*, Jul. 2015, pp. 686–689.
- [17] L. Amarú, P.-E. Gaillardon, and G. De Micheli, "Majority-based synthesis for nanotechnologies," in *Proc. 21st Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Jan. 2016, pp. 499–502.
- [18] L. Amarú, P.-E. Gaillardon, and G. De Micheli, "The EPFL combinational benchmark suite," in *Proc. Int. Workshop Log. Synth. (IWLS)*, 2015, pp. 1–5.
- [19] P. Raghavan et al., "Holisitic device exploration for 7nm node," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2015, pp. 1–5.
- [20] S. Sakhare et al., "Enablement of STT-MRAM as last level cache for the high performance computing domain at the 5nm node," in *IEDM Tech. Dig.*, Aug. 2018, p. 18.
- [21] S. Couet et al., "BEOL compatible high retention perpendicular SOT-MRAM device for SRAM replacement and machine learning," in *Proc. Symp. VLSI Technol.*, Jun. 2021, pp. 1–2.
- [22] M. Gupta and M. Perumkunnil, "High-density SOT-MRAM technology and design specifications for the embedded domain at 5nm node," in *IEDM Tech. Dig.*, Dec. 2020, p. 24.
- [23] S.-H. Yang, K.-S. Ryu, and S. Parkin, "Domain-wall velocities of up to 750 m -1 driven by exchange-coupling torque in synthetic antiferromagnets," *Nature Nanotechnol.*, vol. 10, no. 3, pp. 221–226, Mar. 2015.
- [24] R. Bläsing et al., "Magnetic racetrack memory: From physics to the cusp of applications within a decade," *Proc. IEEE*, vol. 108, no. 8, pp. 1303–1321, Aug. 2020.
- [25] D. E. Nikonov, S. Manipatruni, and I. A. Young, "Cascade-able spin torque logic gates with input–output isolation," *Phys. Scripta*, vol. 90, no. 7, Jun. 2015, Art. no. 074047.
- [26] A. Vaysset, O. Zografos, M. Manfrini, D. Mocuta, and I. P. Radu, "Wide operating window spin-torque majority gate towards large-scale integration of logic circuits," *AIP Adv.*, vol. 8, no. 5, May 2018, Art. no. 055920.
- [27] A. Mahmoud et al., "Would magnonic circuits outperform CMOS counterparts?" in Proc. Great Lakes Symp. (VLSI), Jun. 2022, pp. 309–313.
- [28] B. Prasad et al., "Ultralow voltage manipulation of ferromagnetism," Adv. Mater., vol. 32, no. 28, Jul. 2020, Art. no. 2001943.
- [29] O. Zografos et al., "Wave pipelining for majority-based beyond-CMOS technologies," in *Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE)*, Mar. 2017, pp. 1306–1311.
- [30] S. Fukami, W. A. Borders, A. Kurenkov, C. Zhang, S. DuttaGupta, and H. Ohno, "Use of analog spintronics device in performing neuro-morphic computing functions," in *Proc. 5th Berkeley Symp. Energy Efficient Electron. Syst. Steep Transistors Workshop (E3S)*, Oct. 2017, pp. 1–3.
- [31] A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui, and S. Cotofana, "Spin wave based approximate computing," *IEEE Trans. Emerg. Topics Comput.*, vol. 10, no. 4, pp. 1932–1940, Oct. 2022.
- [32] J. Grollier, D. Querlioz, K. Y. Camsari, K. Everschor-Sitte, S. Fukami, and M. D. Stiles, "Neuromorphic spintronics," *Nature Electron.*, vol. 3, no. 7, pp. 360–370, Mar. 2020.
- [33] Y. Zhang et al., "Time-domain computing in memory using spintronics for energy-efficient convolutional neural network," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 3, pp. 1193–1205, Mar. 2021.
- [34] J. Yang et al., "TIMAQ: A time-domain computing-in-memory-based processor using predictable decomposed convolution for arbitrary quantized DNNs," *IEEE J. Solid-State Circuits*, vol. 56, no. 10, pp. 3021–3038, Oct. 2021.
- [35] J. Wang et al., "Reconfigurable bit-serial operation using toggle SOT-MRAM for high-performance computing in memory architecture," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 69, no. 11, pp. 4535–4545, Nov. 2022.