## The Combined Effect of Process Variations and Power Supply Noise ## on Clock Skew and Jitter Hu Xu\*, Vasilis F. Pavlidis\*, Wayne Burleson<sup>†</sup>, and Giovanni De Micheli\* \* Integrated Systems Laboratory, EPFL, Switzerland † Department of Electrical and Computer Engineering, University of Massachusetts, USA Email: \*{hu.xu, vasileios.pavlidis, giovanni.demicheli}@epfl.ch, †burleson@ecs.umass.edu Abstract—In modern VLSI circuits, a large number of clock buffers are inserted in clock distribution networks, which are significantly affected by process and power supply noise variations. The combined effect of process variations and power supply noise on clock skew and jitter is investigated in this paper. A statistical model of skitter, which consists of skew and jitter, is proposed. Clock paths with different buffer insertion strategies are compared in terms of skew and jitter. The tradeoffs among the constraints on clock jitter, skew, slew rate, and power are discussed. For strict timing constraints, severe power overhead ( $\geq 110\%$ ) has to be added to obtain a low improvement in the worst case skitter and slew rate ( $\leq 13\%$ ). The effect of widely-used techniques, such as recombinant trees and dynamic voltage scaling, on decreasing skitter is also investigated. *Index Terms*—Clock distribution network, clock jitter, clock skew, skitter, process variations, power supply noise #### I. INTRODUCTION In modern VLSI systems, the clock distribution networks are significantly affected by different sources of variations. These variations can be introduced in the design stage, the fabrication stage, and during operation [1]. The resulting clock uncertainty (due to the clock distribution network) includes the difference of delay between different clock paths and along the same clock path known as clock skew and clock jitter, respectively. High clock frequencies severely constrain the available timing margin for processing. Precise models of skew and jitter to analyze the contributions of these sources of uncertainty and explore different tradeoffs in area and power are, consequently, useful. There is a plethora of methods to manage the excessive clock skew in the design phase [2], [3]. Careful physical design, however, does not guarantee the elimination of undesirable skew since the unwanted skew can be introduced by process variations in the fabrication stage [4]. The model of process variations includes *die-to-die* (D2D) and *within-die* (WID) variations [5]. D2D variations affect the devices within one die uniformly, while WID variations affect these devices randomly or systematically. Process-induced skew has extensively been modeled and the skew variation has shown to significantly affect the system performance [6], [7]. Clock jitter is the deviation of the edge of clock signal from the ideal temporal occurrences. Clock jitter can be described This work is funded in part by the Swiss National Science Foundation (No. 260021\_126517/1), European Research Council Grant (No. 246810 NANOSYS), and Intel Braunschweig Labs, Germany. in three ways: period jitter, cycle-cycle jitter, and phase jitter (or time interval error) [1]. Period jitter is defined as the difference between the measured period of one clock cycle and the ideal period, which is the most explicit description of the clock jitter within a circuit. Period jitter is produced by the Phase-Locked Loop (PLL) and the clock distribution network. PLL jitter is mitigated by careful PLL design [8]. The period jitter produced in clock distribution networks is due to the power supply noise on the clock buffers [9]. The effect of the power supply noise on period jitter is analyzed in [10]–[13]. In all these publications, the effect of process variations on clock skew and the effect of power supply noise on clock jitter are, separately, investigated. Nevertheless, clock distribution networks are simultaneously affected by these variations. The actual clock period for data transfer is determined by both clock skew and jitter. A statistical timing analysis method considering clock jitter and skew is proposed in [14], where the actual distribution of power supply noise is required. The contribution of skew and jitter on clock distribution networks, however, is not explored. The term "skitter" from [15] is utilized to describe both the clock skew and jitter. A subcircuit designed to measure the skitter in operation is described in [15]. This subcircuit is used in the design of the circuit architecture to mitigate undesired skew and jitter during operation [16]. Although clock skew and jitter must be cohesively treated as discussed in [14]–[16], the combined effect of process variations and power supply noise on clock distribution networks has not been thoroughly explored. This effect is investigated in terms of skitter in this paper. If the skew and jitter variations are high, the recovery and adaptation procedures have to be frequently executed at the architecture level to ensure correct data transfer [16]. Moreover, these architectural procedures cannot be used for each pair of clock sinks. Consequently, mitigating the negative effect of skitter is important for the design of robust clock distribution networks. This paper explores the relative impact of skew and jitter and explores methods to lower each and the overall effect on the clock period. Analytic models are used to describe the distribution of skitter. Simulation methods are used to obtain the delay variation of one buffer stage. The first droop of the power supply noise is investigated, since this first droop noise is, typically, the worst supply noise [10]–[12]. The Fig. 1. Clock period jitter and skew between two clock paths. The clock paths and FFs are illustrated in (a). The corresponding waveforms of clock signals are illustrated in (b). proposed methodology can be integrated into the computeraided design flow of clock trees to design a robust clock distribution network. The main contributions of this paper are: - A statistical model for skitter including both skew and jitter is proposed and verified with Monte-Carlo simulations. - The results of buffer insertion are compared where process variations and power supply noise are considered separately and simultaneously. - The tradeoffs between skitter constraints, slew rate, and power consumption of clock distribution networks is presented. - The effect of recombinant trees and *dynamic voltage scaling* (DVS) on decreasing skitter is analyzed. The remainder of the paper is organized as follows. The notation for the clock period and skew is introduced in the following section. A statistical model to describe skitter considering both process variations and power supply noise is also presented. A methodology to obtain the delay variation of a buffer stage is presented in Section III. Simulation results and a comparison of different approaches of clock buffer insertion are presented in Section IV. The tradeoff between clock skitter and power consumption, the effect of recombining clock paths and the DVS mechanism on decreasing skitter are also discussed. The conclusions are drawn in Section V. # II. MODEL OF SKITTER UNDER PROCESS VARIATIONS AND POWER SUPPLY NOISE The definition of the clock skew, period jitter, and skitter in this paper are illustrated in Fig. 1. The clock signal is fed into the clock tree from the primary clock driver. Two flip-flops are driven by this clock signal, denoted as $FF_1$ and $FF_2$ in Fig. 1(a). The corresponding waveforms are illustrated in Fig. 1(b). The waveforms $clk_1$ and $clk_2$ denote the clock signal driving $FF_1$ and $FF_2$ , respectively. Assuming the time where the $i^{\text{th}}$ rising edge arrives at clock input is zero, the time where this edge arrives at $FF_1$ and $FF_2$ is, respectively, denoted by $t_{1,i}$ and $t_{2,i}$ . The number of buffers before *point of divergence* (POD) is $n_p$ . The numbers of buffers from the clock input to $FF_1$ and $FF_2$ are denoted by $n_1$ and $n_2$ , respectively. The skew between the $i^{\rm th}$ edge of $clk_1$ and $clk_2$ is $S_{1,2}(i)$ . The ideal clock period is $T_{\rm clk}$ . The measured clock periods after the $i^{\rm th}$ edge for FF<sub>1</sub> and FF<sub>2</sub> are $T_1$ and $T_2$ , respectively. The corresponding period jitters are $J_1 = T_1 - T_{\rm clk}$ and $J_2 = T_2 - T_{\rm clk}$ . Assuming the data is propagated from FF<sub>1</sub> to FF<sub>2</sub> within one clock cycle, $T_{1,2}$ is the resulting time interval that affects the clock frequency of the circuit. Consequently, the variation of $T_{1,2}$ is denoted as skitter $J_{1,2}$ , $$J_{1,2} = T_{1,2} - T_{\text{clk}} = t_2(i+1) - t_1(i) - T_{\text{clk}}$$ = $S_{1,2}(i) + J_2$ . (1) As shown in (1), the effective time window $T_{1,2}$ is determined by $J_{1,2}$ , which is the sum of the skew $S_{1,2}(i)$ and the period jitter $J_2$ along clock path 2. Simultaneously modeling the skew and jitter can more accurately determine delay uncertainty. The skitter $J_{1,2}$ is the sum and difference of the delay of buffer stages, $$J_{1,2} = \sum_{k=1}^{n_2} d_{2,k}(i+1) - \sum_{k=1}^{n_1} d_{1,k}(i) - T_{\text{clk}}, \tag{2}$$ where $d_{1,k}(i)$ is the delay of the $k^{\text{th}}$ buffer stage along the path to FF<sub>1</sub> for the $i^{\text{th}}$ clock edge. As modeled in [10], $d_{1,k}(i)$ can be approximated by a non-recursive formula for the first-droop power supply noise, $$d_{1,k}(i) = \mathcal{F}\left(v_k(i), \vec{P}\right),\tag{3}$$ $$v_k(i) \approx v_{\text{noise}} \left( t_0(i) + \frac{k \left( d_{\mathsf{r}}(v_1(i)) + d_{\mathsf{f}}(v_1(i)) \right)}{2} \right), \quad (4)$$ $$v_{\text{noise}}(t) = V_{\text{n}} \sin(2\pi f_{\text{n}} t + \phi), \tag{5}$$ where $v_k(i)$ is the voltage noise which affects the $k^{\text{th}}$ buffer stage when the $i^{\text{th}}$ clock edge arrives at this stage. The power supply noise $v_{\text{noise}}$ is modeled as a sinusoidal waveform with amplitude $V_n$ , frequency $f_n$ , and initial phase $\phi$ . This deterministic model is widely used to describe the first droop of the power supply noise, which is considered as the worst supply noise in a circuit [10], [12]. Other faster and more erratic droops of the supply noise can be included as random variables with probabilistic formulations, similar to process variations. The delay of a buffer stage under $v_1(i)$ for a rising and falling input is denoted by $d_{\rm r}(v_1(i))$ and $d_{\rm f}(v_1(i))$ , respectively. The set of parameters affected by process variations is denoted by $\vec{P}$ . For instance, if the variations in channel length and threshold voltage are considered, $\vec{P} = \{L_{\rm eff}, V_{\rm th}\}$ . The expression for $\mathcal{F}\left(v_k(i), \vec{P}\right)$ is obtained by quadratic fitting as presented in Section III. This expression can be approximated as a Gaussian distribution if the parameters in $\vec{P}$ are described by Gaussian distribution. The distribution of $J_{1,2}$ is, therefore, approximated as a Gaussian distribution from (2) and (3). The mean value and the standard deviation of $J_{1,2}$ are discussed separately. • Mean value of skitter $\mu_{J_{1,2}}$ . The term $J_{1,2}$ can be expressed as the difference of the delay of the $i+1^{\rm th}$ and $i^{\rm th}$ clock edges, $$J_{1,2} \sim \mathcal{N}(\mu_{J_{1,2}}, \sigma_{J_{1,2}}^2),$$ (6) $$\mu_{J_{1,2}} = \sum_{k=1}^{n_2} \mu_{d_{2,k}(i+1)} - \sum_{k=1}^{n_1} \mu_{d_{1,k}(i)} - T_{\text{clk}}.$$ (7) Standard deviation of the skitter σ<sub>J1,2</sub>. The variation on J<sub>1,2</sub> is determined by both the D2D and WID variations, which are independent from each other. All the devices are affected by D2D variations uniformly. The WID variations on different devices consist of random and systematic components [5], [7], [17]. $$\begin{split} \sigma_{J_{1,2}}^{2} &= \sigma_{J_{1,2}}^{2} + \sigma_{J_{1,2}}^{2}} + \sigma_{J_{1,2}}^{2}. \\ \sigma_{J_{1,2}^{\text{DDD}}} &= \sum_{k=1}^{n_{2}} \sigma_{d_{2,k}^{\text{DDD}}(i+1)} - \sum_{k=1}^{n_{1}} \sigma_{d_{1,k}^{\text{DDD}}(i)}, \\ \sigma_{J_{1,2}^{\text{WID}}}^{2} &= \sum_{k=1}^{n_{2}} \sigma_{d_{2,k}^{\text{WID}}(i+1)}^{2} + \sum_{k=1}^{n_{1}} \sigma_{d_{1,k}^{\text{WID}}(i)}^{2} \\ &+ 2 \sum_{k=1}^{n_{2}-1} \sum_{h=k+1}^{n_{2}} \text{Cov} \left[ d_{2,k}^{\text{WID}}(i+1), d_{2,h}^{\text{WID}}(i+1) \right] \\ &+ 2 \sum_{k=1}^{n_{1}-1} \sum_{h=k+1}^{n_{1}} \text{Cov} \left[ d_{1,k}^{\text{WID}}(i), d_{1,h}^{\text{WID}}(i) \right] \\ &- 2 \sum_{k=1}^{n_{2}} \sum_{h=1}^{n_{1}} \text{Cov} \left[ d_{2,k}^{\text{WID}}(i+1), d_{1,h}^{\text{WID}}(i) \right], \\ &\text{Cov}(a,b) = \text{corr}(a,b) \sigma_{a} \sigma_{b}. \end{split} \tag{8}$$ Assuming the number of buffers before POD is $n_p$ , for $k \leq n_p$ , corr $\left[d_{2,k}^{\text{WID}}(i+1), d_{1,k}^{\text{WID}}(i)\right] = 1$ . Other correlations in (11) are determined based on the given models. For instance, WID variations can be modeled as independent [18] or spatially-correlated [17], [19]. ## III. DELAY VARIATION OF A BUFFER STAGE The delay variation of a buffer stage due to process variations and the power supply noise is discussed in this section. For a clock tree with uniform clock buffer insertion, the input slew rate and load of each buffer stage are similar. Consequently, the delay variation of a buffer stage can be evaluated with an elemental circuit, as illustrated in Fig. 2. The investigated buffer stage is depicted with a dashed rectangle in Fig. 2. The interconnect between two buffers is modeled as an RLC $\pi$ network. The power supply to buffers $b_0$ , $b_1$ , and $b_2$ can be adapted to model the power supply noise $v_{\rm noise}$ . By measuring the delay variation from pin A to pin B, the effect of process variations under different power supply noise can be described. Fig. 2. A circuit used to measure the delay variation of one buffer stage due to process variations and power supply noise. Fig. 3. The mean and standard deviation of the delay of a buffer stage. The mean and standard deviation of the delay of a buffer stage are illustrated in Fig. 3, respectively. In this example, a clock buffer is an inverter, based on a PTM 32 nm CMOS model [20]. In this example, a clock buffer is an inverter, which is based on a PTM 32 nm CMOS model [20]. The supply voltage is $V_{\rm dd} + \Delta V_{\rm dd}$ , where $V_{\rm dd} = 0.9$ V is the nominal supply voltage. As shown in Fig. 3(a), the delay of the buffer stage for a rising and falling input edge is denoted by, $d_{\rm r}$ and $d_{\rm f}$ , respectively. The mean delay $\mu_{d_{\rm f}}$ decreases with $v_{\rm noise}$ much faster than $\mu_{d_{\rm r}}$ . In Fig. 3(b), $\sigma_{d_{\rm r}}$ and $\sigma_{d_{\rm f}}$ also decrease with $\Delta V_{\rm dd}$ . Consequently, a higher $V_{\rm dd}$ can produce lower mean and standard deviation of the delay of a clock buffer stage. Both $\mu_{d_k(i)}$ and $\sigma_{d_k(i)}$ under different power supply noise can be obtained by polynomial fitting from SPICE based Monte-Carlo simulations [10]. Considering $\Delta V_{\rm dd} = v_{\rm noise}$ , the delay variation of a buffer stage is approximated by a second- TABLE I DIFFERENT BUFFER INSERTION STRATEGIES FOR AN INTERCONNECT. | # Buffers | 10 | 14 | 20 | 30 | 40 | 50 | 60 | |--------------------|------|-----|-----|-----|-----|-----|-----| | length [ $\mu m$ ] | 1000 | 714 | 500 | 333 | 250 | 200 | 167 | | $\min W_n [\mu m]$ | 1.8 | 1.5 | 1.2 | 0.9 | 0.9 | 0.9 | 0.6 | order polynomial, $$y = a_2 v_{\text{noise}}^2 + a_1 v_{\text{noise}} + a_0.$$ (12) With the expressions for the delay variation of one buffer stage, the skitter $J_{1,2}$ is obtained from (1) through (10). The $\mu_{d_{1,k}(i)}, \mu_{d_{2,k}(i+1)}, \sigma_{d_{1,k}(i)}$ , and $\sigma_{d_{2,k}(i+1)}$ used in (7) to (10) are obtained by the fitting expressions (12). The voltage noise $v_{\text{noise}}$ is determined through (4) and (5). #### IV. SIMULATIONS AND DISCUSSION Two paths of a clock tree with clock buffers inserted are simulated and discussed in this section. The electrical parameters of the transistors are based on a 32 nm PTM model [20]. The variation in channel length ( $\sigma^{D2D}=3\%\mu$ and $\sigma^{WID}=5\%\mu$ based on ITRS data [21]) is considered in the simulations. Note that different sources of variations can be modeled by the proposed modeling approach. The parameters of the interconnects are based on an Intel 32 nm interconnect technology [10]. The resistance, inductance, and capacitance of the interconnects per unit length are 388.007 $\Omega$ /mm, 68.683 fF/mm, and 1.768 nH/mm, respectively. The skitter including skew and period jitter between two paths of a clock tree are investigated. Considering two clock paths with a length of 10 mm, seven cases of buffer insertion are investigated, as listed in Table I. The maximum size of the investigated nMOS transistors is assumed to be 22.5 $\mu m$ . The size of the pMOS transistors is twice the $W_n$ to produce close to equal rise and fall times. The accuracy of the proposed methodology to estimate $J_{1,2}$ is verified in the following subsection. The case, where the paths contain 20 buffers with $W_n=3~\mu m$ , is taken as an example. The result is compared with SPICE based Monte-Carlo simulations [22]. The efficiency of different buffer insertion cases in reducing $J_{1,2}$ is then discussed. The buffer insertion can be driven by considering 1) only process variations, 2) only the power supply noise, and 3) both of these two effects, respectively. The skitter of the interconnects with different length, the tradeoff between power consumption and skitter, and the effect of recombining clock paths and DVS in reducing skitter are also presented. #### A. Accuracy of the Proposed Methodology The accuracy of $J_{1,2}$ obtained from (1) through (12) is verified in this section. Twenty buffers are inserted along one interconnect. For a fixed $\phi = \frac{3}{2}\pi$ in (5), three cases of $n_p$ are examined, $n_p = 0, 10, 18$ . The estimated $\mu_{J_{1,2}}$ and $\sigma_{J_{1,2}}$ and the results from Monte-Carlo simulations are shown in Table II. The mean delay from the proposed model and the Monte-Carlo simulations are denoted by $\mu_{\rm M}$ and $\mu_{\rm MC}$ , respectively. As reported in Table II, for all the three cases of $n_p$ , the proposed TABLE II COMPARISON BETWEEN THE PROPOSED MODELING METHOD AND MONTE-CARLO SIMULATIONS. | $\overline{n_p}$ | $\mu_{\rm M}$ [ps] | $\mu_{\rm MC}$ [ps] | $\mu\%$ | $\sigma_{\rm M}$ [ps] | $\sigma_{\rm MC}$ [ps] | $\sigma\%$ | |------------------|--------------------|---------------------|---------|-----------------------|------------------------|------------| | 0 | -57.63 | -54.2 | 5.4% | 32.9 | 29.5 | 11.4% | | 10 | -57.63 | -55.0 | 3.8% | 22.4 | 22.5 | -0.2% | | 18 | -57.63 | -54.6 | 4.6% | 12.4 | 11.9 | 3.9% | Fig. 4. $\mu_{J_1,2}$ and $\sigma_{J_1,2}$ from the proposed modeling method and Monte-Carlo simulations (named by "MC"). model exhibits reasonably high accuracy (below 5.4% for $\mu$ and below 11.4% for $\sigma$ ). For $n_p=0$ , different initial noise phases $\phi$ are also examined. The $\mu_{J_{1,2}}$ and $\sigma_{J_{1,2}}$ from the proposed model and the Monte-Carlo simulations (MC) are illustrated in Fig. 4. Since $J_{1,2}$ is approximated as a Gaussian distribution based on (1) - (12), the probability for $J_{1,2}$ to lie within the range $[\mu-3\sigma,\mu+3\sigma]$ is 99.7%. The negative $J_{1,2}$ with the maximum absolute value can be expressed as $\max(J_{1,2})=\mu_{J_{1,2}}-3\sigma_{J_{1,2}}$ , which results in the shortest time period for data transfer. The $\max(J_{1,2})$ from the proposed model and the Monte-Carlo simulations is also illustrated in Fig. 4. As shown in this figure, the proposed modeling method produces reasonable accuracy for different $\phi$ . The worst $\max(J_{1,2})$ or the worst case period jitter (WJ), occurs where $\phi = \frac{3}{2}\pi$ (270°). This behavior is consistent with the conclusion made in [10], when $f_{\rm n} \ll f_{\rm clk}$ . Consequently, $\phi = \frac{3}{2}\pi$ is utilized and $J_{1,2}$ implies WJ in the remainder of this paper. In this case, $\mu_{J_{1,2}}$ and $\max(J_{1,2})$ are both negative and are described with absolute values for clarity. ## B. Different Objectives for Buffer Insertion The three objectives previously mentioned for performing buffer insertion are compared in this section. The resulting number and size of buffers are also presented. The slew rate (rise time) for different buffer insertions is investigated, as shown in Fig. 5. Since the rise time for 10 inverters is greater than 75 ps, these solutions are not considered in the following analysis. 1) Buffer Insertion under Process Variations: There are plenty of works focusing on buffer insertion considering process variations [23], [24]. In these methodologies, the buffers are inserted to reduce both the delay and power while alleviating the delay uncertainty due to process variations. All the buffer stages are considered to be supplied with a constant Fig. 5. Mean slew rate for different buffer insertion under process variations and power supply noise. Fig. 6. $\sigma_{J_{1,2}}$ for different buffer insertion under process variations. $V_{\rm dd}$ (instead of using (12) to determine the distribution of the delay and skew). Consequently, the period jitter $J_1$ and $J_2$ in Fig. 1(b) are neglected. The variation of skew $S_{1,2}$ determines $J_{1,2}$ . The buffer insertion cases and the resulting $\sigma_{J_{1,2}}$ from Monte-Carlo simulations are illustrated in Fig. 6, where only process variations are considered. The lowest $\sigma_{1,2}$ is achieved for 14 buffers with $W_n=12\,\mu\mathrm{m}$ . The resulting minimum $\sigma_{1,2}$ is 21.17 ps. 2) Buffer Insertion under Power Supply Noise: There are also existing works on buffer insertion considering the power supply noise [10]. In these works, the effect of the power supply noise on clock jitter is modeled and buffers are inserted such that this effect is suppressed. In this case, process variations are not considered. Consequently, $S_{1,2}$ and $J_{1,2}$ are constant for a given power supply noise scenario. The WJ from SPICE based simulations for different numbers and sizes of buffers is illustrated in Fig. 7. The lowest WJ is achieved by 14 buffers but with $W_n=7.5\,\mu\mathrm{m}$ . The resulting minimum $\mu_{J_{1,2}}$ is 36.2 ps. The solution with fewer buffers produces lower WJ. 3) Buffer Insertion under both Process Variations and Power Supply Noise: Since the process variations and power supply noise coexist in a real circuit, investigating the combined effect of these variations is necessary. Skitter $J_{1,2}$ combining $S_{1,2}$ and $J_2$ can be obtained from (1) to (12). In Fig. 7. WJ<sub>1,2</sub> for different buffer insertion under power supply noise. Fig. 8. $J_{1,2}$ for different buffer insertion under process variations and power supply noise. (a) is the maximum $J_{1,2}$ . The max and min difference on $\sigma_{J_{1,2}}$ between PV only and PV&PSN is shown in (b). this case, both the effect of process and voltage variations are considered to determine the size and number of buffers. The $\max(J_{1,2})$ from Monte-Carlo simulations for different buffer solutions is illustrated in Fig. 8(a). In this example, the minimum $\mu_{J_{1,2}}$ , $\sigma_{J_{1,2}}$ , and $\max(J_{1,2})$ from different buffer insertions are 35.7 ps, 22.36 ps, and 102.98 ps, respectively. The corresponding solutions are 14 buffers with $W_n=7.5\,\mu\mathrm{m},12\,\mu\mathrm{m},12\,\mu\mathrm{m}$ , respectively. The solution with fewer buffers, again, produces lower $J_{1,2}$ . The comparison in $\mu_{J_{1,2}}$ and $\sigma_{J_{1,2}}$ between the proposed model and Monte-Carlo simulations for different numbers of buffers ( $W_n=7.5\,\mu\mathrm{m}$ ) is reported in Table III. As reported in this table, for the clock TABLE III COMPARISON BETWEEN THE PROPOSED MODELING METHOD AND MONTE-CARLO SIMULATIONS FOR DIFFERENT NUMBERS OF BUFFERS. | # buf | μ <sub>M</sub> [ps] | $\mu_{\rm MC}$ [ps] | $\mu\%$ | $\sigma_{\rm M}$ [ps] | $\sigma_{\rm MC}$ [ps] | $\sigma\%$ | |-------|---------------------|---------------------|---------|-----------------------|------------------------|------------| | 14 | -33.3 | -35.7 | -6.9% | 22.5 | 22.9 | -1.8% | | 20 | -39.4 | -43.3 | -9.1% | 22.5 | 23.9 | -5.8% | | 30 | -51.2 | -57.1 | -10.3% | 25.7 | 28.1 | -8.4% | | 40 | -64.0 | -69.0 | -7.3% | 29.8 | 26.2 | 14.0% | | 50 | -73.9 | -77.7 | -4.8% | 33.0 | 28.9 | 14.0% | | 60 | -80.8 | -82.7 | -2.3% | 34.7 | 30.6 | 13.4% | paths with different numbers of buffers, the proposed model exhibits reasonable accuracy (below 10% for $\mu$ and below 14% for $\sigma$ ). For clarity, the skitter is described by the results from Monte-Carlo simulations in the remainder of this section. Comparing the results of the three considerations for buffer insertion, it is shown that under process and voltage variations, the mean of the resulting $J_{1,2}$ is dominated by power supply noise (the difference in $\mu_{J_{1,2}}$ between considering power supply noise only (PSN) and considering both process variations and power supply noise (PV&PSN) is typically below 2%). This behavior is because $\mu_{J_{1,2}}$ is the linear combination of the mean delay of each buffer stage as expressed by (7), which is determined by the power supply noise, as illustrated by Fig. 3(a). The difference between the $\sigma_{J_{1,2}}$ considering process variations only (PV) and PV&PSN is reported in Fig. 8(b). The nonnegligible $\Delta\sigma_{J_{1,2}}$ is reported for the clock paths with different numbers of buffers. The $\Delta\sigma_{J_{1,2}}$ for 14 buffers is the highest, although it is the optimum solution for all the three objectives. Modeling PV and PSN simultaneously is, therefore, necessary to estimate the variation of $J_{1,2}$ . #### C. Skew and Jitter for Various Lengths of the Clock Path The global clock paths, which are typically long, are investigated in the previous sections. As the length of the clock path changes, the clock skew and jitter also differ. The skew and jitter with different lengths of clock paths are discussed in this section. An example of clock skew and jitter for different interconnect length is illustrated in Fig. 9. The same buffers $(W_n=3\,\mu\mathrm{m})$ are inserted at the same distance (500 $\mu\mathrm{m}$ ) for all the clock paths. The ideal clock period $(T_{\mathrm{clk}}=312.5~\mathrm{ps})$ is denoted by the dashed line. The actual mean $(T_{\mathrm{clk}}-\mu_{\mathrm{J}_{1,2}})$ , the highest $(T_{\mathrm{clk}}-\mu_{\mathrm{J}_{1,2}}+3\sigma_{\mathrm{J}_{1,2}})$ , and the lowest $(T_{\mathrm{clk}}-\mu_{\mathrm{J}_{1,2}}-3\sigma_{\mathrm{J}_{1,2}})$ periods within 99.7% confidence range are denoted by $\Diamond$ , $\blacksquare$ , and $\blacktriangle$ , respectively. As shown in Fig. 9, the skitter increases with the length of the clock path, given the same buffer insertion. The mean and the variation of the period jitter increase with the interconnect length. The largest clock period, however, remains nearly constant as the interconnect length varies, since the increase in period jitter and skew counteract each other. The results from the proposed model are also illustrated in Fig. 9, which fit well with Monte-Carlo results. Fig. 9. Skew and jitter with different length of clock paths. Fig. 10. Power consumption vs. $max(J_{1,2})$ for different buffer insertions. #### D. Power Consumption with Constraints on Skitter The power consumed by clock distribution networks still constitutes a significant portion of the total power consumed by a circuit [1], [25]. The power consumption of the clock network under different constraints on skitter is investigated in this subsection. For the investigated clock paths, the total power consumption under different constraints on $\max(J_{1,2})$ is illustrated in Fig. 10. As shown in this figure, when $\max(J_{1,2}) \geq 220$ ps, all the buffer insertions approximately consume the same power. As the constraint becomes stricter $(\max(J_{1,2})$ decreases), the power increases and the solutions with fewer buffers are more power-efficient. The solution with 14 buffers consumes the lowest power and meets the constraint on $\max(J_{1,2})$ . The constraint on $\max(J_{1,2}) \geq 115$ ps can be met with low power overhead. Nevertheless, as the constraint becomes lower than 115 ps, significant power overhead is shown. For example, to decrease the $\max(J_{1,2})$ from 118 ps to 103 ps (13% improvement), the 14 buffers inserted along each clock path are sized up from 4.5 $\mu$ m to 12 $\mu$ m. The resulting power consumption increases from 9.1 mW to 19.2 mW (110% increase). In conclusion, pursuing extreme constraints on clock skew and period jitter results in high power for buffer insertion. #### E. Power Consumption with Constraints on Slew Rate The power consumed by a clock path under different constraints on the slew rate is investigated in this subsection. The output slew is denoted by the rise time at the clock sinks. Fig. 11. Power consumption vs. output slew for different buffer insertions. The power consumption under different constraints on the rise time is illustrated in Fig. 11. In contrast with the buffer insertion solutions minimizing $\max(J_{1,2})$ , the clock path with more and smaller buffers produces a lower output slew (higher slew rate). As shown in Figs. 5 and 11, the minimum slew rate of the clock path with 14 buffers is much higher than other solutions. Consequently, the solution with 14 buffers can not be used for the clock paths with the strict slew constraint, although this solution produces the lowest $\max(J_{1,2})$ . Among the other approaches, a clock path with 60 buffers consumes the lowest power under the same constraint on slew rate. Similar to the results in Fig. 10, the increase in power becomes severe as the slew constraint becomes stricter (slew rate decreases). For example, as the slew constraint decreases from 21 ps to 18.5 ps (12% decrease), the size of the buffers increases from 3 $\mu$ m to 7.5 $\mu$ m. The resulting power consumption increases from 17 mW to 38.5 mW (126% increase). In conclusion, pursuing extreme constraints on slew rate also results in high power overhead. ## F. Mitigating Skitter with Recombining Clock Paths Recombining clock paths (e.g., in binary trees and clock spines) are used to mitigate skew by shorting different paths at the output of the clock buffers [1], [10]. The interconnects can be shorted at different levels along the clock path, as depicted in Fig. 12(a). By inserting the shorted interconnect at different positions along the clock path, the number of shorted clock buffers $n_s$ varies from 0 to $\max(n_1, n_2) - n_p$ . The skew and jitter for the clock paths with different $n_s$ are illustrated in Fig. 12(b), where $n_1 = n_2 = 20, n_p = 0, n_s = \{0, 5, 10, 15, 20\}, W_n = 3 \,\mu\text{m}$ . As illustrated in Fig. 12(b), $3\sigma_{J_{1,2}}$ significantly decreases with $n_s$ . The mean skitter $\mu_{J_{1,2}}$ between two clock paths is, however, not affected by the position of the shorted interconnect. In other words, the variation of skitter is highly reduced by shorting the clock paths at the clock sinks, while the mean skitter cannot be reduced by the shorted interconnect. As $n_s$ increases, $\mu_{J_{1,2}}$ becomes higher than $3\sigma_{J_{1,2}}$ . This behavior shows that the period jitter caused by the power supply noise becomes dominant as the skew variation is reduced by binary trees. The power consumed by clock buffers increases slightly Fig. 12. Skitter and power with the shorted wire at different levels of clock paths. The number of buffers before the shorted point is denoted by $n_s$ . with $n_s$ , which indicates that the power does not vary a lot while shorting the buffers at different levels between two branches. #### G. Effect of DVS on Skitter The effect of dynamic voltage scaling (DVS) on skitter is discussed in this section. DVS is an efficient method to mitigate the impact of PVT variations on data transfer [16], [26]. Since DVS is commonly applied to the circuit block by block, the supply voltage for the data paths and the clock distribution networks are both tuned. For example, consider the setup slack $t_{\rm slack}$ between FF<sub>1</sub> and FF<sub>2</sub> in Fig. 1(a), $$t_{\text{slack}} = T_{1,2} - D_{1,2} - t_{\text{setup}} = T_{\text{clk}} - t_{\text{setup}} + J_{1,2} - D_{1,2}$$ . (13) The delay $D_{1,2}$ is the propagation time of data from the clock input pin of $FF_1$ , through the logic gates between $FF_1$ and $FF_2$ , to the data input pin of $FF_2$ . The setup time of $FF_2$ is denoted by $t_{\text{setup}}$ , which is constant. A positive $t_{\rm slack}$ is required for the data to be successfully latched by FF<sub>2</sub>. Since both $J_{1,2}$ and $D_{1,2}$ are affected by process variations and power supply noise, DVS can be used to ensure a positive $t_{\rm slack}$ by voltage scaling [16]. The voltage (consequently, the delay) of logic gates is adjusted according to the measured delay variation. The clock buffers within the adjusted circuit block are also affected by the scheduled supply voltage. An example of the skitter due to different $V_{\rm dd}$ is illustrated in Fig. 13. The skitter between two clock branches with 20 clock buffers ( $W_n=3\,\mu\mathrm{m}$ ) along each branch is shown in Fig. 13. By increasing $V_{\mathrm{dd}}$ , both the mean and variation of the skitter decrease. The maximum skitter is, therefore, reduced. Regarding the delay variation of a buffer stage shown in Fig. 3, both the mean and variation of the delay decrease with $V_{\mathrm{dd}}$ . As a result, the induced skitter decreases. Fig. 13. Skitter between two branches vs. supply voltage. Since $J_{1,2}$ is negative in this example, decreasing $|J_{1,2}|$ facilitates satisfying (13). Increasing $V_{\rm dd}$ can improve the performance of the circuit by both speeding up the logic gates and reducing the skitter of the clock distribution networks. The power consumed by the clock buffers, however, increases quadratically with $V_{\rm dd}$ . #### V. CONCLUSION The combined effect of process variations and power supply noise on clock skew and period jitter is investigated in this paper. A statistical model for the skitter including clock skew and jitter between two clock sinks is proposed and verified. The skitter for different buffer insertion cases for long clock paths is discussed. Simulation results show that when considering process variations and power supply noise separately, the resulting standard deviation of skitter can be up to 60% different from the reality. Modeling the process and power variations cohesively is, consequently, necessary to obtain the accurate distribution of the clock skitter, which significantly affects the operating frequency of a circuit. The resulting power consumption under different constraints on skitter and slew rate is also investigated. For strict timing constraints, severe power overhead ( $\geq 110\%$ ) is added to obtain a low improvement in the worst case skitter and slew rate ( $\leq 13\%$ ). Consequently, accurately estimating the skitter variation is necessary to design a power-efficient clock distribution network. Two mechanisms that can be used to mitigate the skitter are investigated. Recombining clock paths are shown to be efficient in reducing the variation of skitter but cannot mitigate the mean skitter. DVS can be used to mitigate the effect of process variations and power supply noise on both the datapath and clock distribution networks. #### REFERENCES - [1] T. Xanthopoulos, Clocking in Modern VLSI Systems. Springer, 2009. - [2] J. Cong et al., "Bounded-Skew Clock and Steiner Routing," ACM Transactions on Design Automation of Electronic Systems, vol. 3, no. 3, pp. 341–388, July 1998. - [3] E. Friedman, "Clock Distribution Networks in Synchronous Digital Integrated Circuits," *Proceedings of the IEEE*, vol. 89, no. 5, pp. 665–692, May 2001. - [4] S. Nassif, "Delay Variability: Sources, Impacts and Trends," in *Proceedings of the IEEE International Solid-State Circuits Conference*, February 2000, pp. 368–369. - [5] K. A. Bowman et al., "Impact of Die-to-Die and Within-Die Parameter Variations on the Clock Frequency and Throughput of Multi-Core Processors," *IEEE Transactions on Very Large Scale Integration (VLSI)* Systems, vol. 17, no. 12, pp. 1679–1690, December 2009. - [6] X. Jiang and S. Horiguchi, "Statistical Skew Modeling for General Clock Distribution Networks in Presence of Process Variations," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 9, no. 5, pp. 704–717, October 2001. - [7] D. Harris and S. Naffziger, "Statistical Clock Skew Modeling with Data Delay Variations," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 9, no. 6, pp. 888–898, December 2001. - [8] B. Razavi, Phase-locking in high-performance systems: from devices to architectures. John Wiley & Sons, Inc. New York, USA, 2003. - [9] M. Saint-Laurent and M. Swaminathan, "Impact of Power-Supply Noise on Timing in High-Frequency Microprocessors," *IEEE Transactions on Advanced Packaging*, vol. 27, no. 1, pp. 135–144, February 2004. - [10] J. Jang, O. Franza, and W. Burleson, "Compact Expressions for Supply Noise Induced Period Jitter of Global Binary Clock Trees," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 20, no. 1, pp. 1–14, December 2010. - [11] K. Wong, T. Rahal-arabi, M. Ma, and G. Taylor, "Enhancing Microprocessor Immunity to Power Supply Noise With Clock-Data Compensation," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 4, pp. 749–758, April 2006. - [12] D. Jiao, J. Gu, and C. Kim, "Circuit Design and Modeling Techniques for Enhancing the Clock-Data Compensation Effect Under Resonant Supply Noise," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 10, pp. 2130– 2141, October 2010. - [13] L. Chen, M. Marek-Sadowska, and F. Brewer, "Coping with Buffer Delay Change due to Power and Ground Noise," in *Proceedings of Design Automation Conference*, June 2002, pp. 860–865. - [14] T. Enami et al., "Statistical Timing Analysis Considering Clock Jitter and Skew due to Power Supply Noise and Process Variation," IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E93-A, no. 12, pp. 2399–2408, December 2010. - [15] R. Franch et al., "On-chip Timing Uncertainty Measurements on IBM Microprocessors," in Proceedings of the IEEE International Test Conference, October 2007, pp. 1–7. - [16] M. Gupta et al., "Tribeca: Design for PVT Variations with Local Recovery and Fine-Grained Adaptation," in Proceedings of the IEEE/ACM International Symposium on Microarchitecture, December 2009, pp. 435–446. - [17] A. Agarwal, D. Blaauw, and V. Zolotov, "Statistical Timing Analysis for Intra-Die Process Variations with Spatial Correlations," in *Proceedings* of the IEEE/ACM International Conference on Computer-Aided Design, November 2003, pp. 900–907. - [18] K. A. Bowman, S. G. Duvall, and J. D. Meindl, "Impact of Die-to-Die and Within-Die Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 2, pp. 183–190, February 2002. - [19] H. Chang and S. Sapatnekar, "Statistical Timing Analysis Under Spatial Correlations," *IEEE Transactions on Computer-Aided Design of Inte*grated Circuits and Systems, vol. 24, no. 9, pp. 1467–1482, September 2005 - [20] NIMO ASU, "ASU Predictive Technology Model," 2008. [Online]. Available: http://www.eas.asu.edu/~ptm/ - [21] ITRS, "International Technology Roadmap for Semiconductors," 2009. [Online]. Available: http://www.itrs.net - [22] Virtuoso Spectre Circuit Simulator User Guide, 7.0.1 ed., Cadence Design Systems, Inc., June 2008. - [23] R. Chen and H. Zhou, "Fast Buffer Insertion for Yield Optimization Under Process Variations," in *Proceedings of Asia and South Pacific Design Automation Conference*, January 2007, pp. 19–24. - [24] J. Xiong and L. He, "Fast Buffer Insertion Considering Process Variations," *Proceedings of the International Symposium on Physical Design*, pp. 128–135, April 2006. - [25] S. Tam, J. Leung, and R. Limaye, "Clock Generation & Distribution for a 45nm, 8-Core Xeon® Processor with 24MB Cache," in *Proceedings* of Symposium on VLSI Circuits, August 2009, pp. 154–155. - [26] X. Liang, G.-Y. Wei, and D. Brooks, "ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency," in Proceedings of International Symposium on Computer Architecture, June 2008, pp. 191–202.