# The clock system for LHAASO WCDA based on reduced White Rabbit

LI Cheng<sup>1,2</sup> LIU Shubin<sup>1,2,\*</sup> SHANG Linfeng<sup>1,2</sup> CAO Ping<sup>1,2</sup> AN Qi<sup>1,2</sup>

<sup>1</sup>State Key Laboratory of Particle Detection & Electronics, Department of Modern Physics, University of Science and Technology of China, Hefei 230026,

China

<sup>2</sup>Anhui Key Laboratory of Physical Electronics, Hefei 230026, China

**Abstract** Due to the large scale of Water Cherenkov Detector Array in Large High Altitude Air Shower Observatory, the frontend digitization is imperative. Thus a clock distribution system is desired, which broadcasts the synchronous clock signals with low jitter to the frontend electronics distributed in the field of 90 000  $\text{m}^2$ . The White Rabbit protocol provides an option, which has been approved to achieve sub-ns accuracy and ps jitter in the synchronization of around 1 000 nodes in the order of 10 km. But the hierarchy of the original is too complex for Large High Altitude Air Shower Observatory application. Thus we proposed a reduced scheme based on the White Rabbit protocol. The validation circuit shows that the clock skew due to the fiber length difference can be adjusted to less than 25 ps and the clock jitter is less than 62 ps.

Key words White Rabbit, Skew, Jitter, Digital dual mixer time difference, Time-to-digital converter

# 1 Introduction

The Large High Altitude Air Shower Observatory (LHAASO) project is a large extensive air shower detector array of 1 km<sup>2</sup> at 4 300 m above sea level at Yangbajing, Tibet, for very high energy gamma ray astronomy and cosmic ray physics. As a major component of the LHAASO, the Water Cherenkov Detector Array (WCDA), consists of four pools of the same size, covering an area of 90 000 m<sup>2</sup> with an effective water depth of 4 m. It contains 3600 detector units, and 3600 photomultiplier tubes (PMTs) facing upward to collect Cherenkov lights produced in water by shower particles<sup>[1]</sup>. The arriving time of the Cherenkov light should be recorded by the PMT and the readout electronics, with a resolution of better than 1 ns. For the clock driven to the readout electronics, both the skew and the jitter parameter of less than 100 ps are essential. In addition, the charge of PMT pulse should be digitized with a wide dynamic range from a single photoelectron to 4 000 photoelectrons. Both the time and charge measurement circuits should be fed by the high performance clock signal.

For the measurement, a conventional way is digitization on the backend. A cable from each PMT carrying the output signal is connected to the preamplifier, and the analog signal is transmitted along a coaxial cable to the digitization module located in the counting room, where the time and charge measurement is implemented. But due to the large scale of LHAASO WCDA, the impedance and the bandwidth limitation of the long cable will certainly attenuate greatly the signals. For LHAASO WCDA, the cable length of 150 m is reasonable, while an experiment in LBNE (Long-Baseline Neutrino Experiment) WCD<sup>[2]</sup> research showed that the common RG58 broadband cables demonstrated the attenuations over 150 m of greater than 9 db. Also the noise brought by the long cable will decrease the time and charge measurement resolution.

Another way is to realize the digitization near the detector, in the front end electronics (FEE), and the digital signals, rather than analog signals, are

Supported by the National Natural Science Foundation of China (No. 11175174 and No.11005107) and the Knowledge Innovation Program of the Chinese Academy of Sciences (KJCX2-YW-N27).

<sup>\*</sup> Corresponding author. *E-mail address*: liushb@ustc.edu.cn Received date: 2012-02-25

transmitted to the counting room *via* a long cable or fiber. There are many successful WCDA readout electronics of this type, e.g., the ANTARES neutrino telescope<sup>[3]</sup>. However, there is a great difficulty in distributing the clock to every FEE. For an array with thousands of channels, the time measurement must be carried with a synchronous clock. For each clock, the primary importance is the phase alignment, which should remain stable when the fiber delay varies due to the fluctuation of temperature.

We tested the thermal phase shift of a 9/125  $\mu$ m bare single mode fiber at escalating temperature from 30°C to 55°C. In Fig. 1, the fiber delays are shown in the differences to the fiber delay at 30°C, and the average temperature coefficient is 35 ps·km<sup>-1</sup>·°C<sup>-1</sup>. This indicates the necessity of delay adjustment to keep the phase alignment among all FEEs.



Fig.1 Differences of fiber delay at 30°C to 55°C.

In the ANTARES neutrino telescope, an echobased time calibration method is applied for the clock phase alignment. FEE sends a return signal *via* the same optical path for the outgoing clock signals. The time offsets among all FEEs are measured every hour by recording propagation delays of the return signals from each FEE with respect to the original clock signal emission time<sup>[3]</sup>. However, the 1-ns resolution of this echo-based time calibration method cannot meet the demand of LHAASO WCDA clock system.

Timing systems of in accelerator laboratories provide many options for the LHAASO WCDA, such as the TOF clock system for BESIII<sup>[4]</sup>. A specialized 80 m-long phase-stabilized optical fiber is used to transfer the RF clock reference from the BEPCII accelerator to the TOF VME64xP crates. The PSOF has good thermal stability, with a temperature coefficient is 0.4 ppm/°C at 10°C to 30°C, i.e., 3.2 ps timing drift with the 80-m length. Nevertheless, the phase-stabilized optical fiber of about 150-m long for each of the 3600 frontend channels of LHAASO WCDA would be an exorbitant cost.

A better choice for LHAASO WCDA clock distribution is the White Rabbit protocol  $(WR)^{[5]}$ , being carried out by laboratories of high energy physics, and aiming at developing a distributed timing and data network capable of synchronizing up to 1 000 nodes with an accuracy of <1 ns to the timing source.

## 2 The White Rabbit protocol

The White Rabbit takes advantage of the synchronous ethernet, IEEE1588<sup>[6]</sup> and the digital Dual Mixer Time Difference (DMTD)<sup>[7]</sup> to realize the synchronization of remote node precisely and continuously.

#### 2.1 Structure

WR consists of a WR master, WR switches and WR nodes. The WR master gets its clock from an external clock source and uses it to drive the encoding of all transmitters in each of the downlink ports. The downlink ports are connected either to a final node or to the uplink port of another switch by fiber links. All of the switches and nodes generate a tree of switches where all internal clocks are derived from the master source clock.

## 2.2 Process

Once clocks are transmitted and recovered in all nodes, there remains the task of compensating transmission delays, which comes from the difference of propagation time in fibers. WR uses IEEE1588 to figure out the coarse delay first and the fine delay is then measured continuously by the digital DMTD. The principle of WR is shown in Fig.2. The master receives the reference clock and fanouts it as the transmitted clock (TCLK), which is transmitted to the delay measurement module and the slave. The RCLK recovered by the Serializer/Deserializer (SerDes) is fed back as the transmit clock of the same SerDes. Finally, the phase difference between TCLK and the bounced back clock (BCLK), which is recovered by the master SerDes, is measured for the fine delay adjustment. The slave will dynamically adjust the phase difference according to the coarse and fine delay measurement.



Fig.2 Pnciple of the White Rabbit protocol<sup>[5]</sup>.

# 2.3 Digital DMTD

After rough estimate of fiber link delay by IEEE1588 to achieve ns range resolution, the digital DMTD is applied to achieve ps range resolution<sup>[5]</sup>. So the digital DMTD is a particular need for the LHAASO WCDA to realize the skew of less than 100 ps among all FEEs.

In the digital DMTD process (Fig.3), CLK\_FX is generated and fed into the clock input of two flipflops, of which the period is just slightly different from that of TCLK and BCLK. TCLK and BCLK are imported to the data input of the two flip-flops.



Fig.3 Schematics of the digital DMTD.

After sampling the two clock signals with CLK\_FX in a slow sweeping way, the phase difference of the output of Q1 and Q2 is a linearly magnified version of the phase difference between TCLK and BCLK<sup>[5]</sup> (Fig.4). Then Q1 and Q2 are processed by the deglitcher and pulse shaping module. The magnified phase difference  $\Delta \Phi_Q$  between Q1 and Q2 is achieved from the deglitched Q1 and Q2 by the phase difference averaging module. Therefore, the phase difference measurement in the ps range is achieved.



Fig.4 Principle of the digital DMTD.

So the phase difference of TCLK to BCLK can be related to the phase difference of Q1 and Q2:,

$$\Delta \Phi_Q = A \ \Delta \Phi_{\rm Clk} \tag{1}$$

where,  $\Delta \Phi_Q = \Phi_{Q1} - \Phi_{Q2}$  is the phase difference of Q1and Q2, A is the magnification coefficient, and  $\Delta \Phi_{\text{Clk}} = \Phi_{\text{TCLK}} - \Phi_{\text{BCLK}}$  is the phase difference of TCLK and BCLK. The zooming effect of digital DMTD can be quantified by the magnification coefficient.

The relation between the total root mean square (RMS) timing precision of measuring  $\Delta \Phi_{\underline{Q}}(\sigma_T)$  and that of  $\Delta \Phi_{\text{Clk}}(\sigma_{\text{Clk}})$  can be expressed as

$$\sigma_{\rm Clk} = \sigma_T / A \tag{2}$$

## 3 Reduced WR

Providing a novel synchronization scheme for clock distribution and data transferring, WR can be applied as an infrastructure of the timing system for LHAASO. Results of a preliminary test using a pair of WR master/slave nodes show that rms time jitter between recovered master/slave PPS signals is below 0.4 ns, and the peak-to-peak deviation is <100 ps among different synchronization process over different fiber lengths<sup>[8]</sup>. This gives the confidence of applying WR in the readout electronics for LHAASO WCDA, but the complex synchronous ethernet to realize the clock distribution is not necessary for the readout electronics of LHAASO WCDA. We only need to distribute the synchronous clock from the counting room to FEE located in the pools, and gather data back via the same path. A star structure shall be better than the switch network. Thus the synchronous ethernet and WR switches are trimmed in our proposal. Besides, the digital DMTD is improved by using high resolution time-to-digital converter (TDC), rather than a simple counter, to improve the delay measurement.

## 3.1 Structure

To evaluate the proposal, prototype circuits were built. There are two main parts in the reduced version of WR: the Clock Transmitter Module (CTM) and the Clock Receiver Module (CRM), as shown in Fig.5.



**Fig.5** Photos of the Clock Transmitter board (left) and Clock Receiver board (right).

The CTM function is to receive reference clock from an external source, transmit TCLK to the CRM, and measure the delay between the two parts.

The CRM function is to obtain RCLK by recovering TCLK with the SerDes, transmit BCLK back to CTM, adjust the clock delay between the two parts, and generate FEE\_CLK as the reference clock for CRM to realize the time and charge measurement.

In our star structure, one CTM drives the clock of 10 CRMs. The system process is as follows. A rough estimate of the fiber link delay can be figured out by IEEE1588. Then, in the clock phase scale, the digital DMTD begins the delay measurements, *via* IEEE1588 and the digital DMTD, implemented in the FPGA of the CTM, while in the FPGA of the CRM, the delay adjustment module is ready to receive the measurement results and adjust them to a proper value.

#### 3.2 Improvement of digital DMTD

The principle of digital DMTD given in Section 2 is in ideal considerations. In the experiment (will be described in Section 4), an interesting phenomenon occurred. A jitter of about 100 ns was found at the transition of the Q1 and Q2. Because  $\Delta \Phi_Q$  is the phase difference of Q1 and Q2, the jitter of Q1 and Q2 will cause the deterioration of the RMS jitter of  $\Delta \Phi_Q$ .

The great jitter might be caused by the jitter of the TCLK, BCLK and CLK\_FX, and metastability of the flip-flop as well, hence the necessity of reducing the clock jitter and solving the metastable problem. Using the phase lock loop (PLL), the test result (will be given in Section 4) shows that the jitter performance meets the requirement of LHAASO WCDA. The metastable problem is caused by setup and hold time violations of the flip-flop. This was observed in an experiment<sup>[9]</sup>, in which the time difference between the transition edge of data and the rising edge of clock could be adjusted by a ps step. At the data edge, which was relative to the positive edge of clock, the flip-flop just failed to latch the rising D input. This is called the critical switching point (CSP). Thus the time between the data edge and the positive edge of clock is defined as the CSP time ( $T_{critical}$ ), when the edge of data arrives at the CSP.

Figure 6 shows the clock-to-Q-delay value as a function of the difference between the actual data setup  $T_{\text{setup}}$  and  $T_{\text{critical}}$  on a logarithmic scale. Whenever the positive edge of clock arrives about 10 ns after the CSP, the clock to Q delay stays at a fixed value. The position of the data edge, relative to the positive edge of clock, at which the clock to Q delay keeps unchanged, is the metastable point (MP). Thus the time between the critical switching point and the metastable point is the metastable point time ( $T_{\text{meta}}$ ).



Fig.6 The essence of the metastable state.

As the positive edge of clock moves towards the CSP, the Q output still switches, but the clock to Qdelay gets longer. For the positive edge of clock arrivals very close to the CSP, the clock to Q delay is proportional to the logarithm of the difference between the data setup time and the CSP. This increase in the clock to output delay as a function of the real input setup time is the essence of the metastable state<sup>[9]</sup>.

According to the theory of metastability, when the positive edge of CLK\_FX arrives after the CSP and before the MP, the metastable problem happens; when the positive edge of CLK\_FX arrives after the MP, the metastable problem will not happen.

If the difference between the period of TCLK and BCLK and that of CLK\_FX is defined as  $T_{\text{step}}$ , then the positive edge of CLK\_FX moves relative to the transition edge of TCLK and BCLK by  $T_{\text{step}}$  every time. Defining  $N_{\text{step}}$  as the number of times that the positive edge of CLK\_FX arrives in the range of  $T_{\text{meta}}$ (Fig.7), the  $T_{\text{step}}$  and  $N_{\text{step}}$  can be calculated as

$$T_{\text{step}} = T - T_{\text{FX}} = T/A \tag{3}$$

$$N_{\text{step}} = T_{\text{meta}}/T_{\text{step}} = AT_{\text{meta}}/T$$
 (4)

where,  $T_{FX}$  is the period of CLK\_FX; *T* is the period of TCLK and BLCK;  $T_{meta}$  is the metastable point time of the flip-flop; and *A* is the magnification coefficient.



**Fig.7**  $T_{critical}$ ,  $T_{meta}$ ,  $T_{step}$  and  $N_{step}$ .

From Eq.(4), the  $N_{\text{step}}$  increases with A. But the bigger  $N_{\text{step}}$  means more positive edges of CLK\_FX to arrive during the  $T_{\text{meta}}$ . This will cause a more serious metastable problem, as was seen in the experimental result in Fig.8. The RMS jitter of  $\Delta \Phi_Q (\sigma_{\Delta \phi Q})$  was about 30 ps at  $A \leq 111$ , where the  $\sigma_{\Delta \phi Q}$  started to increase rapidly with A. So, as long as  $A \leq 111$ , the metastable problem will not occur.



**Fig.8**  $\sigma_{\Delta \Phi Q}$  as a function of *A*.

After improvement of the RMS jitter of  $\Delta \Phi_Q$ , the following task relies on measuring  $\Delta \Phi_Q$ , by a counter or a TDC. From Eq.(1), the clock skew should be magnified by *A* and measured as  $\Delta \Phi_Q$ . For <100 ps clock skew among all FEEs, we have  $\Delta \Phi_Q \leq 11.1$  ns at *A*=111. Then, using a 100 MHz counter to measure  $\Delta \Phi_Q$  can meet the resolution requirement of 10 ns.

The total RMS timing precision for measuring  $\Delta \Phi_Q (\sigma_T)$  needs careful consideration, too. The  $\sigma_T$  mainly originates from the timing variations induced by the distribution of  $\Delta \Phi_Q$  and the quantization error, .

$$\sigma_T = (\sigma_{\Delta \Phi Q}^2 + \sigma_q^2)^{0.5} \tag{5}$$

where  $\sigma_q$  is the RMS timing precision of a counter or a TDC due to the quantization error.

A method to decrease the  $\sigma_q$  is the time interval averaging (TIA), a powerful yet easy method for increasing accuracy of time interval measurements on repetitive signals. TIA is based on statistical reduction of the ±1 counter quantization error inherent in digital measurements. The more intervals are averaged, the measurement result approaches nearer to the true value of the unknown time interval.

According to TIA, the worst RMS timing precision of measurements for the quantization error can be calculated by the Eq.(6) and the time period for measuring  $\Delta \Phi_O$  by Eq.(7).

$$\sigma_q = Bin/(4N)^{0.5} \tag{6}$$

$$T_{\rm M} = N T_Q \tag{7}$$

where, *Bin* is the resolution of a counter or a TDC, *N* is the sample number of  $\Delta \Phi_Q$  measured,  $T_M$  is the measurement time, and  $T_Q$  is the period of Q1 and Q2.

Calculation results show that to get a 50-ps  $\sigma_q$  by a 100 MHz counter and a TDC with a resolution of 1 ns, the counter needs a  $T_{\rm M}$  that is 100 times greater than the TDC does. So a multi-phased clock TDC in FPGA is chosen, and it provides the time resolution of 625 ps and the RMS of 312.5 ps<sup>[10]</sup>, well below 1 ns.

# 4 Performance

## 4.1 Testing setup

FS725 Rubidium Frequency Standard manufactured by Stanford Research Systems is used as a 10 MHz sine clock reference. After 0.5 m cable, the 10 MHz sine clock is transmitted to the CTM and multiplied to 40 MHz clock by the PLL (AD9518). The 40 MHz TCLK is then fanouted to the SerDes (DS92LV16) and FPGA (Virtex-4 XC4VLX40-10) by CY2304. The optical transmitter and receiver are GT2541 and GT2542 respectively, which work at a frequency up to 2.5 GHz. Two clock modules are connected by 9/125 µm single mode fiber. The oscilloscope is LeCroy WavePro715z. Fig.9 shows the test setup scheme.



Fig.9 The test setup scheme.

#### 4.2 Delay adjustment performance

To test the delay adjustment performance of the system, 40-MHz TCLK was transmitted by the CTM to CRModule. The phase difference between TCLK and FEE\_CLK indicates the performance of a single point-to-point link of the clock distribution system.

Two fiber links (100 m and 200 m) were used, to simulate the delay deviation of fiber. The phase differences of TCLK and FEE\_CLK, before and after the delay adjustment, were recorded by an oscilloscope. The results of delay adjustment shows that the phase difference of TCLK and FEE\_CLK with different fiber links can be adjusted to the preset value of about 11 ns (Table 1). After delay adjustment, the delay difference between the two fibers is only 24.5 ps, within design requirement of LHAASO WCDA.

Table 1delay adjustment performance.

| Fibers | Phase difference (TCLK to FEE_CLK) / ns |
|--------|-----------------------------------------|
| 100 m  | 8.9433 (before delay adjustment)        |
|        | 11.1647 (after the delay adjustment )   |
| 200 m  | 1.0039 (before the delay adjustment)    |
|        | 11.1402 (after the delay adjustment )   |

#### 4.3 Jitter performance

The jitter performance of TCLK in CTM and that of FEE\_CLK in CRM were measured (Fig.10). The cycle-to-cycle jitter of TCLK was 28.40 ps RMS and

that of FEE\_CLK 61.13 ps RMS, a jitter performance satisfying the requirement of LHAASO WCDA.





#### 5 Conclusions

In this paper, based on the need of LHAASO WCDA, a clock distribution system is designed and tested. This clock distribution system is a reduced version of WR, which can automatically adjust the propagation delay and keep the phase alignment. The validation circuit shows that the clock skew due to the fiber's length difference can be adjusted to less than 25 ps and the clock jitter is less than 62 ps. The experimental results show that the clock performance satisfies the requirement of LHAASO WCDA. Furthermore, a new prototype clock system based on the same design is being developed for LHAASO WCDA.

### References

- 1 An Q, Bai Y X, Bi X J, *et al*. Nucl Instrum Meth Phys Res, Sect A, 2011, **644**: 11–17.
- 2 The LBNE collaboration. LBNE Conceptual design report, http://www.phy.bnl.gov, 2010.
- 3 Aguilar J A, Al Samarai I., Albert A, *et al.* Astroparticle Phys, 2011, **34:** 539–549.
- 4 Li H, Liu S, Feng C, *et al.* IEEE Trans Nucl Sci, 2006, **57**: 442–445.
- 5 Moreira P, Serrano J, Wlostowski T, *et al.* ISPCS-2009-5340196: White rabbit: Sub-nanosecond timing distribution over ethernet. ISPCS 2009 international IEEE symposium on precision clock synchronization for measurement control and communication, Brescia, Italy, 2009, 1–5.
- 6 IEEE standard for a precision clock synchronization protocol for networked measurement and control systems, IEEE Std 1588-2002, IEEE instrumentation &

measurement society, 2002, 1-144.

- 7 Serrano J, Alvarez P, Cattin M, *et al.* ICALEPCS: The white rabbit project. ICALEPCS2009, Kobe, Japan, 2009, 1–3.
- 8 Gong G, Chen S, Du Q et al. ICALEPCS: Subnanosecond timing system design and development for LHAASO project. ICALEPCS2011, Grenoble, France,

2011, 646–649.

- 9 Howard W J, Martin G. High-speed digital design: a handbook of black magic, New Jersey, Prentice-Hall, 1993, Chapter 3,120–131.
- Hao X, Liu S, Zhao L, *et al.* Nucl Sci Tech, 2011, 22: 178 –184.