# DBPM signal processing with field programmable gate arrays

LAI Longwei<sup>1,2</sup> LENG Yongbin<sup>1,\*</sup> YI Xing<sup>1,2</sup> YAN Yingbing<sup>1</sup> ZHANG Ning<sup>1,2</sup> YANG Guisen<sup>1,2</sup> WANG Baopeng<sup>1,2</sup> XIONG Yun<sup>1,2</sup>

> <sup>1</sup>Shanghai Synchrotron Radiation Facility, Shanghai Institute of Applied Physics, Shanghai 201800, China <sup>2</sup>Graduate University of Chinese Academy of Sciences, Beijing 100049, China

**Abstract** DBPM system performance is determined by the design and implementation of beam position signal processing algorithm. In order to develop the system, a beam position signal processing algorithm is implemented on FPGA. The hardware is a PMC board ICS-1554A-002 (GE Corp.) with FPGA chip XC5VSX95T. This paper adopts quadrature frequency mixing to down convert high frequency signal to base. Different from conventional method, the mixing is implemented by CORDIC algorithm. The algorithm theory and implementation details are discussed in this paper. As the board contains no front end gain controller, this paper introduces a published patent-pending technique that has been adopted to realize the function in digital logic. The whole design is implemented with VHDL language. An on-line evaluation has been carried on SSRF (Shanghai Synchrotron Radiation Facility) storage ring. Results indicate that the system turn-by-turn data can measure the real beam movement accurately, and system resolution is 1.1 µm.

Key words DBPM, FPGA, Frequency mixing, CORDIC.

### 1 Introduction

The DBPM (digital beam position monitor) performance is determined by signal processing. The turn-by-turn rate beam position signal is acquired using three processing algorithms<sup>[1]</sup>. Although FFT(fast Fourier transform)-based algorithm shows a better performance, it needs huge resource consumption. In this paper, we adopt the quadrature frequency mixing-based processing algorithm, which is the classical processing stream based on software radio theory. Fig.1 shows the block diagram. The input RF signal is under-sampled at the 169 multiple rate of the SSRF revolution frequency. The signal is processed by a quadrature mixing module, two CIC (cascade integrator comb) filtering and 13 decimation modules, two FIR (finite impulse response) filtering and 13 decimation modules, and a square root operation module to get the turn-by-turn data. The hardware ICS-1554A-002 has four RF input connectors, and

signals are sampled by four ADCs before entering the XC5VSX95T circuit.



Fig.1 DBPM signal processing stream.

The CIC theory and design is described in Ref.[2], and by the combine of Matlab toolbox FDA-Tool and Xilinx FIR IP core<sup>[3]</sup>, the FIR filtering and decimation is easily designed. In this paper, we discuss mainly design of the quadrature mixing module and an innovative technique to optimize the performance, rather than the CIC and FIR modules.

#### 2 Quadrature mixing

The first processing module is quadrature mixing. It down converts signal from central frequency to base<sup>[4]</sup>.

Supported by Shanghai Synchrotron Radiation Facility project

<sup>\*</sup> Corresponding author. *E-mail address:* lengyongbin@sinap.ac.cn Received date: 2011-01-18

Different from direct implementation, the CORDIC (coordinate rotation digital computer) algorithm is used to save RAM resources, which storing the lookup table of sine and cosine, and save a pair of multipliers. The pipeline iteration structure of the algorithm implementation makes system run at high clock frequency. Users just add pipeline stages to increase phase precision.

#### 2.1 CORDIC theory

Basic CORDIC algorithms are firstly introduced by Volder J E<sup>[5]</sup> and Walther J S<sup>[6]</sup>. A wide range of functions, including certain trigonometric, hyperbolic, linear and logarithmic functions, can be achieved by shift and addition<sup>[7]</sup>. This feature is suitable for implementation in FPGA (field programmable gate arrays).

The CORDIC algorithm is derived from general rotation transform:

$$x' = x\cos\theta - y\sin\theta, \ y' = y\cos\theta + x\sin\theta$$
 (1)

where (x,y) and (x', y') are the coordinate of rotation initial vector **A** and, terminal vector **B**, respectively;  $\theta$ is the angle rotated in the Cartesian plane. Eq.(1) can be rearranged as

$$x' = \cos\theta [x - y \tan\theta], \ y' = \cos\theta [y + x \tan\theta]$$
 (2)

Suppose **B** is achieved by *n* times rotation of **A**. The equation at  $i^{\text{th}}$  rotation angle  $(\theta_i)$  is,

$$x_{i+1} = \cos\theta_i [x_i - y_i \tan\theta_i], \ y_{i+1} = \cos\theta_i [y_i + x_i \tan\theta_i] \quad (3)$$

Set  $\tan \theta_i = \pm 2^{-i}$  (*i*=0, 1, 2...), the tangent multiplication can be implemented by shifts. Arbitrary an-

gles can be obtained by successively smaller elementary rotation, and i increases by 1 after each rotation. The iterative rotation can be expressed by

$$x_{i+1} = K_i[x_i - y_i d_i 2^{-i}], \ y_{i+1} = K_i[y_i - x_i d_i 2^{-i}]$$
(4)

where  $d_i = \pm 1$ ,  $K_i = \cos(\tan^{-1}2^{-i}) = (1+2^{-2i})^{-1/2}$ 

The angle deviation after every rotation is

$$z_{i+1} = z_i - d_i \tan^{-1}(2^{-i}) \tag{5}$$

where  $z_0$  is the angle of objective vector;  $d_i = +1$  at  $z_0 > 0$ , or else,  $d_i = -1$ .

The ultimate iterative rotation is,

$$x_{n} = M_{n}[x_{0}\cos(z_{0}) - y_{0}\sin(z_{0})]$$

$$y_{n} = M_{n}[y_{0}\cos(z_{0}) + x_{0}\sin(z_{0})]$$

$$z_{n} \approx 0$$

$$M_{n} = \prod_{n} \sqrt{1 + 2^{-2i}} \approx 1.647$$

Set  $y_0 = 0$ ,

$$x_n = M_n x_0 \cos(z_0); \quad y_n = M_n x_0 \sin(z_0)$$
 (7)

Eq.(7) is  $M_n$  times of the quadrature mixing, which is a CORDIC-based mixing. The Cartesian-to-polar transform was applied to the square root operation in this system<sup>[7]</sup>.

#### 2.2 Implementation structure in FPGA

Area tradeoffs speed strategy is adopted in FPGA design. Fig.2 shows that the pipeline system of CORDIC-based quadrature mixing structure can run at full clock frequency. It is realized in complement code.



Fig.2 Diagram of CORDIC based qadrature mixing pipeline.

#### 2.2.1 Iteration times

Determined by the iteration stages, the phase precision is  $2\pi/2^N$  when the output bit size of phase accumulator is *N*. It equals to the phase of *i*<sup>th</sup> iteration, which is  $\arctan(2^{-i})$ . The iteration times is given by

$$i = -\log_2[\tan(2\pi/2^N)] \tag{8}$$

where, *i* =16 at *N*=18.

#### 2.2.2 Mixing precision

The mixing precision depends on that of the input data, and the sine and cosine function, which are only done when there are no input data. The minimum sine value changes at  $(\pi/2-\Delta\varphi, \pi/2)$ , where  $\Delta\varphi$  is the minimum phase change.

$$[\sin(\pi/2) - \sin(\pi/2 - \Delta\varphi)] 2^{n_{bs}} \ge 1$$

$$2^{n_{bs}} \ge [1 - \cos(\Delta\varphi)]^{-1}$$

$$n_{bs} \ge \log_2 [1 - \cos(\Delta\varphi)]^{-1}$$
(9)

where  $n_{\rm bs}$  is the bit size of sine.

$$\Delta \varphi = 2\pi/2^N, \text{ and,}$$

$$n_{bs} \ge \log 2 [1 - \cos(2\pi/2^N)]^{-1}$$
(10)

where,  $n_{bs} \ge 16$  at N = 18, and the mixing precision is 16 bit size.

### 2.2.3 Phase truncation influence

Designed as 32, the bit size of phase accumulator is truncated to 18 by CORDIC operation, and the truncation can introduce spurious spectrum to the design. The beam central frequency in SSRF storage ring is 499.654 MHz, and shifts to 30.5344 MHz after under-sampling at the rate of 117.2799 MHz, then the frequency control word is 1118215904<sup>[8]</sup>. Matlab simulation is used to evaluate the spurious introduced by phase truncation. Fig.3a shows that the phase truncation has introduced spurious to the Numerical Controlled Oscillator (NCO). Compared with Fig.3b, the NCO background noise increased from -200 dB to -150 dB. However, the maximum spurious introduced is less than -100 dB, hence a negligible influence. So the phase truncation of 14 bits is safe.

## **2.2.4** Quadrant mapping<sup>[9]</sup>

The sine and cosine can be calculated by the CORDIC algorithm at the phase of  $-\pi/2$  to  $\pi/2$ , and the highest two bits are used to judge the quadrant in the range of 0 to  $2\pi$ . Table 1 lists the quadrant mapping, with *X* and *Y* being the outputs of last iteration stage.



Fig.3 Spectra of NCO with (a) and without (b) phase truncation.

Table 1Quadrant mapping

|                                                                 | Highest<br>digit | Highest-1<br>Digit | Quadrant | I Channel    | Q channel    |
|-----------------------------------------------------------------|------------------|--------------------|----------|--------------|--------------|
| 01IIY complementX10IIIX complementY complement11IVYX complement | 0                | 0                  | Ι        | X            | Y            |
| 10IIIX complementY complement11IVYX complement                  | 0                | 1                  | II       | Y complement | Х            |
| 1 1 IV Y X complement                                           | 1                | 0                  | III      | X complement | Y complement |
|                                                                 | 1                | 1                  | IV       | Y            | X complement |

### 3 Automatic input gain control

FPGA is a bit-based operation circuit. A plus operation leads to output of one more bit data, and a multiplication doubles the bits. Data bit width grows with operation times, and takes so enormous resources, hindering realization of the design. Therefore, truncations are essential steps in a FPGA design. To make the output data contain more effective information, an input gain for weak signals is necessary. The DBPM, not the ICS-1554A-002, has front end gain control module. However, FPGA can implement the function by digital logic.

The DBPM contains four input channels, and

its automatic gain control module is shown in Fig.4. The absolute value of inputting four signals is taken to make bitwise logic-or operation, and get lock signal from previous output data. The "inp\_lock" bit corresponds to '1' once the "inp\_mod\_check" bit detects '1'. Each lock bit controls a Multiplex (MUX) in a priority ranking chain. Each stage output value is fixed, or is controlled by "inp\_lock" bits of the previous stage. The "inp\_mov\_r(10)" of the last stage outputs the final left shift bits with number N, and the input signal is amplified by  $2^N$ .



Fig.4 Diagram of automatic gain control.

#### 4 **On-line evaluation**

The on-line evaluation was carried on the SSRF storage ring filled with 500 beam bunches (150 mA) during user operation. Beam signals were fed into the four input channels of ICS-1554A-002, from the spare probe of cell 16 (16BPM8). A commercial digital BPM processor of Libera Brilliance was connected to another spare probe at cell 15 (15BPM8) as a reference. Libera Brilliance has been working well at SSRF to measure real beam motion and machine parameters <sup>[10]</sup>. So the agreement of sampled data between ICS-1554A-002 and Libera Brilliance will be a good evaluation criterion for performance of the new DBPM system.

Figure 5 shows the results of turn-by-turn beam position measurement taken by ICS-1554A-002 and Libera Brilliance at the same time during decay mode operation. For horizontal plane the Libera Brilliance data clearly shows sine-like beam orbit motion, with an amplitude being as large as 6.6  $\mu$ m (RMS value), introduced by energy oscillation shown in Fig.5c.



**Fig.5** Sampled turn-by-turn result from ICS1554A-002 positions of (a) x and (b) y, and Libera Brilliance positions of (c) x and (d) y.

The orbit oscillation with exactly the same frequency and amplitude was recorded by ICS-1554A-002, as shown in Fig.5a. The perfect agreement of measured horizontal orbit oscillation between ICS-1554A-002 and Libera Brilliance indicates the ability of real beam motion detection of the new DBPM system. For the vertical plane, all strange unstable oscillation modes were attenuated by the transverse feedback system<sup>[11]</sup>. The standard deviation of turn-by-turn beam position data is mainly contributed by electronics noise, which is a good evaluation of the system resolution. The calculation of 2000 samples shows almost the same spatial resolution for both processors: ICS-1554A-002, 1.1 µm (Fig.5b); and Libera Brilliance, 1.2 µm (Fig.5d).

Figure 6 shows the horizontal beam position spectra derived from turn-by-turn data of ICS-1554A-002 and Libera Brilliance. Both spectra show energy oscillation at the frequency of about 5 kHz, which is overlapped by narrow band noise.



**Fig.6** Spectra on-line turn-by-turn at 65536 turns (horizontal direction).

### 5 Conclusion

This paper have introduced CORDIC algorithm based mixing module and digital logic based automatic gain control technique applied in the beam position signal processing system. The on-line evaluation shows that the system can monitor beam movement accurately, and the system resolution is  $1.1 \ \mu\text{m}$ . It meets the beam diagnostic requirement. System performance could be further improved by applying more precise frequency control word.

### References

- Lai L W, Leng Y B, Yan Y B, *et al.* Nucl Tech, 2010, **33**: 734–739 (in Chinese)
- 2 Uwe Meyer-Baese. Digital Signal Processing with Field Programmable Gate Arrays, Second Edition. Beijing: Tsinghua University Press, Chapter 5, 2007, 158–173 (in Chinese).
- 3 Xilinx Inc, FIR\_Compiler v4.0 data sheet, June 27, 2008. http://www.uccs.edu/~gtumbush/4211/fir\_compiler\_ds534 .pdf.
- 4 Yang X N, Lou C Y, Xu J L, *et al.* Software radio theory and application. Beijing: Publishing House of Electronics Industry, 2001. Chapter 2, 48–53 (in Chinese).
- 5 Volder J E. The CORDIC Trigonometric Computing Technique. IRE Trans Electron Comput, 1959, 8: 330–334.
- 6 Walther J S. A Unified algorithm for elementary functions, Spring Joint Computer Conference pp. California, USA 1971, 379–385.
- 7 Andraka R. A survey of CORDIC algorithms for FPGA based computers. ACG, Inc. ACM 0-89791-978-5/98/01, 1998.
- 8 Xilinx Inc, DDS Compiler v4.0 data sheet, November 30, 2006. <u>http: // www.xilinx. com/ support/ documentation / ip\_documentation/dds\_ds558.pdf</u>.
- 9 Liu K. The design and implementation of digital down convector circuit corresponding verification platform. Chengdu: University of Electric Science and Technology of China, 2006, 28 (in Chinese).
- 10 Leng Y B, Zhou W M, Yuan R X, *et al.* Nucl Tech, 2010,33: 401–404 (in Chinese).
- 11 Leng Y B, Ye K R, Zhou W M, *et al.* SSRF beam diagnostics system commissioning Proc Of DIPAC'09, 2009, 24–26.