# Design of high-speed Delay-FXLMS hardware architecture based on FPGA

Jun Yuan<sup>1</sup>, Xiangsheng Meng<sup>1</sup>, Jia Ran<sup>1</sup>, Wei Wang<sup>1</sup>, Qiang Zhao<sup>1</sup>, Jun Li<sup>1</sup>, Qin Li<sup>2</sup> <sup>1</sup>School of Optoelectronic Engineering, Chongqing, University of Posts and Telecommunications, Chong Qing 400065, China <sup>2</sup>Chongqing Marketing Department of Southwest Oil & Gas Field Company, Chong Qing 401120, China

Received: July 25, 2021. Revised: January 17, 2022. Accepted: February 4, 2022. Published: February 28, 2022.

Abstract-In order to improve the convergence and clock speed of DFxLMS adaptive filter, a hardware architecture of fine-grained retiming DFxLMS (HS-TF-RDFXLMS) filter in the form of hardware sharing transpose is proposed. Firstly, the architecture adopts delay decomposition algorithm to solve the problem that the convergence of filter decreases due to the increase of delay and output lag. Secondly, on the premise that the algorithm performance remains unchanged, the adaptive filter module and the secondary path module are transposed to further reduce the critical path to improve the clock speed of the system. The number of registers is reduced by optimizing circuit sub-module. Finally, the area/speed tradeoff of TF-RDFXLMS filter is realized by hardware sharing on the basis of constant critical path. Experimental results show that the convergence speed of the algorithm is 3.5 times that of DFxLMS algorithm, and the critical path is shortened by  $([log2^N]+1)T_{ADD}$ . The circuit structure of adaptive filter designed in this paper is realized by Xilinx platform. Artix7 FPGA The clock of speed HS-TF-RDFXLMS filter is reduced by 4.386% compared with TF-RDFXLMS filter. However, the resources of LUT and FF are saved by 10.964% and 28.322% respectively. The power consumption is 150.73 mW. This improves the performance of the system.

Keywords—FxLMS, adaptive filter, retiming, hardware sharing, FPGA.

#### I. INTRODUCTION

With the rapid development of economy and the advancement of urbanization, noise seriously threatens people's physical and mental health, so controlling noise pollution is an urgent problem to be solved at present. Noise control methods can be divided into passive noise control (PNC) [1,2]and active noise control (ANC) [3,4]. ANC can effectively suppress low frequency noise and is widely used in active noise reduction earphones [5, 6, 7, 8]. currently, the most popular adaptive algorithm used in Active Noise Control (ANC) systems [9,10] is Filtered -x Least Mean Square (FxLMS) [11,12]. FxLMS algorithm has been widely used as the "benchmark" algorithm in active noise control due to its clear physical mechanism, small computation and simple implementation [13]. The core of ANC system is adaptive algorithm [14] and adaptive filter [15]. Adaptive filters are widely used in the fields of system identification [16,17], inverse modeling, linear prediction [18] and interference cancellation [19]. Figure 1 is the structure diagram of the FxLMS algorithm. P(z) is called the main path, which is the transfer function of the sound path between the reference noise source and the error microphone, and S(z) is called the secondary path, which is the connection between the speaker and the microphone. Transfer functions for electrical and acoustic paths.



With the development of IC technology, traditional algorithms implemented by software cannot meet the required processing speed. Field Programmable Gate Array (FPGA) [20] is widely applied in the fields of voice signal processing, network communication, audio and video processing and cryptography due to its powerful function and flexible design [21]. Bahoura et al. [22] designed an adaptive noise elimination system based on FPGA. They were successfully applied to

remove white noise from electrocardiograms and speech signals. However, it has the problem of low adaptive filter rate. Yamazaki I [23] et al. analyzed the execution sequence of FxLMS algorithm. They point out the limitations of the FxLMS algorithm in mapping to hardware circuits. The most common transformation is the delay FxLMS (DFxLMS) algorithm [24]. DFxLMS algorithm is more consistent with the design idea of hardware circuit. Mohanty et al. [25] proposed a delayed FxLMS and delayed FxLMS algorithm to solve the problem of low error calculation efficiency in the air-electrical interface of active Noise Control (ANC). Although the above structure solves the problem of low rate FxLMS adaptive filters. However, the filter design introduces delay. This causes the output of the system to lag relatively. The output hysteresis becomes more prominent with the increase of filter tap coefficient. Dong et al. [26] proposed a Systolic FxLMS algorithm structure based on FPGA. They use pipelining technology to make the pulsating FxLMS architecture have high throughput and good scalability. This structure reduces the amount of adaptive delay in the adaptive filter and improves the convergence of the filter algorithm. This solves the problem of large adaptive delay in the hardware structure of DFxLMS algorithm mapping. But it has the problem of relative lag of system output, and the lag brought by the structure is proportional to the length of the filter.

The structure proposed by the above researchers mainly has the problems of excessive adaptive delay and system output lag. The number of adaptive delays and the lag of the system are directly proportional to the order of the filter [27], which leads to the decrease of the convergence of the algorithm.

To sum up, this paper studies the feasibility of feedforward FxLMS algorithm in active noise reduction headphones, the purpose of this paper is to optimize the algorithm and hardware structure of adaptive filter. A hardware adaptive filtering algorithm is studied. It proposes a fine-grained retiming DFxLMS (HS-TF-RDFXLMS) filter in the form of hardware-shared transpose. From the point of view of algorithm, the algorithm of delay decomposition is used to solve the problem of too large adaptive delay and system output lag. In addition, from the point of view of hardware design and algorithm convergence unchanged, the critical path is reduced, the number of registers in the whole circuit is reduced by optimizing the circuit module, and the clock speed of data processing is increased. The area/speed tradeoff of filter is realized by hardware sharing.

## II. HIGH SPEED FINE GRANULARITY RETIMING DFXLMS ALGORITHM

#### A. DFxLMS adaptive filter

DFxLMS algorithm is an algorithm with hardware thinking. It is very suitable for highly pipelined adaptive digital filter implementation. At present, the main challenge is to use these delays as a pipelining method for DFxLMS filters. This determines the amount of delay required to implement the circuit pipelining. Because if m is too low, the circuit will slow down and m is too high, which leads to slow convergence and poor tracking ability. The architecture based on DFxLMS is characterized by (1) short critical path but slow convergence and poor tracking performance (2) fast convergence and good tracking performance but long critical path. Therefore, while the algorithm performance is improved and the critical path is not reduced, the retiming technique is used to reallocate delay units in the entire circuit architecture, which achieves complete pipelining.

The coefficient updating equation of DFxLMS algorithm is shown in Formula (1)

$$\boldsymbol{W}(n+1) = \boldsymbol{W}(n) - 2\mu e(n-m)\boldsymbol{x}'(n-m)$$
(1)

Error signals of DFxLMS algorithm are shown in Formula (2)

$$\boldsymbol{e}(n-m) = \boldsymbol{d}(n-m) - \boldsymbol{y}_s(n-m) \tag{2}$$

The secondary signal is the filter output, which is calculated from the reference signal. The specific details are shown in Formula (3)

$$\mathbf{y}(n) = \mathbf{X}(n)\mathbf{W}^{T}(n) = \sum_{i=0}^{N-1} \mathbf{w}_{i}(n)x(n-i)$$
 (3)

Where *N* is the length of the filter, *m* is the number of delay units,  $\mu$  is the step length of the adaptive filter, and y(n) is the output of the adaptive filter.



Figure 2 is a schematic description of retiming of inserting a cut set. It divides the system into sfG-1 and SFG-1. It can select one side of the cut set to enter and exit the boundary. D represents a delay. Retiming technology is the most effective way to improve clock speed. Map a circuit G to a retiming circuit  $G_r$ . The weight calculation of edges in the figure is

$$w_r(e) = w(e) + r(V) - r(U)$$
 (4)

Wherein, r(V) is the value of V of each node in the figure, w(e) is the weight of e of the edge of the original figure G, and  $w_r(e)$  is the number of e of the edge of Figure  $G_r$  after retiming.

shown in Formula (4)

Retiming is a transformation technology [28,29]. It is used to change the position of delay elements in the circuit structure without affecting the input and output characteristics of the circuit. For a retiming diagram to be feasible,  $w_r(e) \ge 0$  must remain true for all edges e in  $G_r$ . Let  $e_{1,2}$  represent an edge from  $G_1$  to  $G_2 \cdot e_{2,1}$  is the edge from  $G_2$  to  $G_1$ .

k delays are added to each edge from  $G_1$  to  $G_2$ , as shown in Formula (5)

$$w_r(e_{1,2}) \ge 0 \Longrightarrow w(e_{1,2}) + k \ge 0$$
 (5)

Similarly, k delays are subtracted for each edge  $e_{2,1}$  from  $G_2$  to  $G_1$ , as shown in Formula (6)

$$w_r(e_{2,1}) \ge 0 \Longrightarrow w(e_{2,1}) - k \ge 0$$
 (6)

Combine equations (5) and (6) and consider all edges of the cut set, as shown in formula (7)

$$-\min_{G_1 \xrightarrow{e} \to G_2} \{w(e)\} \le k \le \min_{G_2 \xrightarrow{e} \to G_1} \{w(e)\}$$
(7)

k is the delay in the retiming circuit. The value range is  $0 \le k \le 1$ .

Figure 3 shows the three retiming processes of the DF-DFXLMS adaptive filter. These three processes have a certain sequence. Process ① the FIR filter module retiming operation. Because m delay units are added at the error signal input and expected signal output. 0.25m delay units are mapped to FIR filters. Another 0.25m delay units are mapped to the output of the filter. After a round of retiming operation, the critical path of the FIR circuit of the adaptive filter is the delay of a multiplier. Procedures 2 and 3 retiming the weight update module and the secondary path module. The 0.25m delay units of the filter input signal are mapped to the weight update part and the secondary path part respectively. This makes the critical path of the entire circuit a multiplier. After retiming, the delay unit of DFxLMS adaptive filter decreases from 0.5*m* . Critical path reduced from т to  $T_{mult} + (\log_2^N + 1)T_{add}$  to  $T_{mult} + T_{add}$ .



Figure 3 shows a large number of registers in the circuit. Therefore, reducing the number of registers while reducing the critical path is an urgent problem to be solved.

#### B. TF-RDFxLMS Self-adapting filter

In order to reduce the number of registers in the adaptive filter and minimize the clock cycle of the circuit in the retiming, register minimization [30,31] is applied to the circuit design.

The number of registers required to realize the output edge of node V in retiming is shown in Formula (8)

$$Rv = \max_{V \xrightarrow{e} ?} \{w_r(e)\}$$
(8)

In the circuit after retiming, the cost of the total register is shown in Formula (9)

$$COST = \sum R \mathbf{v} \tag{9}$$

When the clock cycle meets the constraint condition, the circuit node in Figure 3 needs to meet the condition. Specific details are shown in formulas (10), (11), (12), (13), (14), (15) and (16)

$$r(1)=r(7)=r(13)...=r(6N+3)=0$$
(10)

$$r(2)=r(8)=r(14)....=r(6N+4)=1$$
(11)

$$r(3)=r(9)=r(15)...=r(6N+5)=0$$
(12)

$$r(4)=r(10)=r(16)....=r(6N+6)=-1$$
(13)

$$r(5)=r(11)=r(17)....=r(6N+7)=1$$
(14)

$$r(6) = r(12) = r(18) \dots = r(6N+8) = 0$$
(15)

$$r(19)=1$$
  $r(20)=-2$  (16)

Figure 4 shows the hardware architecture diagram of high-speed retiming TF-RDFXLMS algorithm implemented on FPGA. Where r(20)=-2 represents two delays from each output edge to each input edge of node 20.



Figure 4. TF-RDFXLMS adaptive filter

After register minimization, the number of registers in the circuit is reduced from  $7N + 2\log_2^{N-1}$  to  $3N + 3\log_2^N + 10$ . The number of registers can be reduced by optimizing circuit submodule under the condition of constant critical path. This architecture facilitates implementation with fewer resources.

#### III. HS-TF-RDFxLMS ARCHITECTURE DESIGN

The high speed retiming TF-DFXLMS filter shortens the critical path and improves the clock speed of the system. Register minimization reduces the number of registers in the entire circuit. But it increases the clock speed and increases the consumption of hardware resources. Therefore, a fine-grained retiming filter (HS-TF-RDFXLMS) in the form of hardware-shared transpose is designed to achieve the area/speed tradeoff of TF-RDFXLMS filter.

#### A. Hardware architecture derivation

Figure 5 shows the block diagram of HS-TF-RDFXLMS algorithm. The adaptive filter and the secondary path are

transposed. The adaptive delay is 2. The architecture diagram is mainly composed of adaptive filtering module, error calculation module, weight update module and secondary path module. It realizes the audio noise reduction function in active noise control. The adaptive filtering module is mainly used to complete filtering calculation. The module adopts FIR structure. Because FIR filter has the characteristics of fast convergence speed and small steady-state error; Error calculation module is mainly composed of a transposed FIR filter and a subtracter. The multiplication part is responsible for calculating the multiplication of N weights and N corresponding input sample values. The main hardware structure of weight update module mainly depends on the choice of adaptive filtering algorithm. It consists of N carry adders. This is used to update N weight coefficients. Where the convergence factor is 2 to the negative integer power. It is implemented by shifting the corresponding multiplication operation. This can greatly reduce computation and latency. The main function of the secondary path module is to correct the error gradient estimate of the LMS algorithm. Generally, the FIR filter based on LMS algorithm is used for model adaptive identification.



The architecture combines four taps into 4Tapx, and four taps PM (0), PM(1), PM(2), and PM(3) into a single resource, 4Tap0.It combines the four taps PM(4), PM(5), PM(6), and PM(7) into a single resource, 4Tap1.The system is divided into two groups. The first group arranged 4Tap0 to execute clock cycles 0, 2, 4, and 6. The second group arranged 4Tap1 to execute clock cycles 1, 3, 5, and 7. Adaptive filter circuit is designed by hardware sharing. Speed/area tradeoffs are achieved by saving hardware resources while ensuring clock speed.

The signal received by the HS-TF-RDFXLMS error sensor is shown in Formula (17)

$$e(n-2) = d(n-2) - y_s(n-2)$$
  
=  $d(n-2) - s(n) * [w^T(n-2)x(n-2)]$  (17)  
=  $d(n-2) - w^T(n-2)x'(n-2)$ 

Where, d(n-2) is the main noise signal after the addition of adaptive delay, s(n) is the secondary path estimation signal, \* represents convolution operation, and y'(n-2) = s(n) \* y(n-2) is the output signal after filtering. The weight coefficient and reference input signal of the transverse filter at *n* are shown in Formula (18) and (19)

$$\boldsymbol{W}(n) = [w_L(n), \dots, w_2(n), w_1(n)]^T$$
(18)

$$X(n) = [x(n), \dots, x(n-L+2), x(n-L+1)]^{T}$$
(19)

Thus, formula (17) is rewritten as

$$\boldsymbol{e}(n-2) = \boldsymbol{d}(n-2) - \sum_{i=0}^{N-1} x'(n-i-2) w_i(n-i-2)$$
(20)

According to the principle of the steepest descent method, the filter coefficient is recursive, and it is the smallest under the

mean square criterion. The weight update equation is shown in Formula (21)

$$\mathbf{W}(n+1) = \mathbf{W}(n) - \mu \nabla(n) e^2(n) \tag{21}$$

Where,  $\mu$  is the convergence coefficient, which is the parameter that controls the stability and convergence speed, the gradient is  $\nabla(n)$ ,  $\nabla$  is the gradient operator, and the defined column vector is shown in formula (22)

$$\boldsymbol{\nabla} = \begin{bmatrix} \frac{\partial}{\partial w_L} \dots \frac{\partial}{\partial w_2} & \frac{\partial}{\partial w_1} \end{bmatrix}^T$$
(22)

The *i* th element of the gradient vector  $\nabla e^2(n)$  is

$$\frac{\partial e^2(n)}{\partial w_i} = 2e(n)\frac{\partial e(n)}{\partial w_i}$$
(23)

Substitute equation (17) into equation (23), get

$$\boldsymbol{\nabla} e^2(n) = -2e(n-2)\boldsymbol{x}'(n-2) \tag{24}$$

Substitute Equation (24) into Equation (21), get

$$W(n+1) = W(n) - 2\mu e(n-2)x'(n-2)$$
(25)

When the tap length of the adaptive filter is long enough, the step size limit of HS-TF-RDFXLMS algorithm is shown in Formula (24)

$$0 < \mu < \frac{1}{\lambda_{\max}} \sin \frac{\pi}{10}$$
 (26)

Where,  $\lambda_{max}$  is the maximum eigenvalue of the auto-correlation matrix of filter-X signal.

#### B. TF-RDFxLMS Filter

Figure 6 shows the structure diagram of (a) TF-RDFXLMS PM and (b) TF-RDFXLMS Hardware Shared filter.PM structure is mainly composed of three adders, three multipliers, six registers, three switches and a gate. It uses a pulsating array design structure. The structure of the whole circuit is not symmetrical. This design idea is beneficial to the subsequent wiring operation. In the PM structure, the filter weights are locally updated. It can increase the order of TF-RDFXLMS filters by adding more PM modules. This does not change the size of the filter critical path. The HS-TF-RDFXLMS filter structure is composed of [(N-1)/2] exactly the same Processing Module (PM), an adder, a multiplier and a delay unit in (b), which achieves speed/area balance.



#### IV. EXPERIMENTAL RESULTS AND ANALYSIS

This paper mainly designs six kinds of adaptive filter structures, including DF-FXLMS filter, DF-DFXLMS filter, DF-RDFXLMS filter, Systolic FxLMS filter, TF-RDFXLMS filter and HS-TF-RDFXLMS filter. The first three structures all belong to the direct filter structure, and the last three structures belong to the transpose filter structure.

| i dolo i. i inic vs. naraware resource complexity |
|---------------------------------------------------|
|---------------------------------------------------|

| Design             | Critical noth                                      | Adaptive delay | Latency        | calculated amount |                    |                                                           |  |
|--------------------|----------------------------------------------------|----------------|----------------|-------------------|--------------------|-----------------------------------------------------------|--|
| Design             | Ciffical path                                      |                |                | adding device     | multiplying device | register                                                  |  |
| DF-FxLMS           | $3T_{\text{MULT}} + (N+1)T_{\text{ADD}}$           | 0              | 0              | 3 <i>N</i> -1     | 3 <i>N</i> +1      | 2N-1                                                      |  |
| DF-DFxLMS[17]      | $T_{\text{MULT}} + ([\log_2^N] + 1)T_{\text{ADD}}$ | $log_2^N+2$    | Ν              | 3 <i>N</i> -1     | 3 <i>N</i> +1      | $7N + 2\log_2^N - 1$                                      |  |
| DF-RDFxLMS         | $T_{\text{MULT}}+3T_{\text{ADD}}$                  | N+1            | N/2            | 3 <i>N</i> -1     | 3 <i>N</i> +1      | 5 <i>N</i> -4                                             |  |
| Systolic-FxLMS[18] | $2T_{\text{MULT}}+2T_{\text{ADD}}$                 | N/4+3          | $2 + \log_2^N$ | 3N+1              | 3N+1               | 4 <i>N</i> -2                                             |  |
| TF-RDFxLMS         | $T_{ m MULT}$                                      | 2              | 2              | 3N+1              | 3 <i>N</i> +1      | 3N+3log2 <sup>N</sup> +10                                 |  |
| HS-TF-RDFxLMS      | $T_{ m MULT}$                                      | 2              | 2              | 1.5 <i>N</i> +1   | 1.5N+1             | 1.5 <i>N</i> +1.5 log <sub>2</sub> <sup><i>N</i></sup> +5 |  |

Table 1 shows the time and hardware complexity of the proposed architectural design compared to other designs. As can be seen from the table, the critical path of this article is  $3T_{mult} + (N+1)T_{add}$  smaller than that of DF-FXLMS filter. It is  $T_{mult} + 2T_{add}$  shorter in critical path than Systolic-FxLMS filter. The number of adaptive delays decreases from N/4+3 to 2. Compared with TF-RDFXLMS filter, the calculation of HS-TF-RDFXLMS filter is reduced.

The mapping between algorithm and hardware structure is not completely corresponding. The same algorithm can be implemented by a variety of different hardware structures. A fixed hardware structure can only correspond to a specific function description, that is to say, the hardware structure determines the algorithm structure. since the hardware structure determines the algorithm structure, the proposed algorithm is simulated and analyzed to verify the superiority of the hardware architecture. In order to verify the convergence of the proposed algorithm, MATLAB2019b is used for modeling and simulation. The input signal is a mixture of sinusoidal signal and white gaussian noise with a SNR of 15dB. The order of the filter is 8. The convergence factor of the adaptive filter with direct structure is  $\mu = 10^{-3}$ . The convergence factor of the transposed adaptive filter is  $\mu = 4*10^{-3}$ .



The convergence characteristics of some typical adaptive filters are shown in Figure 7 and compared with the filtering algorithm designed in this paper. Simulation results show that the proposed algorithm has better convergence than other designs. It starts converging around 1000 iterations. However, the Systolic FxLMS algorithm and DF-DFXLMS algorithm began to converge after about 2000 and 3500 iterations. The convergence speed of the proposed algorithm is 2 times and 3.5 times that of Systolic-FxLMS and DF-DFXLMS. Verilog hardware description language is used in the hardware circuits of six adaptive filters designed in this paper. Vivado software tool is used to implement it on Xilinx Artix7 FPGA development platform. Table 2 shows the performance comparison of the six adaptive filters in terms of hardware resources, clock speed and power consumption.

| Table 2. Performance comparison of adaptive filters |          |               |            |                     |            |               |  |  |  |
|-----------------------------------------------------|----------|---------------|------------|---------------------|------------|---------------|--|--|--|
| parameter                                           | DF-FxLMS | DF-DFxLMS[17] | DF-RDFxLMS | Systolic-FxLMS [18] | TF-RDFxLMS | HS-TF-RDFxLMS |  |  |  |
| Slices                                              | 232      | 386           | 428        | 361                 | 467        | 381           |  |  |  |
| Flip-flops                                          | 256      | 341           | 376        | 384                 | 459        | 329           |  |  |  |
| 4-LUTs                                              | 773      | 795           | 847        | 739                 | 757        | 674           |  |  |  |
| Bonded IOBs                                         | 49       | 49            | 49         | 49                  | 49         | 49            |  |  |  |
| DSP                                                 | 17       | 17            | 17         | 17                  | 17         | 17            |  |  |  |
| Max frequency of operation (MHz)                    | 55.261   | 89.646        | 103.380    | 131.891             | 146.092    | 139.684       |  |  |  |
| Max comb path<br>delay(ns)                          | 18.096   | 11.155        | 9.673      | 7.582               | 6.845      | 7.159         |  |  |  |
| Total power<br>consumption (mW)                     | 144.62   | 146.27        | 148.18     | 149.35              | 152.49     | 150.73        |  |  |  |

It can be seen from Table 2 that the maximum clock speed of HS-TF-RDFXLMS filter proposed in this paper is 139.684MHz. The critical path delay is 6. 371ns. The maximum clock speed of the Systolic FxLMS filter and the DF-DFXLMS filter is 131.891 MHz and 103.380MHz respectively. The critical path delay is 7.582ns and 9.673ns, respectively. Compared with Systolic-FxLMS algorithm and DFxLMS algorithm, the maximum clock speed is improved by 15.97% and 34.13% respectively. Compared with Systolic-FxLMS and DFxLMS, the critical path of the proposed algorithm is 1.211ns and 3.302ns, respectively. This shows that the hardware architecture proposed in this paper has faster processing speed than the traditional FxLMS hardware architecture. The maximum clock speed of HS-TF-RDFXLMS filter is reduced by 4.386% compared with TF-RDFXLMS filter. However, the resources of LUT and FF were saved by 10.964% and 28.322% respectively. It implements the area/speed tradeoff of the HS-TF-RDFXLMS filter. This will be beneficial to the implementation of feedforward FxLMS algorithm in active noise reduction headphones, and lay a foundation for the research of multi-channel ANC system.

## V. CONCLUSION

This paper studies DxLMS algorithm and analyzes the existing problems deeply. It proposes a fine-grained retiming DFxLMS (HS-TF-RDFXLMS) filter in the form of hardware-shared transpose. This ensures the convergence of the algorithm and reduces the size of the critical path. Meanwhile, the area/speed tradeoff of TF-RDFXLMS filter is realized. From the algorithm point of view, the convergence speed of the proposed algorithm is 2 times and 3.5 times that of the Systolic FxLMS algorithm and DFxLMS algorithm. Compared with Systolic FxLMS and DFxLMS, the maximum clock speed of the proposed structure is improved by 15.97% and 34.13%, respectively. Compared with Systolic FxLMS and DFxLMS, the critical path is shortened by 1.211ns and 3.302ns

respectively. The HS-TF-RDFXLMS filter achieves an area/speed tradeoff. The filter has high efficiency and parallel processing capability on FPGA, which improves the convergence and clock speed of the system.

#### ACKNOWLEDGMENT

This research was supported by the Science and Technology Major Project of Chongqing Municipal Science and Technology Bureau (cstc2018jszx-cyztzxX0054), and the Chongqing Municipal Science and Technology Commission Major Project of Integrated Circuit Industry (cstc2018jszx-cyztzx0217)

## References

- [1] Saravanan V, Santhiyakumari N. An Active Noise Control System for Impulsive Noise Using Soft Threshold FxLMS Algorithm with Harmonic Mean Step Size [J]. Wireless Personal Communications, 2019, 109 (4): 2263-2276.
- [2] Zhang S, Wang Y S, Guo H, et al. A normalized frequency-domain block filtered-x LMS algorithm for active vehicle inter noise control [J]. Mechanical Systems and Signal Processing, 2019, 120: 150-165.
- [3] Shi D, Gan W S, Lam B, et al. Feedforward selective fixed-filter active noise control: Algorithm and implementation [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1479-1492.
- [4] Ho C Y, Shyu K K, Chang C Y, et al. Efficient narrowband noise cancellation system using adaptive line enhancer [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1094-1103.
- [5] Huang C R, Chang C Y, Kuo S M. Implementation of Feedforward Active Noise Control Techniques for Headphones [C]//2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2020: 293-296.
- [6] Rivera Benois P, Roden R, Blau M, et al. Optimization of a Fixed Virtual Sensing Feedback ANC Controller for In-Ear

Headphones with Multiple Loudspeakers [J]. arXiv e-prints, 2021: ArXiv: 2110.03586.

[7] Meng F, Yu A, Fernandez D. Sound leakage investigation of ANC headphones using particle velocity sensors [C]//INTER-NOISE and NOISE-CON Congress and Conference Proceedings. Institute of Noise Control Engineering, 2020, 261 (6): 698-708.

DOI: 10.46300/9106.2022.16.94

- [8] Niu F, Qiu X, Zhang D. Effects of active noise cancelling headphones on speech recognition [J]. Applied Acoustics, 2020, 165: 107335.
- [9] HUANG H, LIU Y. Design of LMS Algorithm Based on Adaptive Noise Cancellation Device and Implementation [J]. Guangxi Communication Technology, 2011, 4.
- [10] Ho C Y, Shyu K K, Chang C Y, et al. Efficient narrowband noise cancellation system using adaptive line enhancer[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1094-1103.
- [11] Akhtar M T. On Active Impulsive Noise Control (AINC) Systems: Developing a Filtered-Reference Adaptive Algorithm Using a Convex-Combined Normalized Step-Size Approach[J]. Circuits, Systems & Signal Processing, 2020, 39(9).
- [12] Akhtar M T. A time-varying normalized step-size based generalized fractional moment adaptive algorithm and its application to ANC of impulsive sources[J]. Applied Acoustics, 2019, 155: 240-249.
- [13] Song P, Zhao H. Filtered-x least mean square/fourth (FXLMS/F) algorithm for active noise control[J]. Mechanical Systems and Signal Processing, 2019, 120: 69-82.
- [14] Ardekani I T, Abdulla W H. Theoretical convergence analysis of FxLMS algorithm[J]. Signal Processing, 2010, 90(12): 3046-3055.
- [15] Abdi F, Amiri P. Design and implementation of adaptive FxLMS on FPGA for online active noise cancellation[J]. Journal of the Chinese Institute of Engineers, 2018, 41(2): 132-140.
- [16] Nejevenko E S, Sotnikov A A. Adaptive modeling for hydroacoustic signal processing[J]. Pattern Recognition and Image Analysis, 2006, 16(1): 5-8.
- [17] Yu R, Song Y, Nambiar M. Fast system identification using prominent subspace LMS[J]. Digital Signal Processing, 2014, 27: 44-56.
- [18] Sayed A H. Fundamentals of adaptive filtering[M]. John Wiley & Sons, 2003.
- [19] Diggikar A B, Ardhapurkar S S. Design and Implementation of Adaptive filtering algorithm for Noise Cancellation in speech signal on FPGA[C]//2012 International Conference on Computing, Electronics and Electrical Technologies (ICCEET). IEEE, 2012: 766-771.
- [20] Thilagam S, Karthigaikumar P. Implementation of adaptive FPGA noise canceller using for real-time applications[C]//2015 2nd International Conference on Electronics and Communication Systems (ICECS). IEEE, 2015: 1711-1714.
- [21] Nekouei F, Talebi N Z, Kavian Y S, et al. FPGA implementation of LMS self correcting adaptive filter (SCAF) and hardware analysis[C]//2012 8th International

Symposium on Communication Systems, Networks & Digital Signal Processing (CSNDSP). IEEE, 2012: 1-5.

- [22] Bahoura M, Ezzaidi H. FPGA-implementation of parallel and sequential architectures for adaptive noise cancelation[J]. Circuits, Systems, and Signal Processing, 2011, 30(6): 1521-1548.
- [23] Yamazaki I, Tomov S, Dongarra J. Stability and performance of various singular value QR implementations on multicore CPU with a GPU[J]. ACM Transactions on Mathematical Software (TOMS), 2016, 43(2): 1-18.
- [24] Goel P, Chandra M. FPGA Implementation of Adaptive Filtering Algorithms for Noise Cancellation-A Technical Survey[C]//Proceedings of the Third International Conference on Microelectronics, Computing and Communication Systems. Springer, Singapore, 2019: 517-526.
- [25] Mohanty B K, Singh G, Panda G. Hardware design for VLSI implementation of FxLMS-and FsLMS-based active noise controllers[J]. Circuits, Systems, and Signal Processing, 2017, 36(2): 447-473.
- [26] Shi D, Shi C, Gan W S. A systolic FxLMS structure for implementation of feedforward active noise control on FPGA[C]//2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, 2016: 1-6.
- [27] Park S Y, Meher P K. Low-power, high-throughput, and low-area adaptive FIR filter based on distributed arithmetic[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2013, 60(6): 346-350.
- [28] Jalaja S, AM V P. Different retiming transformation technique to design optimized low power VLSI architecture[J]. AIMS Electronics and Electrical Engineering, 2018, 2(4): 117-130.
- [29] Goel P, Chandra M. VLSI implementations of retimed high speed adaptive filter structures for speech enhancement[J]. Microsystem Technologies, 2018, 24(12): 4799-4806.
- [30] Yagain D, Vijaya K A. FIR filter design based on retiming automation using VLSI design metrics[C]//2013 International Conference on Technology, Informatics, Management, Engineering and Environment. IEEE, 2013: 17-22.
- [31] Joy A, Vinitha C S. Folding and Register Minimization Transformation on DSP Filter[C]//2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON). IEEE, 2018: 1-6.



Jun Yuan, received B.E. and M.E. degrees in Electrical Engineering in 2006, 2009 respectively, from Southwest Jiaotong University, China. And then in 2012 he received D.Eng. degree from Kochi University of Technology, Japan. Then he joined School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, China. His areas of research interests are analog-digital mixed signal IC design, DFT research and noise processing IC design.

# Creative Commons Attribution License 4.0 (Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en\_US