Verilog FFT 设计
FFTFast Fourier Transform快速傅立叶变换是一种 DFT离散傅里叶变换的高效算法。在以时频变换分析为基础的数字处理方法中有着不可替代的作用。FFT 原理公式推导DFT 的运算公式为其中将离散傅里叶变换公式拆分成奇偶项则前 N/2 个点可以表示为同理后 N/2 个点可以表示为由此可知后 N/2 个点的值完全可以通过计算前 N/2 个点时的中间过程值确定。对 A[k] 与 B[k] 继续进行奇偶分解直至变成 2 点的 DFT这样就可以避免很多的重复计算实现了快速离散傅里叶变换FFT的过程。算法结构8 点 FFT 计算的结构示意图如下。由图可知只需要简单的计算几次乘法和加法便可完成离散傅里叶变换过程而不是对每个数据进行繁琐的相乘和累加。重要特性(1) 级的概念每分割一次称为一级运算。设 FFT 运算点数为 N共有 M 级运算则它们满足每一级运算的标识为 m 0, 1, 2, ..., M-1。为了便于分割计算FFT 点数 N 的取值经常为 2 的整数次幂。(2) 蝶形单元FFT 计算结构由若干个蝶形运算单元组成每个运算单元示意图如下蝶形单元的输入输出满足其中每一个蝶形单元运算时进行了一次乘法和两次加法。每一级中均有 N/2 个蝶形单元。故完成一次 FFT 所需要的乘法次数和加法次数分别为(3) 组的概念每一级 N/2 个蝶形单元可分为若干组每一组有着相同的结构与因子分布。例如 m0 时可以分为 N/24 组。m1 时可以分为 N/42 组。mM-1 时此时只能分为 1 组。(4)因子分布因子存在于 m 级其中。在 8 点 FFT 第二级运算中即 m1 蝶形运算因子可以化简为(5) 码位倒置对于 N8 点的 FFT 计算X(0) ~ X(7) 位置对应的 2 进制码为X(000), X(001), X(010), X(011), X(100), X(101), X(110), X(111)将其位置的 2 进制码进行翻转X(000), X(100), X(010), X(110), X(001), X(101), X(011), X(111)此时位置对应的 10 进制为X(0), X(4), X(2), X(6), X(1), X(5), X(3), X(7)恰好对应 FFT 第一级输入数据的顺序。该特性有利于 FFT 的编程实现。FFT 设计设计说明为了利用仿真简单的说明 FFT 的变换过程数据点数取较小的值 8。如果数据是串行输入需要先进行缓存所以设计时数据输入方式为并行。数据输入分为实部和虚部共 2 部分所以计算结果也分为实部和虚部。设计采用流水结构暂不考虑资源消耗的问题。为了使设计结构更加简单这里做一步妥协乘法计算直接使用乘号。如果 FFT 设计应用于实际一定要将乘法结构换成可以流水的乘法器或使用官方提供的效率较高的乘法器 IP。蝶形单元设计蝶形单元为定点运算需要对旋转因子进行定点量化。借助 matlab 将旋转因子扩大 8192 倍左移 13 位可参考附录。为了防止蝶形运算中的乘法和加法导致位宽逐级增大每一级运算完成后要对输出数据进行固定位宽的截位也可去掉旋转因子倍数增大而带来的影响。 代码如下实例timescale 1ns/100ps/**************** butter unit *************************Xm(p) ------------------------ Xm1(p)- -- --- -- -Xm(q) ------------------------ Xm1(q)Wn -1*//////////////////////////////////////////////////////module butterfly(input clk,input rstn,input en,input signed [23:0] xp_real, // Xm(p)input signed [23:0] xp_imag,input signed [23:0] xq_real, // Xm(q)input signed [23:0] xq_imag,input signed [15:0] factor_real, // Wnrinput signed [15:0] factor_imag,output valid,output signed [23:0] yp_real, //Xm1(p)output signed [23:0] yp_imag,output signed [23:0] yq_real, //Xm1(q)output signed [23:0] yq_imag);reg [4:0] en_r ;always (posedge clk or negedge rstn) beginif (!rstn) beginen_r b0 ;endelse beginen_r {en_r[3:0], en} ;endend//////(1.0) Xm(q) mutiply and Xm(p) delayreg signed [39:0] xq_wnr_real0;reg signed [39:0] xq_wnr_real1;reg signed [39:0] xq_wnr_imag0;reg signed [39:0] xq_wnr_imag1;reg signed [39:0] xp_real_d;reg signed [39:0] xp_imag_d;always (posedge clk or negedge rstn) beginif (!rstn) beginxp_real_d b0;xp_imag_d b0;xq_wnr_real0 b0;xq_wnr_real1 b0;xq_wnr_imag0 b0;xq_wnr_imag1 b0;endelse if (en) beginxq_wnr_real0 xq_real * factor_real;xq_wnr_real1 xq_imag * factor_imag;xq_wnr_imag0 xq_real * factor_imag;xq_wnr_imag1 xq_imag * factor_real;//expanding 8192 times as Wnrxp_real_d {{4{xp_real[23]}}, xp_real[22:0], 13b0};xp_imag_d {{4{xp_imag[23]}}, xp_imag[22:0], 13b0};endend//(1.1) get Xm(q) mutiplied-results and Xm(p) delay againreg signed [39:0] xp_real_d1;reg signed [39:0] xp_imag_d1;reg signed [39:0] xq_wnr_real;reg signed [39:0] xq_wnr_imag;always (posedge clk or negedge rstn) beginif (!rstn) beginxp_real_d1 b0;xp_imag_d1 b0;xq_wnr_real b0 ;xq_wnr_imag b0 ;endelse if (en_r[0]) beginxp_real_d1 xp_real_d;xp_imag_d1 xp_imag_d;//提前设置好位宽余量防止数据溢出xq_wnr_real xq_wnr_real0 - xq_wnr_real1 ;xq_wnr_imag xq_wnr_imag0 xq_wnr_imag1 ;endend//////(2.0) butter resultsreg signed [39:0] yp_real_r;reg signed [39:0] yp_imag_r;reg signed [39:0] yq_real_r;reg signed [39:0] yq_imag_r;always (posedge clk or negedge rstn) beginif (!rstn) beginyp_real_r b0;yp_imag_r b0;yq_real_r b0;yq_imag_r b0;endelse if (en_r[1]) beginyp_real_r xp_real_d1 xq_wnr_real;yp_imag_r xp_imag_d1 xq_wnr_imag;yq_real_r xp_real_d1 - xq_wnr_real;yq_imag_r xp_imag_d1 - xq_wnr_imag;endend//(3) discard the low 13bits because of Wnrassign yp_real {yp_real_r[39], yp_real_r[1323:13]};assign yp_imag {yp_imag_r[39], yp_imag_r[1323:13]};assign yq_real {yq_real_r[39], yq_real_r[1323:13]};assign yq_imag {yq_imag_r[39], yq_imag_r[1323:13]};assign valid en_r[2];endmodule顶层例化根据 FFT 算法结构示意图将蝶形单元例化完成最后的 FFT 功能。可根据每一级蝶形单元的输入输出对应关系依次手动例化 12 次也可利用 generate 进行例化此时就需要非常熟悉 FFT 中组和级的特点(1) 8 点 FFT 设计需要 3 级运算每一级有 4 个蝶形单元每一级的组数目分别是 4、2、1。(2) 每一级的组内一个蝶形单元中两个输入端口的距离恒为m 为级标号对应左移运算 1m组内两个蝶形单元的第一个输入端口间的距离为 1。(3) 每一级相邻组间的第一个蝶形单元的第一个输入端口的距离为对应左移运算 2m。例化代码如下。其中矩阵信号 xm_realxm_imag的一维、二维地址是代表级和组的标识。在判断信号端口之间的连接关系时使用了看似复杂的判断逻辑而且还带有乘号其实最终生成的电路和手动编写代码例化 12 个蝶形单元的方式是完全相同的。因为 generate 中的变量只是辅助生成实际的电路相关值的计算判断都已经在编译时完成。这些变量更不会生成实际的电路只是为更快速的模块例化提供了一种方法。实例timescale 1ns/100psmodule fft8 (input clk,input rstn,input en,input signed [23:0] x0_real,input signed [23:0] x0_imag,input signed [23:0] x1_real,input signed [23:0] x1_imag,input signed [23:0] x2_real,input signed [23:0] x2_imag,input signed [23:0] x3_real,input signed [23:0] x3_imag,input signed [23:0] x4_real,input signed [23:0] x4_imag,input signed [23:0] x5_real,input signed [23:0] x5_imag,input signed [23:0] x6_real,input signed [23:0] x6_imag,input signed [23:0] x7_real,input signed [23:0] x7_imag,output valid,output signed [23:0] y0_real,output signed [23:0] y0_imag,output signed [23:0] y1_real,output signed [23:0] y1_imag,output signed [23:0] y2_real,output signed [23:0] y2_imag,output signed [23:0] y3_real,output signed [23:0] y3_imag,output signed [23:0] y4_real,output signed [23:0] y4_imag,output signed [23:0] y5_real,output signed [23:0] y5_imag,output signed [23:0] y6_real,output signed [23:0] y6_imag,output signed [23:0] y7_real,output signed [23:0] y7_imag);//operating datawire signed [23:0] xm_real [3:0] [7:0];wire signed [23:0] xm_imag [3:0] [7:0];wire en_connect [15:0] ;assign en_connect[0] en;assign en_connect[1] en;assign en_connect[2] en;assign en_connect[3] en;//factor, multiplied by 0x2000wire signed [15:0] factor_real [3:0] ;wire signed [15:0] factor_imag [3:0];assign factor_real[0] 16h2000; //1assign factor_imag[0] 16h0000; //0assign factor_real[1] 16h16a0; //sqrt(2)/2assign factor_imag[1] 16he95f; //-sqrt(2)/2assign factor_real[2] 16h0000; //0assign factor_imag[2] 16he000; //-1assign factor_real[3] 16he95f; //-sqrt(2)/2assign factor_imag[3] 16he95f; //-sqrt(2)/2//输入初始化和码位有关倒置assign xm_real[0][0] x0_real;assign xm_real[0][1] x4_real;assign xm_real[0][2] x2_real;assign xm_real[0][3] x6_real;assign xm_real[0][4] x1_real;assign xm_real[0][5] x5_real;assign xm_real[0][6] x3_real;assign xm_real[0][7] x7_real;assign xm_imag[0][0] x0_imag;assign xm_imag[0][1] x4_imag;assign xm_imag[0][2] x2_imag;assign xm_imag[0][3] x6_imag;assign xm_imag[0][4] x1_imag;assign xm_imag[0][5] x5_imag;assign xm_imag[0][6] x3_imag;assign xm_imag[0][7] x7_imag;//butter instantiaiton//integer index[11:0] ;genvar m, k;generate//3 stagefor(m0; m2; mm1) begin: stagefor (k0; k3; kk1) begin: unitbutterfly u_butter(.clk (clk ) ,.rstn (rstn ) ,.en (en_connect[m*4 k] ) ,//是否再组内组编号组内编号下组编号新组内编号.xp_real (xm_real[ m ] [k[m:0] (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))] ),.xp_imag (xm_imag[ m ] [k[m:0] (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))] ),.xq_real (xm_real[ m ] [(k[m:0] (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))) (1m) ]), //增加蝶形单元两个输入端口间距离.xq_imag (xm_imag[ m ] [(k[m:0] (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))) (1m) ]),.factor_real(factor_real[k[m:0](1m)?k[m:0] : k[m:0]-(1m) ]),.factor_imag(factor_imag[k[m:0](1m)?k[m:0] : k[m:0]-(1m) ]),//output data.valid (en_connect[ (m1)*4 k ] ),.yp_real (xm_real[ m1 ][k[m:0] (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))] ),.yp_imag (xm_imag[ m1 ][(k[m:0]) (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))] ),.yq_real (xm_real[ m1 ][(k[m:0] (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))) (1m) ]),.yq_imag (xm_imag[ m1 ][((k[m:0]) (1m) ?(k[3:m] (m1)) k[m:0] :(k[3:m] (m1)) (k[m:0]-(1m))) (1m) ]));endendendgenerateassign valid en_connect[12];assign y0_real xm_real[3][0] ;assign y0_imag xm_imag[3][0] ;assign y1_real xm_real[3][1] ;assign y1_imag xm_imag[3][1] ;assign y2_real xm_real[3][2] ;assign y2_imag xm_imag[3][2] ;assign y3_real xm_real[3][3] ;assign y3_imag xm_imag[3][3] ;assign y4_real xm_real[3][4] ;assign y4_imag xm_imag[3][4] ;assign y5_real xm_real[3][5] ;assign y5_imag xm_imag[3][5] ;assign y6_real xm_real[3][6] ;assign y6_imag xm_imag[3][6] ;assign y7_real xm_real[3][7] ;assign y7_imag xm_imag[3][7] ;endmoduletestbenchtestbench 编写如下主要用于 16 路实、复数据的连续输入。因为每次 FFT 只有 8 点数据所以送入的数据比较随意并不是正弦波等规则的数据。实例timescale 1ns/100psmodule test ;reg clk;reg rstn;reg en ;reg signed [23:0] x0_real;reg signed [23:0] x0_imag;reg signed [23:0] x1_real;reg signed [23:0] x1_imag;reg signed [23:0] x2_real;reg signed [23:0] x2_imag;reg signed [23:0] x3_real;reg signed [23:0] x3_imag;reg signed [23:0] x4_real;reg signed [23:0] x4_imag;reg signed [23:0] x5_real;reg signed [23:0] x5_imag;reg signed [23:0] x6_real;reg signed [23:0] x6_imag;reg signed [23:0] x7_real;reg signed [23:0] x7_imag;wire valid;wire signed [23:0] y0_real;wire signed [23:0] y0_imag;wire signed [23:0] y1_real;wire signed [23:0] y1_imag;wire signed [23:0] y2_real;wire signed [23:0] y2_imag;wire signed [23:0] y3_real;wire signed [23:0] y3_imag;wire signed [23:0] y4_real;wire signed [23:0] y4_imag;wire signed [23:0] y5_real;wire signed [23:0] y5_imag;wire signed [23:0] y6_real;wire signed [23:0] y6_imag;wire signed [23:0] y7_real;wire signed [23:0] y7_imag;initial beginclk 0; //50MHzrstn 0 ;#10 rstn 1;forever begin#10 clk ~clk; //50MHzendendfft8 u_fft (.clk (clk ),.rstn (rstn ),.en (en ),.x0_real (x0_real),.x0_imag (x0_imag),.x1_real (x1_real),.x1_imag (x1_imag),.x2_real (x2_real),.x2_imag (x2_imag),.x3_real (x3_real),.x3_imag (x3_imag),.x4_real (x4_real),.x4_imag (x4_imag),.x5_real (x5_real),.x5_imag (x5_imag),.x6_real (x6_real),.x6_imag (x6_imag),.x7_real (x7_real),.x7_imag (x7_imag),.valid (valid),.y0_real (y0_real),.y0_imag (y0_imag),.y1_real (y1_real),.y1_imag (y1_imag),.y2_real (y2_real),.y2_imag (y2_imag),.y3_real (y3_real),.y3_imag (y3_imag),.y4_real (y4_real),.y4_imag (y4_imag),.y5_real (y5_real),.y5_imag (y5_imag),.y6_real (y6_real),.y6_imag (y6_imag),.y7_real (y7_real),.y7_imag (y7_imag));//data inputinitial beginen 0 ;x0_real 24d10;x1_real 24d20;x2_real 24d30;x3_real 24d40;x4_real 24d10;x5_real 24d20;x6_real 24d30;x7_real 24d40;x0_imag 24d0;x1_imag 24d0;x2_imag 24d0;x3_imag 24d0;x4_imag 24d0;x5_imag 24d0;x6_imag 24d0;x7_imag 24d0;(negedge clk) ;en 1 ;forever begin(negedge clk) ;x0_real (x0_real 22h3F_ffff) ? b0 : x0_real 1 ;x1_real (x1_real 22h3F_ffff) ? b0 : x1_real 1 ;x2_real (x2_real 22h3F_ffff) ? b0 : x2_real 31 ;x3_real (x3_real 22h3F_ffff) ? b0 : x3_real 1 ;x4_real (x4_real 22h3F_ffff) ? b0 : x4_real 23 ;x5_real (x5_real 22h3F_ffff) ? b0 : x5_real 1 ;x6_real (x6_real 22h3F_ffff) ? b0 : x6_real 6 ;x7_real (x7_real 22h3F_ffff) ? b0 : x7_real 1 ;x0_imag (x0_imag 22h3F_ffff) ? b0 : x0_imag 2 ;x1_imag (x1_imag 22h3F_ffff) ? b0 : x1_imag 5 ;x2_imag (x2_imag 22h3F_ffff) ? b0 : x2_imag 3 ;x3_imag (x3_imag 22h3F_ffff) ? b0 : x3_imag 6 ;x4_imag (x4_imag 22h3F_ffff) ? b0 : x4_imag 4 ;x5_imag (x5_imag 22h3F_ffff) ? b0 : x5_imag 8 ;x6_imag (x6_imag 22h3F_ffff) ? b0 : x6_imag 11 ;x7_imag (x7_imag 22h3F_ffff) ? b0 : x7_imag 7 ;endend//finish simulationinitial #1000 $finish ;endmodule仿真结果大致可以看出FFT 结果可以流水输出。用 matlab 自带的 FFT 函数对相同数据进行运算前 2 组数据 FFT 结果如下。可以看出第一次输入的数据信号只有实部有效时FFT 结果是完全一样的。但是第二次输入的数据复部也有信号此时两者之间的结果开始有误差有时误差还很大。用 matlab 对 Verilog 实现的 FFT 过程进行模拟发现此过程的 FFT 结果和 Verilog 实现的 FFT 结果基本一致。将有误差的两种 FFT 结果取绝对值进行比较图示如下。可以看出FFT 结果的趋势大致相同但在个别点有肉眼可见的误差。设计总结就如设计蝶形单元时所说旋转因子量化时位宽的选择就会引入误差。而且每个蝶形单元的运算结果都会进行截取也会引入误差。matlab 计算 FFT 时不用考虑精度问题以其最高精度对数据进行 FFT 计算。以上所述都会导致最后两种 FFT 计算方式结果的差异。感兴趣的学者可以将旋转因子和输入数据位宽再进行一定的增加FFT 点数也可以增加然后再进行仿真对比相对误差应该会减小。附录matlab 使用生成旋转因子8 点 FFT 只需要用到 4 个旋转因子。旋转因子扩大倍数为 8192。实例clear all;close all;clc;%% Wnr calcuting%for r 0:3Wnr_factor cos(pi/4*r) - j*sin(pi/4*r) ;Wnr_integer floor(Wnr_factor * 2^13) ;if (real(Wnr_integer)0)Wnr_real real(Wnr_integer) 2^16 ; %负数的补码elseWnr_real real(Wnr_integer) ;endif (imag(Wnr_integer)0)Wnr_imag imag(Wnr_integer) 2^16 ;elseWnr_imag imag(Wnr_integer);endWnr(2*r1,:) dec2hex(Wnr_real) %实部Wnr(2*r2,:) dec2hex(Wnr_imag) %虚部endFFT 结果对比matlab 模拟 Verilog 实现 FFT 的过程如下也包括 2 种 FFT 结果的对比。实例clear all;close all;clc;%% 8dots fft%for r0:3Wnr(r1) cos(pi/4*r) - j*sin(pi/4*r) ;endx [10, 20, 30, 40, 10, 20 ,30 ,40];step [12j, 15j, 313j, 16j, 234j, 18j, 611j, 17j];x2 x step;xm0 [x2(01), x2(41), x2(21), x2(61), x2(11), x2(51), x2(31), x2(71)] ;%% stage1xm1(1) xm0(1) xm0(2)*Wnr(1) ;xm1(2) xm0(1) - xm0(2)*Wnr(1) ;xm1(3) xm0(3) xm0(4)*Wnr(1) ;xm1(4) xm0(3) - xm0(4)*Wnr(1) ;xm1(5) xm0(5) xm0(6)*Wnr(1) ;xm1(6) xm0(5) - xm0(6)*Wnr(1) ;xm1(7) xm0(7) xm0(8)*Wnr(1) ;xm1(8) xm0(7) - xm0(8)*Wnr(1) ;floor(xm1(:))%% stage2xm2(1) xm1(1) xm1(3)*Wnr(1) ;xm2(3) xm1(1) - xm1(3)*Wnr(1) ;xm2(2) xm1(2) xm1(4)*Wnr(2) ;xm2(4) xm1(2) - xm1(4)*Wnr(2) ;xm2(5) xm1(5) xm1(7)*Wnr(1) ;xm2(7) xm1(5) - xm1(7)*Wnr(1) ;xm2(6) xm1(6) xm1(8)*Wnr(2) ;xm2(8) xm1(6) - xm1(8)*Wnr(2) ;floor(xm2(:))%% stage3xm3(1) xm2(1) xm2(5)*Wnr(1) ;xm3(5) xm2(1) - xm2(5)*Wnr(1) ;xm3(2) xm2(2) xm2(6)*Wnr(2) ;xm3(6) xm2(2) - xm2(6)*Wnr(2) ;xm3(3) xm2(3) xm2(7)*Wnr(3) ;xm3(7) xm2(3) - xm2(7)*Wnr(3) ;xm3(4) xm2(4) xm2(8)*Wnr(4) ;xm3(8) xm2(4) - xm2(8)*Wnr(4) ;floor(xm3(:))%% fftfft1 fft(x)fft2 fft(x2)plot(1:8, abs(fft2))hold onplot(1:8, abs(xm3), r)