This is a heavily modified implementation of FFTPack [1,2], with the following advantages:
N*log(N), because Bluestein's algorithm [3] is used for these cases.3-clause BSD (see LICENSE.md)
Twiddle factor computation:
[0; pi/4] for higher accuracysincospi() is used, which actually computes sin(x) and (cos(x)-1).n sin/cos pairs are required, the adjusted sincospi() is only called 2*sqrt(n) times; the remaining values are obtained by evaluating the angle addition theorems in a numerically accurate way.Parallel invocation:
Efficient codelets are available for the factors:
Larger prime factors are handled by somewhat less efficient, generic routines.
For lengths with very large prime factors, Bluestein's algorithm is used, and instead of an FFT of length n, a convolution of length n2 >= 2*n-1 is performed, where n2 is chosen to be highly composite.
[1] Swarztrauber, P. 1982, Vectorizing the Fast Fourier Transforms (New York: Academic Press), 51 [2] https://www.netlib.org/fftpack/ [3] https://en.wikipedia.org/wiki/Chirp_Z-transform