Dyalog APL - Fast multi-digit product using FFT.

z←x ##.xtimes y                         ⍝ Fast multi-digit product using FFT.

From Roger Hui:

0. Introduction

FFT is used to provide an O(n×⍟n) algorithm for the convolution of  two  vectors
(polynomial multiplication), which is then used to implement a  fast  multiplic-
ation for integers with an arbitrary number of digits.  The ideas and code  were
presented by Henry Rich in the Jwiki essay FFT on  2010-12-27,  here  translated
from J to Dyalog APL. See: http://www.jsoftware.com/jwiki/Essays/FFT

In these notes, ⎕pp←6 and ⎕io←0 .

1. Convolution

Convolution (polynomial multiplication) can be coded succinctly:

   convolve←{+⌿(-⍳⍴⍺)⌽⍺∘.×⍵,0×1↓⍺}

   x←9 3 5 8 1 0 5
   y←6 2 3 7 4

   x convolve y
54 36 63 130 94 73 109 49 19 35 20

convolve works by summing the antidiagonals of x∘.×y , that is, summing the ele-
ments of x∘.×y having like indices in (⍳⍴x)∘.+⍳⍴y , resulting in a  vector  with
¯1+(⍴x)+⍴y elements.

   (x∘.×y) '     ' ((⍳⍴x)∘.+⍳⍴y)
 54 18 27 63 36         0 1 2 3  4
 18  6  9 21 12         1 2 3 4  5
 30 10 15 35 20         2 3 4 5  6
 48 16 24 56 32         3 4 5 6  7
  6  2  3  7  4         4 5 6 7  8
  0  0  0  0  0         5 6 7 8  9
 30 10 15 35 20         6 7 8 9 10

convolve takes time of order (⍴x)×(⍴y) , making it impractical  on  large  argu-
ments (longer than 1e5 , say).

2. FFT

A faster convolution obtains by use of FFT (fast Fourier  transform).  FFT  con-
verts a polynomial represented by its coefficients into its value at  n  points,
here the n roots of unity, and the inverse FFT converts from the n  points  back
into coefficients. The important properties are that

   ⍺ convolve ⍵  ←→  iFFT (FFT ⍺) × (FFT ⍵)

and that FFT and iFFT are computed  in  O(n×⍟n)  time  (justifying  the  moniker
"fast").

The key computation in FFT (and iFFT) is:

   floop←{(⊣/⍺)∇⍣(×m)⊢(+⌿⍵),[m-0.5]⍺×[⍳m←≢⍴⍺]-⌿⍵}

The right argument is a hypercube initially constructed from the argument  ⍵  to
FFT as (r⍴2)⍴⍵ where r←2⍟⍬⍴⍴⍵ . floop applies to the hypercube r times, to  each
of the axes in order. Let p←¯1*2÷2*r be a primitive root of unity of  order 2*r.
The left argument is the hypercube p*(2*k)×s⍴⍳×/s←2⍴⍨(r-1)-k on the k-th applic-
ation of floop . Equivalently, the initial left argument is p*s⍴⍳×/s←(r-1)⍴2 and
the next left argument obtains from the current one by ⊣/⍺ .

    x
9 3 5 8 1 0 5
    y
6 2 3 7 4

    iFFT (FFT 16↑x) × (FFT 16↑y)
54J¯9.32587E¯15 36J¯3.51108E¯15 63J¯6.38378E¯15 ...

    x rconvolve y
54J¯9.32587E¯15 36J¯3.51108E¯15 63J¯6.38378E¯15 ...

    ⌊0.5+9○ x rconvolve y
54 36 63 130 94 73 109 49 19 35 20

    x convolve y
54 36 63 130 94 73 109 49 19 35 20

Because rconvolve is mathematically (if not numerically) equivalent to convolve,
the application of 9○ to its result is justified when the arguments are real and
of ⌊0.5+ when the they are integral.


3. Integer Multiplication

The convolution of x and y are the digits of their product when x and y are  in-
terpreted as vectors of digits. The convention here is that the digits are list-
ed from most significant to least significant. For example, the number  3142  is
represented by 3 1 4 2 . Base 10 is used here but other bases are possible.

It remains to convert the result into standard form, with each element less than
the base.  If carry performs one step of propagating the carries, then the limit
of its application is the desired computation.

    x
9 3 5 8 1 0 5
    y
6 2 3 7 4

    ⎕←t←⌊0.5+9○ x rconvolve y
54 36 63 130 94 73 109 49 19 35 20

    carry 0,t
5 7 12 16 9 11 13 13 10 12 7 0

    carry carry 0,t
5 8 3 6 10 2 4 4 1 2 7 0

    carry carry carry 0,t
5 8 3 7 0 2 4 4 1 2 7 0

    carry carry carry carry 0,t
5 8 3 7 0 2 4 4 1 2 7 0

    carry⍣≡ 0,t
5 8 3 7 0 2 4 4 1 2 7 0

    x xtimes y
5 8 3 7 0 2 4 4 1 2 7 0

    ' '~⍨⍕x xtimes y
583702441270

    {⎕pp←18 ⋄ ⍕⍵} 9358105 × 62374
583702441270

    s t ← ↓ ? 2 1.5e4⍴10

    s xtimes time t
00.296

xtimes has the potential to multiply numbers with over a million digits in under
a minute.  The function as written is constrained by the fact that in Dyalog APL
the maximum rank of an array is 15, so that the maximum product  can  have  only
2*15 or 32768 digits.


4. Collected Definitions

The presentation here favors terseness and clarity over efficiency.  Some  poss-
ible improvements are described in the Jwiki essay.  (For example, computing the
roots of unity just once rather than four times.)

    convolve←{+⌿(-⍳⍴⍺)⌽⍺∘.×⍵,0×1↓⍺}

    xtimes←{⎕io←0
        roots     ← {×\1,1↓(⍵÷2)⍴¯1*2÷⍵}
        cube      ← {⍵⍴⍨2⍴⍨2⍟⍴⍵}
        extend    ← {(2*⌈2⍟¯1+(⍴⍺)+⍴⍵)↑¨⍺ ⍵}
        floop     ← {(⊣/⍺)∇⍣(×m)⊢(+⌿⍵),[m-0.5]⍺×[⍳m←≢⍴⍺]-⌿⍵}
        FFT       ← {      ,(cube  roots ⍴⍵)floop cube ⍵}
        iFFT      ← {(⍴⍵)÷⍨,(cube +roots ⍴⍵)floop cube ⍵}
        rconvolve ← {(¯1+(⍴⍺)+⍴⍵)↑iFFT ⊃×/FFT¨⍺ extend ⍵}
        carry     ← {1↓+⌿1 0⌽0,0 10⊤⍵}
        (+/∧\0=t)↓t←carry⍣≡0,⌊0.5+9○ ⍺ rconvolve ⍵
    }

Ref: http://www.jsoftware.com/jwiki/Essays/FFT

See also: nats big xpower

Back to: contents

Back to: Workspaces