kmap ← {xk←(⊂,⊂∘kind)⎕nl-⍳10} ##.kk fnop    ⍝ Kind Koloring of d-fnop named ⍵.

Result [kmap] is a vector of character vectors of the "kinds" of names  in  each
line of the ⎕NR of right argument function or  operator  [fnop].  Optional  left
argument [xk] is a dictionary of the kinds of external names.

We  cannot parse the following line of APL until we know whether the names refer
to variables, functions or operators:

    report ← lines print title pages

For example, "print" could be a dyadic function (F), which takes the array-pair
(title pages) as right argument and array (lines) as left:

    report ← lines print title pages        ⍝ print is a function

or title could also be a (monadic) function:

    report ← lines print title pages        ⍝ print and title are functions
                   FFFFF FFFFF

or title could be a monadic operator (M), which takes function print as left

    report ← lines print title pages        ⍝ title is a monadic operator
                   FFFFF MMMMM

or print could be a dyadic operator (D), taking title as right operand:

    report ← lines print title pages        ⍝ print is a dyadic operator
             FFFFF DDDDD FFFFF

The expression to the right of the assignment arrow might resolve  to a function
(train), making "report" a function:

    report ← lines print title pages        ⍝ (lines print title) "atop" pages

In general, an APL expression can be parsed if and only if its names can be
classified into one of four "kinds", depending on their referent values:

  N: array or niladic function
  F: function
  M: monadic operator
  D: dyadic operator

Here are some more examples using N and F to "kind-colour" names (and brackets):

    mean←{ sum←+⌿⍵ ⋄ num←≢⍵ ⋄ ( sum÷num ) }     ⍝ (sum÷num) nilad

    mean←{ sum←+⌿⍵ ⋄ num←≢  ⋄ ( sum÷num )⍵}     ⍝ (sum÷num) Agh train

    mean←{ sum←+⌿  ⋄ num←≢  ⋄ ( sum÷num )⍵}     ⍝ (sum÷num) fgh train

In addition to names,  [kk]  colours parentheses and curly braces to reflect the
kinds of their contained expressions.


    How many valid parsings are there of ⍵ consecutive distinct names?  We can
    try permutations of representatives (0 + ¨ ⍤) from each of the four kinds,
    trapping syntax errors for invalid sequences.  Here are the numbers for seq-
    uences of lengths from one to twelve, showing an approximately exponential
    increase. For a dozen adjacent distinct names there are over a million sep-
    arate interpretations:

        ⎕ ← kinds ← {+/,{2::0 ⋄ _←⍎'0+¨⍤'[⍵] ⋄ 1}¨⍳⍵/4}¨ 1 to 12
    3 7 19 60 200 669 2258 7644 25974 88475 302052 1033072

        ⌊ 0.5+ ⍟kinds               ⍝ exponential ←→ log approximately linear
    1 2 3 4 5 7 8 9 10 11 13 14

    Here are all 19 valid sequences of length 3:

        {~∘0,{2::0 ⋄ _←⍎'0+¨⍤'[⍵] ⋄ 'NFMD'[⍵] }¨⍳⍵/4} 3

Web Page Colouring
[kk] is used to colour the web pages for the code of the functions and operators
from this workspace.  Click the ##.kk link in the header line above to  see  the
effect.  As suggested by Morten Kromberg in his blog post, the pages use italics
to distinguish array from function and operator values.

Attempting to use a BOLD font-style for operators widened the characters, caus-
ing misalignment of the comment symbols and rendering the overall effect rather
untidy.   To avoid this, monadic and dyadic operator names are distinguished by
being coloured in medium and lighter blue, respectively:

       Font     Colour          Kind
       ----     ------          ----
    N: normal   black           nilad (array)
    F: italic   black           function
    M: italic   medium blue     monadic operator
    D: italic   lighter blue    dyadic operator

For example, in the code at notice that:

1. In the second line, because ⊂∘kind is a function, the parenthesised express-
   ion as a whole is a _fork_ and so the parentheses are italicised. The ital-
   ics should help when reading the line, by indicating that the parenthesised
   expression consumes, rather than provides a left argument for, the result of
   the ⎕NL:

   [1]   ⍺←(⊂,⊂∘kind)⎕NL-⍳10                ⍝ (...) is a function
           F    FFFFF

   (muse: This identification of function trains may alone be sufficient reason
          for adopting kind-colouring.)

2. Within the definition of sub-function [body], notice that [dq] and [expr] are
   both identified as functions. The colouring helps with reading the ⊃ and / to
   their right as "first" and "reduction" respectively:

   [19]  K∆←1⊃dq⊃expr/(⌽⍺),⊂E∆ K ⍬          ⍝ dq is a fn, so ⊃ is "first"
              FF FFFF

The code for →rats← at provides examples of
several more cases:

1. [rats] itself is a (medium-blue-coloured) monadic operator.  With colouring,
   we can determine this immediately from the colours of the name and opening
   curly brace, without having to scan the body of the code for the presence of
   an ⍺⍺ (and the absence of an ⍵⍵, which would make it a _dyadic_ operator).

   [0]  rats←{⎕IO←1                         ⍝ rats is a monadic operator
        MMMM M

2. Inner [arith] is also easily identified as a monadic operator:

    [2]     arith←{                         ⍝ arith is a monadic operator
            MMMMM M

3. Inferring the kind of rats' operand ⍺⍺ is currently beyond kk's ability and
   so name "aa" in line[3] is coloured (red) for "unknown".

   [3]  fn←⍺⍺{aa←⍺⍺ ⋄ ⊃⎕CR'aa'}⍵            ⍝ aa is coloured "unknown"
             M??              M

4. [norm] and [sum] reference a function and monadic operator, respectively:

   [4]      '+'≡fn:norm ⍺+sum ⍵             ⍝ sum is a monadic operator
                   FFFF   MMM

5. Towards the end of the code some functions are derived from dyadic operators:

   [59] else←{⍺⍺ ⍵:⍵ ⋄ err ⍵⍵}              ⍝ else is a dyadic operator
        DDDD D               D                ¯¯¯¯
   [60] echk←{⍵≡⌊⍵}else'irrational'         ⍝ ... deriving function echk
        FFFF F    FDDDD                                             ¯¯¯¯
   [63] err←⎕SIGNAL∘11                      ⍝ err is a derived function
        FFF                                   ¯¯¯
   [65] imin←∩mesh⌊                         ⍝ dyadic op mesh derives fn imin
        FFFF  DDDD                                      ¯¯¯¯            ¯¯¯¯
   [67] gcd←imin over umax                  ⍝ dyadic op over derives fn gcd
        FFF FFFF DDDD FFFF                              ¯¯¯¯            ¯¯¯

The code for →joy← at shows the binding of ⍺ as
left operand of a multi-line monadic operator to form a derived function "prt":

   [208]    prt←⍺{                          ⍝ ⍺{...⍺⍺...} derives fn prt
            FFF  M                             ¯        ¯            ¯¯¯
Without the colouring of both prt and the left brace, this snippet would be hard
to parse  without noticing that nothing follows the matching brace fifteen lines
further down the code.

Depending on context, some expressions can resolve to more than one kind.  Exam-
ples include the four "hybrid" tokens: / ⌿ \ ⍀,  which can each be of kind F  or
M, depending  on what is to their left, and operand tokens, ⍺⍺ and ⍵⍵, which can
each be either N or F. In isolation, the colours for names "slash" and "left" in
slash←/ and left←⍺⍺ cannot be determined.

This is not a shortcoming of the kind-inference process: given this property  of
the Dyalog language, drawing attention to such ambiguities with distinct colours
may be considered a virtue.

For example, the name "left" in this operator:

    twice ← {left←⍺⍺ ⋄ left left ⍵}

has kind F if the operator is called with a function operand,  and N if an array
operand is used. In this case [kk] correctly infers a composite kind: N∨F.

[kk] employs a total of 11 colours: N, F, M, D, f, r, h, o, ·, Z and ?:

                        ┌────────── Example of referent value
N: nilad                ⍬           (name assigned to ⍬ is coloured N)
F: function             +           (name assigned to + is coloured F)
M: monadic operator     ⌸            etc
D: dyadic operator      ⍤

r: N∨F                  ⍺⍺ ⍵⍵       ("r" stands for opeRand)
h: · F∨M                / ⌿ \ ⍀     ("h" stands for Hybrid)
o: · · M∨D              ∇∇          ("o" stands for Operator)

f: monadic function     1+          (temporary kind used internally)
?: unresolved                       (kind could not be determined)
·: not coloured         ⍝...        (for tokens other than names and brackets)
Z: internal error                   (indicates a fault in the parse table)

In the web pages for the code at, colour _red_ is used for kinds
other than the primary N, F, M and D.   See:
which contains several such expressions.

When coding APL, it is often possible to "force" the kind of a polymorphic expr-
ession.  For example, it may be intended that the [twice] operator  be used only
to apply its operand _function_ twice to an argument array.

    twice ← {left←⍺⍺ ⋄ left left ⍵}

    ⊂twice 'this'           ⍝ enclosed twice

However, as coded, the operator is equally happy with an _array_ operand:

    '<'twice 'this'         ⍝ prefixed twice

To force kind F for name "left", we could code:

    twice←{left←⍺⍺∘⊢ ⋄ left left ⍵}
    ⊂twice 'this'           ⍝ function operand: enclosed twice

    '<'twice 'this'         ⍝ array operand is a no-op.

In similar fashion, adding ⊣0 to the end of line[131] of →ratsum← to become:

    [131]  digs←↑↓/(1 ¯1×'{}'≡ext ⍺⍺),⊂⍺⍺⊣0    ⍝ digits without surrounding {}s.
... removes much of the red colouring from:

Finally, note that (1+⍺⍺) has a compound kind of N∨F, whereas the kind of (⍺⍺+1)
is simply N. If ⍺⍺ always references an array, the latter coding should reduce
the burden on the human reader.

Technical notes:

0. kk's optional left argument ⍺ defaults to a dictionary of the →kind← of each
   name in the current namespace. A dictionary is a pair of vectors (names
   values) from which function [val], given a name, can obtain its value.

1. Function →tokens← is used to create a vector of token vectors from the lines
   of the ⎕NR of the subject function.

2. The tokens vectors for each line are joined with a synthetic newline token
   (,'┘') to produce a single vector [toks] of tokens. With respect to parsing,
   newlines are the same as diamonds.

3. Function [lint] removes tokens (comments and white space), not needed in the
   parsing, and returns a vector of indices into [toks] of the remaning ones.
   The orginal tokens can be retrieved from this vector with function [tx] and
   in the following descriptions the word "token" will often be used inter-
   changeably with "token index". In addition, surplus newlines and diamonds are
   removed, as are tokens for outer-product (∘.) so that all remaining ∘s can be
   treated as the composition operator. This works because kind(∘.f) ←→ kind(f).
   There are several similar transformations:

        kind(x.y)  ←→ kind(y)
        kind(x[y]) ←→ kind(x)
        kind(x←y)  ←→ kind(y)

4. The kind-inference process uses four mutually-recursive functions: [fnop],
   [body], [expr] and [tokn], together with a de-queueing function [dq] Here
   are their types:

    fnop :: S E K Q ← Tinx ∇ S E K Q
    body ::   E K   ← Body ∇   E K
    expr ::   E K Q ← Expr ∇   E K Q
    tokn :: S E K Q ← Tinx ∇ S E K Q
      dq ::     K   ←      ∇   E K Q

       S := [Kind]                  ⍝ Stack: vector of kinds
       E := [Tinx][Kind]            ⍝ Environment: maps tokens to kinds
       K := [Tinx Kind]             ⍝ Colour map: vector of token/kind pairs
       Q := [Body]                  ⍝ Queue of deferred fnop bodies
    Kind := N F M D _F NF H MD U Z  ⍝ Kind
    Body := [Expr]                  ⍝ Vector of ⋄:┘-separated exprs
    Expr := [Tinx]                  ⍝ Expression: vector of token indices
    Tinx := Indx | [Tinx]           ⍝ Index or indices into toks vector

    [fnop] takes a nested triple: '{' body '}'; logs the colours of the { and }
    braces by looking for the presence of ⍺⍺ or ⍵⍵ in the fnop body; and splits
    tokens into ⋄-, :- and ┘-separated expressions; with which it calls function

    [body] processes the vector of expressions, which constitute the fnop body,
    with a left-to-right [expr]-reduction, accumulating E K and Q vectors.

    [expr] takes a vector of tokens, representing an expression, together with:
    an environment (E); current colour map (K); and body queue (Q), and returns
    new values for E, K and Q. The expression is processed with a right-to-left

    [tokn] takes a token to its left and the tuple (S E K Q) as right argument.
    E, K, and Q are as above and S is a stack of the kinds of tokens already
    visited. For a complete and correct expression, the stack S can be resolved
    into a single kind by function [parse]

5. The Queue Q

    The parsing process distinguishes fnop _definitions_ from _applications_.
    Applications are processed immediately on encounter but definitions are
    deferred until the end of the current body. The following code illustrates
    why this is necessary, when colouring instances of name "sum" within the two
    inner functions:

            sum←+/          ⍝ sum:F    "sum is of kind F"
            def←{sum ⍵}     ⍝ def:F     sum:N (from below)
            app←{sum ⍵}⍵    ⍝ app:N     sum:F (from above)
            sum←+/⍵         ⍝ sum:N
            def 0           ⍝ def:F

    After  processing all expressions in a fnop body, function [dq] is called to
    flush the queue.

Auxiliary functions:

    toks                        ⍝ token vector with white-space and ┘-newlines
│foo│←│{│┘│    │a│←│(│b│+│c│)│+│(│(│d│+│e│)│×│f│)│┘│    │a│[│g│]│∘│.│{│⍺│ │⍵│}│⍵│┘│}│

Function [lint] discards surplus tokens and returns a vector of the indices of
the remaining ones.

    lint toks                   ⍝ indices of de-fluffed tokens vector
0 1 2 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 25 26 27 30 31 33 34 35 37

[tx] reconstitutes tokens from their indices:

    tx lint toks                ⍝ reconstituted by [tx] function

[nest] encloses each occurrence of top-level of brackets into a triple.

    tx '{}'nest lint toks       ⍝ nested at {} tokens
│   │ ││{│┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐│}││
│   │ ││ ││a│←│(│b│+│c│)│+│(│(│d│+│e│)│×│f│)│┘│a│[│g│]│{│⍺│⍵│}│⍵││ ││
│   │ ││ │└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘│ ││
│   │ │└─┴───────────────────────────────────────────────────────┴─┘│

    body ← 2 1⊃'{}'nest lint toks   ⍝ function body
    tx body

    tx '()[]'nest body              ⍝ top-level () and [] nesting
│ │ ││(│┌─┬─┬─┐│)││ ││(│┌─┬─┬─┬─┬─┬─┬─┐│)││ │ ││[│┌─┐│]││ │ │ │ │ │
│ │ ││ ││b│+│c││ ││ ││ ││(│d│+│e│)│×│f││ ││ │ ││ ││g││ ││ │ │ │ │ │
│ │ ││ │└─┴─┴─┘│ ││ ││ │└─┴─┴─┴─┴─┴─┴─┘│ ││ │ ││ │└─┘│ ││ │ │ │ │ │
│ │ │└─┴───────┴─┘│ │└─┴───────────────┴─┘│ │ │└─┴───┴─┘│ │ │ │ │ │


1. [kk] is confused by names in "dotted expressions":

    nil←0 ⋄ ⎕se.nil←⊢ ⋄ nil         ⍝ rightmost "nil" should be NNN
    NNN         FFF     FFF

2. It is possible to confound the kind inferencing by re-using the same name for
   different kinds within a particular fnop body.  Look  at  name  "what" in the
   following code:

        what←0          ⍝ what:N
        fun←{what ⍵}    ⍝ what:?
        0⊣fun 0:        ⍝ first call on fun what:N
        what←⊢          ⍝ what:F
        fun 0           ⍝ second call on fun: what:F

   In this example, name "what", within function "fun", will be coloured F by
   the de-queueing process. Ideally it should be coloured N∨F(r) but this would
   require a significantly more complex treatment.  Of course, refraining from
   changing the kind of a name, which is (arguably) good practice, should avoid
   this problem.

Possible improvements:

1. Some kinds can be inferred from their context. For example, for correct code:

    The kind of a guard or error-guard must be N:

        N    N

    Some primitive operators are known to accept only kind F operands:

        F    F

    Some primitive functions are strictly dyadic so an expression to its left
    must be of kind N:

        N    N

    Only kind N may be returned as the result of a function or operator:

          N    N

    An assignment of _multiple_ names must be of kind N:

        this that ← (0<⍺⍺)
        NNNN NNNN   N    N

    ... and so on.

2. Where the kind of an operand can be inferred, this information could be passed
   into the body of the operator by adding an entry to the kind-dictionary for ⍺⍺
   (and ⍵⍵).  This could be achieved by adding extra productions for such cases:

        ⍵,6 _ Mn  _ D _ N }{      ⍝  Mn ← D to nilad right operand     => ⍵⍵:N
        ⍵,6 _ Mf  _ D _ F }{      ⍝  Mf ← D to function right operand  => ⍵⍵:F
   This would increase the rows in the table of productions by a significant num-
   ber. However, it would, for example, remove the red colouring from the follow-
   ing inner function within

        rsum←(atab ⍺⍺~'{}'){                ⍝ row sum of matrix ⍵, carry_in ⍺.
            cov itot←↓⍉↑⍺⍺[↓⍉digs⍳⍵]        ⍝ carry vector and initial total.
            '0'∧.=cov,⍺:'0'itot             ⍝ all-zero carry: done.
            co tot←'0'∇↑(1↓cov,⍺)itot       ⍝ total with shifted carry vector.
            (⊃⊃⌽'0'∇↑co,⊂1↑cov)tot          ⍝ aggregate carry and total.

3. A Mk-II version of [kk] might choose to deal in only the four primary kinds:
   N F M D.

   The inference process would create a parse-tree for the whole capsule (fnop)
   with ambiguous nodes represented as _vectors_ of multiple interpretations: a
   tree-of-trees. Items of stack S would, in general be vectors of kinds:

        S := [[K]]                              ⍝ Stack: vector vectors of kinds

   and [parse] would be applied to the outer-product-join reduction of the stack.
   A unification process (see →unify←) would filter consistent combinations from
   the tree-of-trees:

        S∆ E∆ ← E unify filter parse¨ ,⊃∘.,/ S  ⍝ all consistent interpretations
   ... leaving some, though in practice probably few, multi-coloured names:

      {⍵/'NFMD'}¨1↓,⎕io≠⍳4/2                    ⍝ all possible kind colours

   In particular it could:

   3.1 Automatically make use of the kinds of operands passed to  inner  operat-
       ors.  For example,  if local operator "twice"  is only ever called with a
       function operand, then expressions that include ⍺⍺ within its body can be
       treated accordingly:

            twice←{left←⍺⍺ ⋄ left left ⍵}   ⍝ left is FFFF because:
            this←⊂twice ⍺                   ⍝   twice is only ever called
            that←+/twice ⍵                  ⍝       with a _function_ as operand

      Given this technology, it might be appropriate to colour the ⍺⍺, ⍵⍵ and ∇∇
      tokens themselves.

   3.2 Accumulate inference information between lines. For example, in the first
       line below, "nums" might reference either an array or a function (train).
       However, assuming the second line to be correct code,  the  expression to
       the left of the guard (which in isolation could also evaluate to a train)
       must reduce to 0 or 1, implying nums (and thus ⍺⍺) to be a nilad: NNNN.

        nums←1↓⍺⍺               ⍝ nums could be NNNN or FFFF
        0∊nums:'error'          ⍝ nums must be NNNN

4. In addition to resolving ambiguities, _conflicting_ kind determinations could
   be flagged as errors in the code.

Requires: →tokens← →kind←



    (⎕nr ,[0.5] kk)'mean'           ⍝ source lines and colours
│mean←{│    sum←+⌿⍵│    num←≢│    (sum÷num)⍵│}│

    ⍝ The following function interleaves the lines of its argument function
    ⍝ with their kind colours:

    crkk←{↑↑,/(⎕NR ⍵){⍺ ⍵}¨kk ⍵}    ⍝ ⎕cr interleaved with line colouring

    crkk'mean'                      ⍝ colours per line of function "mean"

    crkk'crkk'                      ⍝ (selfie)
crkk←{↑↑,/(⎕NR ⍵){⍺ ⍵}¨kk ⍵}

See also: parse tokens kind alists to

Back to: contents

Back to: Workspaces