tree ← #.exp.parse expr ⍝ Infix expression parser. [expr] is a character vector representing an expression to be parsed. The parse is controlled by [#.exp.defns], a 4-vector of: - order of precedence 0 1 2 3 ··· - association direction ← → ↑ (monadic) - function name * / ln ··· - APL equivalent × ÷ ⍟ ··· disp exp.defns ┌→──────────────────────────┬──────────────┬───────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │┌→┬─┬─┬─┬─┬─┬─┬───┬───┬───┬───┬───────┬─────┬─────┐│┌→┬─┬─┬─┬─┬─┬─┬─────────────────────────────────────┬──┬──┬──┬────────────────────────────────────────────────┬────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────┐│ │0 1 1 2 2 3 4 5 6 6 6 7 8 9│←←←←←→↑↑↑↑↑↑↑↑││,│+│-│*│/│∧│-│mod│sum│max│min│ceiling│floor│round│││ │+│-│×│÷│*│-│{0::⎕signal ⎕en ⋄ n m _←⍵,0 ⋄ m | n}│+/│⌈/│⌊/│{0::⎕signal ⎕en ⋄ num mul _←⍵,0 ⋄ mul×⌈num÷mul}│{0::⎕signal ⎕en ⋄ num mul _←⍵,0 ⋄ mul×⌊num÷mul}│{0::⎕signal ⎕en ⋄ num sig _←⍵,0 ⋄ mul←÷10*sig ⋄ mul×⌊0.5+num÷mul}││ │ │ │└→┴→┴→┴→┴→┴→┴→┴──→┴──→┴──→┴──→┴──────→┴────→┴────→┘│└⊖┴→┴→┴→┴→┴→┴→┴────────────────────────────────────→┴─→┴─→┴─→┴───────────────────────────────────────────────→┴───────────────────────────────────────────────→┴─────────────────────────────────────────────────────────────────→┘│ └~─────────────────────────→┴─────────────→┴──────────────────────────────────────────────────→┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘ or in columns: ↑exp.defns 0 1 1 2 2 3 4 5 6 6 6 7 8 9 ← ← ← ← ← → ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ , + - * / ∧ - mod sum max min ceiling floor round + - × ÷ * - {0::⎕signal ⎕en ⋄ n m _←⍵,0 ⋄ m | n} +/ ⌈/ ⌊/ {0::⎕signal ⎕en ⋄ num mul _←⍵,0 ⋄ mul×⌈num÷mul} {0::⎕signal ⎕en ⋄ num mul _←⍵,0 ⋄ mul×⌊num÷mul} {0::⎕signal ⎕en ⋄ num sig _←⍵,0 ⋄ mul←÷10*sig ⋄ mul×⌊0.5+num÷mul} Note that apart from power '*', dyadic operators bind left, and that monadic and dyadic '*' and '-' have different binding strengths; see examples below. A possible source of confusion is the distinction between "order of association" and "direction of evaluation". For example, primitive operator '-' associates left, so that 3-2-1 is interpreted as (3-2)-1. This mean that the evaluation proceeds from left to right: 3-2-1 → 1-1 → 0 (in the opposite direction from the association arrow). [parse] signals the following errors with error number 666. ┌────────────────┬───────┐ │ Error Message │Example│ ├────────────────┼───────┤ │missing operator│ 2 3 │ │missing operand │ 2+ │ │null expression │ () │ │unexpected ) │ 2) │ │missing ))··· │ ((2 │ └────────────────┴───────┘ Technical notes: [parse] proceeds from left to right through the tokens in the expression. Oper- ands are pushed immediately onto an output stack (acc), which accumulates the parse tree. Operators are pushed onto a second stack (stk), where the top two are compared. If the top operator is weaker than the second, the latter is ex- tracted from the stack and bound with the top two items in [acc], which are re- placed with a single item, the (larg op rarg) triple. It is convenient to store the tokens waiting to be processed, also in a stack. We wind up with three stacks: ⍵: Tokens waiting to be processed, acc: Tree nodes waiting to be connected by an operator. stk: Operators of lower binding strength than the current one. The following illustration shows the parsing of expression 1+2-3*4∧5∧6/7+8. Initially, all tokens are in the ⍵ stack, and acc and stk are empty. See below for a key to the annotations. Remember, the expression is to be parsed according to the rules of standard arithmetic, where ∧ binds tighter than *, binds tighter than +, etc. ────────┬────────────────┬──────── acc → │ ← tokens │ ← stk ────────┼────────────────┼──────── │ 1+2-3*4∧5∧6/7+8│ · │←┘ │ ────────┼────────────────┼──────── 1│ +2-3*4∧5∧6/7+8│ · │ └→ │ ────────┼────────────────┼──────── 1│ 2-3*4∧5∧6/7+8│ + · │ ←┘ │ └>>┘ ────────┼────────────────┼──────── 1 2│ -3*4∧5∧6/7+8│ + · │ └→ │ ────────┼────────────────┼──────── 1 2│ 3*4∧5∧6/7+8│ - + · │ │ └≤≤┘ │ │ ←┘ ────────┼────────────────┼──────── + │ 3*4∧5∧6/7+8│ - · ┌┴┐│ ↑ │ 1 2│ ←┘ │ ────────┼────────────────┼──────── + 3│ *4∧5∧6/7+8│ - · ┌┴┐ │ └→ │ 1 2 │ │ ────────┼────────────────┼──────── + 3│ 4∧5∧6/7+8│ * - · ┌┴┐ │ │ └>>┘ 1 2 │ │ ────────┼────────────────┼──────── + 3│ 4∧5∧6/7+8│ * - · ┌┴┐ │ ←┘ │ 1 2 │ │ ────────┼────────────────┼──────── + 3 4│ ∧5∧6/7+8│ * - · ┌┴┐ │ └→ │ 1 2 │ │ ────────┼────────────────┼──────── + 3 4│ 5∧6/7+8│ ∧ * - · ┌┴┐ │ │ └>>┘ 1 2 │ │ ────────┼────────────────┼──────── + 3 4│ 5∧6/7+8│ ∧ * - · ┌┴┐ │ ←┘ │ 1 2 │ │ ────────┼────────────────┼──────── + 3 4 5│ ∧6/7+8│ ∧ * - · ┌┴┐ │ └→ │ 1 2 │ │ ────────┼────────────────┼──────── + 3 4 5│ 6/7+8│ ∧ ∧ * - · ┌┴┐ │ │ └>>┘ 1 2 │ │ ────────┼────────────────┼──────── + 3 4 5│ 6/7+8│ ∧ ∧ * - · ┌┴┐ │ ←┘ │ 1 2 │ │ ────────┼────────────────┼──────── + 3 4 5 6│ /7+8│ ∧ ∧ * - · ┌┴┐ │ └→ │ 1 2 │ │ │ │ ────────┼────────────────┼──────── + 3 4 5 6│ 7+8│ / ∧ ∧ * - · ┌┴┐ │ │ └<<┘ 1 2 │ │ ←┘ │ │ ────────┼────────────────┼──────── + 3 4 ∧ │ 7+8│ / ∧ * - · ┌┴┐ ┌┴┐ │ │ └<<┘ 1 2 5 6 │ │ ←┘ │ │ ────────┼────────────────┼──────── + 3 ∧ │ 7+8│ / * - · ┌┴┐ ┌┴─┐ │ │ └≤≤┘ 1 2 4 ∧ │ │ ←┘ ┌┴┐│ │ 5 6│ │ ────────┼────────────────┼──────── + * │ 7+8│ / - · ┌┴┐ ┌┴─┐ │ │ └>>┘ 1 2 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐│ │ 5 6│ │ ────────┼────────────────┼──────── + * │ 7+8│ / - · ┌┴┐ ┌┴─┐ │ ←┘ │ 1 2 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐│ │ 5 6│ │ ────────┼────────────────┼──────── + * 7│ +8│ / - · ┌┴┐ ┌┴─┐ │ └→│ 1 2 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┼────────────────┼──────── + * 7│ 8│ + / - · ┌┴┐ ┌┴─┐ │ │ └<<┘ 1 2 3 ∧ │ │ ←┘ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┼────────────────┼──────── + / │ 8│ + - · ┌┴┐ ┌┴──┐ │ │ └≤≤┘ 1 2 * 7 │ │ ←┘ ┌┴─┐ │ │ 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┼────────────────┼──────── - │ 8│ + · ┌──┴───┐ │ │ └>>┘ + / │ │ ┌┴┐ ┌┴──┐ │ │ 1 2 * 7 │ │ ┌┴─┐ │ │ 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┼────────────────┼──────── - │ 8│ + · ┌──┴───┐ │ ←┘│ + / │ │ ┌┴┐ ┌┴──┐ │ │ 1 2 * 7 │ │ ┌┴─┐ │ │ 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┼────────────────┼──────── - 8│ │ + · ┌──┴───┐ │ │ ←┘ + / │ │ ┌┴┐ ┌┴──┐ │ │ 1 2 * 7 │ │ ┌┴─┐ │ │ 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┼────────────────┼──────── + │ │ · ┌──┴──┐ │ │ - 8 │ │ ┌──┴───┐ │ │ + / │ │ ┌┴┐ ┌┴──┐ │ │ 1 2 * 7 │ │ ┌┴─┐ │ │ 3 ∧ │ │ ┌┴─┐ │ │ 4 ∧ │ │ ┌┴┐ │ │ 5 6 │ │ ────────┴────────────────┴──────── Key: stack → top of stack to right. ← stack top of stack to left. ↑···↑ compare operators: ⍺>>⍵ ⍺ stonger than ⍵. ⍺<<⍵ ⍺ weaker than ⍵. ⍺≥≥⍵ ⍺ same strength as ⍵, associates right. ⍺≤≤⍵ ⍺ same strength as ⍵, associates left. ←┘ └→ move token left or right. In addition to the three stacks, parse maintains two more items of state: [dep], the current parenthesis nesting depth. [cls], the "class" of the previous token: _: initial state or previous token was a left paren. a: previous token was an operand (argument). f: previous token was an operator (function). Together, cls, dep, ack, stk and ⍵ completely describe the parse state. Parentheses are processed by calling the parser recursively on encountering a '(', and pushing the result back onto the token stack as an operand, when a ')' is reached. When the last token (which in a correct expression, must be an operand) has been fetched, [stk] may still contain some operators. Any remaining ones are flushed, simply by pushing an artificially weak (strength = ¯1) dummy operator onto the stack. This displaces all of the others, causing them to bind with their oper- ands waiting in the accumulator. Vectors vs. Lists vs. Recursive Tuples -------------------------------------- A stack is often represented in APL by a _vector_. In this case, operations on the stack are: push: stack←(⊂head),stack pop: head stack←(⊃stack)(1↓stack) An alternative is to implement a recursive data structure, the _list_. A list is either empty, or is a pair (head tail), where tail is itself a list. push: stack←head stack pop: head stack←stack If each item in the stack is the same _tuple_ (a b c), we can use either a list: push: stack←(a b c)stack pop: (a b c)stack←stack or a "recursive tuple", which provides marginally easier access to its top item: push: stack←a b c stack pop: a b c stack←stack In general, a _vector_ consumes slightly less workspace, but is a fraction slow- er to access than the other two. A _tuple_ is both marginally quicker and uses less space than a _list_, but is appropriate only if all of its items have the same structure. [parse] uses a list for its accumulator [acc], and a recursive tuple for its op- erand stack [stk]. To render a vector as a list, we "link-reduce" it: list←↑{⍺ ⍵}/vector,_ where '_' is a marker that distinguishes the bottom of the list. Choosing a _scalar_ value for the marker means that we can test for it _after_ deconstruct- ing the list: head tail←list head≡_: null Error message considerations ---------------------------- In order to place an error marker accurately under an errant token, each token is bound with its position in the original source, and the pairing maintained throughout the whole of the evaluation. This results in each node in the parse tree's having a position marker. Outer nodes for which there is no corresponding source token carry a dummy position of ¯1. The following example shows the com- plete parse tree, together with token positions for source that has been offset to the right by 100 spaces. disp parse (100/' '),'2+3+4' ⍝ 100-offset expression. ┌→────────────┬───────┬─┐ │┌→┬───────┬─┐│ │ │ ││ │┌→┬───┐│ ││┌→┬───┐│ │ ││2││+│102││3│││+│104││4│ ││ │└→┴~──┘│ ││└→┴~──┘│ │ │└→┴──────→┴→┘│ │ │ └────────────→┴──────→┴→┘ Examples: )cs exp [show] displays the expression tree in graphical form: 't'show parse'1+2-3+4' ⍝ +/- associate left. + ┌─┴┐ - 4 ┌─┴┐ + 3 ┌┴┐ 1 2 't'show parse'1+2*3+4' ⍝ * binds tighter. + ┌───┴┐ + 4 ┌┴─┐ 1 * ┌┴┐ 2 3 't'show parse'1+2*(3+4)' ⍝ ()s override binding. + ┌┴─┐ 1 * ┌┴─┐ 2 + ┌┴┐ 3 4 't'show parse'1*2*3+4∧5∧6' ⍝ ∧ associates right. + ┌─┴─┐ * ∧ ┌─┴┐ ┌┴─┐ * 3 4 ∧ ┌┴┐ ┌┴┐ 1 2 5 6 't'show parse'1+--2*3' ⍝ monadic - binds tightly right. + ┌┴─────┐ 1 * ┌───┴┐ - 3 └─┐ - └┐ 2 Back to: →Implementation_Details←