segs ← tag ##.htx html                      ⍝ Extract html segments.

Extracts [tag]-tagged segments from character array [html].

NB: This function may be coded more simply using system function ⎕XML.

Right argument [html] may be:

    - a character vector, possibly containing linefeed characters, or
    - a character matrix, or
    - a vector of character vectors (as delivered by →getfile←).

If [tag] starts with a '<' character, the <begin> and </end> tags are themselves
included  in  the result, otherwise they are omitted. For aesthetic reasons, the
closing '>' may also be included in [tag], but is ignored.

Technical notes:

The coding is an example of "programming with functions". Notice that nearly all
of the local names refer to functions, rather than to data arrays.

Examples:

    bold←'<b>this</b> and <b>that</b>'

      'b' htx bold                  ⍝ extract <bold> text.
┌────┬────┐
│this│that│
└────┴────┘

    '<b>' htx bold                  ⍝ .. including tags.
┌───────────┬───────────┐
│<b>this</b>│<b>that</b>│
└───────────┴───────────┘

    htm                             ⍝ character vector (with linefeeds).
<html>
  <body>
    <table>
      <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr>
      <tr><td>Guys</td><td>60</td><td>40</td></tr>
      <tr><td>Dolls</td><td>20</td><td>80</td></tr>
    </table>
  </body>
</html>

    'table'htx htm                  ⍝ extract table.
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

    '<table>'htx htm                ⍝ extract table with tags.
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│<table> <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> </table>│
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

    'tr'htx htm                     ⍝ extract table rows.
┌───────────────────────────────────────────┬───────────────────────────────────┬────────────────────────────────────┐
│<td>%</td><td>Eye Poke</td><td>Kumquat</td>│<td>Guys</td><td>60</td><td>40</td>│<td>Dolls</td><td>20</td><td>80</td>│
└───────────────────────────────────────────┴───────────────────────────────────┴────────────────────────────────────┘

    'td'htx htm                     ⍝ extract table data.
┌─┬────────┬───────┬────┬──┬──┬─────┬──┬──┐
│%│Eye Poke│Kumquat│Guys│60│40│Dolls│20│80│
└─┴────────┴───────┴────┴──┴──┴─────┴──┴──┘

    ↑'td'∘htx¨'tr'htx htm           ⍝ extract table data per row.
┌─────┬────────┬───────┐
│%    │Eye Poke│Kumquat│
├─────┼────────┼───────┤
│Guys │60      │40     │
├─────┼────────┼───────┤
│Dolls│20      │80     │
└─────┴────────┴───────┘

See also: Line_vectors html getfile

Back to: contents

Back to: Workspaces