segs ← tag ##.htx html                      ⍝ Extract html segments.
Extracts [tag]-tagged segments from character array [html].
NB: This function may be coded more simply using system function ⎕XML.
Right argument [html] may be:
    - a character vector, possibly containing linefeed characters, or
    - a character matrix, or
    - a vector of character vectors (as delivered by →getfile←).
If [tag] starts with a '<' character, the <begin> and </end> tags are themselves
included  in  the result, otherwise they are omitted. For aesthetic reasons, the
closing '>' may also be included in [tag], but is ignored.
Technical notes:
The coding is an example of "programming with functions". Notice that nearly all
of the local names refer to functions, rather than to data arrays.
Examples:
    bold←'<b>this</b> and <b>that</b>'
      'b' htx bold                  ⍝ extract <bold> text.
┌────┬────┐
│this│that│
└────┴────┘
    '<b>' htx bold                  ⍝ .. including tags.
┌───────────┬───────────┐
│<b>this</b>│<b>that</b>│
└───────────┴───────────┘
    htm                             ⍝ character vector (with linefeeds).
<html>
  <body>
    <table>
      <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr>
      <tr><td>Guys</td><td>60</td><td>40</td></tr>
      <tr><td>Dolls</td><td>20</td><td>80</td></tr>
    </table>
  </body>
</html>
    'table'htx htm                  ⍝ extract table.
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    '<table>'htx htm                ⍝ extract table with tags.
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│<table> <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> </table>│
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    'tr'htx htm                     ⍝ extract table rows.
┌───────────────────────────────────────────┬───────────────────────────────────┬────────────────────────────────────┐
│<td>%</td><td>Eye Poke</td><td>Kumquat</td>│<td>Guys</td><td>60</td><td>40</td>│<td>Dolls</td><td>20</td><td>80</td>│
└───────────────────────────────────────────┴───────────────────────────────────┴────────────────────────────────────┘
    'td'htx htm                     ⍝ extract table data.
┌─┬────────┬───────┬────┬──┬──┬─────┬──┬──┐
│%│Eye Poke│Kumquat│Guys│60│40│Dolls│20│80│
└─┴────────┴───────┴────┴──┴──┴─────┴──┴──┘
    ↑'td'∘htx¨'tr'htx htm           ⍝ extract table data per row.
┌─────┬────────┬───────┐
│%    │Eye Poke│Kumquat│
├─────┼────────┼───────┤
│Guys │60      │40     │
├─────┼────────┼───────┤
│Dolls│20      │80     │
└─────┴────────┴───────┘
See also: Line_vectors html getfile
Back to: contents
Back to: Workspaces