segs ← tag ##.htx html                      ⍝ Extract html segments.

Extracts [tag]-tagged segments from character array [html].

NB:  This  function  may be coded more simply using system function ⎕XML, intro-
duced in Dyalog V12.1.                                                      <V>

Right argument [html] may be:

    - a character vector, possibly containing linefeed characters, or
    - a character matrix, or
    - a vector of character vectors (as delivered by →getfile←).

If [tag] starts with a '<' character, the <begin> and </end> tags are themselves
included  in  the result, otherwise they are omitted. For aesthetic reasons, the
closing '>' may also be included in [tag], but is ignored.

Technical notes:

The coding is an example of "programming with functions". Notice that nearly all
of the local names refer to functions, rather than to data arrays.

Examples:

    bold←'<b>this</b> and <b>that</b>'

    disp   'b' htx bold             ⍝ extract <bold> text.
┌→───┬────┐
│this│that│
└───→┴───→┘

    disp '<b>' htx bold             ⍝ .. including tags.
┌→──────────┬───────────┐
│<b>this</b>│<b>that</b>│
└──────────→┴──────────→┘

    htm                             ⍝ character vector (with linefeeds).
<html>
  <body>
    <table>
      <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr>
      <tr><td>Guys</td><td>60</td><td>40</td></tr>
      <tr><td>Dolls</td><td>20</td><td>80</td></tr>
    </table>
  </body>
</html>

      disp 'table'htx htm            ⍝ extract table.
┌→────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘

    disp '<table>'htx htm           ⍝ extract table with tags.
┌→───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│<table> <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> </table>│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘

    disp 'tr'htx htm                ⍝ extract table rows.
┌→──────────────────────────────────────────┬───────────────────────────────────┬────────────────────────────────────┐
│<td>%</td><td>Eye Poke</td><td>Kumquat</td>│<td>Guys</td><td>60</td><td>40</td>│<td>Dolls</td><td>20</td><td>80</td>│
└──────────────────────────────────────────→┴──────────────────────────────────→┴───────────────────────────────────→┘

    disp 'td'htx htm                ⍝ extract table data.
┌→┬────────┬───────┬────┬──┬──┬─────┬──┬──┐
│%│Eye Poke│Kumquat│Guys│60│40│Dolls│20│80│
└→┴───────→┴──────→┴───→┴─→┴─→┴────→┴─→┴─→┘

    disp ↑'td'∘htx¨'tr'htx htm      ⍝ extract table data per row.
┌→────┬────────┬───────┐
↓  %  │Eye Poke│Kumquat│
├────→┼───────→┼──────→┤
│Guys │   60   │  40   │
├────→┼───────→┼──────→┤
│Dolls│   20   │  80   │
└────→┴───────→┴──────→┘

See also: Line_vectors html getfile

Back to: contents

Back to: Workspaces

Trouble seeing APL font?