segs ← tag ##.htx html ⍝ Extract html segments.
Extracts [tag]-tagged segments from character array [html].
NB: This function may be coded more simply using system function ⎕XML, intro-
duced in Dyalog V12.1. <V>
Right argument [html] may be:
- a character vector, possibly containing linefeed characters, or
- a character matrix, or
- a vector of character vectors (as delivered by →getfile←).
If [tag] starts with a '<' character, the <begin> and </end> tags are themselves
included in the result, otherwise they are omitted. For aesthetic reasons, the
closing '>' may also be included in [tag], but is ignored.
Technical notes:
The coding is an example of "programming with functions". Notice that nearly all
of the local names refer to functions, rather than to data arrays.
Examples:
bold←'<b>this</b> and <b>that</b>'
disp 'b' htx bold ⍝ extract <bold> text.
┌→───┬────┐
│this│that│
└───→┴───→┘
disp '<b>' htx bold ⍝ .. including tags.
┌→──────────┬───────────┐
│<b>this</b>│<b>that</b>│
└──────────→┴──────────→┘
htm ⍝ character vector (with linefeeds).
<html>
<body>
<table>
<tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr>
<tr><td>Guys</td><td>60</td><td>40</td></tr>
<tr><td>Dolls</td><td>20</td><td>80</td></tr>
</table>
</body>
</html>
disp 'table'htx htm ⍝ extract table.
┌→────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘
disp '<table>'htx htm ⍝ extract table with tags.
┌→───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│<table> <tr><td>%</td><td>Eye Poke</td><td>Kumquat</td></tr> <tr><td>Guys</td><td>60</td><td>40</td></tr> <tr><td>Dolls</td><td>20</td><td>80</td></tr> </table>│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────→┘
disp 'tr'htx htm ⍝ extract table rows.
┌→──────────────────────────────────────────┬───────────────────────────────────┬────────────────────────────────────┐
│<td>%</td><td>Eye Poke</td><td>Kumquat</td>│<td>Guys</td><td>60</td><td>40</td>│<td>Dolls</td><td>20</td><td>80</td>│
└──────────────────────────────────────────→┴──────────────────────────────────→┴───────────────────────────────────→┘
disp 'td'htx htm ⍝ extract table data.
┌→┬────────┬───────┬────┬──┬──┬─────┬──┬──┐
│%│Eye Poke│Kumquat│Guys│60│40│Dolls│20│80│
└→┴───────→┴──────→┴───→┴─→┴─→┴────→┴─→┴─→┘
disp ↑'td'∘htx¨'tr'htx htm ⍝ extract table data per row.
┌→────┬────────┬───────┐
↓ % │Eye Poke│Kumquat│
├────→┼───────→┼──────→┤
│Guys │ 60 │ 40 │
├────→┼───────→┼──────→┤
│Dolls│ 20 │ 80 │
└────→┴───────→┴──────→┘
See also: Line_vectors html getfile
Back to: contents
Back to: Workspaces
Trouble seeing APL font?