───────────────────────
pack-: Data Compression
───────────────────────
These functions, mostly from Veli-Matti Jantunen, pack and unpack data. The left
argument  of  each  function  (default 1)  determines whether the right argument
is to be 1:compressed or 0:expanded (mnemonic: 1-pack, 0-unpack).

          text ≡ 0 packT 1 packT text
    1

packs are suitable for compressing:

    B H R S U Z 4   bitmaps and images
    D H S T Z       text,
    N Z             sparse, simple numeric or character arrays.

We  can  test the relative performance of the various approaches with a function
that compresses and expands  its argument. [pack_test] takes an array to pack as
right argument and a 1-character algorithm code as left argument.

    pack_test←{                         ⍝ Test pack/unpack.
        pack←⍎'pack',⍺                  ⍝ compression function.
        ais←⎕AI ⋄ cmp←  pack ⍵          ⍝ compressed data.
        aic←⎕AI ⋄ exp←0 pack cmp        ⍝ re-expanded data.
        aix←⎕AI                         ⍝ expansion time.
        exp≢⍵:'N / A'                   ⍝ unpack & check.
        times←¯2-/2⊃¨ais aic aix        ⍝ processor times.
        mill←,↑6 0∘⍕¨times              ⍝ milliseconds processor time.
        fmto←⎕SIZE↑2 4⍴'exp cmp '       ⍝ before and after sizes.
        pcent←'%',⍨4 0⍕100×÷/⌽-\fmto    ⍝ % reduction (bigger is better).
        csize←6 0⍕¯1↑fmto               ⍝ compressed size.
        csize,pcent,mill                ⍝ size compression milliseconds.
    }

Result fields are:

    csize   size of compressed data.
    redn%   percentage reduction (bigger is better).
    cmill   processor milliseconds for compression.
    xmill   processor milliseconds for expansion.

The compression algorithms were tested against 3 arrays:

A bitmap:

      'dapl'⎕wc'BitMap' 'c:/dyalog90/dyalog.bmp'    ⍝ Dyalog APL bitmap

A text vector:

      notes.Data_compression.

A sparse numeric matrix:

      nmat←{⍵×0=10|⍵}?100 100⍴1000

... leading to:

      algs←'BDHNQRSTUX4Z'                           ⍝ algorithms: packB, ···
      data←notes.Data_compression dapl.CBits nmat   ⍝ subject data.
      size←{⎕size'⍵'}                               ⍝ size function.
      ttls←⍕¨'Text' 'BitMap' 'Sparse',¨size¨data    ⍝ initial data sizes.

      (' ',algs),ttls⍪algs∘.pack_test data          ⍝ comparison table.
┌─┬───────────────────────┬───────────────────────┬───────────────────────┐
│ │Text 7632              │BitMap 181940          │Sparse 20020           │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│B│  7652   0%     0    16│  6108  97%     0    16│  3240  84%     0     0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│D│  4612  40%    15     0│N / A                  │N / A                  │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│H│  4552  40%    16    78│  5800  97%    15   531│  2508  87%    15   110│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│N│  6768  11%     0     0│187688  ¯3%     0     0│  3384  83%     0     0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│Q│  4676  39%     0    16│ 13980  92%    15    32│  3760  81%     0     0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│R│  9728 ¯27%    16     0│ 15508  91%     0     0│  5924  70%     0     0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│S│  4764  38%    15   500│  5800  97%    15  4649│  2612  87%    15   390│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│T│  5612  26%    16     0│N / A                  │N / A                  │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│U│  7816  ¯2%     0     0│  5776  97%    16     0│ 10284  49%     0     0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│X│  6704  12%    46    16│  5780  97%    47    15│  6560  67%    16    15│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│4│  8612 ¯13%   125   172│ 12880  93%   265   296│  6264  69%   109   125│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│Z│  3816  50%   250   265│  2536  99%   671   249│  2556  87%   219   156│
└─┴───────────────────────┴───────────────────────┴───────────────────────┘

For  comparison, the following examples show some different packings of the same
array 'Mississippi'.

    packB'Mississippi'                  ⍝ simple array packing.
┌─────────┬─────────────────────┬───────────────────────────────┐
│1 11 Misp│1 1 1 0 1 1 0 1 1 0 1│0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1│
└─────────┴─────────────────────┴───────────────────────────────┘

    packD'Mississippi'                  ⍝ Pack char array to boolean vector.
0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1
      0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1
      0 1 1 0 1 0

    packH'Mississippi'                  ⍝ Huffman packing.
┌──┬────┬───────┬─────────────────────────────────────────┐
│11│sMpi│1 3 3 2│1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 1│
└──┴────┴───────┴─────────────────────────────────────────┘

    packQ'Mississippi'                  ⍝ Assorted uniQues packer.
┌────┬────┬───────────────────────────────────────────────────────────┐
│ispM│1 11│1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 1 0│
└────┴────┴───────────────────────────────────────────────────────────┘

    packR'Mississippi'                  ⍝ RLE packing.
┌──┬───────────────┬────────┐
│11│1 1 2 1 2 1 2 1│Misisipi│
└──┴───────────────┴────────┘

    packS'Mississippi'                  ⍝ Shannon-Fano packing.
┌──┬────┬───────┬───────────────────────────────────────────────────────────┐
│11│ispM│1 2 3 3│0 1 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0│
└──┴────┴───────┴───────────────────────────────────────────────────────────┘

    packU'Mississippi'                  ⍝ Unique packing.
┌──┬────┬─────────────────────┐
│11│Misp│0 1 2 2 1 2 2 1 3 3 1│
└──┴────┴─────────────────────┘

    packX'Mississippi'                  ⍝ Text packing.
┌────┬────┬───────────────────────────────────────────────────────────────────┐
│Misp│1 11│0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1│
└────┴────┴───────────────────────────────────────────────────────────────────┘

    pack4'Mississippi'                  ⍝ Quad-tree packing.
┌────┬──┬──────────────────────────────────────────┐
│Misp│11│¯3 0 1 ¯1 2 ¯1 2 1 ¯2 2 ¯1 2 1 ¯1 3 ¯1 3 1│
└────┴──┴──────────────────────────────────────────┘

    packZ'Mississippi'                  ⍝ LZW packing.
┌──┬─────────────────┬────┐
│11│0 0 0 0 1 1 0 0 0│Misp│
│  │0 0 1 1 0 1 1 1 0│    │
│  │0 1 0 0 1 1 1 1 1│    │
└──┴─────────────────┴────┘

    ⍝ Here are some more examples, using a larger text matrix:

    mm←⎕fmt notes.Marilyn       ⍝ mm is a simple char matrix. See: →Marilyn←

    chk←{                       ⍝ check of compress-decompress round-trip.
        size←{⍬⍴⎕size'⍵'}
        ⍵≡0 ⍺⍺ ⍺⍺ ⍵:size ⍺⍺ ⍵
        ' Error: compression round-trip fails'
    }

        ⊢ chk mm    ⍝ Raw data size
11300
    packB chk mm    ⍝ Pack a simple array.
11376
    pack4 chk mm    ⍝ ⍝ Quad-tree packing.
9176
    packR chk mm    ⍝ Run Length Encoding (RLE packing).
7808
    packT chk ,mm   ⍝ Simple text vector packager.
7136
    packQ chk mm    ⍝ Assorted uniQues packer.
5276
    packS chk mm    ⍝ Shannon-Fano packing
5084
    packU chk mm    ⍝ Unique packer.
5064
    packH chk mm    ⍝ Huffman packing.
5028
    packX chk mm    ⍝ TeXt packer.
3800
    packZ chk mm    ⍝ Lempel-Ziv-Welch compression.
3612

See also: packB packD packH packN packQ packR
See also: packS packT packU packX packZ pack4
See also: pack base64 shannon

Back to: contents

Back to: Workspaces