───────────────────────
pack-: Data Compression
───────────────────────
These functions, mostly from Veli-Matti Jantunen, pack and unpack data. The left
argument of each function (default 1) determines whether the right argument
is to be 1:compressed or 0:expanded (mnemonic: 1-pack, 0-unpack).
text ≡ 0 packT 1 packT text
1
packs are suitable for compressing:
B H R S U Z 4 bitmaps and images
D H S T Z text,
N Z sparse, simple numeric or character arrays.
We can test the relative performance of the various approaches with a function
that compresses and expands its argument. [pack_test] takes an array to pack as
right argument and a 1-character algorithm code as left argument.
pack_test←{ ⍝ Test pack/unpack.
pack←⍎'pack',⍺ ⍝ compression function.
ais←⎕AI ⋄ cmp← pack ⍵ ⍝ compressed data.
aic←⎕AI ⋄ exp←0 pack cmp ⍝ re-expanded data.
aix←⎕AI ⍝ expansion time.
exp≢⍵:'N / A' ⍝ unpack & check.
times←¯2-/2⊃¨ais aic aix ⍝ processor times.
mill←,↑6 0∘⍕¨times ⍝ milliseconds processor time.
fmto←⎕SIZE↑2 4⍴'exp cmp ' ⍝ before and after sizes.
pcent←'%',⍨4 0⍕100×÷/⌽-\fmto ⍝ % reduction (bigger is better).
csize←6 0⍕¯1↑fmto ⍝ compressed size.
csize,pcent,mill ⍝ size compression milliseconds.
}
Result fields are:
csize size of compressed data.
redn% percentage reduction (bigger is better).
cmill processor milliseconds for compression.
xmill processor milliseconds for expansion.
The compression algorithms were tested against 3 arrays:
A bitmap:
'dapl'⎕wc'BitMap' 'c:/dyalog90/dyalog.bmp' ⍝ Dyalog APL bitmap
A text vector:
notes.Data_compression.
A sparse numeric matrix:
nmat←{⍵×0=10|⍵}?100 100⍴1000
... leading to:
algs←'BDHNQRSTUX4Z' ⍝ algorithms: packB, ···
data←notes.Data_compression dapl.CBits nmat ⍝ subject data.
size←{stuff←⍵ ⋄ ⎕size'stuff'} ⍝ size function.
ttls←⍕¨'Text' 'BitMap' 'Sparse',¨size¨data ⍝ initial data sizes.
0 disp (' ',algs),ttls⍪algs∘.pack_test data ⍝ comparison table.
┌─┬───────────────────────┬───────────────────────┬───────────────────────┐
│ │ Text 7632 │ BitMap 181940 │ Sparse 20020 │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│B│ 7652 0% 0 16│ 6108 97% 0 16│ 3240 84% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│D│ 4612 40% 15 0│ N / A │ N / A │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│H│ 4552 40% 16 78│ 5800 97% 15 531│ 2508 87% 15 110│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│N│ 6768 11% 0 0│187688 ¯3% 0 0│ 3384 83% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│Q│ 4676 39% 0 16│ 13980 92% 15 32│ 3760 81% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│R│ 9728 ¯27% 16 0│ 15508 91% 0 0│ 5924 70% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│S│ 4764 38% 15 500│ 5800 97% 15 4649│ 2612 87% 15 390│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│T│ 5612 26% 16 0│ N / A │ N / A │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│U│ 7816 ¯2% 0 0│ 5776 97% 16 0│ 10284 49% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│X│ 6704 12% 46 16│ 5780 97% 47 15│ 6560 67% 16 15│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│4│ 8612 ¯13% 125 172│ 12880 93% 265 296│ 6264 69% 109 125│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│Z│ 3816 50% 250 265│ 2536 99% 671 249│ 2556 87% 219 156│
└─┴───────────────────────┴───────────────────────┴───────────────────────┘
For comparison, the following examples show some different packings of the same
array 'Mississippi'.
disp packB'Mississippi' ⍝ simple array packing.
┌→────────┬─────────────────────┬───────────────────────────────┐
│1 11 Misp│1 1 1 0 1 1 0 1 1 0 1│0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1│
└+───────→┴~───────────────────→┴~─────────────────────────────→┘
disp packD'Mississippi' ⍝ Pack char array to boolean vector.
0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1
0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1
0 1 1 0 1 0
disp packH'Mississippi' ⍝ Huffman packing.
┌→─┬────┬───────┬─────────────────────────────────────────┐
│11│sMpi│1 3 3 2│1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 1│
└~→┴───→┴~─────→┴~───────────────────────────────────────→┘
disp packQ'Mississippi' ⍝ Assorted uniQues packer.
┌→───┬────┬───────────────────────────────────────────────────────────┐
│ispM│1 11│1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 1 0│
└───→┴~──→┴~─────────────────────────────────────────────────────────→┘
disp packR'Mississippi' ⍝ RLE packing.
┌→─┬───────────────┬────────┐
│11│1 1 2 1 2 1 2 1│Misisipi│
└~→┴~─────────────→┴───────→┘
disp packS'Mississippi' ⍝ Shannon-Fano packing.
┌→─┬────┬───────┬───────────────────────────────────────────────────────────┐
│11│ispM│1 2 3 3│0 1 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0│
└~→┴───→┴~─────→┴~─────────────────────────────────────────────────────────→┘
disp packU'Mississippi' ⍝ Unique packing.
┌→─┬────┬─────────────────────┐
│11│Misp│0 1 2 2 1 2 2 1 3 3 1│
└~→┴───→┴~───────────────────→┘
disp packX'Mississippi' ⍝ Text packing.
┌→───┬────┬───────────────────────────────────────────────────────────────────┐
│Misp│1 11│0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1│
└───→┴~──→┴~─────────────────────────────────────────────────────────────────→┘
disp pack4'Mississippi' ⍝ Quad-tree packing.
┌→───┬──┬──────────────────────────────────────────┐
│Misp│11│¯3 0 1 ¯1 2 ¯1 2 1 ¯2 2 ¯1 2 1 ¯1 3 ¯1 3 1│
└───→┴~→┴~────────────────────────────────────────→┘
disp packZ'Mississippi' ⍝ LZW packing.
┌→─┬─────────────────┬────┐
│ │0 0 0 0 1 1 0 0 0│ │
│11│0 0 1 1 0 1 1 1 0│Misp│
│ │0 1 0 0 1 1 1 1 1↓ │
└~→┴~───────────────→┴───→┘
⍝ Here are some more examples, using a larger text matrix:
mm←⎕fmt notes.Marilyn ⍝ mm is a simple char matrix. See: →Marilyn←
chk←{ ⍝ check of compress-decompress round-trip.
size←{a←⍵ ⋄ ⍬⍴⎕size'a'}
⍵≡0 ⍺⍺ ⍺⍺ ⍵:size ⍺⍺ ⍵
' Error: compression round-trip fails'
}
⊢ chk mm ⍝ Raw data size
11300
packB chk mm ⍝ Pack a simple array.
11376
pack4 chk mm ⍝ ⍝ Quad-tree packing.
9176
packR chk mm ⍝ Run Length Encoding (RLE packing).
7808
packT chk ,mm ⍝ Simple text vector packager.
7136
packQ chk mm ⍝ Assorted uniQues packer.
5276
packS chk mm ⍝ Shannon-Fano packing
5084
packU chk mm ⍝ Unique packer.
5064
packH chk mm ⍝ Huffman packing.
5028
packX chk mm ⍝ TeXt packer.
3800
packZ chk mm ⍝ Lempel-Ziv-Welch compression.
3612
See also: packB packD packH packN packQ packR
See also: packS packT packU packX packZ pack4
See also: pack base64
Back to: contents
Back to: Workspaces
Trouble seeing APL font?