───────────────────────
pack-: Data Compression
───────────────────────
These functions, mostly from Veli-Matti Jantunen, pack and unpack data. The left
argument of each function (default 1) determines whether the right argument
is to be 1:compressed or 0:expanded (mnemonic: 1-pack, 0-unpack).
text ≡ 0 packT 1 packT text
1
packs are suitable for compressing:
B H R S U Z 4 bitmaps and images
D H S T Z text,
N Z sparse, simple numeric or character arrays.
We can test the relative performance of the various approaches with a function
that compresses and expands its argument. [pack_test] takes an array to pack as
right argument and a 1-character algorithm code as left argument.
pack_test←{ ⍝ Test pack/unpack.
pack←⍎'pack',⍺ ⍝ compression function.
ais←⎕AI ⋄ cmp← pack ⍵ ⍝ compressed data.
aic←⎕AI ⋄ exp←0 pack cmp ⍝ re-expanded data.
aix←⎕AI ⍝ expansion time.
exp≢⍵:'N / A' ⍝ unpack & check.
times←¯2-/2⊃¨ais aic aix ⍝ processor times.
mill←,↑6 0∘⍕¨times ⍝ milliseconds processor time.
fmto←⎕SIZE↑2 4⍴'exp cmp ' ⍝ before and after sizes.
pcent←'%',⍨4 0⍕100×÷/⌽-\fmto ⍝ % reduction (bigger is better).
csize←6 0⍕¯1↑fmto ⍝ compressed size.
csize,pcent,mill ⍝ size compression milliseconds.
}
Result fields are:
csize size of compressed data.
redn% percentage reduction (bigger is better).
cmill processor milliseconds for compression.
xmill processor milliseconds for expansion.
The compression algorithms were tested against 3 arrays:
A bitmap:
'dapl'⎕wc'BitMap' 'c:/dyalog90/dyalog.bmp' ⍝ Dyalog APL bitmap
A text vector:
notes.Data_compression.
A sparse numeric matrix:
nmat←{⍵×0=10|⍵}?100 100⍴1000
... leading to:
algs←'BDHNQRSTUX4Z' ⍝ algorithms: packB, ···
data←notes.Data_compression dapl.CBits nmat ⍝ subject data.
size←{⎕size'⍵'} ⍝ size function.
ttls←⍕¨'Text' 'BitMap' 'Sparse',¨size¨data ⍝ initial data sizes.
(' ',algs),ttls⍪algs∘.pack_test data ⍝ comparison table.
┌─┬───────────────────────┬───────────────────────┬───────────────────────┐
│ │Text 7632 │BitMap 181940 │Sparse 20020 │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│B│ 7652 0% 0 16│ 6108 97% 0 16│ 3240 84% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│D│ 4612 40% 15 0│N / A │N / A │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│H│ 4552 40% 16 78│ 5800 97% 15 531│ 2508 87% 15 110│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│N│ 6768 11% 0 0│187688 ¯3% 0 0│ 3384 83% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│Q│ 4676 39% 0 16│ 13980 92% 15 32│ 3760 81% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│R│ 9728 ¯27% 16 0│ 15508 91% 0 0│ 5924 70% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│S│ 4764 38% 15 500│ 5800 97% 15 4649│ 2612 87% 15 390│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│T│ 5612 26% 16 0│N / A │N / A │
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│U│ 7816 ¯2% 0 0│ 5776 97% 16 0│ 10284 49% 0 0│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│X│ 6704 12% 46 16│ 5780 97% 47 15│ 6560 67% 16 15│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│4│ 8612 ¯13% 125 172│ 12880 93% 265 296│ 6264 69% 109 125│
├─┼───────────────────────┼───────────────────────┼───────────────────────┤
│Z│ 3816 50% 250 265│ 2536 99% 671 249│ 2556 87% 219 156│
└─┴───────────────────────┴───────────────────────┴───────────────────────┘
For comparison, the following examples show some different packings of the same
array 'Mississippi'.
packB'Mississippi' ⍝ simple array packing.
┌─────────┬─────────────────────┬───────────────────────────────┐
│1 11 Misp│1 1 1 0 1 1 0 1 1 0 1│0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1│
└─────────┴─────────────────────┴───────────────────────────────┘
packD'Mississippi' ⍝ Pack char array to boolean vector.
0 0 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1
0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1
0 1 1 0 1 0
packH'Mississippi' ⍝ Huffman packing.
┌──┬────┬───────┬─────────────────────────────────────────┐
│11│sMpi│1 3 3 2│1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 1│
└──┴────┴───────┴─────────────────────────────────────────┘
packQ'Mississippi' ⍝ Assorted uniQues packer.
┌────┬────┬───────────────────────────────────────────────────────────┐
│ispM│1 11│1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 1 0│
└────┴────┴───────────────────────────────────────────────────────────┘
packR'Mississippi' ⍝ RLE packing.
┌──┬───────────────┬────────┐
│11│1 1 2 1 2 1 2 1│Misisipi│
└──┴───────────────┴────────┘
packS'Mississippi' ⍝ Shannon-Fano packing.
┌──┬────┬───────┬───────────────────────────────────────────────────────────┐
│11│ispM│1 2 3 3│0 1 0 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0│
└──┴────┴───────┴───────────────────────────────────────────────────────────┘
packU'Mississippi' ⍝ Unique packing.
┌──┬────┬─────────────────────┐
│11│Misp│0 1 2 2 1 2 2 1 3 3 1│
└──┴────┴─────────────────────┘
packX'Mississippi' ⍝ Text packing.
┌────┬────┬───────────────────────────────────────────────────────────────────┐
│Misp│1 11│0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 1│
└────┴────┴───────────────────────────────────────────────────────────────────┘
pack4'Mississippi' ⍝ Quad-tree packing.
┌────┬──┬──────────────────────────────────────────┐
│Misp│11│¯3 0 1 ¯1 2 ¯1 2 1 ¯2 2 ¯1 2 1 ¯1 3 ¯1 3 1│
└────┴──┴──────────────────────────────────────────┘
packZ'Mississippi' ⍝ LZW packing.
┌──┬─────────────────┬────┐
│11│0 0 0 0 1 1 0 0 0│Misp│
│ │0 0 1 1 0 1 1 1 0│ │
│ │0 1 0 0 1 1 1 1 1│ │
└──┴─────────────────┴────┘
⍝ Here are some more examples, using a larger text matrix:
mm←⎕fmt notes.Marilyn ⍝ mm is a simple char matrix. See: →Marilyn←
chk←{ ⍝ check of compress-decompress round-trip.
size←{⍬⍴⎕size'⍵'}
⍵≡0 ⍺⍺ ⍺⍺ ⍵:size ⍺⍺ ⍵
' Error: compression round-trip fails'
}
⊢ chk mm ⍝ Raw data size
11300
packB chk mm ⍝ Pack a simple array.
11376
pack4 chk mm ⍝ ⍝ Quad-tree packing.
9176
packR chk mm ⍝ Run Length Encoding (RLE packing).
7808
packT chk ,mm ⍝ Simple text vector packager.
7136
packQ chk mm ⍝ Assorted uniQues packer.
5276
packS chk mm ⍝ Shannon-Fano packing
5084
packU chk mm ⍝ Unique packer.
5064
packH chk mm ⍝ Huffman packing.
5028
packX chk mm ⍝ TeXt packer.
3800
packZ chk mm ⍝ Lempel-Ziv-Welch compression.
3612
See also: packB packD packH packN packQ packR
See also: packS packT packU packX packZ pack4
See also: pack base64 shannon
Back to: contents
Back to: Workspaces