⍝ Lempel-Ziv-Welch compression:

    packZ'mississippi'                      ⍝ compress char vector
┌──┬─────────────────┬────┐
│11│0 0 0 0 1 1 0 0 0│misp│
│  │0 0 1 1 0 1 1 1 0│    │
│  │0 1 0 0 1 1 1 1 1│    │
└──┴─────────────────┴────┘

    {⍵≡0 packZ packZ ⍵}'mississippi'        ⍝ expand recovers original
1

    2⊥2⊃packZ'mississippi'                  ⍝ ... bool matrix decoded
0 1 2 2 5 7 3 3 1

    ¯8 packZ'mississippi'                   ⍝ final compression dictionary
┌─┬─┬─┬─┬──┬──┬──┬──┬───┬───┬──┬──┐
│m│i│s│p│mi│is│ss│si│iss│sip│pp│pi│
└─┴─┴─┴─┴──┴──┴──┴──┴───┴───┴──┴──┘

    segs←{(¯12 packZ ⍵)[⎕io+2⊥2⊃packZ ⍵]}   ⍝ dictionary entry per code word.

    segs'mississippi'                       ⍝ segments for string.
┌─┬─┬─┬─┬──┬──┬─┬─┬─┐
│m│i│s│s│is│si│p│p│i│
└─┴─┴─┴─┴──┴──┴─┴─┴─┘

    {⍵≡↑,/segs ⍵}'mississippi'              ⍝ enlist of segs recovers original
1
    segs 40⍴'a'                             ⍝ segs for highly repetitive string
┌─┬──┬───┬────┬─────┬──────┬───────┬────────┬────┐
│a│aa│aaa│aaaa│aaaaa│aaaaaa│aaaaaaa│aaaaaaaa│aaaa│
└─┴──┴───┴────┴─────┴──────┴───────┴────────┴────┘

    segs 40⍴'abcd'                          ⍝ segs for less repetitive string
┌─┬─┬─┬─┬──┬──┬───┬──┬──┬───┬───┬───┬────┬─────┬────┬───┐
│a│b│c│d│ab│cd│abc│da│bc│dab│cda│bcd│abcd│abcda│bcda│bcd│
└─┴─┴─┴─┴──┴──┴───┴──┴──┴───┴───┴───┴────┴─────┴────┴───┘

    packZ'hello world'                      ⍝ compressed char vector
┌──┬─────────────────────┬────────┐
│11│0 0 0 0 0 1 1 0 1 0 1│helo wrd│
│  │0 0 1 1 1 0 0 1 1 1 1│        │
│  │0 1 0 0 1 0 1 1 0 0 1│        │
└──┴─────────────────────┴────────┘

    packZ 11 ⎕dr'hello world'               ⍝ compressed bool vector
┌──┬───────────────────────────────────────────────────────────────┬───┐
│88│0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0│0 1│
│  │0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 0 1 0 0 1 1 1 0│   │
│  │0 0 0 0 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0│   │
│  │0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 0 0│   │
│  │0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0 0 0│   │
└──┴───────────────────────────────────────────────────────────────┴───┘

    packZ 83 ⎕DR'hello world'               ⍝ compressed integer vector
┌──┬─────────────────────┬──────────────────────────────┐
│11│0 0 0 0 0 1 1 0 1 0 1│104 101 108 111 32 119 114 100│
│  │0 0 1 1 1 0 0 1 1 1 1│                              │
│  │0 1 0 0 1 0 1 1 0 0 1│                              │
└──┴─────────────────────┴──────────────────────────────┘

    size←{⍬⍴⎕size'⍵'}                       ⍝ function for size in bytes.

:If 32={2×⎕size'⍵'}⍬                        ⍝ 32-bit interpreter:

        size 100 100⍴⍳7                     ⍝ size of numeric matrix.
    10020
        size packZ 100 100⍴⍳7               ⍝ compressed size.
    512
:Else                                       ⍝ 64-bit interpreter:
        size 100 100⍴⍳7                     ⍝ size of numeric matrix.
    10040
        size packZ 100 100⍴⍳7               ⍝ compressed size.
    600

:EndIf

    mat←'Scissors' 'Stone' 'Paper'[2 3⍴1 2 3 2 3 1]

    mat                                     ⍝ nested matrix
┌────────┬─────┬────────┐
│Scissors│Stone│Paper   │
├────────┼─────┼────────┤
│Stone   │Paper│Scissors│
└────────┴─────┴────────┘

:If 32={2×⎕size'⍵'}⍬                        ⍝ 32-bit interpreter:

        size mat                            ⍝ matrix size.
    188
        size packZ ⍉mat                     ⍝ compressed size.
    172
:Else                                       ⍝ 64-bit interpreter:
        size mat                            ⍝ matrix size.
    328
        size packZ ⍉mat                     ⍝ compressed size.
    320

:EndIf

    packZ mat                               ⍝ compressed matrix
┌───┬─────────┬──────────────────────┐
│2 3│0 0 0 1 0│┌────────┬─────┬─────┐│
│   │0 0 1 0 0││Scissors│Stone│Paper││
│   │0 1 0 0 0│└────────┴─────┴─────┘│
└───┴─────────┴──────────────────────┘

    tokens ⊃⎕nr'packZ'                      ⍝ tokens in first line of packZ
┌─────┬─┬─┬───┬─┬───┬─┬─┬─┬─┬──────────────────────────┬───────────────────────────────┐
│packZ│←│{│⎕IO│ │⎕ML│←│0│ │1│                          │⍝ Lempel-Ziv-Welch compression.│
└─────┴─┴─┴───┴─┴───┴─┴─┴─┴─┴──────────────────────────┴───────────────────────────────┘

    chk←{⍺←⊢ ⋄ ⍵≡0 packZ ⍺ packZ ⍵}         ⍝ check unpack pack round trip.

    toks←↑,/tokens¨⎕nr'packZ'               ⍝ nested vector of packZ's tokens.

    chk toks                                ⍝ nested pack: full circle.
1
    chk 'hello world'                       ⍝ char pack: full circle.
1
    chk 11 ⎕dr'hello world'                 ⍝ bool pack: full circle.
1
    packZ{⍵/⍵}⍳8
┌──┬───────────────────────────────────────────┬───────────────┐
│36│0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0 1 1 1│1 2 3 4 5 6 7 8│
│  │0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1│               │
│  │0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0 0 0│               │
│  │0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 1 1 1 1 1 1│               │
│  │0 1 1 0 1 1 1 1 0 0 0 1 1 0 0 0 1 0 1 0 1 0│               │
└──┴───────────────────────────────────────────┴───────────────┘

    ⎕ml←⍴'VMJ'                              ⍝ Check OK in ⎕ML 3.
    'OK'≡0 packZ packZ'OK'
1

⍝∇ packZ tokens

Back to: code

Back to: Workspaces