sum ← {digs←6} ##.chksum array              ⍝ Simple ⍺-digit checksum.

A checksum  can  be  considered as a "signature" for a set of data and so may be
used to verify that the data has not been modified or corrupted.

[chksum] returns a (by default 6-digit-) checksum for its argument array ⍵.

NB: Check-summing  and  hashing systems typically map a large set of data values
onto a significantly smaller set of sums.  It is therefore inevitable that there
be  "clashes",  where different values map to the same sum.

This means that a good checksum algorithm guarantees that:

- If the checksum has changed, then the array has _certainly_ changed.
- If the checksum has not changed, then the array has _probably_ not changed.

The  art of creating useful checksums is to balance the conflicting requirements

- Produce a fast-enough summing function for large volumes of data.
- Maximise sensitivity to typical changes in the data.

For example, a simple approach might be an ⍺-residue sum of the data bytes. How-
ever,  this  would  not  detect  added or removed 0-values or reordering of data
items. For this reason, a "weighted sum" is often used.

There is a large body of literature devoted to the subject. See, for example:

Bug: [chksum] ignores array items that are namespace references (refs).
Bug: [chksum] ignores ⎕NULL.
Bug: [chksum] crashes (DOMAIN ERROR) on encountering a ⎕OR item.

Technical notes:

[chksum] returns the weighted sum of:
    the byte vector of: the shape followed by a ¯1 separator
        followed by
    the byte vector of the ravel of the array.

where byte vectors for various item (⎕DR) types are the:

    nested:     concatenation of the byte vectors of subarrays.
    boolean:    (0 and 1) items themselves.
    numeric:    (256|83 ⎕DR) byte-values.
    character:  (256|83 ⎕DR) byte-values of ⎕UCS unicode indices.

Separating  the last two cases above ensures that [chksum] returns the same res-
ult for character arrays in Unicode and Classic versions of Dyalog.

In  order to distinguish null arrays of differing types, such arrays are repres-
ented by their prototypical items.


    chksum ⎕cr'chksum'              ⍝ simple char array

    chksum ⎕nr'chksum'              ⍝ nested    ..  ..

    chksum 1 2 3 ∘.○ 4 5 6          ⍝ simple numeric array.

    chksum¨(1 3 2)(2 1 3)           ⍝ clash: values with same chksum.
538 538

    chksum¨'' ⍬                     ⍝ distinct nulls.
1295 1275

    ⍝ Checksumming the notes in this workspace is reasonably quick:

    chksum time notes.(⍎¨↓⎕nl 2)    ⍝ time checksumming of notes namespace.

    chksum #                        ⍝ checksum of ref is 0.

See also: time

Back to: contents

Back to: Workspaces