sum ← {digs←6} ##.chksum array ⍝ Simple ⍺-digit checksum. A checksum can be considered as a "signature" for a set of data and so may be used to verify that the data has not been modified or corrupted. [chksum] returns a (by default 6-digit-) checksum for its argument array ⍵. NB: Check-summing and hashing systems typically map a large set of data values onto a significantly smaller set of sums. It is therefore inevitable that there be "clashes", where different values map to the same sum. This means that a good checksum algorithm guarantees that: - If the checksum has changed, then the array has _certainly_ changed. - If the checksum has not changed, then the array has _probably_ not changed. The art of creating useful checksums is to balance the conflicting requirements to: - Produce a fast-enough summing function for large volumes of data. - Maximise sensitivity to typical changes in the data. For example, a simple approach might be an ⍺-residue sum of the data bytes. How- ever, this would not detect added or removed 0-values or reordering of data items. For this reason, a "weighted sum" is often used. There is a large body of literature devoted to the subject. See, for example: http://en.wikipedia.org/wiki/Checksum Bug: [chksum] ignores array items that are namespace references (refs). Bug: [chksum] ignores ⎕NULL. Bug: [chksum] crashes (DOMAIN ERROR) on encountering a ⎕OR item. Technical notes: [chksum] returns the weighted sum of: the byte vector of: the shape followed by a ¯1 separator followed by the byte vector of the ravel of the array. where byte vectors for various item (⎕DR) types are the: nested: concatenation of the byte vectors of subarrays. boolean: (0 and 1) items themselves. numeric: (256|83 ⎕DR) byte-values. character: (256|83 ⎕DR) byte-values of ⎕UCS unicode indices. Separating the last two cases above ensures that [chksum] returns the same res- ult for character arrays in Unicode and Classic versions of Dyalog. In order to distinguish null arrays of differing types, such arrays are repres- ented by their prototypical items. Examples: chksum ⎕cr'chksum' ⍝ simple char array 314685 chksum ⎕nr'chksum' ⍝ nested .. .. 930686 chksum 1 2 3 ∘.○ 4 5 6 ⍝ simple numeric array. 412967 chksum¨(1 3 2)(2 1 3) ⍝ clash: values with same chksum. 538 538 chksum¨'' ⍬ ⍝ distinct nulls. 1295 1275 ⍝ Checksumming the notes in this workspace is reasonably quick: chksum time notes.(⍎¨↓⎕nl 2) ⍝ time checksumming of notes namespace. 00.16 chksum # ⍝ checksum of ref is 0. 0 See also: time Back to: contents Back to: Workspaces