---------------
Dyalog DataBase
---------------
The  functions in the ddb namespace are used to maintain simple data arrays in a
single mapped file. They provide a robust alternative to an "inverted" component
file, as long as the maximum size of the data in each field may be fixed at cre-
ation time.

See →RefCard← for definitions of each of the ddb functions:

    create      Create table.
    remove      Remove table.

    append      Append row(s) to table.
    retain      Retain only selected rows.

    open        Open table {read/write}.
    defs        Field definitions.

    get         Get field(s) from table.
    put         Replace vals in field(s).

Tables
------
A  table is a two-dimensional entity with a varying number of _rows_ and a fixed
number of named columns, called _fields_.

Each  (row;field) _cell_ in the table contains a simple array of any rank and of
one of the primitive types: 80 160 320 83 163 323 645.

The  values  in  each row of a particular field must have the same type and rank
but may have differing shapes up to the maximum declared for the field.

Here is a picture of a typical table:

         name       age  girth         siblings         ←--- field names.
        ┌──────────┬───┬──────────────┬──────────┐
    [1] │Tom       │ 24│44.0 38.5 42.0│Elizabeth │
        │          │   │              │Richard   │
        ├──────────┼───┼──────────────┼──────────┤
    [2] │Dick      │ 35│44.5 38.0 42.0│          │
        ├──────────┼───┼──────────────┼──────────┤
    [3] │Harry     │ 12│32.0 30.0 30.5│Edward    │
        ├──────────┼───┼──────────────┼──────────┤
    [4] │Sally     │ 19│34.0 30.0 34.5│          │
        └──────────┴───┴──────────────┴──────────┘
     ↑   ↑            ↑          ↑     ↑  name      type  max shape
     │   │            │          │     │  ----      ----  ---------
     │   │            │          │     └── siblings   80   6 10 (matrix)
     │   │            │          └──────── girth     645   3    (vector)
     │   │            └─────────────────── age        83   ⍬    (scalar)
     │   └──────────────────────────────── name       80  10    (vector)
     │
     └── row numbers.

Rows  are  selected  using  a boolean "mask" vector rather than an index vector,
which means that questions of index origin do not arise.

Fields are identified by _name_.

Fixed vs. Variable axes
-----------------------
The type and maximum shape of each value in a field are specified when the table
is  created; these quantities determine how much space is allocated for each row
as the mapped file grows.

Each axis of the given maximum shape is declared as either positive or negative.
A  positive axis accommodates a _fixed_ number of items, whereas a negative axis
accommodates a _variable_ number of items (less than or equal to the maximum).

Data  stored along a _fixed_ axis are padded to the maximum axis length, whereas
with a _variable_ axis, the axis value for the particular datum is stored in the
file, and restored when the value is accessed.

If one or more of the axes of a given field is negative, the field is said to be
_nested_. Otherwise, the field is termed _simple_. See below for a discussion of
the relative merits of nested and simple fields.

A small example shows the distinction:

        'tmp' ddb.create ('en' 80 10)('fr' 80 ¯10)  ⍝ table creation.
                               ⍝   │            └──── nested field.
                               ⍝   └───────────────── simple field.

        e←3 5⍴'one  two  three'                     ⍝ simple char matrix.
        f←'un' 'deux' 'trois'                       ⍝ nested char vectors.

        'tmp' ddb.append e f                        ⍝ append 3 rows to table.

        display 'tmp' ddb.get 'en'                  ⍝ fetch simple field.
    ┌→─────────┐
    ↓one       │
    │two       │
    │three     │
    └──────────┘

        display 'tmp' ddb.get 'fr'                  ⍝ fetch nested field.
    ┌→────────────────────┐
    │ ┌→─┐ ┌→───┐ ┌→────┐ │
    │ │un│ │deux│ │trois│ │
    │ └──┘ └────┘ └─────┘ │
    └∊────────────────────┘

        ddb.remove'tmp'                             ⍝ remove table.

Pros and cons
-------------                    simple  nested
                                ┌───────┬───────┐
  File space (disk) requirement │ less  │ more  │
                                ├───────┼───────┤
                Speed of access │quicker│slower │
                                ├───────┼───────┤
             Shape retained per │column │ cell  │
                                └───────┴───────┘

Note  that  avoiding  negative fields for reasons of speed is a false economy if
code  in  the  calling application would then be obliged to split and mix simple
cell values!

See section "Test Scripts" below for some timing samples.

Field Definitions
-----------------
Each field has a:

    name    character vector.
    type    one of 80 160 320 83 163 323 645.
    shape   numeric vector specifying the (maximum) shape of data values.

For example:

    'name'   80  20         20-item character vector.
    'age'    83             8-bit integer scalar.
    'sales' 645  10 12      10 × 12 double floating point matrix.

The corresponding field specification would look like this:

    ('name' 80 20)('age' 83)('sales' 645 10 12)

Note  that boolean (type 11) fields are not supported (of course, boolean values
may be stored in a table by using ⎕dr to convert to and from a supported type).

Table Handles
-------------
Functions  [create] and [open] return a _handle_ to the table for faster access.
Note  that  for  convenience,  many of the functions accept either a handle or a
table name.

A  handle  is a (ref to a) namespace containing, amongst other things, a ⎕map of
the table. Note that, even with a large table, this handle consumes little work-
space as the mapping uses virtual memory _outside_ the workspace.

Variables within the handle space are:

    access  read/write 'r' or 'w'.
    names   nested vector of field names.
    shapes  maximum shape for each field (-ive for variable axes).
    types   field types: 80 160 320 83 163 323 645
    tname   file name of table.

together with some extra "internal" information that is not of general interest:

    _cols   number of bytes per row of mapped matrix.
    _cvex   byte column masks.
    _flds   number of fields.
    _map    (83 ¯1 _cols)-mapped file.
    _pvex   field partition vectors.
    _skip   number of rows occupied by header.
    _svex   field shape partition vectors.
    _xtra   extra bytes for shapes of values with negative type.
    _xvex   vector of column indices per field.

Tip: note that (tab.names) is a convenient alternative to (⊃¨ddb.defs tab).

Usage
-----
Copy  the  ddb namespace into your application and prefix calls on the functions
with "ddb.":

    )copy  ddb  ddb
... saved ...

    'emp' ddb.get 'salary'      ⍝ use [get] from ddb namespace.

Importing databases from SQL
----------------------------
Function  #.import  attempts to import tables from SQL using Dyalog.Net. This is
intended  primarily to provide some test data for experimentation; the import is
not entirly accurate, as some field types are excluded.

          import'northwind'                     ⍝ import database.
    Orders
    Products
    Order Details
    CustomerCustomerDemo
    CustomerDemographics
        Type 99 field "CustomerDesc" not imported
    Region
    Territories
    EmployeeTerritories
    Employees
        Type 34 field "Photo" not imported
        Type 99 field "Notes" not imported
    Categories
        Type 99 field "Description" not imported
        Type 34 field "Picture" not imported
    Customers
    Shippers
    Suppliers
        Type 99 field "HomePage" not imported

          ↑ddb.defs'northwind/Orders'           ⍝ defs for Orders table.
     OrderID         323
     CustomerID       80             5
     EmployeeID      323
     OrderDate       323
     RequiredDate    323
     ShippedDate     323
     ShipVia         323
     Freight         645
     ShipName         80            40
     ShipAddress      80            60
     ShipCity         80            15
     ShipRegion       80            15
     ShipPostalCode   80            10
     ShipCountry      80            15

          'northwind/Orders' ddb.get 'ShipName' ⍝ ShipName from Orders table.
    Vins et alcools Chevalier
    Toms Spezialitäten
    Hanari Carnes
    Victuailles en stock
    Suprêmes délices
    Hanari Carnes
    Chop-suey Chinese
    Richter Supermarkt
    Wellington Importadora
    HILARION-Abastos
    ...

Technical notes
---------------
Field definitions are stored at the start of the file in the following format:

        ┌──────────────── number of bytes.
        │   ┌──────────── field name
        │   │       ┌──── description
        │   │       │
        1   mag0    magic number 0xaa => Dyalog file.
        1   mag1    magic number 0x80 => ddb file.
        1   vers    version 0-255.
        1   mcid    machine architecture.

        4   _skip   number of table rows occupied by this header.
        4   _cols   row width in bytes.
        4   _flds   number of fields.

  _flds×2  types   field types:  80 160 320 83 163 323 645
  _flds×1  ranks   rank of each field row.
4×+/ranks  shapes  concatenation of +/- shape words.
  _flds×?  names   null terminated field names.
        ?  padding to whole number (_skip) of rows.

Simple  fields are stored as a single simple matrix padded to ×/¨tab.shapes col-
umns. For example:

        'tmp' ddb.create ('name' 80 8)     ('age' 83)   ⍝ two simple fields.
        'tmp' ddb.append (2 6⍴'TinkerTailor') (23 45)   ⍝ two rows of values.

would be represented in the (bytes of the) mapped file:

     <name-------------------------><age>
    ┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
    │ T │ i │ n │ k │ e │ r │   │   │ 23│   Fields are padded to max shape.
    ├───┼───┼───┼───┼───┼───┼───┼───┼───┤
    │ T │ a │ i │ l │ o │ r │   │   │ 45│
    └───┴───┴───┴───┴───┴───┴───┴───┴───┘

Nested fields are  prefixed with tab._xtra columns of bytes, which determine the
shape of each row/field item. For example:

        'tmp' ddb.create ('name' 80 ¯8)    ('age' 83)
        'tmp' ddb.append ('Soldier' 'Sailor') (23 45)

would be represented in the mapped file with an extra 1-byte column of shape:

    <shp><name-------------------------><age>
    ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
    │  7│ S │ o │ l │ d │ i │ e │ r │   │ 23│
    ├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
    │  6│ S │ a │ i │ l │ o │ r │   │   │ 45│
    └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘

For fields of maximum size greater than 255, 2-bytes of shape would be required:

        'tmp' ddb.create 'addr' 80 ¯300
        'tmp' ddb.append 'Buckingham Palace' 'Rotten Row'

    <shape-> <name--------------------------
    ┌───────┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬
    │     17│ B │ u │ c │ k │ i │ n │ g │ h │ a │ m │   │ P │ a
    ├───────┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼─
    │     10│ R │ o │ t │ t │ e │ n │   │ R │ o │ w │ · │
    └───────┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴

Finally,  for  fields  of higher rank, several shape columns would be needed (in
the following case, 1+1+2 bytes per row):

        'tmp' ddb.create 'cuboids' 645 ¯10 ¯20 ¯200
        'tmp' ddb.append (3 4 5⍴...)(10 11 199⍴...)

    <shape---------> <cube--------------------
    ┌───┬───┬───────┬───────────
    │  3│  4│      5│ ravel of items in field ...
    ├───┼───┼───────┼───────────
    │ 10│ 11│    199│ ravel of items in field ...
    └───┴───┴───────┴───────────

Note  that  the  width  in bytes needed to store each axis of the shape for each
field is easily determined from the field's definition: ⌊(⎕DR¨tab.shapes)÷80).

Internal Mapping Type
---------------------
The  ddb  functions map the file as a type 83 matrix of 8-bit integers. A disad-
vantage  of  this  is that care must be taken to avoid values being unexpectedly
demoted to type 11 (boolean) if all items happen to be 0s or 1s. To ensure this,
[get] fixes the values at type 83 using dyadic ⎕DR.

Mapping  the file as type 11 or 82 would remove the need for this additional op-
eration. However, mapping to type:

11: would  have  the advantage of allowing support for boolean data types in the
    file but would increase the time taken to select rows from the mapped matrix
    by a significant factor.

82: would  incur extra character translations at get and put time. This could be
    avoided  by  setting  ⎕DR's  translation table to 0-255. However, this would
    need to be set and restored on each operation or set globally in the applic-
    ation, which might have undesirable effects on other elements of the system.

On balance, mapping to type 83 seems to be our best bet.

Function  [_83mat]  is  a  support function for [append] and [put] and would not
normally  be called directly from application code. Notice how [_83mat] accumul-
ates _functions_ as opposed to _arrays_, as a way of preventing the retention of
potentially large copies of the data in various states of conversion. As a simp-
le  example  of  this technique, the following functions achieve the same effect
but the second one uses less intermediate workspace:

    greedy←{                ⍝ matrix with embedded nulls from vector of vectors.
        vex←⍵,¨⊃⎕av         ⍝ vectors with trailing nulls.
        mat←↑vex            ⍝ matrix.
        pad←(⊃⍴mat)⍺↑mat    ⍝ padded (or truncated) rows.
        pad                 ⍝ result.
    }

    frugal←{                ⍝ matrix with embedded nulls from vector of vectors.
        vex←{⍵,¨⊃⎕av}       ⍝ vectors with trailing nulls.
        mat←{↑⍵}            ⍝ matrix.
        pad←{(⊃⍴⍵)⍺↑⍵}      ⍝ padded (or truncated) rows.
        pad mat vex ⍵       ⍝ result (of course "mat" may be replaced with "↑").
    }

Function [get] uses the same approach.

Ugliness in ddb.retain
----------------------
Function  #.ddb.retain  retains  specified rows using boolean compression of the
mapped  variable  and then truncating the underlying native file using ⎕nresize.
The file must be unmapped during the ⎕nresize and then remapped afterwards.

If more than one mapping handle is in existence, the unmap succeeds but the file
truncation  fails.  A  subsequent ⎕map would then remap unwanted garbage rows at
the end of the file, as shown in the following diagram:

    ┌───────────────── retention mask
    │ ┌─────────────── file rows
    1 aaa       aaa ┐
    0 bbb   →   ccc ├─ desired result
    1 ccc       eee ┘
    0 ddd       ddd ┐
    1 eee   →   eee ├─ unwanted garbage rows
    0 fff       fff ┘

This situation can occur, for example, if a handle is assigned at create or open
time and the character file name is used (instead of the handle) at retain time.

The  rather ugly solution to this problem is to try an exploratory file truncat-
ion prior to reassigning rows in the mapped variable.
    ...
    ntie←unmap 0                                ⍝ unmap and tie native file.
    lndx←¯1+⎕NSIZE ntie                         ⍝ last byte index.
    last←⎕NREAD ntie 83 1 lndx                  ⍝ last byte in file.
    19::⎕EN error'TABLE OPEN'{⍺}⍺ remap ntie    ⍝ failure: too many handles.
    {0}last ⎕NAPPEND lndx ⎕NRESIZE ntie:        ⍝ attempt remove/replace byte.
    {0}⍺ remap ntie:                            ⍝ success: not too many handles.
    ...

There must be a better way.

Test Scripts
------------
Namespace [#.scripts] contains test scripts. For example:

      test 'ddb'            ⍝ test ddb: no news => good news.

A left argument of 1 traces execution of the script:

      1 test 'ddb'          ⍝ test ddb, showing progress.
...

and a left argument of ¯1 opens an edit window on some or all of the scripts:

     ¯1 test'ddb'           ⍝ edit scripts.ddb.
     ¯1 test'get' 'put'     ⍝ edit get and put scripts.
     ¯1 test''              ⍝ edit all scripts.

[scripts.(get put open append_retain create_remove)] are intended for perfomance
timing  using operator [time].

[scripts.times]  runs  each  of  these  scripts displaying the number of seconds
taken.  Variables [loops] and [rows] determine the number of accesses and number
of  table rows respectively. If the variables are not set in the root namespace,
they both default to 1,000.

The following is typical output on a 600Mhz machine:

      test'times'

loops:  1000
rows:   1000

Get/put simple tables:
    get
06.62
    put
15.85

Get/put nested tables:
    get
30.35
    put
35.53

Miscellaneous:
    open
04.32
    append_retain
15.57
    create_remove
07.08

The  times  displayed  are  in seconds, so with [loops] set to 1000, they can be
interpreted as milliseconds per single access.

Note that the [get]s and [put]s above use _handles_ to open tables. Using a file
_name_ instead, would effectively add the [open] time to each [get] and [put].

See:  →scripts.ddb←
      →scripts.times←
      →scripts.open←
      →scripts.get←
      →scripts.put←
      →scripts.create_remove←
      →scripts.append_retain←

Examples
--------

    defns←('name' 80 20)('age' 83)('girth' 83 3)    ⍝ field definitions.

    disp defns
┌→───────────┬────────┬────────────┐
│┌→───┬──┬──┐│┌→──┬──┐│┌→────┬──┬─┐│
││name│80│20│││age│83│││girth│83│3││
│└───→┴~─┴~─┘│└──→┴~─┘│└────→┴~─┴─┘│
└───────────→┴───────→┴───────────→┘

    'example' ddb.create defns                      ⍝ create table "example.ddb"

    disp ddb.defs 'example'                         ⍝ field defs from table.
┌→───────────┬────────┬────────────┐
│┌→───┬──┬──┐│┌→──┬──┐│┌→────┬──┬─┐│
││name│80│20│││age│83│││girth│83│3││
│└───→┴~─┴~─┘│└──→┴~─┘│└────→┴~─┴─┘│
└───────────→┴───────→┴───────────→┘

    'example' ddb.append 'john' 56 (38 42 46)       ⍝ append row to table.

    block←(↑'pete' 'jessica')(55 23)(2 3⍴44 32 34, 48 24 36)

    disp block                                      ⍝ block of rows.
┌→──────┬─────┬────────┐
│pete   │55 23│44 32 34│
│jessica↓     │48 24 36↓
└──────→┴~───→┴~──────→┘

    'example' ddb.append block                      ⍝ append block of rows.

    tab←ddb.open 'example'                          ⍝ handle for table.

    tab ddb.get'name'                               ⍝ get one field from table.
john
pete
jessica

    disp tab ddb.get 'name' 'age' 'girth'           ⍝ get all fields.
┌→───────────────────┬────────┬────────┐
│john                │        │38 42 46│
│pete                │56 55 23│44 32 34│
│jessica             ↓        │48 24 36↓
└───────────────────→┴~──────→┴~──────→┘

    ⎕ex'tab'                                        ⍝ remove mapping handle.

    disp 'example' ddb.get 'age' 'girth'            ⍝ use table name.
┌→───────┬────────┐
│        │38 42 46│
│56 55 23│44 32 34│
│        │48 24 36↓
└~──────→┴~──────→┘

    'example' ddb.put 'age' (88 77 21)              ⍝ replace age values.

    'example' ddb.get'age'                          ⍝ check update.
88 77 21

    tab←ddb.open 'example'                          ⍝ open table.

    jays←^/∨\'j'=tab ddb.get 'name'                 ⍝ names starting with 'j'.

    tab ddb.get jays 'girth'                        ⍝ girths of  ..  ..
38 42 46
48 24 36

    tab ddb.put jays 'age' (66 22)                  ⍝ replace selected ages.
    tab ddb.get 'age'                               ⍝ check update.
66 77 22

    ⎕ex'tab'                                        ⍝ release table.

    'example' ddb.retain jays                       ⍝ retain only j-names.

    'example' get 'names'                           ⍝ compressed table.
john
jessica

    disp ddb.{{⍵ get ⍵.names}open ⍵}'example'       ⍝ open and get all fields.
┌→───────────────────┬─────┬────────┐
│john                │66 22│38 42 46│
│jessica             ↓     │48 24 36↓
└───────────────────→┴~───→┴~──────→┘

    get←ddb.((open'example'∘get)                    ⍝ bind handle with get fn.

    get(66=get'age')'name' 'girth'                  ⍝ compound selection.
 john  38 42 46

    ⎕ex'get'                                        ⍝ release and

    ddb.remove 'example'                            ⍝   remove table.

See also: →Examples← →RefCard←

Back to: Contents