---------------
Dyalog DataBase
---------------
The functions in the ddb namespace are used to maintain simple data arrays in a
single mapped file. They provide a robust alternative to an "inverted" component
file, as long as the maximum size of the data in each field may be fixed at cre-
ation time.
See →RefCard← for definitions of each of the ddb functions:
create Create table.
remove Remove table.
append Append row(s) to table.
retain Retain only selected rows.
open Open table {read/write}.
defs Field definitions.
get Get field(s) from table.
put Replace vals in field(s).
Tables
------
A table is a two-dimensional entity with a varying number of _rows_ and a fixed
number of named columns, called _fields_.
Each (row;field) _cell_ in the table contains a simple array of any rank and of
one of the primitive types: 80 160 320 83 163 323 645.
The values in each row of a particular field must have the same type and rank
but may have differing shapes up to the maximum declared for the field.
Here is a picture of a typical table:
name age girth siblings ←--- field names.
┌──────────┬───┬──────────────┬──────────┐
[1] │Tom │ 24│44.0 38.5 42.0│Elizabeth │
│ │ │ │Richard │
├──────────┼───┼──────────────┼──────────┤
[2] │Dick │ 35│44.5 38.0 42.0│ │
├──────────┼───┼──────────────┼──────────┤
[3] │Harry │ 12│32.0 30.0 30.5│Edward │
├──────────┼───┼──────────────┼──────────┤
[4] │Sally │ 19│34.0 30.0 34.5│ │
└──────────┴───┴──────────────┴──────────┘
↑ ↑ ↑ ↑ ↑ name type max shape
│ │ │ │ │ ---- ---- ---------
│ │ │ │ └── siblings 80 6 10 (matrix)
│ │ │ └──────── girth 645 3 (vector)
│ │ └─────────────────── age 83 ⍬ (scalar)
│ └──────────────────────────────── name 80 10 (vector)
│
└── row numbers.
Rows are selected using a boolean "mask" vector rather than an index vector,
which means that questions of index origin do not arise.
Fields are identified by _name_.
Fixed vs. Variable axes
-----------------------
The type and maximum shape of each value in a field are specified when the table
is created; these quantities determine how much space is allocated for each row
as the mapped file grows.
Each axis of the given maximum shape is declared as either positive or negative.
A positive axis accommodates a _fixed_ number of items, whereas a negative axis
accommodates a _variable_ number of items (less than or equal to the maximum).
Data stored along a _fixed_ axis are padded to the maximum axis length, whereas
with a _variable_ axis, the axis value for the particular datum is stored in the
file, and restored when the value is accessed.
If one or more of the axes of a given field is negative, the field is said to be
_nested_. Otherwise, the field is termed _simple_. See below for a discussion of
the relative merits of nested and simple fields.
A small example shows the distinction:
'tmp' ddb.create ('en' 80 10)('fr' 80 ¯10) ⍝ table creation.
⍝ │ └──── nested field.
⍝ └───────────────── simple field.
e←3 5⍴'one two three' ⍝ simple char matrix.
f←'un' 'deux' 'trois' ⍝ nested char vectors.
'tmp' ddb.append e f ⍝ append 3 rows to table.
display 'tmp' ddb.get 'en' ⍝ fetch simple field.
┌→─────────┐
↓one │
│two │
│three │
└──────────┘
display 'tmp' ddb.get 'fr' ⍝ fetch nested field.
┌→────────────────────┐
│ ┌→─┐ ┌→───┐ ┌→────┐ │
│ │un│ │deux│ │trois│ │
│ └──┘ └────┘ └─────┘ │
└∊────────────────────┘
ddb.remove'tmp' ⍝ remove table.
Pros and cons
------------- simple nested
┌───────┬───────┐
File space (disk) requirement │ less │ more │
├───────┼───────┤
Speed of access │quicker│slower │
├───────┼───────┤
Shape retained per │column │ cell │
└───────┴───────┘
Note that avoiding negative fields for reasons of speed is a false economy if
code in the calling application would then be obliged to split and mix simple
cell values!
See section "Test Scripts" below for some timing samples.
Field Definitions
-----------------
Each field has a:
name character vector.
type one of 80 160 320 83 163 323 645.
shape numeric vector specifying the (maximum) shape of data values.
For example:
'name' 80 20 20-item character vector.
'age' 83 8-bit integer scalar.
'sales' 645 10 12 10 × 12 double floating point matrix.
The corresponding field specification would look like this:
('name' 80 20)('age' 83)('sales' 645 10 12)
Note that boolean (type 11) fields are not supported (of course, boolean values
may be stored in a table by using ⎕dr to convert to and from a supported type).
Table Handles
-------------
Functions [create] and [open] return a _handle_ to the table for faster access.
Note that for convenience, many of the functions accept either a handle or a
table name.
A handle is a (ref to a) namespace containing, amongst other things, a ⎕map of
the table. Note that, even with a large table, this handle consumes little work-
space as the mapping uses virtual memory _outside_ the workspace.
Variables within the handle space are:
access read/write 'r' or 'w'.
names nested vector of field names.
shapes maximum shape for each field (-ive for variable axes).
types field types: 80 160 320 83 163 323 645
tname file name of table.
together with some extra "internal" information that is not of general interest:
_cols number of bytes per row of mapped matrix.
_cvex byte column masks.
_flds number of fields.
_map (83 ¯1 _cols)-mapped file.
_pvex field partition vectors.
_skip number of rows occupied by header.
_svex field shape partition vectors.
_xtra extra bytes for shapes of values with negative type.
_xvex vector of column indices per field.
Tip: note that (tab.names) is a convenient alternative to (⊃¨ddb.defs tab).
Usage
-----
Copy the ddb namespace into your application and prefix calls on the functions
with "ddb.":
)copy ddb ddb
... saved ...
'emp' ddb.get 'salary' ⍝ use [get] from ddb namespace.
Importing databases from SQL
----------------------------
Function #.import attempts to import tables from SQL using Dyalog.Net. This is
intended primarily to provide some test data for experimentation; the import is
not entirly accurate, as some field types are excluded.
import'northwind' ⍝ import database.
Orders
Products
Order Details
CustomerCustomerDemo
CustomerDemographics
Type 99 field "CustomerDesc" not imported
Region
Territories
EmployeeTerritories
Employees
Type 34 field "Photo" not imported
Type 99 field "Notes" not imported
Categories
Type 99 field "Description" not imported
Type 34 field "Picture" not imported
Customers
Shippers
Suppliers
Type 99 field "HomePage" not imported
↑ddb.defs'northwind/Orders' ⍝ defs for Orders table.
OrderID 323
CustomerID 80 5
EmployeeID 323
OrderDate 323
RequiredDate 323
ShippedDate 323
ShipVia 323
Freight 645
ShipName 80 40
ShipAddress 80 60
ShipCity 80 15
ShipRegion 80 15
ShipPostalCode 80 10
ShipCountry 80 15
'northwind/Orders' ddb.get 'ShipName' ⍝ ShipName from Orders table.
Vins et alcools Chevalier
Toms Spezialitäten
Hanari Carnes
Victuailles en stock
Suprêmes délices
Hanari Carnes
Chop-suey Chinese
Richter Supermarkt
Wellington Importadora
HILARION-Abastos
...
Technical notes
---------------
Field definitions are stored at the start of the file in the following format:
┌──────────────── number of bytes.
│ ┌──────────── field name
│ │ ┌──── description
│ │ │
1 mag0 magic number 0xaa => Dyalog file.
1 mag1 magic number 0x80 => ddb file.
1 vers version 0-255.
1 mcid machine architecture.
4 _skip number of table rows occupied by this header.
4 _cols row width in bytes.
4 _flds number of fields.
_flds×2 types field types: 80 160 320 83 163 323 645
_flds×1 ranks rank of each field row.
4×+/ranks shapes concatenation of +/- shape words.
_flds×? names null terminated field names.
? padding to whole number (_skip) of rows.
Simple fields are stored as a single simple matrix padded to ×/¨tab.shapes col-
umns. For example:
'tmp' ddb.create ('name' 80 8) ('age' 83) ⍝ two simple fields.
'tmp' ddb.append (2 6⍴'TinkerTailor') (23 45) ⍝ two rows of values.
would be represented in the (bytes of the) mapped file:
<name-------------------------><age>
┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ T │ i │ n │ k │ e │ r │ │ │ 23│ Fields are padded to max shape.
├───┼───┼───┼───┼───┼───┼───┼───┼───┤
│ T │ a │ i │ l │ o │ r │ │ │ 45│
└───┴───┴───┴───┴───┴───┴───┴───┴───┘
Nested fields are prefixed with tab._xtra columns of bytes, which determine the
shape of each row/field item. For example:
'tmp' ddb.create ('name' 80 ¯8) ('age' 83)
'tmp' ddb.append ('Soldier' 'Sailor') (23 45)
would be represented in the mapped file with an extra 1-byte column of shape:
<shp><name-------------------------><age>
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ 7│ S │ o │ l │ d │ i │ e │ r │ │ 23│
├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
│ 6│ S │ a │ i │ l │ o │ r │ │ │ 45│
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
For fields of maximum size greater than 255, 2-bytes of shape would be required:
'tmp' ddb.create 'addr' 80 ¯300
'tmp' ddb.append 'Buckingham Palace' 'Rotten Row'
<shape-> <name--------------------------
┌───────┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬
│ 17│ B │ u │ c │ k │ i │ n │ g │ h │ a │ m │ │ P │ a
├───────┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼─
│ 10│ R │ o │ t │ t │ e │ n │ │ R │ o │ w │ · │
└───────┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴
Finally, for fields of higher rank, several shape columns would be needed (in
the following case, 1+1+2 bytes per row):
'tmp' ddb.create 'cuboids' 645 ¯10 ¯20 ¯200
'tmp' ddb.append (3 4 5⍴...)(10 11 199⍴...)
<shape---------> <cube--------------------
┌───┬───┬───────┬───────────
│ 3│ 4│ 5│ ravel of items in field ...
├───┼───┼───────┼───────────
│ 10│ 11│ 199│ ravel of items in field ...
└───┴───┴───────┴───────────
Note that the width in bytes needed to store each axis of the shape for each
field is easily determined from the field's definition: ⌊(⎕DR¨tab.shapes)÷80).
Internal Mapping Type
---------------------
The ddb functions map the file as a type 83 matrix of 8-bit integers. A disad-
vantage of this is that care must be taken to avoid values being unexpectedly
demoted to type 11 (boolean) if all items happen to be 0s or 1s. To ensure this,
[get] fixes the values at type 83 using dyadic ⎕DR.
Mapping the file as type 11 or 82 would remove the need for this additional op-
eration. However, mapping to type:
11: would have the advantage of allowing support for boolean data types in the
file but would increase the time taken to select rows from the mapped matrix
by a significant factor.
82: would incur extra character translations at get and put time. This could be
avoided by setting ⎕DR's translation table to 0-255. However, this would
need to be set and restored on each operation or set globally in the applic-
ation, which might have undesirable effects on other elements of the system.
On balance, mapping to type 83 seems to be our best bet.
Function [_83mat] is a support function for [append] and [put] and would not
normally be called directly from application code. Notice how [_83mat] accumul-
ates _functions_ as opposed to _arrays_, as a way of preventing the retention of
potentially large copies of the data in various states of conversion. As a simp-
le example of this technique, the following functions achieve the same effect
but the second one uses less intermediate workspace:
greedy←{ ⍝ matrix with embedded nulls from vector of vectors.
vex←⍵,¨⊃⎕av ⍝ vectors with trailing nulls.
mat←↑vex ⍝ matrix.
pad←(⊃⍴mat)⍺↑mat ⍝ padded (or truncated) rows.
pad ⍝ result.
}
frugal←{ ⍝ matrix with embedded nulls from vector of vectors.
vex←{⍵,¨⊃⎕av} ⍝ vectors with trailing nulls.
mat←{↑⍵} ⍝ matrix.
pad←{(⊃⍴⍵)⍺↑⍵} ⍝ padded (or truncated) rows.
pad mat vex ⍵ ⍝ result (of course "mat" may be replaced with "↑").
}
Function [get] uses the same approach.
Ugliness in ddb.retain
----------------------
Function #.ddb.retain retains specified rows using boolean compression of the
mapped variable and then truncating the underlying native file using ⎕nresize.
The file must be unmapped during the ⎕nresize and then remapped afterwards.
If more than one mapping handle is in existence, the unmap succeeds but the file
truncation fails. A subsequent ⎕map would then remap unwanted garbage rows at
the end of the file, as shown in the following diagram:
┌───────────────── retention mask
│ ┌─────────────── file rows
1 aaa aaa ┐
0 bbb → ccc ├─ desired result
1 ccc eee ┘
0 ddd ddd ┐
1 eee → eee ├─ unwanted garbage rows
0 fff fff ┘
This situation can occur, for example, if a handle is assigned at create or open
time and the character file name is used (instead of the handle) at retain time.
The rather ugly solution to this problem is to try an exploratory file truncat-
ion prior to reassigning rows in the mapped variable.
...
ntie←unmap 0 ⍝ unmap and tie native file.
lndx←¯1+⎕NSIZE ntie ⍝ last byte index.
last←⎕NREAD ntie 83 1 lndx ⍝ last byte in file.
19::⎕EN error'TABLE OPEN'{⍺}⍺ remap ntie ⍝ failure: too many handles.
{0}last ⎕NAPPEND lndx ⎕NRESIZE ntie: ⍝ attempt remove/replace byte.
{0}⍺ remap ntie: ⍝ success: not too many handles.
...
There must be a better way.
Test Scripts
------------
Namespace [#.scripts] contains test scripts. For example:
test 'ddb' ⍝ test ddb: no news => good news.
A left argument of 1 traces execution of the script:
1 test 'ddb' ⍝ test ddb, showing progress.
...
and a left argument of ¯1 opens an edit window on some or all of the scripts:
¯1 test'ddb' ⍝ edit scripts.ddb.
¯1 test'get' 'put' ⍝ edit get and put scripts.
¯1 test'' ⍝ edit all scripts.
[scripts.(get put open append_retain create_remove)] are intended for perfomance
timing using operator [time].
[scripts.times] runs each of these scripts displaying the number of seconds
taken. Variables [loops] and [rows] determine the number of accesses and number
of table rows respectively. If the variables are not set in the root namespace,
they both default to 1,000.
The following is typical output on a 600Mhz machine:
test'times'
loops: 1000
rows: 1000
Get/put simple tables:
get
06.62
put
15.85
Get/put nested tables:
get
30.35
put
35.53
Miscellaneous:
open
04.32
append_retain
15.57
create_remove
07.08
The times displayed are in seconds, so with [loops] set to 1000, they can be
interpreted as milliseconds per single access.
Note that the [get]s and [put]s above use _handles_ to open tables. Using a file
_name_ instead, would effectively add the [open] time to each [get] and [put].
See: →scripts.ddb←
→scripts.times←
→scripts.open←
→scripts.get←
→scripts.put←
→scripts.create_remove←
→scripts.append_retain←
Examples
--------
defns←('name' 80 20)('age' 83)('girth' 83 3) ⍝ field definitions.
disp defns
┌→───────────┬────────┬────────────┐
│┌→───┬──┬──┐│┌→──┬──┐│┌→────┬──┬─┐│
││name│80│20│││age│83│││girth│83│3││
│└───→┴~─┴~─┘│└──→┴~─┘│└────→┴~─┴─┘│
└───────────→┴───────→┴───────────→┘
'example' ddb.create defns ⍝ create table "example.ddb"
disp ddb.defs 'example' ⍝ field defs from table.
┌→───────────┬────────┬────────────┐
│┌→───┬──┬──┐│┌→──┬──┐│┌→────┬──┬─┐│
││name│80│20│││age│83│││girth│83│3││
│└───→┴~─┴~─┘│└──→┴~─┘│└────→┴~─┴─┘│
└───────────→┴───────→┴───────────→┘
'example' ddb.append 'john' 56 (38 42 46) ⍝ append row to table.
block←(↑'pete' 'jessica')(55 23)(2 3⍴44 32 34, 48 24 36)
disp block ⍝ block of rows.
┌→──────┬─────┬────────┐
│pete │55 23│44 32 34│
│jessica↓ │48 24 36↓
└──────→┴~───→┴~──────→┘
'example' ddb.append block ⍝ append block of rows.
tab←ddb.open 'example' ⍝ handle for table.
tab ddb.get'name' ⍝ get one field from table.
john
pete
jessica
disp tab ddb.get 'name' 'age' 'girth' ⍝ get all fields.
┌→───────────────────┬────────┬────────┐
│john │ │38 42 46│
│pete │56 55 23│44 32 34│
│jessica ↓ │48 24 36↓
└───────────────────→┴~──────→┴~──────→┘
⎕ex'tab' ⍝ remove mapping handle.
disp 'example' ddb.get 'age' 'girth' ⍝ use table name.
┌→───────┬────────┐
│ │38 42 46│
│56 55 23│44 32 34│
│ │48 24 36↓
└~──────→┴~──────→┘
'example' ddb.put 'age' (88 77 21) ⍝ replace age values.
'example' ddb.get'age' ⍝ check update.
88 77 21
tab←ddb.open 'example' ⍝ open table.
jays←^/∨\'j'=tab ddb.get 'name' ⍝ names starting with 'j'.
tab ddb.get jays 'girth' ⍝ girths of .. ..
38 42 46
48 24 36
tab ddb.put jays 'age' (66 22) ⍝ replace selected ages.
tab ddb.get 'age' ⍝ check update.
66 77 22
⎕ex'tab' ⍝ release table.
'example' ddb.retain jays ⍝ retain only j-names.
'example' get 'names' ⍝ compressed table.
john
jessica
disp ddb.{{⍵ get ⍵.names}open ⍵}'example' ⍝ open and get all fields.
┌→───────────────────┬─────┬────────┐
│john │66 22│38 42 46│
│jessica ↓ │48 24 36↓
└───────────────────→┴~───→┴~──────→┘
get←ddb.((open'example'∘get) ⍝ bind handle with get fn.
get(66=get'age')'name' 'girth' ⍝ compound selection.
john 38 42 46
⎕ex'get' ⍝ release and
ddb.remove 'example' ⍝ remove table.
See also: →Examples← →RefCard←
Back to: Contents