--------------- Dyalog DataBase --------------- The functions in the ddb namespace are used to maintain simple data arrays in a single mapped file. They provide a robust alternative to an "inverted" component file, as long as the maximum size of the data in each field may be fixed at cre- ation time. See →RefCard← for definitions of each of the ddb functions: create Create table. remove Remove table. append Append row(s) to table. retain Retain only selected rows. open Open table {read/write}. defs Field definitions. get Get field(s) from table. put Replace vals in field(s). Tables ------ A table is a two-dimensional entity with a varying number of _rows_ and a fixed number of named columns, called _fields_. Each (row;field) _cell_ in the table contains a simple array of any rank and of one of the primitive types: 80 160 320 83 163 323 645. The values in each row of a particular field must have the same type and rank but may have differing shapes up to the maximum declared for the field. Here is a picture of a typical table: name age girth siblings ←--- field names. ┌──────────┬───┬──────────────┬──────────┐ [1] │Tom │ 24│44.0 38.5 42.0│Elizabeth │ │ │ │ │Richard │ ├──────────┼───┼──────────────┼──────────┤ [2] │Dick │ 35│44.5 38.0 42.0│ │ ├──────────┼───┼──────────────┼──────────┤ [3] │Harry │ 12│32.0 30.0 30.5│Edward │ ├──────────┼───┼──────────────┼──────────┤ [4] │Sally │ 19│34.0 30.0 34.5│ │ └──────────┴───┴──────────────┴──────────┘ ↑ ↑ ↑ ↑ ↑ name type max shape │ │ │ │ │ ---- ---- --------- │ │ │ │ └── siblings 80 6 10 (matrix) │ │ │ └──────── girth 645 3 (vector) │ │ └─────────────────── age 83 ⍬ (scalar) │ └──────────────────────────────── name 80 10 (vector) │ └── row numbers. Rows are selected using a boolean "mask" vector rather than an index vector, which means that questions of index origin do not arise. Fields are identified by _name_. Fixed vs. Variable axes ----------------------- The type and maximum shape of each value in a field are specified when the table is created; these quantities determine how much space is allocated for each row as the mapped file grows. Each axis of the given maximum shape is declared as either positive or negative. A positive axis accommodates a _fixed_ number of items, whereas a negative axis accommodates a _variable_ number of items (less than or equal to the maximum). Data stored along a _fixed_ axis are padded to the maximum axis length, whereas with a _variable_ axis, the axis value for the particular datum is stored in the file, and restored when the value is accessed. If one or more of the axes of a given field is negative, the field is said to be _nested_. Otherwise, the field is termed _simple_. See below for a discussion of the relative merits of nested and simple fields. A small example shows the distinction: 'tmp' ddb.create ('en' 80 10)('fr' 80 ¯10) ⍝ table creation. ⍝ │ └──── nested field. ⍝ └───────────────── simple field. e←3 5⍴'one two three' ⍝ simple char matrix. f←'un' 'deux' 'trois' ⍝ nested char vectors. 'tmp' ddb.append e f ⍝ append 3 rows to table. display 'tmp' ddb.get 'en' ⍝ fetch simple field. ┌→─────────┐ ↓one │ │two │ │three │ └──────────┘ display 'tmp' ddb.get 'fr' ⍝ fetch nested field. ┌→────────────────────┐ │ ┌→─┐ ┌→───┐ ┌→────┐ │ │ │un│ │deux│ │trois│ │ │ └──┘ └────┘ └─────┘ │ └∊────────────────────┘ ddb.remove'tmp' ⍝ remove table. Pros and cons ------------- simple nested ┌───────┬───────┐ File space (disk) requirement │ less │ more │ ├───────┼───────┤ Speed of access │quicker│slower │ ├───────┼───────┤ Shape retained per │column │ cell │ └───────┴───────┘ Note that avoiding negative fields for reasons of speed is a false economy if code in the calling application would then be obliged to split and mix simple cell values! See section "Test Scripts" below for some timing samples. Field Definitions ----------------- Each field has a: name character vector. type one of 80 160 320 83 163 323 645. shape numeric vector specifying the (maximum) shape of data values. For example: 'name' 80 20 20-item character vector. 'age' 83 8-bit integer scalar. 'sales' 645 10 12 10 × 12 double floating point matrix. The corresponding field specification would look like this: ('name' 80 20)('age' 83)('sales' 645 10 12) Note that boolean (type 11) fields are not supported (of course, boolean values may be stored in a table by using ⎕dr to convert to and from a supported type). Table Handles ------------- Functions [create] and [open] return a _handle_ to the table for faster access. Note that for convenience, many of the functions accept either a handle or a table name. A handle is a (ref to a) namespace containing, amongst other things, a ⎕map of the table. Note that, even with a large table, this handle consumes little work- space as the mapping uses virtual memory _outside_ the workspace. Variables within the handle space are: access read/write 'r' or 'w'. names nested vector of field names. shapes maximum shape for each field (-ive for variable axes). types field types: 80 160 320 83 163 323 645 tname file name of table. together with some extra "internal" information that is not of general interest: _cols number of bytes per row of mapped matrix. _cvex byte column masks. _flds number of fields. _map (83 ¯1 _cols)-mapped file. _pvex field partition vectors. _skip number of rows occupied by header. _svex field shape partition vectors. _xtra extra bytes for shapes of values with negative type. _xvex vector of column indices per field. Tip: note that (tab.names) is a convenient alternative to (⊃¨ddb.defs tab). Usage ----- Copy the ddb namespace into your application and prefix calls on the functions with "ddb.": )copy ddb ddb ... saved ... 'emp' ddb.get 'salary' ⍝ use [get] from ddb namespace. Importing databases from SQL ---------------------------- Function #.import attempts to import tables from SQL using Dyalog.Net. This is intended primarily to provide some test data for experimentation; the import is not entirly accurate, as some field types are excluded. import'northwind' ⍝ import database. Orders Products Order Details CustomerCustomerDemo CustomerDemographics Type 99 field "CustomerDesc" not imported Region Territories EmployeeTerritories Employees Type 34 field "Photo" not imported Type 99 field "Notes" not imported Categories Type 99 field "Description" not imported Type 34 field "Picture" not imported Customers Shippers Suppliers Type 99 field "HomePage" not imported ↑ddb.defs'northwind/Orders' ⍝ defs for Orders table. OrderID 323 CustomerID 80 5 EmployeeID 323 OrderDate 323 RequiredDate 323 ShippedDate 323 ShipVia 323 Freight 645 ShipName 80 40 ShipAddress 80 60 ShipCity 80 15 ShipRegion 80 15 ShipPostalCode 80 10 ShipCountry 80 15 'northwind/Orders' ddb.get 'ShipName' ⍝ ShipName from Orders table. Vins et alcools Chevalier Toms Spezialitäten Hanari Carnes Victuailles en stock Suprêmes délices Hanari Carnes Chop-suey Chinese Richter Supermarkt Wellington Importadora HILARION-Abastos ... Technical notes --------------- Field definitions are stored at the start of the file in the following format: ┌──────────────── number of bytes. │ ┌──────────── field name │ │ ┌──── description │ │ │ 1 mag0 magic number 0xaa => Dyalog file. 1 mag1 magic number 0x80 => ddb file. 1 vers version 0-255. 1 mcid machine architecture. 4 _skip number of table rows occupied by this header. 4 _cols row width in bytes. 4 _flds number of fields. _flds×2 types field types: 80 160 320 83 163 323 645 _flds×1 ranks rank of each field row. 4×+/ranks shapes concatenation of +/- shape words. _flds×? names null terminated field names. ? padding to whole number (_skip) of rows. Simple fields are stored as a single simple matrix padded to ×/¨tab.shapes col- umns. For example: 'tmp' ddb.create ('name' 80 8) ('age' 83) ⍝ two simple fields. 'tmp' ddb.append (2 6⍴'TinkerTailor') (23 45) ⍝ two rows of values. would be represented in the (bytes of the) mapped file: <name-------------------------><age> ┌───┬───┬───┬───┬───┬───┬───┬───┬───┐ │ T │ i │ n │ k │ e │ r │ │ │ 23│ Fields are padded to max shape. ├───┼───┼───┼───┼───┼───┼───┼───┼───┤ │ T │ a │ i │ l │ o │ r │ │ │ 45│ └───┴───┴───┴───┴───┴───┴───┴───┴───┘ Nested fields are prefixed with tab._xtra columns of bytes, which determine the shape of each row/field item. For example: 'tmp' ddb.create ('name' 80 ¯8) ('age' 83) 'tmp' ddb.append ('Soldier' 'Sailor') (23 45) would be represented in the mapped file with an extra 1-byte column of shape: <shp><name-------------------------><age> ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐ │ 7│ S │ o │ l │ d │ i │ e │ r │ │ 23│ ├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤ │ 6│ S │ a │ i │ l │ o │ r │ │ │ 45│ └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘ For fields of maximum size greater than 255, 2-bytes of shape would be required: 'tmp' ddb.create 'addr' 80 ¯300 'tmp' ddb.append 'Buckingham Palace' 'Rotten Row' <shape-> <name-------------------------- ┌───────┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬ │ 17│ B │ u │ c │ k │ i │ n │ g │ h │ a │ m │ │ P │ a ├───────┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼─ │ 10│ R │ o │ t │ t │ e │ n │ │ R │ o │ w │ · │ └───────┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴ Finally, for fields of higher rank, several shape columns would be needed (in the following case, 1+1+2 bytes per row): 'tmp' ddb.create 'cuboids' 645 ¯10 ¯20 ¯200 'tmp' ddb.append (3 4 5⍴...)(10 11 199⍴...) <shape---------> <cube-------------------- ┌───┬───┬───────┬─────────── │ 3│ 4│ 5│ ravel of items in field ... ├───┼───┼───────┼─────────── │ 10│ 11│ 199│ ravel of items in field ... └───┴───┴───────┴─────────── Note that the width in bytes needed to store each axis of the shape for each field is easily determined from the field's definition: ⌊(⎕DR¨tab.shapes)÷80). Internal Mapping Type --------------------- The ddb functions map the file as a type 83 matrix of 8-bit integers. A disad- vantage of this is that care must be taken to avoid values being unexpectedly demoted to type 11 (boolean) if all items happen to be 0s or 1s. To ensure this, [get] fixes the values at type 83 using dyadic ⎕DR. Mapping the file as type 11 or 82 would remove the need for this additional op- eration. However, mapping to type: 11: would have the advantage of allowing support for boolean data types in the file but would increase the time taken to select rows from the mapped matrix by a significant factor. 82: would incur extra character translations at get and put time. This could be avoided by setting ⎕DR's translation table to 0-255. However, this would need to be set and restored on each operation or set globally in the applic- ation, which might have undesirable effects on other elements of the system. On balance, mapping to type 83 seems to be our best bet. Function [_83mat] is a support function for [append] and [put] and would not normally be called directly from application code. Notice how [_83mat] accumul- ates _functions_ as opposed to _arrays_, as a way of preventing the retention of potentially large copies of the data in various states of conversion. As a simp- le example of this technique, the following functions achieve the same effect but the second one uses less intermediate workspace: greedy←{ ⍝ matrix with embedded nulls from vector of vectors. vex←⍵,¨⊃⎕av ⍝ vectors with trailing nulls. mat←↑vex ⍝ matrix. pad←(⊃⍴mat)⍺↑mat ⍝ padded (or truncated) rows. pad ⍝ result. } frugal←{ ⍝ matrix with embedded nulls from vector of vectors. vex←{⍵,¨⊃⎕av} ⍝ vectors with trailing nulls. mat←{↑⍵} ⍝ matrix. pad←{(⊃⍴⍵)⍺↑⍵} ⍝ padded (or truncated) rows. pad mat vex ⍵ ⍝ result (of course "mat" may be replaced with "↑"). } Function [get] uses the same approach. Ugliness in ddb.retain ---------------------- Function #.ddb.retain retains specified rows using boolean compression of the mapped variable and then truncating the underlying native file using ⎕nresize. The file must be unmapped during the ⎕nresize and then remapped afterwards. If more than one mapping handle is in existence, the unmap succeeds but the file truncation fails. A subsequent ⎕map would then remap unwanted garbage rows at the end of the file, as shown in the following diagram: ┌───────────────── retention mask │ ┌─────────────── file rows 1 aaa aaa ┐ 0 bbb → ccc ├─ desired result 1 ccc eee ┘ 0 ddd ddd ┐ 1 eee → eee ├─ unwanted garbage rows 0 fff fff ┘ This situation can occur, for example, if a handle is assigned at create or open time and the character file name is used (instead of the handle) at retain time. The rather ugly solution to this problem is to try an exploratory file truncat- ion prior to reassigning rows in the mapped variable. ... ntie←unmap 0 ⍝ unmap and tie native file. lndx←¯1+⎕NSIZE ntie ⍝ last byte index. last←⎕NREAD ntie 83 1 lndx ⍝ last byte in file. 19::⎕EN error'TABLE OPEN'{⍺}⍺ remap ntie ⍝ failure: too many handles. {0}last ⎕NAPPEND lndx ⎕NRESIZE ntie: ⍝ attempt remove/replace byte. {0}⍺ remap ntie: ⍝ success: not too many handles. ... There must be a better way. Test Scripts ------------ Namespace [#.scripts] contains test scripts. For example: test 'ddb' ⍝ test ddb: no news => good news. A left argument of 1 traces execution of the script: 1 test 'ddb' ⍝ test ddb, showing progress. ... and a left argument of ¯1 opens an edit window on some or all of the scripts: ¯1 test'ddb' ⍝ edit scripts.ddb. ¯1 test'get' 'put' ⍝ edit get and put scripts. ¯1 test'' ⍝ edit all scripts. [scripts.(get put open append_retain create_remove)] are intended for perfomance timing using operator [time]. [scripts.times] runs each of these scripts displaying the number of seconds taken. Variables [loops] and [rows] determine the number of accesses and number of table rows respectively. If the variables are not set in the root namespace, they both default to 1,000. The following is typical output on a 600Mhz machine: test'times' loops: 1000 rows: 1000 Get/put simple tables: get 06.62 put 15.85 Get/put nested tables: get 30.35 put 35.53 Miscellaneous: open 04.32 append_retain 15.57 create_remove 07.08 The times displayed are in seconds, so with [loops] set to 1000, they can be interpreted as milliseconds per single access. Note that the [get]s and [put]s above use _handles_ to open tables. Using a file _name_ instead, would effectively add the [open] time to each [get] and [put]. See: →scripts.ddb← →scripts.times← →scripts.open← →scripts.get← →scripts.put← →scripts.create_remove← →scripts.append_retain← Examples -------- defns←('name' 80 20)('age' 83)('girth' 83 3) ⍝ field definitions. disp defns ┌→───────────┬────────┬────────────┐ │┌→───┬──┬──┐│┌→──┬──┐│┌→────┬──┬─┐│ ││name│80│20│││age│83│││girth│83│3││ │└───→┴~─┴~─┘│└──→┴~─┘│└────→┴~─┴─┘│ └───────────→┴───────→┴───────────→┘ 'example' ddb.create defns ⍝ create table "example.ddb" disp ddb.defs 'example' ⍝ field defs from table. ┌→───────────┬────────┬────────────┐ │┌→───┬──┬──┐│┌→──┬──┐│┌→────┬──┬─┐│ ││name│80│20│││age│83│││girth│83│3││ │└───→┴~─┴~─┘│└──→┴~─┘│└────→┴~─┴─┘│ └───────────→┴───────→┴───────────→┘ 'example' ddb.append 'john' 56 (38 42 46) ⍝ append row to table. block←(↑'pete' 'jessica')(55 23)(2 3⍴44 32 34, 48 24 36) disp block ⍝ block of rows. ┌→──────┬─────┬────────┐ │pete │55 23│44 32 34│ │jessica↓ │48 24 36↓ └──────→┴~───→┴~──────→┘ 'example' ddb.append block ⍝ append block of rows. tab←ddb.open 'example' ⍝ handle for table. tab ddb.get'name' ⍝ get one field from table. john pete jessica disp tab ddb.get 'name' 'age' 'girth' ⍝ get all fields. ┌→───────────────────┬────────┬────────┐ │john │ │38 42 46│ │pete │56 55 23│44 32 34│ │jessica ↓ │48 24 36↓ └───────────────────→┴~──────→┴~──────→┘ ⎕ex'tab' ⍝ remove mapping handle. disp 'example' ddb.get 'age' 'girth' ⍝ use table name. ┌→───────┬────────┐ │ │38 42 46│ │56 55 23│44 32 34│ │ │48 24 36↓ └~──────→┴~──────→┘ 'example' ddb.put 'age' (88 77 21) ⍝ replace age values. 'example' ddb.get'age' ⍝ check update. 88 77 21 tab←ddb.open 'example' ⍝ open table. jays←^/∨\'j'=tab ddb.get 'name' ⍝ names starting with 'j'. tab ddb.get jays 'girth' ⍝ girths of .. .. 38 42 46 48 24 36 tab ddb.put jays 'age' (66 22) ⍝ replace selected ages. tab ddb.get 'age' ⍝ check update. 66 77 22 ⎕ex'tab' ⍝ release table. 'example' ddb.retain jays ⍝ retain only j-names. 'example' get 'names' ⍝ compressed table. john jessica disp ddb.{{⍵ get ⍵.names}open ⍵}'example' ⍝ open and get all fields. ┌→───────────────────┬─────┬────────┐ │john │66 22│38 42 46│ │jessica ↓ │48 24 36↓ └───────────────────→┴~───→┴~──────→┘ get←ddb.((open'example'∘get) ⍝ bind handle with get fn. get(66=get'age')'name' 'girth' ⍝ compound selection. john 38 42 46 ⎕ex'get' ⍝ release and ddb.remove 'example' ⍝ remove table. See also: →Examples← →RefCard← Back to: Contents