1. 程式人生 > >資料儲存---記憶體列式資料庫KDB+(Q)文件

資料儲存---記憶體列式資料庫KDB+(Q)文件

 Kx systems公司的創始人之一Arthur Whitney在2003年研發了列式資料庫KDB和它的操作語言Q。    官網:www.kx.com


主要Feature:

  • 記憶體內的資料庫:理解KDB的一種方式就是KDB是一個記憶體資料庫,但擁有磁碟可持久化能力。
  • 解釋性語言 :開發週期更短,q語言要做到簡潔,高效和富表達性。(當然學習曲線也不是一般般滴說)
  • 列表是有順序的 :不同於資料庫中的行,因為列表有序,所以資料表也有序
  • 從右往左解析 (q的起源受到多種語言的啟示,包括APL、LISP和函數語言程式設計。)
  • 面向表 (就像其他語言使用字串一樣頻繁)
  • 面向列:關係型資料庫按行處理資料和儲存資料,kdb是按列存資料,對資料進行運算也是直接作用在列向量上。
  • 強型別
  • Null值擁有特殊含義 (詳細見後面的文件)
  • 內建I/O的支援 (很簡潔)

 KDB+(Q)入門  (以下文件來自網路,記下備用)

Now that we know how q works and how to start it up, let'sexamine some real code that shows the power of q. The following program reads acsv file of time-stamped symbols and prices, places the data into a table andcomputes the maximum price for each day. It then opens a socket connection to aq process on another machine and retrieves a similar daily aggregate. Finally,it merges the two intermediate tables and appends the result to an existingfile.

sample:{
 t:("DSF"; enlist ",") 0: `:c:/q/data/px.csv;
 tmpx:select mpx:max Price by Date,Sym from t;
 h:hopen `:aerowing:5042;
 rtmpx:h "select mpx:max Price by Date,Sym from tpx";
 hclose h;
 .[`:c:/q/data/tpx.dat; (); ,; rtmpx,tmpx]
}

All data is ultimately built from atoms, so we begin withatoms. An atom

is an irreducible value with a specific data type. Thebasic data types in q correspond to those of SQL with some additional date andtime related types that facilitate time series. We summarize the data types inthe tables below, giving the corresponding types in SQL, and where appropriateJava and C#. We cover enumerations inCasting and Enumerations.

Q

SQL

Java

C#

boolean

boolean

Boolean

Boolean

byte

byte

Byte

Byte

short

smallint

Short

Int16

int

int

Integer

Int32

long

bigint

Long

Int64

real

real

Float

Single

float

float

Double

Double

char

char(1)

Character

Char

symbol

varchar

(String)

(String)

date

date

Date

datetime

datetime

Timestamp

!DateTime

minute

second

time

time

Time

!TimeSpan

enumeration

Note:The words boolean, short, int, etc. arenot keywords in q, so they arenot displayed in a special font in this text. They do have special meaning whenused as name arguments in some operators. You should avoid using them as names.

The next table collects the important information abouteach of the q data types. We shall refer to this in subsequent sections.

type

size

char type

num type

notation

null value

boolean

1

b

1

1b

byte

1

x

4

0x26

0x00

short

2

h

5

42h

0Nh

int

4

i

6

42

0N

long

8

j

7

42j

0Nj

real

4

e

8

4.2e

0Ne

float

8

f

9

4.2

0n

char

1

c

10

"z"

" "

symbol

*

s

11

`zaphod

`

month

4

m

13

2006.07m

0Nm

date

4

d

14

2006.07.21

0Nd

datetime

4

z

15

2006.07.21T09:13:39

0Nz

minute

4

u

17

23:59

0Nu

second

4

v

18

23:59:59

0Nv

time

4

t

19

09:01:02:042

0

enumeration

*

`u$v

dictionary

99

`a`b`c!10 20 30

table

98

([] c1:`a`b`c; c2:10 20 30)

The basic integer data type is common to nearly allprogramming environments.

int

An int is a signed four-byte integer. A numeric value isidentified as an int by that fact that it contains only numeric digits,possibly with a leading minus sign,without a decimal point. Inparticular, it has no trailing character that would indicate that it is anothernumeric type (see below). Here is a typical int value,

        42

The other two integer data types are short and long. Theshort type represents a two byte signed integer and is denoted by a trailing'h' after optionally signed numeric digits. For example,

        b:-123h
        b
-123h

Similarly, the long type represents an eight byte signedlong integer denoted by a trailing 'j' after optionally signed numeric digits.

        c:1234567890j
        c
1234567890j

Important:Type promotion is performed automatically in q primitive operations. However,if a specific integer type is required in a list and a narrower type ispresented - e.g., an int is expected and a short is presented - the submittedtype willnot be automatically promoted and an error will result.This may be unintuitive for programmers coming from languages of C ancestry,but it will make sense in the context of tables.

Single and double precision floating point data types aresupported.

float

The float type represents an IEEE standard eight-bytefloating point number, often called "double" in other languages. Itis denoted by optionally signed numeric digits containing a decimal point withan optional trailing 'f'. A floating point number can hold at least 15 decimaldigits of precision.

For example,

        pi:3.14159265
        float1:1f

real

The real type represents a four-byte floating point numberand is denoted by numeric digits containing a decimal point and a trailing 'e'.Keep in mind that this type is called 'float' in some languages. A real canhold at least 6 decimal digits of precision, 7 being the norm. Thus

        r:1.4142e
        r
1.4142e

is a valid real number.

Note:The q console abbreviates the display of float or real values having zeros tothe right of the decimal.

        2.0
2f
        4.00e
4e

The behavior of substituting floating point types ofdifferent widths is analogous to the case of integer types.

Both float and real values can be specified in IEEEstandard scientific notation for floating point values.

        f:1.23456789e-10
        r:1.2345678e-10e

By default, the q console displays only seven decimaldigits of accuracy for float and real values by rounding the display in theseventh significant digit.

        f
1.234568e-10
        r
1.234568e-10e

You can change this by using the \P command (noteupper case) to specify a display width up to 16 digits.

        f12:1.23456789012
        f16:1.234567890123456
        \P 12
        f12
1.23456789012
        f16
1.23456789012
        \P 16
        f12
1.23456789012
        f16
1.234567890123456

Binary data can be represented as bit or byte values.

The boolean type uses one byte to store an individual bitand is denoted by the bit value followed by 'b'.

        bit:0b
        bit
0b

byte

The byte type uses one byte to store 8 bits of data and isdenoted by '0x' followed by a hexadecimal value,

        byte:0x2a

In handling binary data, q is more like C than itsdescendants, in that both binary types are considered to be unsigned integersthat can participate in arithmetic expressions or comparisons with othernumeric types. There are no keywords for 'true' or 'false', nor are thereseparate logical operators. With a and pi as above,

        a:42
        bit:1b
        a+bit
43

is an int and

        byte+pi
45.14159

is a float. Observe that type promotion has been performedautomatically.

There are two atomic character types in q. They resemblethe SQL types CHAR and VARCHAR more than the character types of verboselanguages.

char

A char holds an individual ASCII character and is stored inone byte. This corresponds to a SQL CHAR. A char is denoted by a singlecharacter enclosed in double quotes.

        ch:"q"
        ch
"q"

Some keyboard characters, such as the double-quote, cannotbe entered directly into a char since they have special meaning in q. As in C,these characters are escaped with a preceding back-slash ( \ ). While theconsole display also includes the escape, these are actually single characters.

        ch:"\""                        / double-quote
        ch                              / console also displays the escape "\""
        ch:"\\"                        / back-slash
        ch:"\n"                        / newline
        ch:"\r"                        / return
        ch:"\t"                         / horizontal tab

You can also escape a character with an underlying numericvalue expressed as three octal digits.

        "\142"
"b"

A symbol holds a sequence of characters as a single unit. Asymbol is denoted by a leading back-quote (` ), also read "backtick" in q circles.

        s1:`q
        s2:`zaphod

A symbol is irreducible, meaning that the individualcharacters that comprise it arenot directly accessible. Symbols areoften used in q to hold names of other entities.

Important:A symbol isnot a string. We shall see inlists that there is an analogue of strings inq, namely a list of char. While a list of char is a kissing cousin to a symbol,we emphasize that a symbol isnot made up of char. The symbol`a and thechar "a" are not the same. The char"q" and thesymbol`kdb are both atomic entities.

Advanced:Youmay ask whether a symbol can include embedded blanks and special characterssuch as back-tick. The answer is yes. You create such a symbol using therelationship between lists of char and symbols. See Creating Symbols from Stringsfor more on this.

        `$"A symbol with `backtick"
`A symbol with `backtick

Note:A symbol is somewhat akin a SQL VARCHAR, in that it can hold and arbitrarynumber of characters. It is different in that it is atomic. The char "q"and the symbol `kdb are both atomic entities.

A major benefit of q is that it can process both timeseries and relational data in a consistent and efficient manner. Q extends thebasic SQL date and time data types to facilitate temporal arithmetic, which isminimal in SQL and can be clumsy in verbose languages (e.g., Java's datelibrary and its use of time zones). We begin with the equivalents to SQLtemporal types. The additional temporal types in q deal with constituents of adate or time.

date

A date is stored in four bytes and is denoted by yyyy.mm.dd,where yyyy represents the year, mm the month and dd theday. A date value stores the count of days from Jan 1, 2000.

        d:2006.07.04
        d
2006.07.04

Important:Months and days begin at 1 (not zero) so January is '01'.

Leading zeroes in months and days are required; theiromission causes an error.

        bday:2007.1.1
'2007.1.1

Advanced:The underlying day count can be obtained by casting to int.

        `int$2000.02.01
31

time

A time is stored in four bytes and is denoted by hh:mm:ss.uuuwhere hh represents hours on the 24-hour clock, mm representsminutes, ss represents seconds, and uuu represents milliseconds.A time value stores the count of milliseconds from midnight.

        t:09:04:59.000
        t
09:04:59:000

Again, leading zeros are required in all constituents of atime.

Advanced:The underlying millisecond count can be obtained by casting to int.

        `int$12:34:56.789
45296789

A datetime is the combination of a date and a time,separated by 'T' as in the ISO standard format. A datetime value stores thefractional day count from midnight Jan 1, 2000.

        dt:2006.07.04T09:04:59:000
        dt
2006.07.04T09:04:59:000

Advanced:The underlying fractional day count can be obtained by casting to float.

        `float$2000.02.01T12:00:00.000
31.5

month

The month type uses four bytes and is denoted by yyyy.mmwith a trailing 'm'. A month values stores the count of months since thebeginning of the year.

        mon:2006.07m
        mon
2006.07m

Advanced:The underlying month offset can be obtained by casting to int.

        `int$2000.04m
3

The minute type uses four bytes and is denoted by hh:mm.A minute value stores the count of minutes from midnight.

        mm:09:04
        mm
09:04

Note:We did not usemin for the variable name becausemin is a reserved name inq.

Advanced:The underlying minute offset can be obtained by casting to int.

        `int$01:23
83

The second type uses four bytes and is denoted by hh:mm:ss.A second value stores a count of seconds from midnight.

        sec:09:04:59
        sec
09:04:59

The representation of the second type makes it look like aneveryday time value. However, a q time value is a count of milliseconds frommidnight, so the underlying values are different.

Advanced:The underlying values can be obtained by casting to int. This manifests theinequality.

        `int$12:34:56
45296
        `int$12:34:56.000
45296000
        12:34:56=12:34:56.789
0b

The constituents of dates, times and datetimes can be extractedusing dot notation. The individual field values are all extracted as int. Thefield values of a date are named 'year', 'mm' and 'dd'.

        d:2006.07.04
        d.year
2006
        d.mm
7
        d.dd
4

Similarly, the field values of time are 'hh', 'mm', 'ss'.

        t:12:45:59.876
        t.hh
12
        t.mm
45
        t.ss
59

Note:At the time of this writing (Jun 2007) there is no syntax to retrieve themillisecond constituent. Use the construct,

        t mod 1000
876

In addition to the individual field values, you can alsoextract higher-order constituents.

        d.month
2007.07m
        t.minute
12:45
        t.second
12:45:59

Of course, this works for a datetime as well.

        dt:2006.07.04T12:45:59.876
        dt.date
2006.07.04
        dt.time
12:45:59.876
        dt.month
2006.07m
        dt.mm
7
        dt.minute
12.45

Advanced:It is a quirk in q that dot notation for accessing temporal constituents doesnot work on function arguments. For example,

        fmm:{[x] x.mm}
        fmm 2006.09.15
{[x] x.mm}
'x.mm

Instead, cast to the constituent type,

        fmm:{[x] `mm$x}
        fmm 2006.09.15
9

In addition to the regular numeric and temporal values,special values represent infinities, whose absolute values are greater than any“normal” numeric or temporal value.

Token

Value

0w

Positive float infinity

0W

Positive int infinity

0Wh

Positive short infinity

0Wj

Positive long infinity

0Wd

Positive date infinity

0Wt

Positive time infinity

0Wz

Positive datetime infinity

0n

NaN, or not a number

Important:Observe the distinction between lower case 'w' and upper case 'W'.

The result of dividing any positive (or unsigned) non-zerovalue by any zero value is positive float infinity, denoted0w.Dividing a negative value by zero results in negative float infinity, denotedby-0w. The way to remember these is that 'w' looks like the infinitysymbol ∞.

The integral infinities can not be produced via anarithmetic division on normal int values, since the result of division in q isalways a float.

The result of dividing any 0 value by any zero value isundefined, so q represents this as the floating point null 0n.

The q philosophy is that any valid arithmetic expressionwill produce a result rather than an error. Therefore, dividing by 0 produces aspecial float value rather than an exception. You can perform a complexsequence of calculations without worrying about things blowing up in the middleor inserting cumbersome exception trapping. We shall see more about this inPrimitive Operations.

Advanced:While infinities can participate in arithmetic operations, infinite arithmeticis not implemented. Instead, q performs the operation on the underlying bitpatterns. Math propeller heads (including the author) find the followingdisconcerting.

        0W-2
2147483645
        2*0W
-2

The concept of a null value generally indicates missingdata. This is an area in which q differs from both verbose programminglanguages and SQL.

In such languages as C++, Java and C#, the concept of anull value applies to complex entities (i.e., objects) that are accessedindirectly by pointer or by reference. A null value for such an entitycorresponds to an un-initialized pointer, meaning that it has not been assignedthe address of an allocated block of memory. There is no concept of null forentities that are of simple or value type. For those types that admit null, youtest for being null by asking if the value is equal to null.

The NULL value in SQL indicates that the data value isinapplicable or missing. The NULL value is distinct from any value that canactually be contained in a field and does not have '=' semantics. That is, youcannot test a field for being null with = NULL. Instead, you ask if it IS NULL.Because NULL is a separate value, Boolean fields actually have three states: 0,1 and NULL.

In q, the situation is more interesting. While most typeshave distinct null values, some types have no designated way of representing anull value.

The following table summarizes the way nulls are handled.

type

null

boolean

0b

byte

0x00

short

0Nh

int

0N

long

0Nj

real

0Ne

float

0n

char

" "

sym

`

month

0Nm

date

0Nd

datetime

0Nz

minute

0Nu

second

0Nv

time

0Nt

Let's start with the binary types. As you can see, theyhave no special null value, which means that null is equivalent to the valuezero. Consequently, you cannot distinguish between a missing boolean value andthe value that represents false.

In practice, this isn't an issue, since in mostapplications it isn't a critical distinction. It can be a problem if thedefault value of a boolean flag in your application is not zero, so you mustensure that this does not occur. A similar precaution applies to byte values.

Next, observe that all the numeric and temporal types havetheir own designated null values. Here the situation is similar to SQL, in thatyou can distinguish missing data from data whose underlying value is zero. Thedifference from SQL is that there is no universal null value.

The advantage of the q approach is that the null valueshave equals semantics. The tradeoff is that you must use the correct null valuein type-checked situations.

Finally, we consider the character types. Considering asymbol to a variable length character collection justifies why the symbol nullvalue is the empty symbol, designated by a back-tick (` ).

In contrast, the null value for the char type is the charconsisting of the blank character ( " " ). As with binary data, youcannot distinguish between a missing char value and a blank value. Again, thisis not seriously limiting in practice, but you should ensure that yourapplication does not rely on this distinction.

Note:The value"" isnot the char null. Instead, it is the empty list of char.

Data complexity is built up from atoms, which we know, andlists. It is important to achieve a thorough understanding of lists sincenearly all q programming involves processing lists. The concepts are simple butcomplexity can build rapidly. Our approach is to introduce the basic notion ofa general list in the first section, take a quick detour to cover simple andsingleton lists, then return to cover general lists in more detail.

A list is simply an ordered collection. A collection ofwhat, you ask. More precisely, alist is an ordered collection of atomsand other lists. Since this definition is recursive, let's start with thesimplest case in which the list comprises only atoms.

The notation for a general list encloses its items withinmatching parentheses and separates them with semicolons. For readability,optional whitespace is used after the semicolon separators in the last example.

        (1;2;3)
        ("a";"b";"c";"d")
        (`Life;`the;`Universe;`and;`Everything)
        (-10.0; 3.1415e; 1b; `abc; "z")

In the preceding examples, the first three lists are simple,meaning that the list comprises atoms of uniform type. The last example is agenerallist, meaning that it is not simple. Otherwise put, a general list containsitems that are not atoms of a uniform type. This could be atoms of mixed type,nested lists of uniform type, or nested lists of mixed type.

Important:The order of the items in the list is positional (i.e., left-to-right) and ispart of its definition. The lists(1;2) and(2;1) are different. SQLis based on sets, which are inherently unordered. This distinction leads tosome subtle differences between the results of queries on q tables versus theresult sets from analogous SQL queries. The inherent ordering of lists makestime series processing natural and fast in q, while it is cumbersome andperforms poorly in standard SQL.

Lists can be assigned to variables exactly like atoms.

        L1:(1;2;3)
        L2:("z";"a";"p";"h";"o";"d")
        L3:(`Life;`the;`Universe;`and;`Everything)
        L4:(0b;1b;0b;1b;1b;0b)
        L5:(-10.0;3.1415e;1b;`abc;"z")

count

The number of items in a list is its count. You canobtain the count of a list as follows,

        count L1
3

This is our first example of a function, which we willlearn about in Functions. For now, we need onlyunderstand that count returns an int value equal to the number ofitems in a list to its right.

Observe that the count of any atom is 1.

        count 42
1
        count `abcd
1

A simple list - that is, a list of atoms of a uniform type- corresponds to the mathematical notion of avector. Such lists aretreated specially in q. They have a simplified notation, take less storage andcompute faster than general lists. Of course, you can use general list notationfor a vector, but q converts a general list to a vector whenever feasible.

A simple list of any numeric type omits the enclosingparentheses and replaces the separating semi-colons with blanks. The followingtwo expressions for a simple list of int are equivalent,

        (100;200;300)
        100 200 300

This is confirmed by the console display,

        (100;200;300)
100 200 300

Similar notation is used for simple lists of short and longwith the addition of the type indicator.

        H:(1h;2h;255h)
        H
1 2 255h

We conclude that a trailing type indicator in the displayapplies to the entire list and not just the last item of the list; otherwise,the list would not be simple and would be displayed in general form.

        G:(1; 2; 255h)
        G
1
2
255h

Simple lists of float and real are notated similarly.Observe that the q console suppresses the decimal point when displaying a floathaving zero(s) to the right of the decimal, but the value is not an int.

        F:(123.4567;9876.543;99.0)
        F
123.4567 9876.543 99

This notational efficiency for float display means that alist of floats having no decimal parts displays with a trailingf.

        FF:1.0 2.0 3.0
        FF
1 2 3f

The simplified notation for a simple list of binary datajuxtaposes the individual data values together with a type indicator. The typeindicator for boolean trails the value.

        bits:(0b;1b;0b;1b;1b)
        bits
01011b

The indicator for byte leads,

        bytes:(0x20;0xa1;0xff)
        bytes
0x20a1ff

Note:A simple list of boolean atoms requires the same number of bytes to store as ithas atoms. While the simplified notation is suggestive, multiple bits arenotcompressed to fit inside a single byte. The list bits above holds itsvalues in 5 bytes of storage.

The simplified notation for simple lists of symbolsjuxtaposes the individual atoms with no intervening whitespace.

        symbols:(`Life;`the;`Universe;`and;`Everything)
        symbols
`Life`the`Universe`and`Everything

Inserting spaces between the atoms causes an error.

        bad:`This `is `wrong
'is

The simplified notation for a list of char looks just likea string in most languages, with the juxtaposed sequence of characters enclosedin double quotes.

        chars:("s";"o";" ";"l";"o";"n";"g")
        chars
"so long"

Note:A simple list of char is called astring.

Lists can be defined using simplified notation,

        L:100 200 300
        H:1 2 255h
        F:123.4567 9876.543 99.99
        bits:01011b
        bytes:0x20a1ff
        symbols:`Life`the`Universe`and`Everything
        chars:"so long"

Finally, we observe that a list entered as intermixed intsand floats is converted to a simple list of floats.

        1 2.0 3
1 2 3f

Specifying a list of mixed temporal types has a differentbehavior from that of a list of mixed numeric types. In this case, the listtakes the type of the first item in the list; other items are widened ornarrowed to match.

        12:34 01:02:03
12:34 01:02
        01:02:03 12:34
01:02:03 12:34:00

To force the type of a mixed list of temporal values,append a type specifier.

        01:02:03 12:34 11:59:59.999u
01:02 12:34 11:59

Lists with one or no items merit special consideration.

It is useful to have lists with no items. A pair ofparentheses with nothing (except possibly whitespace) between denotes the emptylist.

        L:(  )
        L
-

We shall see in Creating Typed Empty Liststhat it is possible to define an empty list with a specific type.

There is a quirk in q regarding how it handles a listcontaining a single item, called asingleton. Creation of a singletonpresents a notational problem. To see the issue, first realize that a listcontaining a single atom is distinct from the individual atom. As any UPSdriver will readily tell you, an item in a box is not the same as an unboxeditem. By now, we recognize the following as atoms,

        42
        1b
        0x2a
        `beeblebrox
        "z"

We also recognize the following are all lists with twoelements,

        (42;6)
        01b
        `zaphod`beeblebrox
        "zb"
        (40;`two)

How to create a list of a single item? Good question. Theanswer is that there is no syntactic way to do so. You might think that youcould simply enclose the item in parentheses, but this doesn't work since theresult is an atom.

        singleton:(42)
        singleton
42

The reason for this is that parentheses are used formultiple purposes in q. As we have seen, paired parentheses are used to delimititems in the specification of a general list. Paired parentheses are also usedfor grouping in expressions - that is, to isolate the result of the expressioninside the parentheses. The latter usage forces (42) to be the same as the atom42 and so precludes the intention in the specification ofsingletonabove.

The way to make a list with a single item is to use the enlistfunction, which returns a singleton list containing what is to its right.

        singleton:enlist 42
        singleton
,42

To distinguishbetween an atom and the equivalent singleton, examine the sign of their types.

        signum type 42
-1
        signum type enlist 42
1

As a final check before moving on, make sure that youunderstand that the following also defines a list containing a single item,

        singleton:enlist 1 2 3
        count singleton
1

Recall that a list is ordered from left to right by theposition of its items. The offset of an item from the beginning of the list iscalled itsindex. Thus, the first item is has index 0, the second item(if there is one) has index 1, etc. A list of count n has index domain 0 ton-1.

Given a list L, the item at index i isaccessed by L[i]. Retrieving an item by its index is calleditemindexing. For example,

        L:(-10.0;3.1415e;1b;`abc;"z")
        L[0]
-10f
        L[1]
3.1415e
        L[2]
1b
        L[3]
`abc
        L[4]
"z"

Items in a list can also be assigned via item indexing.Thus,

        L1:1 2 3
        L1[2]:42
        L1
1 2 42

Important:Index assignment into a simple list enforces strict type matching with no typepromotion. Otherwise put, when you reassign an item in a simple list, the typemust match exactly and a narrower type is not widened.

        L:100 200 300
        L[1]:42h
'type
        f:100.0 200.0 300.0
        f
100 200 300f
        f[1]:400
'type

This may come as a surprise if you are accustomed tonumeric values always being promoted to wider types in a verbose language.

Providing an invalid data type for the index results in anerror.

        L:(-10.0;3.1415e;1b;`abc;"z")
        L[`1]
'type

If you attempt to index outside of the bounds of the list,the result is not an error. Rather, you get a null value. If the list issimple, this is the null for the type of atoms in the list. For general lists,the result is0n.

        L[5]
0n

One way to understand this is that the result of asking fora non-existent index is "missing value." Keep this in mind, sinceindexing one position past the end of the list is easy to do, especially ifyou're not used to indexing relative to 0.

An empty index returns the entire list.

        L[]
-10f
3.1415e
1b
`abc
"z"

Note:An empty index isnot the same as indexing with an empty list. Thelatter returns an empty list.

        L[()]
_

The syntactic form double-colon ( :: ) denotes thenull item, which allows explicit notation or programmatic generation of anempty index.

        L[::]
-10f
3.1415e
1b
`abc
"z"

Advanced:The type of the null item is undefined; in particular, its type does not matchthat of any normal item in a list. As a consequence, inclusion of the null itemin a list forces the list to be general.

        L:(1;2;3;::)
        L
1
2
3
::
        type L
0h

This can be used to avoid a nasty surprise when q is tooclever. To see how, consider the general list,

        L:(1;2;3;`a)
        type L
0h

Now, reassign the last item to an int and note what happensto the list.

        L[3]:4
        L
1 2 3 4
        type L
6h

The list has been converted to a simple list of int! Asubsequent attempt to reassign the last item back to its original value failswith a type error.

        L[3]:`a
'type

This can be circumvented by placing a null item in thelist, forcing it to remain general.

        L:(1;2;3;`a;::)
        L[3]:4
        L
1
2
3
4
::
       type L
0h
        L[3]:`a
        L
1
2
3
`a
::

Lists can be created from variables.

        L1:(1;2;100 200)
        L2:(1 2 3;‘ab`c)
        L6:(L1;L2)
        L6
1     2   100 200
1 2 3 `ab `c

We scoop our presentation on operations in the next chapterto describe an important operation on lists. Probably the most common operationon two lists is to join them together to form a larger list. More precisely,the join oerator (,) appends its right operand to the end of the left operandand returns the result. It accepts an atom in either argument.

        1 2,3 4 5
1 2 3 4 5
        1,2 3 4
1 2 3 4
        1 2 3,4
1 2 3 4

Observe that if the arguments are not of uniform type, theresult is a general list.

        1 2 3,4.4 5.5
1
2
3
4.4
5.5
        1 2 3,"ab"
1
2
3
"a"
"b"

Note:To accept either a scalar or a list x and produce a uniform shape, use theidiom,

        (),x

which always yields a list with the content of x.

Thus far, we have viewed a list as a static collection ofits items. We can also consider a list to be a mapping provided by itemindexing. Specifically, a listL of count n represents a monadicmapping over the domain of non-negative integers 0,...,n-1. The list mappingassigns the output valueL[ i] to the input value i.Succinctly, the I/O association for the list is,

        i ——> L[ i]

Here are the I/O tables for some basic lists:

101 102 103 104

I

O

0

101

1

102

2

103

3

104

(`a; 123.45; 1b)

I

O

0

`a

1

123.45

2

1b

(1 2; 3 4)

I

O

0

1 2

1

3 4

The first two examples demonstrate ranges of a collectionof atoms. The last example has a range comprised of lists.

A list not only looks like a map, it is a map whosenotation is a shortcut for the I/O table assignment. This is a useful way oflooking at things. We shall see inPrimitive Operations that anested list can be viewed as a multivalent map whose range is atoms.

From the perspective of list as map, the fact that indexingoutside the bounds of a list returns null means the map is implicitly extendedto the domain of all integers with null values outside the list items.

Data complexity is built by using lists as items of lists.

Depth

Now that we're comfortable with simple lists, we return togeneral lists. We can nest by including lists as items of lists. The number oflevels of nesting for a list is called itsdepth. Atoms are consideredto have depth 0 and simple lists have depth 1.

The notation of complex lists reflects their nesting. Forpedagogical purposes, in this section, we shall often use general notation todefine even simple lists; however, the console always display lists insimplified form. In subsequent sections, we shall use only simplified notationfor simple lists.

Following is a list of depth 2 that has three items, thefirst two being atoms and the last a list.

        L1: (1;2;(100;200))
        count L1
3

Following is the simplified notation for the inner list,

        L1:(1;2;100 200)
        L1
1
2
100 200

We present a pictorial representation that may help invisualizing levels of nesting. An atom is represented as a circle containingits value. A list is represented as a box containing its items. A general listis a box containing boxes and atoms.

Following is a list of depth two having two elements, eachof which is a simple list,

        L2:((1;2;3);(`ab;`c))
        L2
1 2 3
`ab`c
        count L2
2

Following is a list of depth two having three elements,each of which is a general list,

        L3:((1;2h;3j);("a";`bc);(1.23;4.56e))
        L3
(1;2h;3j)
("a";`bc)
(1.23;4.55999994278e)
        count L3
3

Following is a list of depth two having one item that is asimple list,

        L4:enlist 1 2 3 4
        L4
1 2 3 4
        count L4
1
        L4[0]
1 2 3 4

Following is list of depth three having two items. Thesecond item is a list of depth two having three items, the last of which is asimple list of four items.

        L5:(1;(100;200;(1000;2000;3000;4000)))
        L5
1
(100;200;1000 2000 3000 4000)
       count L5
2
       count L5[1]
3

Following is a "rectangular" list that can bethought of as a 3x4 matrix,

        m:((11;12;13;14);(21;22;23;24);(31;32;33;34))
        m
11 12 13 14
21 22 23 24
31 32 33 34

It is possible to index directly into the items of a nestedlist.

Retrieving an item via a single index always retrieves anuppermost item from a nested list.

        L:(1;(100;200;(1000;2000;3000;4000)))
        L[0]
1
        L[1]
100
200
1000 2000 3000 4000

Recalling that q evaluates expressions from right-to-left,we interpret the second retrieval above as,

·        Retrieve the item at index 1from L

Alternatively, reading it functionally as left-of-right,

·        Retrieve from L the item atindex 1

Since the result L[1] is itself a list, we canretrieve its elements using a single index.

        L[1][2]
1000 2000 3000 4000

Read this as:

·        Retrieve the item at index 2from the item at index 1 in L

or,

·        Retrieve the item at index 1from L, and from it retrieve the item at index 2

We can repeat single indexing once more to retrieve an itemfrom the innermost nested list.

        L[1][2][0]
1000

Read this as,

·        Retrieve the item from index 0from the item at index 2 in the item at index 1 in L

or,

·        Retrieve the item at index 1from L, and from it retrieve the item at index 2, and from it retrieve the itemat index 0

There is an alternate notation for repeated indexing intothe constituents of a nested list. The last retrieval can also be written as,

        L[1;2;0]
1000

Retrieving inner items for a nested list with this notationis called indexing at depth.

Important:The semicolons in indexing at depth are critical.

Assignment via index also works at depth.

        L:(1;(100;200;(1000 2000 3000 4000)))
        L[1;2;0]:999
        L
1
(100;200; 999 2000 3000 4000)

To verify that the notation for indexing at depth isreasonable, we return to our matrix example,

        m:((11;12;13;14);(21;22;23;24);(31;32;33;34))
        m[0;2]
13
        m[0][2]
13

The indexing at depth notation suggests thinking of mas a multi-dimensional matrix, whereas repeated single indexing suggeststhinking ofm as an array of arrays.Chacun à son goût.

A list of positions can be used to index a list.

In this section, we begin to see the power of q formanipulating lists. We start with,

        L1:100 200 300 400

We know how to index single items of the list

        L1[0]
100
        L1[2]
300

By extension, we can retrieve a list of multiple items viamultiple indices,

        L1[0 2]
100 300

The indices can be in any order, and the correspondingitems are retrieved,

        L1[3 2 0 1]
400 300 100 200

An index can be repeated,

        L1[0 2 0]
100 300 100

Some more examples,

        bits:01101011b
        bits[0 2 4]
011b
        chars:"beeblebrox"
        chars[0 7 8]
"bro"

This explains why including the semi-colon separators isessential when indexing at depth. Leaving them out effectively specifiesmultiple indices, and you will get a corresponding list of values from the toplevel as a result.

You have no doubt noticed that retrieving items viamultiple indices looks just like we've substituted a list for the index.Indeed, this is exactly what is happening. Here are some examples of a simpleindex list,

        I:3 2 0
        L1[I]
400 300 100
         L2:(-10.0;3.1415e;1b;`abc;"z")
         L2[I]
`abc
1b
-10f
        L3:(1;(100;200;(1000;2000;3000;4000));5;(600 700))
        L3
1
(100 200; 1000 2000 3000 4000)
5
600 700
        J:2 1 0
       L3[J]
5
(100 200; 1000 2000 3000 4000)
1

Observe that in every case, the result of indexing a givenlist via a simple list is a new list whose values are retrieved from the firstlevel of the given list and whose shape is the same as the index list. Inparticular, the retrieved list has the same shape as the index list. Thissuggests the behavior with an index that is a non-simple list.

        L1:100 200 300 400
        L1[(0 1; 2 3)]
100 200
300 400
        I:(1;(0;(3 2)))
        L1[I]
200
(100;400 300)

To figure out the result of indexing by any non-simplelist, start with the fact that the result always has the same shape as theindex.

Advanced:More precisely, the result of indexing via a list conforms to the index list.The notion ofconformability of lists is defined recursively. All atomsconform. Two lists conform if they have the same number of items and each oftheir corresponding items conform. In plain language, two lists conform if theyhave the same shape.

Recall that a list item can be assigned via item indexing,

        L:100 200 300 400
        L[0]:1000
        L
1000 200 300 400

Assignment via index extends to indexing via a simple list.

        L:100 200 300 400
        L[1 2 3]:2000 3000 4000
        L
100 2000 3000 4000

Note:Assignment via a simple index list is processed in index order - i.e., fromleft-to-right. Thus,

        L[3 2 1]:999 888 777

is equivalent to,

        L[3]:999
        L[2]:888
        L[1]:777

Consequently, in the case of a repeated item in the indexlist, the right-most assignment prevails.

        L:100 200 300 400
        L[0 1 0 3]:1000 2000 3000 4000
        L
3000 2000 300 4000

You can assign a single value to multiple items in a listby indexing on a simple list and using an atom for the assignment value.

        L:100 200 300 400
        L[1 3]:999
        L
100 999 300 999

Now that we're familiar with retrieving and assigning viaan index list, we introduce a simplified notation. It is permissible to leaveout the brackets and juxtapose the list and index with a separating blank. Someexamples follow.

        L:100 200 300 400
        L[0]
100
        L 0
100
        L[2 1]
300 200
        L 2 1
300 200
        I:2 1
        L[I]
300 200
        L I
300 200
        L[::]
100 200 300 400
        L ::
100 200 300 400

Which notation you use is a matter of personal preference.In this manual, we usually use brackets, since this notation is probably mostfamiliar from verbose programming. Experienced q programmers often usejuxtaposition since it reduces notational density.

The dyadic primitive find ( ? ) returns the indexof the right operand in the left operand list.

       1001 1002 1003?1002
1

Performing find on a list is the inverse to positionalindexing because it maps an item to its position.

If you try to find an item that is not in the list, theresult is an int equal to the count of the list.

        1001 1002 1003?1004
3

The way to think of this result is that the position of anitem that is not in the list is one past the end of the list, which is where itwould be if you were to append it to the list.

Of course, find extends to lists of items.

        1001 1002 1003?1003 1001
2 0

We return to the situation of indexing at depth for nestedlists. For simplicity, let's start with a list that looks like a matrix.

        m:(1 2 3 4; 100 200 300 400; 1000 2000 3000 4000)

Analogy with traditional matrix notation suggests that wecould retrieve a row or column fromm by providing a"partial" index at depth. Indeed, this works.

        m[1;]
100 200 300 400
        m[;3]
4 400 4000

Observe that eliding the last index reduces to itemindexing at the top level.

        m[1;]
100 200 300 400
        m[1]
100 200 300 400

Note:In the previous example, the two syntactic forms have the same result, but thefirst more clearly connotes the situation.

The situation of eliding other than the first index is moreinteresting. The way to readm[;3] above is,

·        Retrieve the items in the thirdposition from all items at the top level of m

Let's tackle another level of nesting.

        L:((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the"))
        L
(1 2 3;4 5 6 7)
(`a`b`c`d;`z`y`x`;`0`1`2)
("now";"is";"the")
        L[;1;]
4 5 6 7
`z`y`x`
"is"
        L[;;2]
3 6
`c`x`2
"w e"

Interpret L[;1;] as,

·        Retrieve all items in thesecond position of each list at the top level

Interpret L[;;2] as,

·        Retrieve the items in the thirdposition for each list at the second level

Observe that in L[;;2] the attempt to retrieve theitem at the third position of the string "is" resulted in the nullvalue " "; hence the blank in "w e" of the result.

Recommendation:In general, it will make things more evident if you donot omit trailingsemi-colons when eliding indices. For example, with L as above,

        L[ ;;]                 / instead of L[]
        L[1;;]                / instead of L[1]
        L[;1;]                / instead of L[;1]

As the final exam for this section, let's combine an elidedindex with indexing by simple arrays. LetL be as above. Then we canretrieve a cross-section ofL using a combination of elided and listindices.

        L[0 2;;0 1]
(1 2;4 5)
("no";"is";"th")

Interpret this as,

·        Retrieve the items frompositions 0 and 1 from all columns in rows 0 and 2

In this section, we further investigate the matrix-likelists from the previous section. A "rectangular" list is a list oflists, all having the same count. Understand that this does not mean that arectangular list is necessarily a traditional matrix, since there can beadditional levels of nesting. For example, the following list is rectangularbecause each of its items has count three, but is not a matrix.

        L:(1 2 3; (10 20; 100 200; 1000 2000))
        L
1         2         3
10   20   100  200  1000 2000

In a rectangular list, elision of the second indexcorresponds to generalized row retrieval and elision of the first indexcorresponds to generalized column retrieval.

        r:(`a`b`c;(1 2 3 4;10 20 30 40;100 200 300 400))
        r[0;]
`a`b`c
        r[;1]
`b
10 20 30 40

Advanced:A rectangular list can be transposed withflip (seeflip), meaning that that therows and columns are reflected, effectively reversing the first two indices inindexing at depth. For example, the transpose ofL above is,

        flip L
1 10 20
2 100 200
3 1000 2000

Matrices are a special case of rectangular lists and canmost easily be defined recursively. Amatrixof dimension 1 is a simplelist. In the context of mathematical operations, the simple list would havenumeric type, but this is not a restriction. The count of a one-dimensionalmatrix is called itsize. In some contexts, a simple one-dimensionalmatrix is called a vector, its countlength, and an atom is a scalar.Some examples.

        v1:1 2 3
        v2:98.60 99.72 100.34 101.93
        v3:`so`long`and`thanks`for`all`the`fish

For n>1, we define a matrix of dimension n recursivelyas a list of matrices of dimensionn-1 all having the same size. Thus, amatrix of dimension 2 is a list of matrices of dimension 1, all having the samesize. If all items in a matrix have the same type, we call this thetypeof the matrix.

Two-dimensional matrices are frequently encountered andhave special terminology. Letm be a two-dimensional matrix. The itemsofm are its rows. As we have already seen, theithrow of m can be obtained via