資料儲存---記憶體列式資料庫KDB+(Q)文件

阿新 • • 發佈：2019-02-08

Kx systems公司的創始人之一Arthur Whitney在2003年研發了列式資料庫KDB和它的操作語言Q。官網：www.kx.com

主要Feature:

記憶體內的資料庫：理解KDB的一種方式就是KDB是一個記憶體資料庫，但擁有磁碟可持久化能力。
解釋性語言：開發週期更短,q語言要做到簡潔，高效和富表達性。(當然學習曲線也不是一般般滴說)
列表是有順序的：不同於資料庫中的行，因為列表有序，所以資料表也有序
從右往左解析 (q的起源受到多種語言的啟示,包括APL、LISP和函數語言程式設計。)
面向表（就像其他語言使用字串一樣頻繁）
面向列：關係型資料庫按行處理資料和儲存資料，kdb是按列存資料，對資料進行運算也是直接作用在列向量上。

強型別
Null值擁有特殊含義（詳細見後面的文件）
內建I/O的支援（很簡潔）

KDB+(Q）入門（以下文件來自網路，記下備用）

Now that we know how q works and how to start it up, let'sexamine some real code that shows the power of q. The following program reads acsv file of time-stamped symbols and prices, places the data into a table andcomputes the maximum price for each day. It then opens a socket connection to aq process on another machine and retrieves a similar daily aggregate. Finally,it merges the two intermediate tables and appends the result to an existingfile.

sample:{

 t:("DSF"; enlist ",") 0: `:c:/q/data/px.csv;

 tmpx:select mpx:max Price by Date,Sym from t;

 h:hopen `:aerowing:5042;

 rtmpx:h "select mpx:max Price by Date,Sym from tpx";

 hclose h;

 .[`:c:/q/data/tpx.dat; (); ,; rtmpx,tmpx]

All data is ultimately built from atoms, so we begin withatoms. An atom

is an irreducible value with a specific data type. Thebasic data types in q correspond to those of SQL with some additional date andtime related types that facilitate time series. We summarize the data types inthe tables below, giving the corresponding types in SQL, and where appropriateJava and C#. We cover enumerations inCasting and Enumerations.

Q	SQL	Java	C#
boolean	boolean	Boolean	Boolean
byte	byte	Byte	Byte
short	smallint	Short	Int16
int	int	Integer	Int32
long	bigint	Long	Int64
real	real	Float	Single
float	float	Double	Double
char	char(1)	Character	Char
symbol	varchar	(String)	(String)
date	date	Date
datetime	datetime	Timestamp	!DateTime
minute
second
time	time	Time	!TimeSpan
enumeration

Note:The words boolean, short, int, etc. arenot keywords in q, so they arenot displayed in a special font in this text. They do have special meaning whenused as name arguments in some operators. You should avoid using them as names.

The next table collects the important information abouteach of the q data types. We shall refer to this in subsequent sections.

type	size	char type	num type	notation	null value
boolean	1	b	1	1b
byte	1	x	4	0x26	0x00
short	2	h	5	42h	0Nh
int	4	i	6	42	0N
long	8	j	7	42j	0Nj
real	4	e	8	4.2e	0Ne
float	8	f	9	4.2	0n
char	1	c	10	"z"	" "
symbol	*	s	11	`zaphod	`
month	4	m	13	2006.07m	0Nm
date	4	d	14	2006.07.21	0Nd
datetime	4	z	15	2006.07.21T09:13:39	0Nz
minute	4	u	17	23:59	0Nu
second	4	v	18	23:59:59	0Nv
time	4	t	19	09:01:02:042	0
enumeration	*	`u$v
dictionary	99	`a`b`c!10 20 30
table	98	([] c1:`a`b`c; c2:10 20 30)

The basic integer data type is common to nearly allprogramming environments.

int

An int is a signed four-byte integer. A numeric value isidentified as an int by that fact that it contains only numeric digits,possibly with a leading minus sign,without a decimal point. Inparticular, it has no trailing character that would indicate that it is anothernumeric type (see below). Here is a typical int value,

The other two integer data types are short and long. Theshort type represents a two byte signed integer and is denoted by a trailing'h' after optionally signed numeric digits. For example,

        b:-123h

-123h

Similarly, the long type represents an eight byte signedlong integer denoted by a trailing 'j' after optionally signed numeric digits.

        c:1234567890j

1234567890j

Important:Type promotion is performed automatically in q primitive operations. However,if a specific integer type is required in a list and a narrower type ispresented - e.g., an int is expected and a short is presented - the submittedtype willnot be automatically promoted and an error will result.This may be unintuitive for programmers coming from languages of C ancestry,but it will make sense in the context of tables.

Single and double precision floating point data types aresupported.

float

The float type represents an IEEE standard eight-bytefloating point number, often called "double" in other languages. Itis denoted by optionally signed numeric digits containing a decimal point withan optional trailing 'f'. A floating point number can hold at least 15 decimaldigits of precision.

For example,

        pi:3.14159265

        float1:1f

real

The real type represents a four-byte floating point numberand is denoted by numeric digits containing a decimal point and a trailing 'e'.Keep in mind that this type is called 'float' in some languages. A real canhold at least 6 decimal digits of precision, 7 being the norm. Thus

        r:1.4142e

1.4142e

is a valid real number.

Note:The q console abbreviates the display of float or real values having zeros tothe right of the decimal.

2.0

2f

        4.00e

4e

The behavior of substituting floating point types ofdifferent widths is analogous to the case of integer types.

Both float and real values can be specified in IEEEstandard scientific notation for floating point values.

        f:1.23456789e-10

        r:1.2345678e-10e

By default, the q console displays only seven decimaldigits of accuracy for float and real values by rounding the display in theseventh significant digit.

1.234568e-10

1.234568e-10e

You can change this by using the \P command (noteupper case) to specify a display width up to 16 digits.

        f12:1.23456789012

        f16:1.234567890123456

        \P 12

f12

1.23456789012

f16

1.23456789012

        \P 16

f12

1.23456789012

f16

1.234567890123456

Binary data can be represented as bit or byte values.

The boolean type uses one byte to store an individual bitand is denoted by the bit value followed by 'b'.

        bit:0b

bit

0b

byte

The byte type uses one byte to store 8 bits of data and isdenoted by '0x' followed by a hexadecimal value,

        byte:0x2a

In handling binary data, q is more like C than itsdescendants, in that both binary types are considered to be unsigned integersthat can participate in arithmetic expressions or comparisons with othernumeric types. There are no keywords for 'true' or 'false', nor are thereseparate logical operators. With a and pi as above,

        a:42

        bit:1b

        a+bit

is an int and

        byte+pi

45.14159

is a float. Observe that type promotion has been performedautomatically.

There are two atomic character types in q. They resemblethe SQL types CHAR and VARCHAR more than the character types of verboselanguages.

char

A char holds an individual ASCII character and is stored inone byte. This corresponds to a SQL CHAR. A char is denoted by a singlecharacter enclosed in double quotes.

        ch:"q"

ch

"q"

Some keyboard characters, such as the double-quote, cannotbe entered directly into a char since they have special meaning in q. As in C,these characters are escaped with a preceding back-slash ( \ ). While theconsole display also includes the escape, these are actually single characters.

        ch:"\""                        / double-quote

        ch                              / console also displays the escape "\""

        ch:"\\"                        / back-slash

        ch:"\n"                        / newline

        ch:"\r"                        / return

        ch:"\t"                         / horizontal tab

You can also escape a character with an underlying numericvalue expressed as three octal digits.

        "\142"

"b"

A symbol holds a sequence of characters as a single unit. Asymbol is denoted by a leading back-quote (` ), also read "backtick" in q circles.

        s1:`q

        s2:`zaphod

A symbol is irreducible, meaning that the individualcharacters that comprise it arenot directly accessible. Symbols areoften used in q to hold names of other entities.

Important:A symbol isnot a string. We shall see inlists that there is an analogue of strings inq, namely a list of char. While a list of char is a kissing cousin to a symbol,we emphasize that a symbol isnot made up of char. The symbol`a and thechar "a" are not the same. The char"q" and thesymbol`kdb are both atomic entities.

Advanced:Youmay ask whether a symbol can include embedded blanks and special characterssuch as back-tick. The answer is yes. You create such a symbol using therelationship between lists of char and symbols. See Creating Symbols from Stringsfor more on this.

        `$"A symbol with `backtick"

`A symbol with `backtick

Note:A symbol is somewhat akin a SQL VARCHAR, in that it can hold and arbitrarynumber of characters. It is different in that it is atomic. The char "q"and the symbol `kdb are both atomic entities.

A major benefit of q is that it can process both timeseries and relational data in a consistent and efficient manner. Q extends thebasic SQL date and time data types to facilitate temporal arithmetic, which isminimal in SQL and can be clumsy in verbose languages (e.g., Java's datelibrary and its use of time zones). We begin with the equivalents to SQLtemporal types. The additional temporal types in q deal with constituents of adate or time.

date

A date is stored in four bytes and is denoted by yyyy.mm.dd,where yyyy represents the year, mm the month and dd theday. A date value stores the count of days from Jan 1, 2000.

        d:2006.07.04

2006.07.04

Important:Months and days begin at 1 (not zero) so January is '01'.

Leading zeroes in months and days are required; theiromission causes an error.

        bday:2007.1.1

'2007.1.1

Advanced:The underlying day count can be obtained by casting to int.

        `int$2000.02.01

time

A time is stored in four bytes and is denoted by hh:mm:ss.uuuwhere hh represents hours on the 24-hour clock, mm representsminutes, ss represents seconds, and uuu represents milliseconds.A time value stores the count of milliseconds from midnight.

        t:09:04:59.000

09:04:59:000

Again, leading zeros are required in all constituents of atime.

Advanced:The underlying millisecond count can be obtained by casting to int.

        `int$12:34:56.789

45296789

A datetime is the combination of a date and a time,separated by 'T' as in the ISO standard format. A datetime value stores thefractional day count from midnight Jan 1, 2000.

        dt:2006.07.04T09:04:59:000

dt

2006.07.04T09:04:59:000

Advanced:The underlying fractional day count can be obtained by casting to float.

        `float$2000.02.01T12:00:00.000

31.5

month

The month type uses four bytes and is denoted by yyyy.mmwith a trailing 'm'. A month values stores the count of months since thebeginning of the year.

        mon:2006.07m

mon

2006.07m

Advanced:The underlying month offset can be obtained by casting to int.

        `int$2000.04m

The minute type uses four bytes and is denoted by hh:mm.A minute value stores the count of minutes from midnight.

        mm:09:04

mm

09:04

Note:We did not usemin for the variable name becausemin is a reserved name inq.

Advanced:The underlying minute offset can be obtained by casting to int.

        `int$01:23

The second type uses four bytes and is denoted by hh:mm:ss.A second value stores a count of seconds from midnight.

        sec:09:04:59

sec

09:04:59

The representation of the second type makes it look like aneveryday time value. However, a q time value is a count of milliseconds frommidnight, so the underlying values are different.

Advanced:The underlying values can be obtained by casting to int. This manifests theinequality.

        `int$12:34:56

        `int$12:34:56.000

45296000

        12:34:56=12:34:56.789

0b

The constituents of dates, times and datetimes can be extractedusing dot notation. The individual field values are all extracted as int. Thefield values of a date are named 'year', 'mm' and 'dd'.

        d:2006.07.04

        d.year

        d.mm

        d.dd

Similarly, the field values of time are 'hh', 'mm', 'ss'.

        t:12:45:59.876

        t.hh

        t.mm

        t.ss

Note:At the time of this writing (Jun 2007) there is no syntax to retrieve themillisecond constituent. Use the construct,

        t mod 1000

In addition to the individual field values, you can alsoextract higher-order constituents.

        d.month

2007.07m

        t.minute

12:45

        t.second

12:45:59

Of course, this works for a datetime as well.

        dt:2006.07.04T12:45:59.876

        dt.date

2006.07.04

        dt.time

12:45:59.876

        dt.month

2006.07m

        dt.mm

        dt.minute

12.45

Advanced:It is a quirk in q that dot notation for accessing temporal constituents doesnot work on function arguments. For example,

        fmm:{[x] x.mm}

        fmm 2006.09.15

{[x] x.mm}

'x.mm

Instead, cast to the constituent type,

        fmm:{[x] `mm$x}

        fmm 2006.09.15

In addition to the regular numeric and temporal values,special values represent infinities, whose absolute values are greater than any“normal” numeric or temporal value.

Token	Value
0w	Positive float infinity
0W	Positive int infinity
0Wh	Positive short infinity
0Wj	Positive long infinity
0Wd	Positive date infinity
0Wt	Positive time infinity
0Wz	Positive datetime infinity
0n	NaN, or not a number

Important:Observe the distinction between lower case 'w' and upper case 'W'.

The result of dividing any positive (or unsigned) non-zerovalue by any zero value is positive float infinity, denoted0w.Dividing a negative value by zero results in negative float infinity, denotedby-0w. The way to remember these is that 'w' looks like the infinitysymbol ∞.

The integral infinities can not be produced via anarithmetic division on normal int values, since the result of division in q isalways a float.

The result of dividing any 0 value by any zero value isundefined, so q represents this as the floating point null 0n.

The q philosophy is that any valid arithmetic expressionwill produce a result rather than an error. Therefore, dividing by 0 produces aspecial float value rather than an exception. You can perform a complexsequence of calculations without worrying about things blowing up in the middleor inserting cumbersome exception trapping. We shall see more about this inPrimitive Operations.

Advanced:While infinities can participate in arithmetic operations, infinite arithmeticis not implemented. Instead, q performs the operation on the underlying bitpatterns. Math propeller heads (including the author) find the followingdisconcerting.

        0W-2

2147483645

        2*0W

-2

The concept of a null value generally indicates missingdata. This is an area in which q differs from both verbose programminglanguages and SQL.

In such languages as C++, Java and C#, the concept of anull value applies to complex entities (i.e., objects) that are accessedindirectly by pointer or by reference. A null value for such an entitycorresponds to an un-initialized pointer, meaning that it has not been assignedthe address of an allocated block of memory. There is no concept of null forentities that are of simple or value type. For those types that admit null, youtest for being null by asking if the value is equal to null.

The NULL value in SQL indicates that the data value isinapplicable or missing. The NULL value is distinct from any value that canactually be contained in a field and does not have '=' semantics. That is, youcannot test a field for being null with = NULL. Instead, you ask if it IS NULL.Because NULL is a separate value, Boolean fields actually have three states: 0,1 and NULL.

In q, the situation is more interesting. While most typeshave distinct null values, some types have no designated way of representing anull value.

The following table summarizes the way nulls are handled.

type	null
boolean	0b
byte	0x00
short	0Nh
int	0N
long	0Nj
real	0Ne
float	0n
char	" "
sym	`
month	0Nm
date	0Nd
datetime	0Nz
minute	0Nu
second	0Nv
time	0Nt

Let's start with the binary types. As you can see, theyhave no special null value, which means that null is equivalent to the valuezero. Consequently, you cannot distinguish between a missing boolean value andthe value that represents false.

In practice, this isn't an issue, since in mostapplications it isn't a critical distinction. It can be a problem if thedefault value of a boolean flag in your application is not zero, so you mustensure that this does not occur. A similar precaution applies to byte values.

Next, observe that all the numeric and temporal types havetheir own designated null values. Here the situation is similar to SQL, in thatyou can distinguish missing data from data whose underlying value is zero. Thedifference from SQL is that there is no universal null value.

The advantage of the q approach is that the null valueshave equals semantics. The tradeoff is that you must use the correct null valuein type-checked situations.

Finally, we consider the character types. Considering asymbol to a variable length character collection justifies why the symbol nullvalue is the empty symbol, designated by a back-tick (` ).

In contrast, the null value for the char type is the charconsisting of the blank character ( " " ). As with binary data, youcannot distinguish between a missing char value and a blank value. Again, thisis not seriously limiting in practice, but you should ensure that yourapplication does not rely on this distinction.

Note:The value"" isnot the char null. Instead, it is the empty list of char.

Data complexity is built up from atoms, which we know, andlists. It is important to achieve a thorough understanding of lists sincenearly all q programming involves processing lists. The concepts are simple butcomplexity can build rapidly. Our approach is to introduce the basic notion ofa general list in the first section, take a quick detour to cover simple andsingleton lists, then return to cover general lists in more detail.

A list is simply an ordered collection. A collection ofwhat, you ask. More precisely, alist is an ordered collection of atomsand other lists. Since this definition is recursive, let's start with thesimplest case in which the list comprises only atoms.

The notation for a general list encloses its items withinmatching parentheses and separates them with semicolons. For readability,optional whitespace is used after the semicolon separators in the last example.

        (1;2;3)

        ("a";"b";"c";"d")

        (`Life;`the;`Universe;`and;`Everything)

        (-10.0; 3.1415e; 1b; `abc; "z")

In the preceding examples, the first three lists are simple,meaning that the list comprises atoms of uniform type. The last example is agenerallist, meaning that it is not simple. Otherwise put, a general list containsitems that are not atoms of a uniform type. This could be atoms of mixed type,nested lists of uniform type, or nested lists of mixed type.

Important:The order of the items in the list is positional (i.e., left-to-right) and ispart of its definition. The lists(1;2) and(2;1) are different. SQLis based on sets, which are inherently unordered. This distinction leads tosome subtle differences between the results of queries on q tables versus theresult sets from analogous SQL queries. The inherent ordering of lists makestime series processing natural and fast in q, while it is cumbersome andperforms poorly in standard SQL.

Lists can be assigned to variables exactly like atoms.

        L1:(1;2;3)

        L2:("z";"a";"p";"h";"o";"d")

        L3:(`Life;`the;`Universe;`and;`Everything)

        L4:(0b;1b;0b;1b;1b;0b)

        L5:(-10.0;3.1415e;1b;`abc;"z")

count

The number of items in a list is its count. You canobtain the count of a list as follows,

        count L1

This is our first example of a function, which we willlearn about in Functions. For now, we need onlyunderstand that count returns an int value equal to the number ofitems in a list to its right.

Observe that the count of any atom is 1.

        count 42

        count `abcd

A simple list - that is, a list of atoms of a uniform type- corresponds to the mathematical notion of avector. Such lists aretreated specially in q. They have a simplified notation, take less storage andcompute faster than general lists. Of course, you can use general list notationfor a vector, but q converts a general list to a vector whenever feasible.

A simple list of any numeric type omits the enclosingparentheses and replaces the separating semi-colons with blanks. The followingtwo expressions for a simple list of int are equivalent,

        (100;200;300)

        100 200 300

This is confirmed by the console display,

        (100;200;300)

100 200 300

Similar notation is used for simple lists of short and longwith the addition of the type indicator.

        H:(1h;2h;255h)

1 2 255h

We conclude that a trailing type indicator in the displayapplies to the entire list and not just the last item of the list; otherwise,the list would not be simple and would be displayed in general form.

        G:(1; 2; 255h)

255h

Simple lists of float and real are notated similarly.Observe that the q console suppresses the decimal point when displaying a floathaving zero(s) to the right of the decimal, but the value is not an int.

        F:(123.4567;9876.543;99.0)

123.4567 9876.543 99

This notational efficiency for float display means that alist of floats having no decimal parts displays with a trailingf.

        FF:1.0 2.0 3.0

FF

1 2 3f

The simplified notation for a simple list of binary datajuxtaposes the individual data values together with a type indicator. The typeindicator for boolean trails the value.

        bits:(0b;1b;0b;1b;1b)

        bits

01011b

The indicator for byte leads,

        bytes:(0x20;0xa1;0xff)

        bytes

0x20a1ff

Note:A simple list of boolean atoms requires the same number of bytes to store as ithas atoms. While the simplified notation is suggestive, multiple bits arenotcompressed to fit inside a single byte. The list bits above holds itsvalues in 5 bytes of storage.

The simplified notation for simple lists of symbolsjuxtaposes the individual atoms with no intervening whitespace.

        symbols:(`Life;`the;`Universe;`and;`Everything)

        symbols

`Life`the`Universe`and`Everything

Inserting spaces between the atoms causes an error.

        bad:`This `is `wrong

'is

The simplified notation for a list of char looks just likea string in most languages, with the juxtaposed sequence of characters enclosedin double quotes.

        chars:("s";"o";" ";"l";"o";"n";"g")

        chars

"so long"

Note:A simple list of char is called astring.

Lists can be defined using simplified notation,

        L:100 200 300

        H:1 2 255h

        F:123.4567 9876.543 99.99

        bits:01011b

        bytes:0x20a1ff

        symbols:`Life`the`Universe`and`Everything

        chars:"so long"

Finally, we observe that a list entered as intermixed intsand floats is converted to a simple list of floats.

        1 2.0 3

1 2 3f

Specifying a list of mixed temporal types has a differentbehavior from that of a list of mixed numeric types. In this case, the listtakes the type of the first item in the list; other items are widened ornarrowed to match.

        12:34 01:02:03

12:34 01:02

        01:02:03 12:34

01:02:03 12:34:00

To force the type of a mixed list of temporal values,append a type specifier.

        01:02:03 12:34 11:59:59.999u

01:02 12:34 11:59

Lists with one or no items merit special consideration.

It is useful to have lists with no items. A pair ofparentheses with nothing (except possibly whitespace) between denotes the emptylist.

        L:(  )

We shall see in Creating Typed Empty Liststhat it is possible to define an empty list with a specific type.

There is a quirk in q regarding how it handles a listcontaining a single item, called asingleton. Creation of a singletonpresents a notational problem. To see the issue, first realize that a listcontaining a single atom is distinct from the individual atom. As any UPSdriver will readily tell you, an item in a box is not the same as an unboxeditem. By now, we recognize the following as atoms,

1b

        0x2a

        `beeblebrox

"z"

We also recognize the following are all lists with twoelements,

        (42;6)

01b

        `zaphod`beeblebrox

        "zb"

        (40;`two)

How to create a list of a single item? Good question. Theanswer is that there is no syntactic way to do so. You might think that youcould simply enclose the item in parentheses, but this doesn't work since theresult is an atom.

        singleton:(42)

        singleton

The reason for this is that parentheses are used formultiple purposes in q. As we have seen, paired parentheses are used to delimititems in the specification of a general list. Paired parentheses are also usedfor grouping in expressions - that is, to isolate the result of the expressioninside the parentheses. The latter usage forces (42) to be the same as the atom42 and so precludes the intention in the specification ofsingletonabove.

The way to make a list with a single item is to use the enlistfunction, which returns a singleton list containing what is to its right.

        singleton:enlist 42

        singleton

,42

To distinguishbetween an atom and the equivalent singleton, examine the sign of their types.

        signum type 42

-1

        signum type enlist 42

As a final check before moving on, make sure that youunderstand that the following also defines a list containing a single item,

        singleton:enlist 1 2 3

        count singleton

Recall that a list is ordered from left to right by theposition of its items. The offset of an item from the beginning of the list iscalled itsindex. Thus, the first item is has index 0, the second item(if there is one) has index 1, etc. A list of count n has index domain 0 ton-1.

Given a list L, the item at index i isaccessed by L[i]. Retrieving an item by its index is calleditemindexing. For example,

        L:(-10.0;3.1415e;1b;`abc;"z")

        L[0]

-10f

        L[1]

3.1415e

        L[2]

1b

        L[3]

`abc

        L[4]

"z"

Items in a list can also be assigned via item indexing.Thus,

        L1:1 2 3

        L1[2]:42

L1

1 2 42

Important:Index assignment into a simple list enforces strict type matching with no typepromotion. Otherwise put, when you reassign an item in a simple list, the typemust match exactly and a narrower type is not widened.

        L:100 200 300

        L[1]:42h

'type

        f:100.0 200.0 300.0

100 200 300f

        f[1]:400

'type

This may come as a surprise if you are accustomed tonumeric values always being promoted to wider types in a verbose language.

Providing an invalid data type for the index results in anerror.

        L:(-10.0;3.1415e;1b;`abc;"z")

        L[`1]

'type

If you attempt to index outside of the bounds of the list,the result is not an error. Rather, you get a null value. If the list issimple, this is the null for the type of atoms in the list. For general lists,the result is0n.

        L[5]

0n

One way to understand this is that the result of asking fora non-existent index is "missing value." Keep this in mind, sinceindexing one position past the end of the list is easy to do, especially ifyou're not used to indexing relative to 0.

An empty index returns the entire list.

L[]

-10f

3.1415e

1b

`abc

"z"

Note:An empty index isnot the same as indexing with an empty list. Thelatter returns an empty list.

        L[()]

The syntactic form double-colon ( :: ) denotes thenull item, which allows explicit notation or programmatic generation of anempty index.

        L[::]

-10f

3.1415e

1b

`abc

"z"

Advanced:The type of the null item is undefined; in particular, its type does not matchthat of any normal item in a list. As a consequence, inclusion of the null itemin a list forces the list to be general.

        L:(1;2;3;::)

::

        type L

0h

This can be used to avoid a nasty surprise when q is tooclever. To see how, consider the general list,

        L:(1;2;3;`a)

        type L

0h

Now, reassign the last item to an int and note what happensto the list.

        L[3]:4

1 2 3 4

        type L

6h

The list has been converted to a simple list of int! Asubsequent attempt to reassign the last item back to its original value failswith a type error.

        L[3]:`a

'type

This can be circumvented by placing a null item in thelist, forcing it to remain general.

        L:(1;2;3;`a;::)

        L[3]:4

::

       type L

0h

        L[3]:`a

`a

::

Lists can be created from variables.

        L1:(1;2;100 200)

        L2:(1 2 3;‘ab`c)

        L6:(L1;L2)

L6

1     2   100 200

1 2 3 `ab `c

We scoop our presentation on operations in the next chapterto describe an important operation on lists. Probably the most common operationon two lists is to join them together to form a larger list. More precisely,the join oerator (,) appends its right operand to the end of the left operandand returns the result. It accepts an atom in either argument.

        1 2,3 4 5

1 2 3 4 5

        1,2 3 4

1 2 3 4

        1 2 3,4

1 2 3 4

Observe that if the arguments are not of uniform type, theresult is a general list.

        1 2 3,4.4 5.5

4.4

5.5

        1 2 3,"ab"

"a"

"b"

Note:To accept either a scalar or a list x and produce a uniform shape, use theidiom,

        (),x

which always yields a list with the content of x.

Thus far, we have viewed a list as a static collection ofits items. We can also consider a list to be a mapping provided by itemindexing. Specifically, a listL of count n represents a monadicmapping over the domain of non-negative integers 0,...,n-1. The list mappingassigns the output valueL[ i] to the input value i.Succinctly, the I/O association for the list is,

        i ——> L[ i]

Here are the I/O tables for some basic lists:

101 102 103 104

I	O
0	101
1	102
2	103
3	104

(`a; 123.45; 1b)

I	O
0	`a
1	123.45
2	1b

(1 2; 3 4)

I	O
0	1 2
1	3 4

The first two examples demonstrate ranges of a collectionof atoms. The last example has a range comprised of lists.

A list not only looks like a map, it is a map whosenotation is a shortcut for the I/O table assignment. This is a useful way oflooking at things. We shall see inPrimitive Operations that anested list can be viewed as a multivalent map whose range is atoms.

From the perspective of list as map, the fact that indexingoutside the bounds of a list returns null means the map is implicitly extendedto the domain of all integers with null values outside the list items.

Data complexity is built by using lists as items of lists.

Depth

Now that we're comfortable with simple lists, we return togeneral lists. We can nest by including lists as items of lists. The number oflevels of nesting for a list is called itsdepth. Atoms are consideredto have depth 0 and simple lists have depth 1.

The notation of complex lists reflects their nesting. Forpedagogical purposes, in this section, we shall often use general notation todefine even simple lists; however, the console always display lists insimplified form. In subsequent sections, we shall use only simplified notationfor simple lists.

Following is a list of depth 2 that has three items, thefirst two being atoms and the last a list.

        L1: (1;2;(100;200))

        count L1

Following is the simplified notation for the inner list,

        L1:(1;2;100 200)

L1

100 200

We present a pictorial representation that may help invisualizing levels of nesting. An atom is represented as a circle containingits value. A list is represented as a box containing its items. A general listis a box containing boxes and atoms.

Following is a list of depth two having two elements, eachof which is a simple list,

        L2:((1;2;3);(`ab;`c))

L2

1 2 3

`ab`c

        count L2

Following is a list of depth two having three elements,each of which is a general list,

        L3:((1;2h;3j);("a";`bc);(1.23;4.56e))

L3

(1;2h;3j)

("a";`bc)

(1.23;4.55999994278e)

        count L3

Following is a list of depth two having one item that is asimple list,

        L4:enlist 1 2 3 4

L4

1 2 3 4

        count L4

        L4[0]

1 2 3 4

Following is list of depth three having two items. Thesecond item is a list of depth two having three items, the last of which is asimple list of four items.

        L5:(1;(100;200;(1000;2000;3000;4000)))

L5

(100;200;1000 2000 3000 4000)

       count L5

       count L5[1]

Following is a "rectangular" list that can bethought of as a 3x4 matrix,

        m:((11;12;13;14);(21;22;23;24);(31;32;33;34))

11 12 13 14

21 22 23 24

31 32 33 34

It is possible to index directly into the items of a nestedlist.

Retrieving an item via a single index always retrieves anuppermost item from a nested list.

        L:(1;(100;200;(1000;2000;3000;4000)))

        L[0]

        L[1]

1000 2000 3000 4000

Recalling that q evaluates expressions from right-to-left,we interpret the second retrieval above as,

· Retrieve the item at index 1from L

Alternatively, reading it functionally as left-of-right,

· Retrieve from L the item atindex 1

Since the result L[1] is itself a list, we canretrieve its elements using a single index.

        L[1][2]

1000 2000 3000 4000

Read this as:

· Retrieve the item at index 2from the item at index 1 in L

or,

· Retrieve the item at index 1from L, and from it retrieve the item at index 2

We can repeat single indexing once more to retrieve an itemfrom the innermost nested list.

        L[1][2][0]

Read this as,

· Retrieve the item from index 0from the item at index 2 in the item at index 1 in L

or,

· Retrieve the item at index 1from L, and from it retrieve the item at index 2, and from it retrieve the itemat index 0

There is an alternate notation for repeated indexing intothe constituents of a nested list. The last retrieval can also be written as,

        L[1;2;0]

Retrieving inner items for a nested list with this notationis called indexing at depth.

Important:The semicolons in indexing at depth are critical.

Assignment via index also works at depth.

        L:(1;(100;200;(1000 2000 3000 4000)))

        L[1;2;0]:999

(100;200; 999 2000 3000 4000)

To verify that the notation for indexing at depth isreasonable, we return to our matrix example,

        m:((11;12;13;14);(21;22;23;24);(31;32;33;34))

        m[0;2]

        m[0][2]

The indexing at depth notation suggests thinking of mas a multi-dimensional matrix, whereas repeated single indexing suggeststhinking ofm as an array of arrays.Chacun à son goût.

A list of positions can be used to index a list.

In this section, we begin to see the power of q formanipulating lists. We start with,

        L1:100 200 300 400

We know how to index single items of the list

        L1[0]

        L1[2]

By extension, we can retrieve a list of multiple items viamultiple indices,

        L1[0 2]

100 300

The indices can be in any order, and the correspondingitems are retrieved,

        L1[3 2 0 1]

400 300 100 200

An index can be repeated,

        L1[0 2 0]

100 300 100

Some more examples,

        bits:01101011b

        bits[0 2 4]

011b

        chars:"beeblebrox"

        chars[0 7 8]

"bro"

This explains why including the semi-colon separators isessential when indexing at depth. Leaving them out effectively specifiesmultiple indices, and you will get a corresponding list of values from the toplevel as a result.

You have no doubt noticed that retrieving items viamultiple indices looks just like we've substituted a list for the index.Indeed, this is exactly what is happening. Here are some examples of a simpleindex list,

        I:3 2 0

        L1[I]

400 300 100

         L2:(-10.0;3.1415e;1b;`abc;"z")

         L2[I]

`abc

1b

-10f

        L3:(1;(100;200;(1000;2000;3000;4000));5;(600 700))

L3

(100 200; 1000 2000 3000 4000)

600 700

        J:2 1 0

       L3[J]

(100 200; 1000 2000 3000 4000)

Observe that in every case, the result of indexing a givenlist via a simple list is a new list whose values are retrieved from the firstlevel of the given list and whose shape is the same as the index list. Inparticular, the retrieved list has the same shape as the index list. Thissuggests the behavior with an index that is a non-simple list.

        L1:100 200 300 400

        L1[(0 1; 2 3)]

100 200

300 400

        I:(1;(0;(3 2)))

        L1[I]

(100;400 300)

To figure out the result of indexing by any non-simplelist, start with the fact that the result always has the same shape as theindex.

Advanced:More precisely, the result of indexing via a list conforms to the index list.The notion ofconformability of lists is defined recursively. All atomsconform. Two lists conform if they have the same number of items and each oftheir corresponding items conform. In plain language, two lists conform if theyhave the same shape.

Recall that a list item can be assigned via item indexing,

        L:100 200 300 400

        L[0]:1000

1000 200 300 400

Assignment via index extends to indexing via a simple list.

        L:100 200 300 400

        L[1 2 3]:2000 3000 4000

100 2000 3000 4000

Note:Assignment via a simple index list is processed in index order - i.e., fromleft-to-right. Thus,

        L[3 2 1]:999 888 777

is equivalent to,

        L[3]:999

        L[2]:888

        L[1]:777

Consequently, in the case of a repeated item in the indexlist, the right-most assignment prevails.

        L:100 200 300 400

        L[0 1 0 3]:1000 2000 3000 4000

3000 2000 300 4000

You can assign a single value to multiple items in a listby indexing on a simple list and using an atom for the assignment value.

        L:100 200 300 400

        L[1 3]:999

100 999 300 999

Now that we're familiar with retrieving and assigning viaan index list, we introduce a simplified notation. It is permissible to leaveout the brackets and juxtapose the list and index with a separating blank. Someexamples follow.

        L:100 200 300 400

        L[0]

L 0

        L[2 1]

300 200

        L 2 1

300 200

        I:2 1

        L[I]

300 200

L I

300 200

        L[::]

100 200 300 400

        L ::

100 200 300 400

Which notation you use is a matter of personal preference.In this manual, we usually use brackets, since this notation is probably mostfamiliar from verbose programming. Experienced q programmers often usejuxtaposition since it reduces notational density.

The dyadic primitive find ( ? ) returns the indexof the right operand in the left operand list.

       1001 1002 1003?1002

Performing find on a list is the inverse to positionalindexing because it maps an item to its position.

If you try to find an item that is not in the list, theresult is an int equal to the count of the list.

        1001 1002 1003?1004

The way to think of this result is that the position of anitem that is not in the list is one past the end of the list, which is where itwould be if you were to append it to the list.

Of course, find extends to lists of items.

        1001 1002 1003?1003 1001

2 0

We return to the situation of indexing at depth for nestedlists. For simplicity, let's start with a list that looks like a matrix.

        m:(1 2 3 4; 100 200 300 400; 1000 2000 3000 4000)

Analogy with traditional matrix notation suggests that wecould retrieve a row or column fromm by providing a"partial" index at depth. Indeed, this works.

        m[1;]

100 200 300 400

        m[;3]

4 400 4000

Observe that eliding the last index reduces to itemindexing at the top level.

        m[1;]

100 200 300 400

        m[1]

100 200 300 400

Note:In the previous example, the two syntactic forms have the same result, but thefirst more clearly connotes the situation.

The situation of eliding other than the first index is moreinteresting. The way to readm[;3] above is,

· Retrieve the items in the thirdposition from all items at the top level of m

Let's tackle another level of nesting.

        L:((1 2 3;4 5 6 7);(`a`b`c`d;`z`y`x`;`0`1`2);("now";"is";"the"))

(1 2 3;4 5 6 7)

(`a`b`c`d;`z`y`x`;`0`1`2)

("now";"is";"the")

        L[;1;]

4 5 6 7

`z`y`x`

"is"

        L[;;2]

3 6

`c`x`2

"w e"

Interpret L[;1;] as,

· Retrieve all items in thesecond position of each list at the top level

Interpret L[;;2] as,

· Retrieve the items in the thirdposition for each list at the second level

Observe that in L[;;2] the attempt to retrieve theitem at the third position of the string "is" resulted in the nullvalue " "; hence the blank in "w e" of the result.

Recommendation:In general, it will make things more evident if you donot omit trailingsemi-colons when eliding indices. For example, with L as above,

        L[ ;;]                 / instead of L[]

        L[1;;]                / instead of L[1]

        L[;1;]                / instead of L[;1]

As the final exam for this section, let's combine an elidedindex with indexing by simple arrays. LetL be as above. Then we canretrieve a cross-section ofL using a combination of elided and listindices.

        L[0 2;;0 1]

(1 2;4 5)

("no";"is";"th")

Interpret this as,

· Retrieve the items frompositions 0 and 1 from all columns in rows 0 and 2

In this section, we further investigate the matrix-likelists from the previous section. A "rectangular" list is a list oflists, all having the same count. Understand that this does not mean that arectangular list is necessarily a traditional matrix, since there can beadditional levels of nesting. For example, the following list is rectangularbecause each of its items has count three, but is not a matrix.

        L:(1 2 3; (10 20; 100 200; 1000 2000))

1         2         3

10   20   100  200  1000 2000

In a rectangular list, elision of the second indexcorresponds to generalized row retrieval and elision of the first indexcorresponds to generalized column retrieval.

        r:(`a`b`c;(1 2 3 4;10 20 30 40;100 200 300 400))

        r[0;]

`a`b`c

        r[;1]

`b

10 20 30 40

Advanced:A rectangular list can be transposed withflip (seeflip), meaning that that therows and columns are reflected, effectively reversing the first two indices inindexing at depth. For example, the transpose ofL above is,

        flip L

1 10 20

2 100 200

3 1000 2000

Matrices are a special case of rectangular lists and canmost easily be defined recursively. Amatrixof dimension 1 is a simplelist. In the context of mathematical operations, the simple list would havenumeric type, but this is not a restriction. The count of a one-dimensionalmatrix is called itsize. In some contexts, a simple one-dimensionalmatrix is called a vector, its countlength, and an atom is a scalar.Some examples.

        v1:1 2 3

        v2:98.60 99.72 100.34 101.93

        v3:`so`long`and`thanks`for`all`the`fish

For n>1, we define a matrix of dimension n recursivelyas a list of matrices of dimensionn-1 all having the same size. Thus, amatrix of dimension 2 is a list of matrices of dimension 1, all having the samesize. If all items in a matrix have the same type, we call this thetypeof the matrix.

Two-dimensional matrices are frequently encountered andhave special terminology. Letm be a two-dimensional matrix. The itemsofm are its rows. As we have already seen, thei^throw of m can be obtained via

資料儲存---記憶體列式資料庫KDB+(Q)文件

int

float

real

byte

char

date

time

month

count

Depth

資料儲存---記憶體列式資料庫KDB+(Q)文件

資料庫為什麼會分為“行式儲存”和“列式儲存”呢？

資料倉庫一些整理(列式資料庫)

五大儲存模型關係模型、鍵值儲存、文件儲存、列式儲存、圖形資料庫

行式資料庫與列式資料庫的對比

資料儲存之使用MongoDB資料庫儲存資料

Android五種資料儲存方式之SQLite資料庫儲存載入SD卡資料庫 sql操作事務防止SQL注入

行式資料庫和列式資料庫區別

行式資料庫與列式資料庫

讀取股票資料儲存到本地MySQL資料庫（一）

行式儲存和列式儲存的比較

Hive部分：行式儲存和列式儲存的比較

Hbase與Oracle比較（列式資料庫與行式資料庫）

測試環境MySQL的MyISAM行式資料庫引擎和InfoBright的brightHouse列式資料庫引擎

分布式搭建-簡易版文件上傳下載服務器FastDFS

使用spreadsheet-reader流式讀取超大excel文件

springboot+freemarker實現生成資料庫設計Word文件

使用pymongo讀取MongoDB資料庫中的文件

Oracle 10g DataGuard 監視主資料庫和備用資料庫（官方文件）

孤荷凌寒自學python第五十四天使用python來刪除Firebase資料庫中的文件

資料儲存---記憶體列式資料庫KDB+(Q)文件

相關推薦