Selection

Selection

Select, transform and filter a table.

Select

Base.Sort.selectFunction.

select(t::Table, which::Selection)

Select all or a subset of columns, or a single column from the table.

Selection is a type union of many types that can select from a table. It can be:

  1. Integer – returns the column at this position.

  2. Symbol – returns the column with this name.

  3. Pair{Selection => Function} – selects and maps a function over the selection, returns the result.

  4. AbstractArray – returns the array itself. This must be the same length as the table.

  5. Tuple of Selection – returns a table containing a column for every selector in the tuple. The tuple may also contain the type Pair{Symbol, Selection}, which the selection a name. The most useful form of this when introducing a new column.

Examples:

Selection with Integer – returns the column at this position.

julia> tbl = table([0.01, 0.05], [2,1], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  2  3
0.05  1  4

julia> select(tbl, 2)
2-element Array{Int64,1}:
 2
 1

Selection with Symbol – returns the column with this name.

julia> select(tbl, :t)
2-element Array{Float64,1}:
 0.01
 0.05

Selection with Pair{Selection => Function} – selects some columns and maps a function over it, then returns the mapped column.

julia> select(tbl, :t=>t->1/t)
2-element Array{Float64,1}:
 100.0
  20.0

Selection with AbstractArray – returns the array itself.

julia> select(tbl, [3,4])
2-element Array{Int64,1}:
 3
 4

Selection with Tuple– returns a table containing a column for every selector in the tuple.

julia> select(tbl, (2,1))
Table with 2 rows, 2 columns:
x  t
───────
2  0.01
1  0.05

julia> vx = select(tbl, (:x, :t)=>p->p.x/p.t)
2-element Array{Float64,1}:
 200.0
  20.0

julia> select(tbl, (:x,:t=>-))
Table with 2 rows, 2 columns:
x  t
────────
1  -0.05
2  -0.01

Note that since tbl was initialized with t as the primary key column, selections that retain the key column will retain its status as a key. The same applies when multiple key columns are selected.

Selection with a custom array in the tuple will cause the name of the columns to be removed and replaced with integers.

julia> select(tbl, (:x, :t, [3,4]))
Table with 2 rows, 3 columns:
1  2     3
──────────
2  0.01  3
1  0.05  4

This is because the third column's name is unknown. In general if a column's name cannot be determined, then selection returns an iterable of tuples rather than named tuples. In other words, it strips column names.

To specify a new name to a custom column, you can use Symbol => Selection selector.

julia> select(tbl, (:x,:t,:z=>[3,4]))
Table with 2 rows, 3 columns:
x  t     z
──────────
2  0.01  3
1  0.05  4

julia> select(tbl, (:x, :t, :minust=>:t=>-))
Table with 2 rows, 3 columns:
x  t     minust
───────────────
2  0.01  -0.01
1  0.05  -0.05

julia> select(tbl, (:x, :t, :vx=>(:x,:t)=>p->p.x/p.t))
Table with 2 rows, 3 columns:
x  t     vx
──────────────
2  0.01  200.0
1  0.05  20.0
source

Map

Base.mapMethod.

map(f, t::Table; select)

Apply f to every row in t. select selects fields passed to f.

Returns a new table if f returns a tuple or named tuple. If not, returns a vector.

Examples

julia> t = table([0.01, 0.05], [1,2], [3,4], names=[:t, :x, :y])
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  1  3
0.05  2  4

julia> manh = map(row->row.x + row.y, t)
2-element Array{Int64,1}:
 4
 6

julia> polar = map(p->@NT(r=hypot(p.x + p.y), θ=atan2(p.y, p.x)), t)
Table with 2 rows, 2 columns:
r    θ
────────────
4.0  1.24905
6.0  1.10715

select argument selects a subset of columns while iterating.


julia> vx = map(row->row.x/row.t, t, select=(:t,:x)) # row only cotains t and x
2-element Array{Float64,1}:
 100.0
  40.0

julia> map(sin, polar, select=:θ)
2-element Array{Float64,1}:
 0.948683
 0.894427
source
Base.mapMethod.
map(f, x::NDSparse; select)

Apply f to every data value in x. select selects fields passed to f. By default, the data values are selected.

If the return value of f is a tuple or named tuple the result will contain many data columns.

Examples

julia> x = ndsparse(@NT(t=[0.01, 0.05]), @NT(x=[1,2], y=[3,4]))
1-d NDSparse with 2 values (2 field named tuples):
t    │ x  y
─────┼─────
0.01 │ 1  3
0.05 │ 2  4

julia> manh = map(row->row.x + row.y, x)
1-d NDSparse with 2 values (Int64):
t    │
─────┼──
0.01 │ 4
0.05 │ 6

julia> vx = map(row->row.x/row.t, x, select=(:t,:x))
1-d NDSparse with 2 values (Float64):
t    │
─────┼──────
0.01 │ 100.0
0.05 │ 40.0

julia> polar = map(p->@NT(r=hypot(p.x + p.y), θ=atan2(p.y, p.x)), x)
1-d NDSparse with 2 values (2 field named tuples):
t    │ r    θ
─────┼─────────────
0.01 │ 4.0  1.24905
0.05 │ 6.0  1.10715

julia> map(sin, polar, select=:θ)
1-d NDSparse with 2 values (Float64):
t    │
─────┼─────────
0.01 │ 0.948683
0.05 │ 0.894427
source

Filter

Base.filterFunction.

filter(pred, t::Union{NextTable, NDSparse}; select)

Filter rows in t according to pred. select choses the fields that act as input to pred.

pred can be:

  • A function - selected structs or values are passed to this function

  • A tuple of column => function pairs: applies to each named column the corresponding function, keeps only rows where all such conditions are satisfied.

By default, filter iterates a table a row at a time:

julia> t = table(["a","b","c"], [0.01, 0.05, 0.07], [2,1,0],
                 names=[:n, :t, :x])
Table with 3 rows, 3 columns:
n    t     x
────────────
"a"  0.01  2
"b"  0.05  1
"c"  0.07  0

julia> filter(p->p.x/p.t < 100, t) # whole row
Table with 2 rows, 3 columns:
n    t     x
────────────
"b"  0.05  1
"c"  0.07  0

By default, filter iterates by values of an NDSparse:

julia> x = ndsparse(@NT(n=["a","b","c"], t=[0.01, 0.05, 0.07]), [2,1,0])
2-d NDSparse with 3 values (Int64):
n    t    │
──────────┼──
"a"  0.01 │ 2
"b"  0.05 │ 1
"c"  0.07 │ 0

julia> filter(y->y<2, x)
2-d NDSparse with 2 values (Int64):
n    t    │
──────────┼──
"b"  0.05 │ 1
"c"  0.07 │ 0

If select is specified. (See Selection convention) then, the selected values will be iterated instead.

julia> filter(iseven, t, select=:x)
Table with 2 rows, 3 columns:
n    t     x
────────────
"a"  0.01  2
"c"  0.07  0

julia> filter(p->p.x/p.t < 100, t, select=(:x,:t))
Table with 2 rows, 3 columns:
n    t     x
────────────
"b"  0.05  1
"c"  0.07  0

select works similarly for NDSparse:

julia> filter(p->p[2]/p[1] < 100, x, select=(:t, 3))
2-d NDSparse with 2 values (Int64):
n    t    │
──────────┼──
"b"  0.05 │ 1
"c"  0.07 │ 0

Here 3 represents the third column, which is the values, p is a tuple of t field and the value.

Filtering by many single columns can be done by passing a tuple of column_name => function pairs.

julia> filter((:x=>iseven, :t=>a->a>0.01), t)
Table with 1 rows, 3 columns:
n    t     x
────────────
"c"  0.07  0

julia> filter((3=>iseven, :t=>a->a>0.01), x) # NDSparse
2-d NDSparse with 1 values (Int64):
n    t    │
──────────┼──
"c"  0.07 │ 0
source
DataValues.dropnaFunction.

dropna(t[, select])

Drop rows which contain NA values.

julia> t = table([0.1, 0.5, NA,0.7], [2,NA,4,5], [NA,6,NA,7],
                  names=[:t,:x,:y])
Table with 4 rows, 3 columns:
t    x    y
─────────────
0.1  2    #NA
0.5  #NA  6
#NA  4    #NA
0.7  5    7

julia> dropna(t)
Table with 1 rows, 3 columns:
t    x  y
─────────
0.7  5  7

Optionally select can be speicified to limit columns to look for NAs in.


julia> dropna(t, :y)
Table with 2 rows, 3 columns:
t    x    y
───────────
0.5  #NA  6
0.7  5    7

julia> t1 = dropna(t, (:t, :x))
Table with 2 rows, 3 columns:
t    x  y
───────────
0.1  2  #NA
0.7  5  7

Any columns whose NA rows have been dropped will be converted to non-na array type. In our last example, columns t and x got converted from Array{DataValue{Int}} to Array{Int}. Similarly if the vectors are of type DataValueArray{T} (default for loadtable) they will be converted to Array{T}.

julia> typeof(column(dropna(t,:x), :x))
Array{Int64,1}
source

AoS and SoA

This section describes extracting struct of arrays (columns) and array of structs (rows) from a table or an NDSparse. (Wikipedia entry on AoS and SoA)

IndexedTables.columnsFunction.

columns(itr[, select::Selection])

Select one or more columns from an iterable of rows as a tuple of vectors.

select specifies which columns to select. See Selection convention for possible values. If unspecified, returns all columns.

itr can be NDSparse, Columns and AbstractVector, and their distributed counterparts.

Examples

julia> t = table([1,2],[3,4], names=[:x,:y])
Table with 2 rows, 2 columns:
x  y
────
1  3
2  4

julia> columns(t)
(x = [1, 2], y = [3, 4])

julia> columns(t, :x)
2-element Array{Int64,1}:
 1
 2

julia> columns(t, (:x,))
(x = [1, 2])

julia> columns(t, (:y,:x=>-))
(y = [3, 4], x = [-1, -2])
source
IndexedTables.rowsFunction.

rows(itr[, select::Selection])

Select one or more fields from an iterable of rows as a vector of their values.

select specifies which fields to select. See Selection convention for possible values. If unspecified, returns all columns.

itr can be NDSparse, Columns and AbstractVector, and their distributed counterparts.

Examples

julia> t = table([1,2],[3,4], names=[:x,:y])
Table with 2 rows, 2 columns:
x  y
────
1  3
2  4

julia> rows(t)
2-element IndexedTables.Columns{NamedTuples._NT_x_y{Int64,Int64},NamedTuples._NT_x_y{Array{Int64,1},Array{Int64,1}}}:
 (x = 1, y = 3)
 (x = 2, y = 4)

julia> rows(t, :x)
2-element Array{Int64,1}:
 1
 2

julia> rows(t, (:x,))
2-element IndexedTables.Columns{NamedTuples._NT_x{Int64},NamedTuples._NT_x{Array{Int64,1}}}:
 (x = 1)
 (x = 2)

julia> rows(t, (:y,:x=>-))
2-element IndexedTables.Columns{NamedTuples._NT_y_x{Int64,Int64},NamedTuples._NT_y_x{Array{Int64,1},Array{Int64,1}}}:
 (y = 3, x = -1)
 (y = 4, x = -2)

Note that vectors of tuples returned are Columns object and have columnar internal storage.

source
Base.keysFunction.

keys(x::NDSparse[, select::Selection])

Get the keys of an NDSparse object. Same as rows but acts only on the index columns of the NDSparse.

source
Base.valuesFunction.

values(x::NDSparse[, select::Selection])

Get the values of an NDSparse object. Same as rows but acts only on the value columns of the NDSparse.

source

Column modification

This section describes functions that can modify the set of columns of a table. Note that these functions return new tables and doesn't mutate the existing table. This is done so that type information for a given table is always available and correct.

IndexedTables.setcolFunction.

setcol(t::Table, col::Union{Symbol, Int}, x::Selection)

Sets a x as the column identified by col. Returns a new table.

setcol(t::Table, map::Pair...)

Set many columns at a time.

Examples:

julia> t = table([1,2], [3,4], names=[:x, :y])
Table with 2 rows, 2 columns:
x  y
────
1  3
2  4

julia> setcol(t, 2, [5,6])
Table with 2 rows, 2 columns:
x  y
────
1  5
2  6

x can be any selection that transforms existing columns.

julia> setcol(t, :x, :x => x->1/x)
Table with 2 rows, 2 columns:
x    y
──────
1.0  5
0.5  6

setcol will result in a re-sorted copy if a primary key column is replaced.

julia> t = table([0.01, 0.05], [1,2], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  1  3
0.05  2  4

julia> t2 = setcol(t, :t, [0.1,0.05])
Table with 2 rows, 3 columns:
t     x  y
──────────
0.05  2  4
0.1   1  3

julia> t == t2
false

If col is not an existing column, setcol will add it:

julia> t = table([1,2], [2,3], names = [:a,:b])
Table with 2 rows, 2 columns:
a  b
────
1  2
2  3

julia> setcol(t, :c, [1,2])
Table with 2 rows, 3 columns:
a  b  c
───────
1  2  1
2  3  2
source
IndexedTables.pushcolFunction.

pushcol(t, name, x)

Push a column x to the end of the table. name is the name for the new column. Returns a new table.

pushcol(t, map::Pair...)

Push many columns at a time.

Example:

julia> t = table([0.01, 0.05], [2,1], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  2  3
0.05  1  4

julia> pushcol(t, :z, [1//2, 3//4])
Table with 2 rows, 4 columns:
t     x  y  z
────────────────
0.01  2  3  1//2
0.05  1  4  3//4
source
IndexedTables.popcolFunction.

popcol(t, col)

Remove the column col from the table. Returns a new table.

popcol(t, cols...)

Remove many columns at a time.

julia> t = table([0.01, 0.05], [2,1], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  2  3
0.05  1  4

julia> popcol(t, :x)
Table with 2 rows, 2 columns:
t     y
───────
0.01  3
0.05  4
source

insertcol(t, position::Integer, name, x)

Insert a column x named name at position. Returns a new table.

julia> t = table([0.01, 0.05], [2,1], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  2  3
0.05  1  4

julia> insertcol(t, 2, :w, [0,1])
Table with 2 rows, 4 columns:
t     w  x  y
─────────────
0.01  0  2  3
0.05  1  1  4
source

insertcolafter(t, after, name, col)

Insert a column col named name after after. Returns a new table.

julia> t = table([0.01, 0.05], [2,1], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  2  3
0.05  1  4

julia> insertcolafter(t, :t, :w, [0,1])
Table with 2 rows, 4 columns:
t     w  x  y
─────────────
0.01  0  2  3
0.05  1  1  4
source

insertcolbefore(t, before, name, col)

Insert a column col named name before before. Returns a new table.

julia> t = table([0.01, 0.05], [2,1], [3,4], names=[:t, :x, :y], pkey=:t)
Table with 2 rows, 3 columns:
t     x  y
──────────
0.01  2  3
0.05  1  4

julia> insertcolbefore(t, :x, :w, [0,1])
Table with 2 rows, 4 columns:
t     w  x  y
─────────────
0.01  0  2  3
0.05  1  1  4
source

renamecol(t, col, newname)

Set newname as the new name for column col in t. Returns a new table.

renamecol(t, map::Pair...)

Rename many columns at a time.

julia> t = table([0.01, 0.05], [2,1], names=[:t, :x])
Table with 2 rows, 2 columns:
t     x
───────
0.01  2
0.05  1

julia> renamecol(t, :t, :time)
Table with 2 rows, 2 columns:
time  x
───────
0.01  2
0.05  1
source

Column special selection

This section describes some special types that can be used to simplify column selection. These types can be used in combination with select, rows or columns, as well as any other function that requires a by or select argument.

All(cols)

Select the union of the selections in cols. If cols == (), select all columns.

Examples

julia> t = table([1,1,2,2], [1,2,1,2], [1,2,3,4], [0, 0, 0, 0],
                        names=[:a,:b,:c,:d])
Table with 4 rows, 4 columns:
a  b  c  d
──────────
1  1  1  0
1  2  2  0
2  1  3  0
2  2  4  0

julia> select(t, All(:a, (:b, :c)))
Table with 4 rows, 3 columns:
a  b  c
───────
1  1  1
1  2  2
2  1  3
2  2  4

julia> select(t, All())
Table with 4 rows, 4 columns:
a  b  c  d
──────────
1  1  1  0
1  2  2  0
2  1  3  0
2  2  4  0
source

Not(cols)

Select the complementary of the selection in cols. Not can accept several arguments, in which case it returns the complementary of the union of the selections.

Examples

julia> t = table([1,1,2,2], [1,2,1,2], [1,2,3,4],
                        names=[:a,:b,:c], pkey = (:a, :b))
Table with 4 rows, 3 columns:
a  b  c
───────
1  1  1
1  2  2
2  1  3
2  2  4

julia> select(t, Keys())
Table with 4 rows, 2 columns:
b  c
────
1  1
2  2
1  3
2  4

julia> select(t, Not(:a, (:a, :b)))
Table with 4 rows, 1 columns:
c
─
1
2
3
4
source

Keys()

Select the primary keys.

Examples

julia> t = table([1,1,2,2], [1,2,1,2], [1,2,3,4],
                               names=[:a,:b,:c], pkey = (:a, :b))
Table with 4 rows, 3 columns:
a  b  c
───────
1  1  1
1  2  2
2  1  3
2  2  4

julia> select(t, Keys())
Table with 4 rows, 2 columns:
a  b
────
1  1
1  2
2  1
2  2
source

Between(first, last)

Select the columns between first and last.

Examples

julia> t = table([1,1,2,2], [1,2,1,2], [1,2,3,4], ["a", "b", "c", "d"],
                                      names=[:a,:b,:c, :d])
Table with 4 rows, 4 columns:
a  b  c  d
────────────
1  1  1  "a"
1  2  2  "b"
2  1  3  "c"
2  2  4  "d"

julia> select(t, Between(:b, :d))
Table with 4 rows, 3 columns:
b  c  d
─────────
1  1  "a"
2  2  "b"
1  3  "c"
2  4  "d"
source

Finally, to select columns whose name respects a given predicate, pass a function to select (or rows, or columns):

julia> t = table([0.01, 0.05], [2,1], [2, 3], names=[:t, :x, :z])
Table with 2 rows, 3 columns:
t     x  z
──────────
0.01  2  2
0.05  1  3

julia> select(t, i -> i != :z)
Table with 2 rows, 2 columns:
t     x
───────
0.01  2
0.05  1