Aggregation

Aggregation

Reduce

Base.reduceFunction.

reduce(f, t::Table; select::Selection)

Reduce t by applying f pair-wise on values or structs selected by select.

f can be:

  1. A function

  2. An OnlineStat

  3. A tuple of functions and/or OnlineStats

  4. A named tuple of functions and/or OnlineStats

  5. A named tuple of (selector => function or OnlineStat) pairs

julia> t = table([0.1, 0.5, 0.75], [0,1,2], names=[:t, :x])
Table with 3 rows, 2 columns:
t     x
───────
0.1   0
0.5   1
0.75  2

When f is a function, it reduces the selection as usual:

julia> reduce(+, t, select=:t)
1.35

If select is omitted, the rows themselves are passed to reduce as tuples.

julia> reduce((a, b) -> @NT(t=a.t+b.t, x=a.x+b.x), t)
(t = 1.35, x = 3)

If f is an OnlineStat object from the OnlineStats package, the statistic is computed on the selection.

julia> using OnlineStats

julia> reduce(Mean(), t, select=:t)
▦ Series{0,Tuple{Mean},EqualWeight}
┣━━ EqualWeight(nobs = 3)
┗━━━┓
    ┗━━ Mean(0.45)

Reducing with multiple functions

Often one needs many aggregate values from a table. This is when f can be passed as a tuple of functions:

julia> y = reduce((min, max), t, select=:x)
(min = 0, max = 2)

julia> y.max
2

julia> y.min
0

Note that the return value of invoking reduce with a tuple of functions will be a named tuple which has the function names as the keys. In the example, we reduced using min and max functions to obtain the minimum and maximum values in column x.

If you want to give a different name to the fields in the output, use a named tuple as f instead:

julia> y = reduce(@NT(sum=+, prod=*), t, select=:x)
(sum = 3, prod = 0)

You can also compute many OnlineStats by passing tuple or named tuple of OnlineStat objects as the reducer.

julia> y = reduce((Mean(), Variance()), t, select=:t)
(Mean = ▦ Series{0,Tuple{Mean},EqualWeight}
┣━━ EqualWeight(nobs = 3)
┗━━━┓
    ┗━━ Mean(0.45), Variance = ▦ Series{0,Tuple{Variance},EqualWeight}
┣━━ EqualWeight(nobs = 3)
┗━━━┓
    ┗━━ Variance(0.1075))

julia> y.Mean
▦ Series{0,Tuple{Mean},EqualWeight}
┣━━ EqualWeight(nobs = 3)
┗━━━┓
    ┗━━ Mean(0.45)

julia> y.Variance
▦ Series{0,Tuple{Variance},EqualWeight}
┣━━ EqualWeight(nobs = 3)
┗━━━┓
    ┗━━ Variance(0.1075)

Combining reduction and selection

In the above section where we computed many reduced values at once, we have been using the same selection for all reducers, that specified by select. It's possible to select different inputs for different reducers by using a named tuple of slector => function pairs:

julia> reduce(@NT(xsum=:x=>+, negtsum=(:t=>-)=>+), t)
(xsum = 3, negtsum = -1.35)

See Selection for more on what selectors can be specified. Here since each output can select its own input, select keyword is unsually unnecessary. If specified, the slections in the reducer tuple will be done over the result of selecting with the select argument.

source

Grouping

groupreduce(f, t[, by::Selection]; select::Selection)

Group rows by by, and apply f to reduce each group. f can be a function, OnlineStat or a struct of these as described in reduce. Recommended: see documentation for reduce first. The result of reducing each group is put in a table keyed by unique by values, the names of the output columns are the same as the names of the fields of the reduced tuples.

Examples

julia> t=table([1,1,1,2,2,2], [1,1,2,2,1,1], [1,2,3,4,5,6],
               names=[:x,:y,:z]);

julia> groupreduce(+, t, :x, select=:z)
Table with 2 rows, 2 columns:
x  +
─────
1  6
2  15

julia> groupreduce(+, t, (:x, :y), select=:z)
Table with 4 rows, 3 columns:
x  y  +
────────
1  1  3
1  2  3
2  1  11
2  2  4

julia> groupreduce((+, min, max), t, (:x, :y), select=:z)
Table with 4 rows, 5 columns:
x  y  +   min  max
──────────────────
1  1  3   1    2
1  2  3   3    3
2  1  11  5    6
2  2  4   4    4

If f is a single function or a tuple of functions, the output columns will be named the same as the functions themselves. To change the name, pass a named tuple:

julia> groupreduce(@NT(zsum=+, zmin=min, zmax=max), t, (:x, :y), select=:z)
Table with 4 rows, 5 columns:
x  y  zsum  zmin  zmax
──────────────────────
1  1  3     1     2
1  2  3     3     3
2  1  11    5     6
2  2  4     4     4

Finally, it's possible to select different inputs for different reducers by using a named tuple of slector => function pairs:

julia> groupreduce(@NT(xsum=:x=>+, negysum=(:y=>-)=>+), t, :x)
Table with 2 rows, 3 columns:
x  xsum  negysum
────────────────
1  3     -4
2  6     -4
source
IndexedTables.groupbyFunction.

groupby(f, t[, by::Selection]; select::Selection, flatten)

Group rows by by, and apply f to each group. f can be a function or a tuple of functions. The result of f on each group is put in a table keyed by unique by values. flatten will flatten the result and can be used when f returns a vector instead of a single scalar value.

Examples

julia> t=table([1,1,1,2,2,2], [1,1,2,2,1,1], [1,2,3,4,5,6],
               names=[:x,:y,:z]);

julia> groupby(mean, t, :x, select=:z)
Table with 2 rows, 2 columns:
x  mean
───────
1  2.0
2  5.0

julia> groupby(identity, t, (:x, :y), select=:z)
Table with 4 rows, 3 columns:
x  y  identity
──────────────
1  1  [1, 2]
1  2  [3]
2  1  [5, 6]
2  2  [4]

julia> groupby(mean, t, (:x, :y), select=:z)
Table with 4 rows, 3 columns:
x  y  mean
──────────
1  1  1.5
1  2  3.0
2  1  5.5
2  2  4.0

multiple aggregates can be computed by passing a tuple of functions:

julia> groupby((mean, std, var), t, :y, select=:z)
Table with 2 rows, 4 columns:
y  mean  std       var
──────────────────────────
1  3.5   2.38048   5.66667
2  3.5   0.707107  0.5

julia> groupby(@NT(q25=z->quantile(z, 0.25), q50=median,
                   q75=z->quantile(z, 0.75)), t, :y, select=:z)
Table with 2 rows, 4 columns:
y  q25   q50  q75
──────────────────
1  1.75  3.5  5.25
2  3.25  3.5  3.75

Finally, it's possible to select different inputs for different functions by using a named tuple of slector => function pairs:

julia> groupby(@NT(xmean=:z=>mean, ystd=(:y=>-)=>std), t, :x)
Table with 2 rows, 3 columns:
x  xmean  ystd
─────────────────
1  2.0    0.57735
2  5.0    0.57735

By default, the result of groupby when f returns a vector or iterator of values will not be expanded. Pass the flatten option as true to flatten the grouped column:

julia> t = table([1,1,2,2], [3,4,5,6], names=[:x,:y])

julia> groupby((:normy => x->Iterators.repeated(mean(x), length(x)),),
                t, :x, select=:y, flatten=true)
Table with 4 rows, 2 columns:
x  normy
────────
1  3.5
1  3.5
2  5.5
2  5.5
source
IndexedTables.flattenFunction.

flatten(t::Table, col)

Flatten col column which may contain a vector of vectors while repeating the other fields.

Examples:

julia> x = table([1,2], [[3,4], [5,6]], names=[:x, :y])
Table with 2 rows, 2 columns:
x  y
─────────
1  [3, 4]
2  [5, 6]

julia> flatten(x, 2)
Table with 4 rows, 2 columns:
x  y
────
1  3
1  4
2  5
2  6

julia> x = table([1,2], [table([3,4],[5,6], names=[:a,:b]),
                         table([7,8], [9,10], names=[:a,:b])], names=[:x, :y]);

julia> flatten(x, :y)
Table with 4 rows, 3 columns:
x  a  b
────────
1  3  5
1  4  6
2  7  9
2  8  10
source

Reducedim

Base.reducedimFunction.

reducedim(f, x::NDSparse, dims)

Drop dims dimension(s) and aggregate with f.

julia> x = ndsparse(@NT(x=[1,1,1,2,2,2],
                        y=[1,2,2,1,2,2],
                        z=[1,1,2,1,1,2]), [1,2,3,4,5,6])
3-d NDSparse with 6 values (Int64):
x  y  z │
────────┼──
1  1  1 │ 1
1  2  1 │ 2
1  2  2 │ 3
2  1  1 │ 4
2  2  1 │ 5
2  2  2 │ 6

julia> reducedim(+, x, 1)
2-d NDSparse with 3 values (Int64):
y  z │
─────┼──
1  1 │ 5
2  1 │ 7
2  2 │ 9

julia> reducedim(+, x, (1,3))
1-d NDSparse with 2 values (Int64):
y │
──┼───
1 │ 5
2 │ 16
source