Statistical computed column functions¶
This section provides syntax and examples for the statistical computed column functions you can use with the Data Prep Compute tool. To use a statistics function, you must store the value as a numeric data type. Numeric data types can be identified by the numeric data type icon in the header row. If the value is not stored as a numeric data type, convert the value to numeric format using the VALUE function.
AVERAGE¶
Calculates the value equal to the sum of a list of numbers divided by the number of items in the list.
Syntax
AVERAGE(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2
, ... [optional] are the additional values.
Example
AVERAGE(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
MAX¶
Returns the greatest (maximum) value from a set of values.
Syntax
MAX(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2``, ...
[optional] are the additional values.
Example
MAX(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
MEDIAN¶
Returns the number that exists in the middle of a range of numbers ordered from lowest to highest value.
Syntax
MEDIAN(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2, ...
[optional] are the additional values.
Example
MEDIAN(@colum_A@, @colum_B@, @colum_C@, @colum_C@, @colum_E@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
In a range with an even set of numbers, the median is the center number, half the numbers are to the right of the value returned and half of the numbers are to the left of the value returned. Where there is no single number in the middle of the range), MEDIAN calculates the average on the two numbers on either side of the mid-point.
Note
MEDIAN is different than AVERAGE. AVERAGE is an arithmetic mean—calculated by adding up a set of numbers and then dividing by the number of values in the set. MEDIAN takes the value at the center of the range. In those number ranges that exhibit a balance in the distribution of values within the collection, the MEDIAN and AVERAGE calculations may coincide; in skewed distributions, the values will be different.
MIN¶
Returns the smallest (minimum) value from a set of values.
Syntax
MIN(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2, ...
[optional] are the additional values.
Example
MIN(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
MODE¶
Returns the value that occurs most frequently in a set of numbers.
Syntax
MODE(VALUE_1, VALUE_2, [VALUE_3, ...])
VALUE_1
is the first value.VALUE_2
is the second value.VALUE_3, ...
[optional] are the additional values.
Example
MODE(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
If multiple numbers have an equal occurrence count greater than one, the value returned is the number (of those that are equal in frequency) that appears first in the set (reading from left to right). If no number appears more than once, the function returns an error.
The most common problem related to MODE
is when the provided set of numbers has no duplicates. At least one number must appear two or more times for the function to successfully evaluate. If the minimum number of arguments (two) are used, then each argument must evaluate to the same number or an error will occur. As you might expect, larger sets of numbers with a more limited variation will reduce the chance of MODE
returning an error.
STDEV¶
Estimates the standard deviation, how much variation from the average, that exists within a sample set of data.
Syntax
STDEV(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2, ...
[optional] are the additional values.
Example
STDEV(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
The standard deviation for data is the square root of its variance. If the set under analysis represents all data points (referred to as a population), use STDEVP
instead.
STDEVP¶
Estimates the standard deviation, how much variation from the average, that exists within the entire set (population) of data.
Syntax
STDEVP(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2, ...
[optional] are the additional values.
Example
STDEVP(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
If the set under analysis represents only a sample of the data points, use STDEV
instead.
VAR¶
Estimates how much dispersion exists (how much the values are spread out) within a samples set of data.
Syntax
VAR(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2, ...
[optional] are the additional values.
Example
VAR(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
If the set under analysis represents all data points (referred to as a population), use VARP
instead.
VARP¶
Estimates how much dispersion exists (how much the values are spread out) within the entire set (population) of data.
Syntax
VARP(VALUE_1, [VALUE_2, ...])
VALUE_1
is the first value.VALUE_2, ...
[optional] are the additional values.
Example
VARP(@Column_A@, @Column_B@, @Column_C@)
Notes on use
The VALUE
you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.
If the set under analysis represents only a sample of data points, use VAR
instead.