Skip to content

Click in-app to access the full platform documentation for your version of DataRobot.

Statistical computed column functions

This section provides syntax and examples for the statistical computed column functions you can use with the Data Prep Compute tool. To use a statistics function, you must store the value as a numeric data type. Numeric data types can be identified by the numeric data type icon in the header row. If the value is not stored as a numeric data type, convert the value to numeric format using the VALUE function.

AVERAGE

Calculates the value equal to the sum of a list of numbers divided by the number of items in the list.

Syntax

AVERAGE(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

AVERAGE(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

MAX

Returns the greatest (maximum) value from a set of values.

Syntax

MAX(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2``, ... [optional] are the additional values.

Example

MAX(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

MEDIAN

Returns the number that exists in the middle of a range of numbers ordered from lowest to highest value.

Syntax

MEDIAN(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

MEDIAN(@colum_A@, @colum_B@, @colum_C@, @colum_C@, @colum_E@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

In a range with an even set of numbers, the median is the center number, half the numbers are to the right of the value returned and half of the numbers are to the left of the value returned. Where there is no single number in the middle of the range), MEDIAN calculates the average on the two numbers on either side of the mid-point.

Note

MEDIAN is different than AVERAGE. AVERAGE is an arithmetic mean—calculated by adding up a set of numbers and then dividing by the number of values in the set. MEDIAN takes the value at the center of the range. In those number ranges that exhibit a balance in the distribution of values within the collection, the MEDIAN and AVERAGE calculations may coincide; in skewed distributions, the values will be different.

MIN

Returns the smallest (minimum) value from a set of values.

Syntax

MIN(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

MIN(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

MODE

Returns the value that occurs most frequently in a set of numbers.

Syntax

MODE(VALUE_1, VALUE_2, [VALUE_3, ...])

  • VALUE_1 is the first value.
  • VALUE_2 is the second value.
  • VALUE_3, ... [optional] are the additional values.

Example

MODE(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

If multiple numbers have an equal occurrence count greater than one, the value returned is the number (of those that are equal in frequency) that appears first in the set (reading from left to right). If no number appears more than once, the function returns an error.

The most common problem related to MODE is when the provided set of numbers has no duplicates. At least one number must appear two or more times for the function to successfully evaluate. If the minimum number of arguments (two) are used, then each argument must evaluate to the same number or an error will occur. As you might expect, larger sets of numbers with a more limited variation will reduce the chance of MODE returning an error.

STDEV

Estimates the standard deviation, how much variation from the average, that exists within a sample set of data.

Syntax

STDEV(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

STDEV(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

The standard deviation for data is the square root of its variance. If the set under analysis represents all data points (referred to as a population), use STDEVP instead.

STDEVP

Estimates the standard deviation, how much variation from the average, that exists within the entire set (population) of data.

Syntax

STDEVP(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

STDEVP(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

If the set under analysis represents only a sample of the data points, use STDEV instead.

VAR

Estimates how much dispersion exists (how much the values are spread out) within a samples set of data.

Syntax

VAR(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

VAR(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

If the set under analysis represents all data points (referred to as a population), use VARP instead.

VARP

Estimates how much dispersion exists (how much the values are spread out) within the entire set (population) of data.

Syntax

VARP(VALUE_1, [VALUE_2, ...])

  • VALUE_1 is the first value.
  • VALUE_2, ... [optional] are the additional values.

Example

VARP(@Column_A@, @Column_B@, @Column_C@)

Notes on use

The VALUE you provide must be a numeric value, a column that contains numeric values, or a function that returns a numeric value.

If the set under analysis represents only a sample of data points, use VAR instead.


Updated October 28, 2021
Back to top