Skip to content

ML.PREPROCESSING.ROBUST_SCALER

Scales features using statistics that are robust to outliers (median and interquartile range).

Syntax

ML.PREPROCESSING.ROBUST_SCALER()

Returns

A RobustScaler transformer handle, ready to pass into ML.FIT_TRANSFORM or ML.PIPELINE.

When to use

Reach for robust_scaler when your data contains outliers that would distort ML.PREPROCESSING.STANDARD_SCALER (which uses the mean and standard deviation) or ML.PREPROCESSING.MIN_MAX_SCALER (which uses the absolute minimum and maximum). Robust scaling centers on the median and divides by the interquartile range — both unaffected by extreme values.

Compared to the alternatives in this namespace:

  • Use ML.PREPROCESSING.STANDARD_SCALER as the default when your data is roughly symmetric and outlier-free.
  • Use ML.PREPROCESSING.MIN_MAX_SCALER when you need bounded outputs.
  • Use robust_scaler whenever the column has a few extreme values that shouldn't influence the scale of the rest.

Examples

Scale the columns in A2:E100 using median / IQR — appropriate when one or two rows have extreme values you don't want to throw the scale off:

=ML.PREPROCESSING.ROBUST_SCALER()
=ML.FIT_TRANSFORM(H1, A2:E100)

Use it inside a pipeline before a regression that's sensitive to outliers:

=ML.PREPROCESSING.ROBUST_SCALER()
=ML.REGRESSION.RIDGE()
=ML.PIPELINE(H1, H2)
=ML.FIT(H3, A2:E100, F2:F100)
=ML.PREDICT(H4, A101:E110)

Remarks

  • After scaling, the median of each column is 0 and the IQR is 1 — the data is not bounded to [0, 1] like with MIN_MAX_SCALER, nor is its variance forced to 1 like with STANDARD_SCALER.
  • Always fit on the training data only, then ML.TRANSFORM the test data through the same fitted scaler.

See also