ML.PREPROCESSING.ROBUST_SCALER¶

Scales features using statistics that are robust to outliers (median and interquartile range).

Syntax¶

ML.PREPROCESSING.ROBUST_SCALER()

Returns¶

A RobustScaler transformer handle, ready to pass into ML.FIT_TRANSFORM or ML.PIPELINE.

When to use¶

Reach for robust_scaler when your data contains outliers that would distort ML.PREPROCESSING.STANDARD_SCALER (which uses the mean and standard deviation) or ML.PREPROCESSING.MIN_MAX_SCALER (which uses the absolute minimum and maximum). Robust scaling centers on the median and divides by the interquartile range — both unaffected by extreme values.

Compared to the alternatives in this namespace:

Use ML.PREPROCESSING.STANDARD_SCALER as the default when your data is roughly symmetric and outlier-free.
Use ML.PREPROCESSING.MIN_MAX_SCALER when you need bounded outputs.
Use robust_scaler whenever the column has a few extreme values that shouldn't influence the scale of the rest.

Examples¶

Scale the columns in A2:E100 using median / IQR — appropriate when one or two rows have extreme values you don't want to throw the scale off:

=ML.PREPROCESSING.ROBUST_SCALER()
=ML.FIT_TRANSFORM(H1, A2:E100)

Use it inside a pipeline before a regression that's sensitive to outliers:

=ML.PREPROCESSING.ROBUST_SCALER()
=ML.REGRESSION.RIDGE()
=ML.PIPELINE(H1, H2)
=ML.FIT(H3, A2:E100, F2:F100)
=ML.PREDICT(H4, A101:E110)

Remarks¶

After scaling, the median of each column is 0 and the IQR is 1 — the data is not bounded to [0, 1] like with MIN_MAX_SCALER, nor is its variance forced to 1 like with STANDARD_SCALER.
Always fit on the training data only, then ML.TRANSFORM the test data through the same fitted scaler.