ML.PREPROCESSING.ROBUST_SCALER¶
Scales features using statistics that are robust to outliers (median and interquartile range).
Syntax¶
Returns¶
A RobustScaler transformer handle, ready to pass into ML.FIT_TRANSFORM or ML.PIPELINE.
When to use¶
Reach for robust_scaler when your data contains outliers that would
distort ML.PREPROCESSING.STANDARD_SCALER (which uses the mean and standard
deviation) or ML.PREPROCESSING.MIN_MAX_SCALER (which uses the absolute
minimum and maximum). Robust scaling centers on the median and divides by
the interquartile range — both unaffected by extreme values.
Compared to the alternatives in this namespace:
- Use
ML.PREPROCESSING.STANDARD_SCALERas the default when your data is roughly symmetric and outlier-free. - Use
ML.PREPROCESSING.MIN_MAX_SCALERwhen you need bounded outputs. - Use robust_scaler whenever the column has a few extreme values that shouldn't influence the scale of the rest.
Examples¶
Scale the columns in A2:E100 using median / IQR — appropriate when one
or two rows have extreme values you don't want to throw the scale off:
Use it inside a pipeline before a regression that's sensitive to outliers:
=ML.PREPROCESSING.ROBUST_SCALER()
=ML.REGRESSION.RIDGE()
=ML.PIPELINE(H1, H2)
=ML.FIT(H3, A2:E100, F2:F100)
=ML.PREDICT(H4, A101:E110)
Remarks¶
- After scaling, the median of each column is
0and the IQR is1— the data is not bounded to[0, 1]like withMIN_MAX_SCALER, nor is its variance forced to1like withSTANDARD_SCALER. - Always fit on the training data only, then
ML.TRANSFORMthe test data through the same fitted scaler.