ML.PREPROCESSING.STANDARD_SCALER¶
Standardizes features by removing the mean and scaling to unit variance.
Syntax¶
Returns¶
A StandardScaler transformer handle, ready to pass into ML.FIT_TRANSFORM or ML.PIPELINE.
When to use¶
Reach for standard_scaler whenever a model is sensitive to feature
magnitude — which covers most algorithms in formulaML except tree-based ones.
It centers each column on zero and scales it to unit variance, so a feature
measured in millions doesn't drown out a feature measured in tenths.
Use standard_scaler as the default scaler. Choose
ML.PREPROCESSING.MIN_MAX_SCALER instead when you need values strictly in
[0, 1], and ML.PREPROCESSING.ROBUST_SCALER when your data has heavy
outliers that would distort the mean and standard deviation.
Always required before:
ML.CLASSIFICATION.LOGISTIC/ML.CLASSIFICATION.SVMML.REGRESSION.RIDGE/LASSO/ELASTIC_NETML.CLUSTERING.KMEANSML.DIM_REDUCTION.PCA
Examples¶
Scale features in A2:E100 and read the standardized values back into the
sheet:
Combine the scaler and a model into a single pipeline so the scaler is fit on training data and reused at predict time:
=ML.PREPROCESSING.STANDARD_SCALER()
=ML.CLASSIFICATION.LOGISTIC()
=ML.PIPELINE(H1, H2)
=ML.FIT(H3, A2:E100, F2:F100)
=ML.PREDICT(H4, A101:E110)
After fitting on training data, apply the same scaler to a held-out
test set with ML.TRANSFORM:
Remarks¶
- Always fit the scaler on the training data only, then reuse it on the
test or production data via
ML.TRANSFORM. Re-fitting on test data leaks information from the test set into your evaluation. - The cleanest way to avoid that mistake is to wrap the scaler and model in
ML.PIPELINE; the pipeline callsfiton the scaler with the training data andtransformautomatically at predict time. - Tree-based models (
ML.CLASSIFICATION.RANDOM_FOREST_CLF,ML.REGRESSION.RANDOM_FOREST_REG) are scale-invariant — you usually do not need a scaler in front of them.