ML.PREPROCESSING.ONE_HOT_ENCODER¶
Converts categorical variables into a format that works better with machine learning algorithms.
Syntax¶
Arguments¶
| Name | Type | Default | Description |
|---|---|---|---|
| handle_unknown | Any | "error" | How to handle unknown categories during transform. 'error' will raise an error, 'ignore' will ignore unknown categories. |
Returns¶
A OneHotEncoder transformer handle, ready to pass into ML.FIT_TRANSFORM or ML.PIPELINE.
When to use¶
Reach for one_hot_encoder when a column holds unordered categories —
country, color, product type, well name — and you need numeric inputs for a
model. Each unique category becomes its own 0/1 column, so the model can
treat the categories independently without imposing any ordering between
them.
Compared to the alternative in this namespace:
- Use one_hot_encoder for unordered categories, especially when you have only a handful of distinct values per column.
- Use
ML.PREPROCESSING.ORDINAL_ENCODERwhen the categories do have a natural order (e.g."low" < "medium" < "high") — assigning integer ranks then makes sense.
Examples¶
One-hot encode a single categorical column in A2:A100:
Apply one-hot encoding only to the categorical columns and pass the numeric
columns through unchanged using ML.COMPOSE.COLUMN_TRANSFORMER:
=ML.PREPROCESSING.ONE_HOT_ENCODER()
=ML.COMPOSE.DATA_TRANSFORMER(H1, "category_col")
=ML.COMPOSE.COLUMN_TRANSFORMER(H2)
=ML.FIT_TRANSFORM(H3, A2:E100)
Use handle_unknown="ignore" so categories appearing only at predict time
don't raise an error:
Remarks¶
- One-hot encoding multiplies the number of columns by the number of
distinct categories. For a column with hundreds of distinct values, the
result is a very wide and very sparse matrix — consider
ML.PREPROCESSING.ORDINAL_ENCODERor merging rare categories first. - When
handle_unknown="error"(default), an unseen category at predict time raises a clear error. Switch to"ignore"to encode unseen categories as all-zeros rows instead. - Always fit on the training data only, then
ML.TRANSFORMthe test data with the same fitted encoder so the column ordering matches.