ML.CLASSIFICATION.RANDOM_FOREST_CLF¶
Creates a Random Forest object.
Syntax¶
ML.CLASSIFICATION.RANDOM_FOREST_CLF(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)
Arguments¶
| Name | Type | Default | Description |
|---|---|---|---|
| n_estimators | int | 100 | The number of trees in the forest. |
| criterion | str | "gini" | The function to measure the quality of a split. Supported criteria: 'gini' for the Gini impurity and 'entropy' for the information gain, 'log_loss' for the cross-entropy loss. |
| max_depth | int | None | The maximum depth of the tree. |
| min_samples_split | int | 2 | The minimum number of samples required to split an internal node. |
| min_samples_leaf | int | 1 | The minimum number of samples required to be at a leaf node. |
| max_features | int | float | str |
| bootstrap | bool | TRUE | Whether bootstrap samples are used when building trees. |
| random_state | int | None | Positional argument 8 |
Returns¶
A Random Forest classifier handle, ready to pass into ML.FIT.
When to use¶
Reach for a Random Forest when you want a strong, low-effort classifier that handles messy real-world data — mixed feature types, missing-ish values, non-linear interactions — without much tuning. It is often the best out-of-the-box choice when you don't yet know what shape your data has.
Compared to the alternatives in this namespace:
- Use
ML.CLASSIFICATION.LOGISTICwhen a linear baseline will do and you need to read off feature coefficients. - Use
ML.CLASSIFICATION.SVMon small, clean datasets where a kernel can exploit a non-linear boundary. - Use random_forest_clf as the default workhorse on tabular data with many features, especially when those features interact.
Examples¶
Build a forest with the default 100 trees and fit it on labeled data in
A2:E100 / F2:F100, then predict ten new rows in A101:E110:
Grow a larger forest for a small accuracy bump (at the cost of fit time):
Cap tree depth to limit overfitting on small datasets:
Remarks¶
n_estimatorsis the number of trees. More trees rarely hurts accuracy but always costs fit time — 100 is a sensible default; 500–1000 for a final model.max_depthdefaults toNone(no limit). Set it to a small integer (e.g.5or10) when you have few rows and want to curb overfitting.- Random Forests are largely scale-invariant — you usually do not need to
scale features beforehand, unlike
ML.CLASSIFICATION.LOGISTICorML.CLASSIFICATION.SVM. - For reproducible results, pass an integer to the
random_stateargument.