Skip to content

ML.CLASSIFICATION.RANDOM_FOREST_CLF

Creates a Random Forest object.

Syntax

ML.CLASSIFICATION.RANDOM_FOREST_CLF(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)

Arguments

Name Type Default Description
n_estimators int 100 The number of trees in the forest.
criterion str "gini" The function to measure the quality of a split. Supported criteria: 'gini' for the Gini impurity and 'entropy' for the information gain, 'log_loss' for the cross-entropy loss.
max_depth int None The maximum depth of the tree.
min_samples_split int 2 The minimum number of samples required to split an internal node.
min_samples_leaf int 1 The minimum number of samples required to be at a leaf node.
max_features int float str
bootstrap bool TRUE Whether bootstrap samples are used when building trees.
random_state int None Positional argument 8

Returns

A Random Forest classifier handle, ready to pass into ML.FIT.

When to use

Reach for a Random Forest when you want a strong, low-effort classifier that handles messy real-world data — mixed feature types, missing-ish values, non-linear interactions — without much tuning. It is often the best out-of-the-box choice when you don't yet know what shape your data has.

Compared to the alternatives in this namespace:

  • Use ML.CLASSIFICATION.LOGISTIC when a linear baseline will do and you need to read off feature coefficients.
  • Use ML.CLASSIFICATION.SVM on small, clean datasets where a kernel can exploit a non-linear boundary.
  • Use random_forest_clf as the default workhorse on tabular data with many features, especially when those features interact.

Examples

Build a forest with the default 100 trees and fit it on labeled data in A2:E100 / F2:F100, then predict ten new rows in A101:E110:

=ML.CLASSIFICATION.RANDOM_FOREST_CLF()
=ML.FIT(H1, A2:E100, F2:F100)
=ML.PREDICT(H2, A101:E110)

Grow a larger forest for a small accuracy bump (at the cost of fit time):

=ML.CLASSIFICATION.RANDOM_FOREST_CLF(500)

Cap tree depth to limit overfitting on small datasets:

=ML.CLASSIFICATION.RANDOM_FOREST_CLF(100, "gini", 5)

Remarks

  • n_estimators is the number of trees. More trees rarely hurts accuracy but always costs fit time — 100 is a sensible default; 500–1000 for a final model.
  • max_depth defaults to None (no limit). Set it to a small integer (e.g. 5 or 10) when you have few rows and want to curb overfitting.
  • Random Forests are largely scale-invariant — you usually do not need to scale features beforehand, unlike ML.CLASSIFICATION.LOGISTIC or ML.CLASSIFICATION.SVM.
  • For reproducible results, pass an integer to the random_state argument.

See also