ML.CLASSIFICATION.RANDOM_FOREST_CLF¶

Creates a Random Forest object.

Syntax¶

ML.CLASSIFICATION.RANDOM_FOREST_CLF(n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, random_state)

Arguments¶

Name	Type	Default	Description
n_estimators	int	100	The number of trees in the forest.
criterion	str	"gini"	The function to measure the quality of a split. Supported criteria: 'gini' for the Gini impurity and 'entropy' for the information gain, 'log_loss' for the cross-entropy loss.
max_depth	int	None	The maximum depth of the tree.
min_samples_split	int	2	The minimum number of samples required to split an internal node.
min_samples_leaf	int	1	The minimum number of samples required to be at a leaf node.
max_features	int	float	str
bootstrap	bool	TRUE	Whether bootstrap samples are used when building trees.
random_state	int	None	Positional argument 8

Returns¶

A Random Forest classifier handle, ready to pass into ML.FIT.

When to use¶

Reach for a Random Forest when you want a strong, low-effort classifier that handles messy real-world data — mixed feature types, missing-ish values, non-linear interactions — without much tuning. It is often the best out-of-the-box choice when you don't yet know what shape your data has.

Compared to the alternatives in this namespace:

Use ML.CLASSIFICATION.LOGISTIC when a linear baseline will do and you need to read off feature coefficients.
Use ML.CLASSIFICATION.SVM on small, clean datasets where a kernel can exploit a non-linear boundary.
Use random_forest_clf as the default workhorse on tabular data with many features, especially when those features interact.

Examples¶

Build a forest with the default 100 trees and fit it on labeled data in A2:E100 / F2:F100, then predict ten new rows in A101:E110:

=ML.CLASSIFICATION.RANDOM_FOREST_CLF()
=ML.FIT(H1, A2:E100, F2:F100)
=ML.PREDICT(H2, A101:E110)

Grow a larger forest for a small accuracy bump (at the cost of fit time):

=ML.CLASSIFICATION.RANDOM_FOREST_CLF(500)

Cap tree depth to limit overfitting on small datasets:

=ML.CLASSIFICATION.RANDOM_FOREST_CLF(100, "gini", 5)

Remarks¶

n_estimators is the number of trees. More trees rarely hurts accuracy but always costs fit time — 100 is a sensible default; 500–1000 for a final model.
max_depth defaults to None (no limit). Set it to a small integer (e.g. 5 or 10) when you have few rows and want to curb overfitting.
Random Forests are largely scale-invariant — you usually do not need to scale features beforehand, unlike ML.CLASSIFICATION.LOGISTIC or ML.CLASSIFICATION.SVM.
For reproducible results, pass an integer to the random_state argument.