The FIT/PREDICT pattern¶

Every classifier, regressor, and clusterer in formulaML follows the same three-step workflow. Once you've used it for one model, every other model works the same way — only the constructor changes.

The three steps¶

1. Build an unfitted model¶

Call the model's constructor with whatever hyperparameters you want. Hyperparameters are passed as named arguments; defaults are sensible if you omit them.

=ML.CLASSIFICATION.LOGISTIC()
=ML.REGRESSION.RIDGE(0.5)
=ML.CLUSTERING.KMEANS(3)

Each constructor returns an object handle representing an unfitted model — the algorithm is configured, but it hasn't seen any data yet.

2. Fit the model on data¶

Pass the unfitted-model handle plus a features range (X) and a target range (y) to ML.FIT. Suppose the constructor's handle is in H1, features are in A2:E101, and the target is in F2:F101:

=ML.FIT(H1, A2:E101, F2:F101)

ML.FIT returns a fitted model handle. The fitted handle is a different object from the unfitted one — keep both around if you want to re-fit on different data later.

For unsupervised models like clusterers, y is omitted:

=ML.FIT(H1, A2:E101)

3. Predict on new data¶

Pass the fitted-model handle plus a range of new features to ML.PREDICT:

=ML.PREDICT(H2, A102:E120)

ML.PREDICT spills a vector or matrix of predictions, scored using the data the model saw during ML.FIT.

A complete example¶

Working in cells A-F for data and column H for handles:

=ML.DATASETS.IRIS()              ' H1 - dataset handle
=ML.DATA.GET_X(H1)               ' A2:D151 - features
=ML.DATA.GET_Y(H1)               ' F2:F151 - target
=ML.CLASSIFICATION.LOGISTIC()    ' H2 - unfitted model
=ML.FIT(H2, A2:D151, F2:F151)    ' H3 - fitted model
=ML.PREDICT(H3, A2:D5)           ' G2:G5 - predictions for first 4 rows

Why three steps?¶

Splitting construction, fitting, and prediction across three cells matches how scikit-learn's API works underneath, and gives you natural places to inspect intermediate results:

Constructor lets you tweak hyperparameters without rebuilding the data pipeline.
ML.FIT is the expensive step (it actually trains the model). You rerun it only when the data or the model changes.
ML.PREDICT is cheap. You can call it many times on different feature ranges from the same fitted handle.

Beyond FIT/PREDICT¶

Once you're comfortable with the basic pattern, the rest of the formulaML surface plugs into the same flow:

Wrap multiple preprocessing steps and a model into a single ML.PIPELINE and treat the pipeline as if it were a model.
Score a fitted model against held-out data with the helpers under ML.EVAL.*.
Tune hyperparameters by varying the constructor's arguments and comparing scores.

Browse the Reference to see every available constructor and helper.