Skip to content

The FIT/PREDICT pattern

Every classifier, regressor, and clusterer in formulaML follows the same three-step workflow. Once you've used it for one model, every other model works the same way — only the constructor changes.

The three steps

1. Build an unfitted model

Call the model's constructor with whatever hyperparameters you want. Hyperparameters are passed as named arguments; defaults are sensible if you omit them.

=ML.CLASSIFICATION.LOGISTIC()
=ML.REGRESSION.RIDGE(0.5)
=ML.CLUSTERING.KMEANS(3)

Each constructor returns an object handle representing an unfitted model — the algorithm is configured, but it hasn't seen any data yet.

2. Fit the model on data

Pass the unfitted-model handle plus a features range (X) and a target range (y) to ML.FIT. Suppose the constructor's handle is in H1, features are in A2:E101, and the target is in F2:F101:

=ML.FIT(H1, A2:E101, F2:F101)

ML.FIT returns a fitted model handle. The fitted handle is a different object from the unfitted one — keep both around if you want to re-fit on different data later.

For unsupervised models like clusterers, y is omitted:

=ML.FIT(H1, A2:E101)

3. Predict on new data

Pass the fitted-model handle plus a range of new features to ML.PREDICT:

=ML.PREDICT(H2, A102:E120)

ML.PREDICT spills a vector or matrix of predictions, scored using the data the model saw during ML.FIT.

A complete example

Working in cells A-F for data and column H for handles:

=ML.DATASETS.IRIS()              ' H1 - dataset handle
=ML.DATA.GET_X(H1)               ' A2:D151 - features
=ML.DATA.GET_Y(H1)               ' F2:F151 - target
=ML.CLASSIFICATION.LOGISTIC()    ' H2 - unfitted model
=ML.FIT(H2, A2:D151, F2:F151)    ' H3 - fitted model
=ML.PREDICT(H3, A2:D5)           ' G2:G5 - predictions for first 4 rows

Why three steps?

Splitting construction, fitting, and prediction across three cells matches how scikit-learn's API works underneath, and gives you natural places to inspect intermediate results:

  • Constructor lets you tweak hyperparameters without rebuilding the data pipeline.
  • ML.FIT is the expensive step (it actually trains the model). You rerun it only when the data or the model changes.
  • ML.PREDICT is cheap. You can call it many times on different feature ranges from the same fitted handle.

Beyond FIT/PREDICT

Once you're comfortable with the basic pattern, the rest of the formulaML surface plugs into the same flow:

  • Wrap multiple preprocessing steps and a model into a single ML.PIPELINE and treat the pipeline as if it were a model.
  • Score a fitted model against held-out data with the helpers under ML.EVAL.*.
  • Tune hyperparameters by varying the constructor's arguments and comparing scores.

Browse the Reference to see every available constructor and helper.