Download executable binaries from the latest TreeExtra release. You can also download sources and compile them, but normally you don't have to. TreeExtra binaries are standalone tools that run from command line. (In Windows, you need to use something like Command Line Prompt (cmd) to run command line tools.)
Prepare your data set following the instructions on input data format. You will need train, validation, test and attribute files. Here is a sample synthetic data set: data.train, data.valid, data.test, data.attr.
Create a new folder where you want to run this experiment and cd there. Output and temporary files will be placed in this folder.
Run ag_train (if needed, modify the file names in the following command
line):
> ag_train -t data.train -v data.valid -r data.attr
The log output will end with the recommendation which command to run next. Most likely the
recommendation will be to run ag_expand. Keep following
recommendations (often it takes about 6 runs of ag_expand) until you
run ag_save.
... recommendation: ag_expand -b 90
> ag_expand -b 90
... recommendation: ag_expand -b 140
> ag_expand -b 140
...
... recommendation: ag_save -a 0.02 -n 6
> ag_save -a 0.02 -n 6
The best model is saved in the file model.bin.
Run ag_predict on the test data:
> ag_predict -p data.test -r data.attr
... RMSE: 0.574717
That's it. The predictions on the test set are saved in preds.txt.
If you can afford to increase the running time of the program, I recommend you repeat the same
experiment in the slow mode. This will create a better model with better performance. To do it,
run ag_train with an additional flag -s slow. The
rest of the process is the same.
> ag_train -t data.train -v data.valid -r data.attr -s slow
> ag_expand -n 16 -b 90
> ag_expand -b 140
...
> ag_save -a 0.05 -n 4
> ag_predict -p data.test -r data.attr
... RMSE: 0.565393
Check out the rest of Additive Groves manual for other options like parallelization, evaluation by ROC, "superfast" training with fixed parameters, etc.