The Caret package ("Caret" stands for Classification And REgression Training) includes functions to support the model training process for complex regression and classification problems. It is widely used in the context of Machine Learning which we have dealt with earlier this year. See this blog post from spring if you are interested in an overview. Apart from other, even more advanced algorithms, regression approaches represent and established element of machine learning. There are tons of resources on regression, if you are interested and we will not focus on the basics in this post. There is, however, an earlier post if you look for a brief introduction.
In the following, the output of a knitr report into html was pasted on the blog site. We used made-up data on donor lifetime value as dependent variable and the initial donation, the last donation and donor age as predictors.
In short: The explanatory and predictive power of the model are low (for which the dummy data has to be blamed to a certain extent). However, the code you will find below aims to illustrate important concepts of machine learning in R:
Preparing Training and Test Data
We see that InitialDonation, LastDonation and Lifetimesum are factos .. so let´s prepare the data.
As we have a decent dataset now we go ahead and load the promiment machine learning package Caret (Classification And Regression Training)
So called near-zero-variance variables (i.e. variables where the observations are all the same) are cleaned.
Before we fit the model, let´s have a look at intercorrelations
Fitting the model
Now it is the moment we fit the (regression) model:
The scatterplot illustrates the relative poorness of the prediction. So: Still some work (data collection, modelling) to be done :-)