Typically, the following steps are needed to develop a predictive model:
Define the Predictive Target
This is the first and most important step during the modeling process. Accurately defining target values can be quite involved. The modeler must understand the business rules and the application scenario since each company has its own understanding of and specific rules for a certain event.
For example, to define payment rates, a certain class of debtor might only require a call from the credit granter in order to facilitate a payment, while another might require a letter to facilitate a payment.
Extract Customer Data
While domain expertise and survey data are helpful, the model requires only customer transactional data and demographic data. In the wireless arena for example, call detail record data can be used, whilst in the financial services sector, financial transactional data are used.
A common problem in predictive analytics is how to properly extract training data to train a model that will reliably forecast future events. Several problems related to the so called false predictors can occur here, most of which result in a trained model that performs better on historical data sets than can be achieved in a production environment where future events are being predicted.
We’ve encountered three main sources of false predictors:
-
Errors in assembling training sets
-
Failure to detect the first instance of the target event
-
Inclusion of post-event information in independent variables i.e. input variables to the model.
Preprocess the Raw Data
Feature selection and data cleaning for the raw customer data are necessary. During the data-cleaning step the modeler usually determines how to deal with missing values and reducing noise.
Input-feature selection can achieve both data cleaning and data reduction by selecting important features and omitting redundant, noisy, or less informative ones. The second goal of preprocessing the raw data is to incorporate domain knowledge into the model through data representation based on domain knowledge.
Find the Best Model
The specific data characteristics for each project need to be taken into account to achieve the best model. For example, one can achieve a better model after compensating for data imbalances, data nonstationarity, and data sparsity. No single algorithm or technique will outperform all other algorithms on all problems (the no free lunch theorem), so model selection is always necessary for each project.
Integrated Decision Making
Prediction of customer behaviour is only a component in the decision framework for optimizing customer value. To achieve the goal of maximizing long-term profitability for a collection campaign, knowing which customers will likely pay is the largest single critical success factor within the accounts receivable management space.