Predictions

General information

What are predictions?

Predictions are a powerful tool that allows you to create a model of customer behaviour based on past data. You can choose any customer event and predict the chance of it happening for every customer.

How does it work?

Predictions are based on artificial intelligence. In general, we use machine learning algorithms to create models on training datasets (past events) and find connections between the data and the target event. The algorithms we currently use are decision trees, random forests and gradient boosted trees.

Use cases

Predictions are a tool that allows you to estimate the chance of an event happening for every customer. You can use this value in any campaign, segmentation or anywhere else the same way as you use the customer aggregates.

Currently, there are three templates in Exponea app. The first two, “event prediction” and “churn prediction” are easy to use for a general user. The last one, “custom prediction” is for special tailored predictions and advanced users. We will first talk about the first two cases and later about the advanced template.

Event prediction

This template is for prediction of an event happening during a session. You can select any event you want to predict, e.g. “purchase” to predict for each customer what is his chance of buying one of your products until he leaves the page.

Beside target event, you can also select the date range on which you want to teach the algorithm. We suggest to use at least 1 week of data.

After the prediction is ran successfully, you can find the predicted value for each customer  in his customer attributes under predictions.

This prediction is best used for displaying banners and changing the way how a web page behaves depending on the chance of user doing some predefined action (target).

Churn prediction

Churn prediction is for predicting whether a given customer stays with you. In particular, it can predict if a customer who has been active in the past will be active in the future or if a customer who has bought something in the last specified period will buy something the next period as well.

Setting up this prediction is very simple. You just have to choose if you want to predict session_start or purchase and the time period on which you want to train it. If you select to train on one month period, the algorithm will consider data for the last two month. The month before the last will be used to generate features and the last month will be used to generate targets for each customer.

After the prediction is ran successfully, you can find the predicted value for each customer in his customer attributes under predictions.

Custom predictions

In custom predictions, we decided to let you have the full power over predictions. With use of this template, you can custom tailor a prediction right to your needs and with help of this guide, you will know exactly what you are doing.

Technical overview of data mining

In order to be able to set all the parameters to right values, you need to understand the basic concepts of data mining and feature generation.

In Exponea, we have data stored as events per customer. So, in order to translate these events into feature set for each customer, we need to do some aggregation functions on the events. This aggregation is pretty easy and straightforward. Basically, all we do is order events for each customer and go through them one by one and count statistical data on top of them. To be able to do that on all of the different projects, we have implemented a set of rules for selecting events and determining conversion target. In basic templates, all this rules are set to the values that turned out to be best while we have tested predictions, but we know that for some cases, you might need to change some of them in order to have more precise prediction.

All the settings explained below:

Target customer filter – We support 3 types of predictions:

  • Binomial classification – predict the chance of reaching a target
    Choose prediction goal. You can choose any combination of events and customer properties you want to predict.
    Eg. if you want to predict purchase of a specific brand, select event type ‘purchase’ and set event property to that particular brand
  • Regression – predict the numeric value of a target
    Choose prediction goal as a continuous variable.
    Eg. predict the number of times a user will return to the site this month. Create an aggregate which counts the number of session starts for the past month and use it as a target
  • Multinomial Classification – predict which of multiple targets is the most plausible
    Create a segmentation with all possible targets you want to predict.
    Eg. Predict which channel is best to use for communicating with a customer. Create each segment for a different channel with combination of purchase or other conversion goal. More details for this case in use cases below.

Eligible customer filter – Choose on which customers you want to train the model. You can select customers having some property or customers that had some series of events in the past. Eligible filter must be a superset of target customer filter. If left empty, the model will be trained on all customers. Use case – if you want to train model for opening emails based on one of your campaigns, set eligible filter to people who had that campaign and target filter to attribute ‘opened’

Date filter – Select dates which will be considered when computing predictions. Only event in selected time frame will influence the model. Should be before target date filter

Customer attributes – Select specific attributes you want to base the predictions on, or select only those you definitely don’t want in your training set and we will choose the best out of the rest.

Event filter – Select specific events you want to base the predictions on, or select only those you definitely don’t want in your training set and we will choose the best out of the rest.

In session – this option allows you to better predict customer behaviour while on site. Select this if you want to predict whether a customer will achieve a goal while on your site. If you want to use prediction for campaigns or other “offline” scenarios, please turn off this feature. This feature will “follow” each customer in training dataset and generate a new learning entry with nearly each event tracked. The frequency of generating this data entry is set in attribute “in session chance”. If you set it to 100%, every event will generate a new entry. This is recommended only if you have small (<10K users) dataset. With large (>1000K users) datasets, please use value between 10% and 50%.

In session time range – Select time range of events immediately relevant to the target. When training the algorithm, some events occur too early to have any effect on the target event so we want to exclude their contribution. Conversely, we don’t want to exclude possibly relevant events by setting the time range too short. The default value is 20 minutes (1200s) and usually produces good results.

Target offset – select time in seconds, when to stop taking events into account before target event. Eg. if you select to predict purchase, usually there will be events just before like add to cart, checkout, etc. These events are in most cases directly connected to the event purchase, so the decision tree would look like “if you have a checkout, you have a purchase, prediction done”. That’s why you need to select a time frame or a funnel, in which the events before target event should be ignored. There are two possibilities how to do this. Either by time (seconds) or with an event funnel.

  • Time (seconds) – Only applies to positive data. Best case is to set it to something between several minutes (60s – 1200s) and a day (60 * 60 * 24 = 86400). When “In Session” is set, Target time offset should be set to very low number, could even be set to 0.
  • Event Funnel – select an event or a series of events on which the preprocessing should stop taking events into account. Eg. for predicting purchase, select “add to cart” event, and only events that happened before add to cart would be considered.

Floating time window – when training the algorithm, the features for each customer are calculated only on events within selected date range. In case that the customer had target event in the beginning of the date range, only few events that happened before the target event are included. In case the floating time window is enabled, we will ensure that every customer will be calculated on all his events that happened in the time window that is calculated as target timestamp – length of original date range.

Dataset balancing – use when your dataset is highly unbalanced. If the target rate is less than 20%, we will automatically rebalance the dataset by omitting random negative samples.

Prediction change over time – creates a model which returns change in prediction chance for a customer over a specified time instead of its actual value. Can be used as a trigger for campaigns or weblayers.

Algorithm settings – Set which algorithm should be used and its properties.

Max depth – Set max depth of a tree. Best case scenario – set it to something between 5 and 10.

Min instances per node – set the minimal number of instances per node in the decision tree / random forest. Depending to dataset, should be set somewhere between 10 and few hundred.

Algorithm type – select algorithm you want to use. Decision tree is the simplest algorithm and also the easiest to understand by a human. Random forests are more accurate in general, but require more computational power. Also, random forest are only in beta version right now, so we don’t guarantee that you will be able to use its output in other parts of Exponea.

Some use cases for different use of custom predictions

In session – predicting what happens while the customers are on the site!

One of the most frequent and useful predictions is In Session Prediction. It allows us to calculate a chance for every person currently browsing our site reaching an immediate goal. With this prediction, you can easily predict whether your customer is about to buy something from you in this session or if he is just looking around and then either increase the amount of goods he is going to buy or in the latter case, persuade him to actually buy something.

In order to use this type of prediction, select custom predictions (or in session prediction in future) and set “In Session” to true. You can then change all the other parameters as you wish.

Use case: Let’s predict whether a customer will buy something and show him banners depending on the chance he is going to buy something.

  1. Create a new prediction. Chose a name by which you can identify this prediction easily
  2. Set target customer filter to purchase
  3. Since we want to ensure that only people that have been active in this time period will go into the training set, set eligible customer filter to people who had at least one session
  4. Set date filter to last month
  5. Select on which events to train the model. In our case, we will leave this to automatic feature selection, so we turn feature selection on and now we can just select events that we know we don’t want to use in the dataset.
  6. We will do the same with customer attributes here. It is extremely important to exclude any attributes directly related to purchase. For example exclude attributes like “pick up point” (or similar) which are only set once the customer has bought something. Including such attributes will create simplistic rules with no prediction power. If you are not sure about this, only stick to neutral attributes like “age”, “sex” and “location”
  7. Set In Session to true
  8. Depending on data set size, set in session chance to anything between 0.2 (large dataset) to 0.8 (small dataset)
  9. Set in session time range to 3600 seconds (1 hour) as the hour before a purchase is usually most relevant. This highly depends on your data, so it’s best to know how long does a typical customer journey on your website take and use that or a slightly higher number
  10. Set target time offset to 60 seconds. That is enough to filter out events that are directly connected to purchase (like “checkout” or “order”)
  11. Set floating to true, this enables us to have more data for each customer
  12. Since In Session feature generates a lot more instances for positive data, set dataset balancing to false (as long as you don’t have more positive than negative instances)
  13. For the algorithm settings, you can keep them on the default. So decision tree with max depth 5 and min instances per node to 50
  14. Save and run prediction
  15. Open your desired banner settings and set the condition to having the prediction value for example larger than 0.5. This way every person who we think has at least 50% chance of buying during this session will see an extra banner offering him some discount if he buys more (or anything else you want).

Predictions for use in new campaigns.

Let’s say we want to send an sms campaign to our customers. Since sms campaigns are rather costly, we want to send it only to people we know will buy something from us in next month. In order to do this, we have to use the prediction module.

Now, we have two options. If we don’t want to get too deep into data science, we can just use the default event prediction, set time period to two months and target to purchase, or we can use the custom prediction template, which is more advanced, but with the help of this guide, you should be able to set everything to the best value for your prediction!

So, let’s begin.

  1. Create a new prediction. Chose a name by which you can identify this prediction easily
  2. Set target customer filter to purchase
  3. Since we want to ensure that only people that have been active in this time period will go into the training set, set eligible customer filter to people who had at least one session
  4. Set date filter to last month
  5. Select on which events to train the model. In our case, we will leave this to automatic feature selection, so we turn feature selection on and now we can just select events that we know we don’t want to use in the dataset.
  6. We will do the same with customer attributes here. It is extremely important to exclude any attributes directly related to purchase. For example exclude attributes like “pick up point” (or similar) which are only set once the customer has bought something. Including such attributes will create simplistic rules with no prediction power. If you are not sure about this, only stick to neutral attributes like “age”, “sex” and “location”
  7. Set In Session to false, since we want to predict for all customer in general and not only for currently active customers on site
  8. Set target time offset to 24 hours. That’s because we want to exclude the last day before purchase, since we don’t want that day to impact the target rate
  9. Set floating to true, this enables us to have more data for each customer
  10. For the algorithm settings, you can keep them on the default. So decision tree with max depth 5 and min instances per node to 50
  11. Save and run prediction
  12. Open your desired campaign settings and set the condition to having the prediction value for example larger than 0.5. This way every person who we think has at least 50% chance of buying will be sent an sms

Predictions for campaign retargeting

If we have a campaign that we used before and now we want to use the same or similar campaign for new customer, we can train a prediction based on old customers that were targeted by the first campaign.

You can even combine more predictions like this for several campaigns and send each customer only that campaign where is the highest chance for him to open it.

  1. Create a new prediction. Chose a name by which you can identify this prediction easily
  2. Setting target customer filter is going to be slightly difficult. We need to set a more complex filter by combining two conditions. The target event will be campaign with attribute name (or id) set to the desired campaign and the same event must also have property “status” set to opened # TODO: add picture
  3. Since we want to ensure that only people that have been targeted by the first campaign will go into the training set, set eligible customer filter to people who had a campaign event with the campaign id of the last campaign
  4. Set date filter to month before the campaign and ending just after the campaign
  5. Select on which events to train the model. In our case, we will leave this to automatic feature selection, so we turn feature selection on and now we can just select events that we know we don’t want to use in the dataset.
  6. We will do the same with customer attributes here. It is extremely important to exclude any attributes directly related to purchase. For example exclude attributes like “pick up point” (or similar) which are only set once the customer has bought something. Including such attributes will create simplistic rules with no prediction power. If you are not sure about this, only stick to neutral attributes like “age”, “sex” and “location”
  7. Set In Session to false, since we want to predict for all customer in general and not only for currently active customers on site
  8. Set target time offset to 0 seconds. That’s because campaign event doesn’t have any events attached to it like purchase has, so we want to train on all the data we have.
  9. Set floating to true, this enables us to have more data for each customer
  10. For the algorithm settings, you can keep them on the default. So decision tree with max depth 5 and min instances per node to 50
  11. Save and run prediction
  12. Open your desired campaign settings and set the condition to having the prediction value for example larger than 0.5. This way every person who we think has at least 50% chance of opening the email will be sent a new campaign

Custom churn prediction

In predictions, we have churn template, but because we wanted to make it as easy to use for a customer as possible, we don’t allow to select much options there. If these options are not enough for your problem, the custom prediction template might come of use.

  1. Create a new custom prediction. Chose a name by which you can identify this prediction easily
  2. Setting target customer filter is going to be slightly difficult. Since churn is about predicting if a customer leaves you, we must select a target goal, in our case session start and then negate it. Now, the important part! Set the date filter inside target customer filter to last month. Later we will set the “global” date filter to the month before. This will ensure that features will be generated from the month before the last one and the target will be generated from last month
  3. Since we want to ensure that only people that have been active in the first time period (the month before last month), we need to set customer filter to all people who had a session start in that period. So set a session start and set the date filter inside the eligible customer filter to desired date. This date should be the same as the “global” date
  4. Set date filter to the month before the last month, the same as in eligible customer filter. (one month earlier as the one in target filter)
  5. Select on which events to train the model. In our case, we will leave this to automatic feature selection, so we turn feature selection on and now we can just select events that we know we don’t want to use in the dataset
  6. We will do the same with customer attributes here
  7. Set In Session to false, since we want to predict for all customer in general and not only for currently active customers on site
  8. Set target time offset to 0 seconds. That’s because we have already set precise boundaries where to get features and where to get target and these boundaries are not overlapping
  9. Set floating to false. This should be set to false, because otherwise it would only use features for one month right before the session start in target filter
  10. For the algorithm settings, you can keep them on the default. So decision tree with max depth 5 and min instances per node to 50
  11. Save and run prediction
  12. Now you have a custom churn prediction where you can select any dates you want and any target you need

Best campaign channel prediction

# TODO

Updated on August 23, 2018

Was this article helpful?

Related Articles