On-the-fly aggregations part 2: What happens next?
07 Feb 2018 | by Chris Roe
Read on as we delve deeper into Apteco’s new on-the-fly aggregation functionality, revealing how it can answer complex questions of conditional probability and more.
In our first post introducing the concept of ‘on-the-fly aggregation’, we looked at how the aggregation of customer data is fast becoming widespread practice for marketers and analysts alike.
Summarising transactional information for a client or customer is a useful tool for providing more simplified analysis. Most typical types of aggregation required to do this can currently be achieved in FastStats® using the RFV wizards. In previous releases, it would have been necessary to create an intermediate Virtual Variable to then use on a data grid or in a selection. The most recent Apteco blog discussed how to use aggregations in expressions, in a way that is both powerful and flexible.
In this piece, we will be looking at a new type of aggregation that will enable you to answer more complex questions, such as those that involve dependent events and conditional courses of customer activity.
Examples of these more complex questions include:
- Who has been to; destination X, then destination Y
- Whose first 3 transactions have all increased in value
Returning a value on the transactional table
An aggregation takes a set of records from a transactional table and groups them into a person record. The aggregated value is then returned at the person table (e.g. lifetime value, last product purchase).
A different approach is to return the value on the transactional table for all transactional records:
In the diagram above, the person has 4 bookings. The ‘Frequency’ of 4 bookings is an attribute of the Person. Each booking has a cost associated with it. The ‘Next Cost’ (or Cost of the Next Booking) is an attribute of each booking record. The final booking doesn’t have a ‘Next Cost’ associated with it.
This gives us the analytical tools to tackle more questions than previously allowed by FastStats®, and to address numerous other sets of problems much more efficiently.
Identifying transactions relative to each other allows us to compare the sequence of transactions a given customer has had. You could be interested in sequences of transactions for the same product; or transactions that have increased or decreased in value.
In special cases with fixed starting transactions (e.g. first or last), we could previously have created virtual variables to define the particular values we wanted to compare before using an expression to combine them in the way we needed for that analysis. The time required to create the virtual variables made this an inefficient process, leading to excess resource being spent on; the creation of virtual variables. If you only wanted to use this analysis once, you would have to create some virtual variables. If they were not removed afterwards, this would have had a knock-on effect in the form of VV refresh times, and the System Explorer becoming bogged down with unnecessary variables.
These new aggregations are computed as required in the calculations. This technique of aggregations is already used in our software to enable drag-offs from transactional analysis results – for example, if you were to select all individuals who have been to Greece, then Australia, then Portugal, with less than 400 days between each transaction.
However, this new development within expressions has opened up the flexibility to answer more general questions regarding this area of analysis.
Example 1
Aim/Task: select people who have had any 3 holidays where they have paid more on each consecutive holiday.
This is an example of a question which we could not have answered previously using FastStats® alone. For a specific booking (e.g. first) we could have created virtual variables to do the calculation – but we couldn’t have done this for any set of 3 ordered transactions.
The resulting expression from the new FastStats® aggregation functionality is shown below. Notice how we have compared the cost of the current booking to the next one, and then compared the next one to the one after that. If both these conditions hold they will return a value of 1, otherwise they will return a value of 0.
The [Cost] is the regular FastStats® variable for this booking record. The other two (shown enclosed by the {} symbols) are aggregations. The definition for one of these is shown below (the other is the same except with 2 instead of 1).
We can then make a selection of people with a value of 1, and look at the relevant transactional detail to check that they are exhibiting the desired behaviour (as shown in the highlighted rows for two example people). Note that these behaviours are at different points within the transactional history for the two highlighted people.
Example 2
Aim: select people who have been on holiday to the same destination for their last three consecutive holidays.
This is an example of a question that we could have previously answered because we have been specific about which particular set of transactions to consider. We would have had to create three virtual variables to represent ‘Last Destination, ‘2nd Last Destination’, ‘3rd Last Destination’ and then use an expression to check their equivalence.
Our approach now is similar, but we also need to use the Rank function to ensure that the destination we are considering is the Last one by Rank and then check the two previous ones are the same.
The screenshot below shows how we can define a Rank aggregation:
Within the expression we would then select bookings with a Rank = 1, and then add this as a clause to the And() expression in the first example.
These two examples simply give a flavour of the opportunities that are opened up by this new development, but in reality, there are myriad possibilities of how you might use these aggregations to unlock new details that lie within your data.
Accurately forecast future purchasing activity, and take the precision of your targeted marketing to the next level. Download: Best Next Offer: Think Ahead with Predictive Analytics