Segment Size Forecasting with “Will it Work?”
The idea for the Hackathon was simple. We all got together on a Wednesday morning and the bravest among us pitched their ideas for great new products. The rest of us jumped on board with those projects that seemed most worthy or fun and we were off.
For our project, we decided to predict the future, or more specifically, a specific aspect of the future — the expected number of users an advertising campaign will target.
An important feature of our application is the ability to target ads to a subset of our total user base, and apply different pricing rules for different types of campaigns or users. In our platform, the combination of targeting and pricing is called a strategy. We define straetgy targeting as a composition of multiple filters, called segments, which range from boolean rules like ‘North American but not Canadian visitors’ or ‘users who use tablets’, to more complex user-interest filters like ‘users who are interested in american football’. These segments are combined with AND/OR boolean rules by a strategy.
The Will-It-Work Product
One of the challenges that we encounter is that sometimes strategies are too broad (too many users) or too narrow (not enough users). If a strategy is too broad, we’re wasting money on users who are not interested in what we’re advertising; if they are too narrow, we won’t be able to reach advertiser goals for number of impressions shown or number of users reached.
Such segment sizing issues usually become apparent after a few days of running with the segment in production and checking results in our reports. Will-It-Work aims to overcome this challenge — it will predict how many users are in a newly created, never-run segment based on segments with similar properties.
The approach was to look in past data and to do as follows:
- If the a strategy with the same segments was created before, then we know exactly how many visitors belong to that segment.
- If the no such strategy exists, then we use a similarity model to score existing segments and extrapolate from them.
In the Hackaton we used a very basic similarity model as a proof of concept, where we ranked similarity of strategies based on the number of shared segments. Using our example from before of ‘North American but not Canadian visitors who use Tablets and are interested in American Football’, the model will score 3 positive filters of ‘North American; Tablet; American Football’ and one negative filter of ‘Not Canadian’.
Will-It-Work then ranks all existing strategies in relation to the new one, and presents the best matches to the user alongside their similarity score and observed segment size.
Segment forecast, similarity and ranking UI.
Because we store segment size information over time, the user can see the trend in segment size among similar strategies, and can hover over the chart to get detailed information.
Our similarity model is a good start, but it is very simplistic and consequently similar strategies can greatly vary in their predictions as evident in the graph. This is due to the fact that some filters are just bigger than others. The impact of changing the filter of ‘North-American’ to ‘Micronesian’ will probably have a greater effect on the number of visitors than changing the filter from ‘Tablet’ to ‘Mobile’, although both would score the same on our model as they are one filter away.
Another issue with the model is that it considers the two segments of ‘mobile and iOS’ and ‘iPhones’ as completely different whereas actually they are quite the same. Ideally, we could use a data-informed approach to ranking similarity based not just on the sizes of segments, but the actual users present in each — a great data science project, perhaps for a future hackathon…