One of the important factors that affects efficiency of our predictive models is the recency of the model. The earlier our bidders get new version of prediction model, the better decisions they can make. Delays in producing the model result in lost money due to incorrect predictions.
The slowest steps in our modeling pipeline are those that require manipulating the full data set — multiple weeks worth of data. Our sampling process has historically required two full passes over the data set, and so was an obvious target for optimization.