Real-time Ad Targeting with Apache Kafka
Here at Magnetic, as a search-retargeting company, our core business model is to provide relevant ads to viewers. Our platform performs this task well, matching viewers up with related ads through various methods including page visits, search queries, and data analytics of each. It currently takes about 15 minutes on average for us to be able to react to new events in our core targeting infrastructure. If we could reduce this time, we could make our engineers, product management, ad operations, and our CEO really happy.
Because the average time a user stays on a web page is less than half a minute, we set our definition of ‘real-time’ to 10-20 seconds. That is, if our platform is able to ingest a viewer’s data and be able to provide a relevant ad within 20 seconds, we’d consider that a success.
A little background
Like all good engineers do, we laid down a plan of how we would accomplish this task, given the allotted time frame of 48 hours (realistically, it was closer to 2 work days). James brought up the that the way we transfer data between systems in our platform — log uploads and downloads via S3 — was inefficient and crippled performance. Here’s a diagram of what James was talking about:
A glimpse of Magnetic’s targeting platform
I’ll try to keep it short: The entry point to our retargeting platform starts at the tracking server, which is responsible for logging all the events from page visits and search events. These logs are uploaded to S3 every 10 minutes, and then downloaded to our ingest service. Ingest is where we take apart the logs line-by-line for data analysis, and insert it into HBase. Ingest also outputs a log file denoting the updated cookie-rows in HBase, so our Targeting server can query from HBase the specific cookie. Targeting then pulls in the cookie-row data along with our internal UI’s advertising campaigns, associates creatives that and into Kyoto Tycoon, a memcached key-value store database, each campaign with every new cookie.
You can probably see where we decided to optimize our platform: the periodic log uploads could be replaced by a message queue.
Put another way, we thought each service comprising our platform was very fast, they were simply congested by having to talk to each other through letter mail. This was the most obvious part of the system to optimize, and if we were able to remove these inefficiencies, we thought, we would be able to eliminate almost all of the 15 minutes.
Our choice of using Kafka was an easy one: it’s proven to be relatively quick, it was easy to setup, it had a python library (our entire platform is written in Python), all of which are advantages in a hackathon situation where you need to get to a minimum viable product especially fast. James was able to set up one of our AWS instances as a Kafka broker and demonstrate that it was working within 10 minutes. Awesome. After more work to prove that the python library was also working (it was), it was time to hack!
Fitting Kafka into our Tracking service was a natural fit because it was already outputting formatted data in a log file. But instead of writing to a log file, we just had to send this data through to our Kafka broker.
Ingest, the consumer of tracking data and our data-cleanser (among other things), was also a natural fit for Kafka. Ingest runs continuously, reading log files from S3, processes the logs, and inserts structured data into HBase. So instead of reading log files, we setup Ingest to act as a Kafka consumer, reading from the Kafka topic we set up from our Tracking service. Also, instead of writing to HBase, we also had to emit data into a Kafka topic for our Targeting service to see the viewer data.
Targeting runs as a cron service that scans Ingest logs to extract the cookies that have just been written to HBase. (There are actually two ways that Compile runs, we’ll just cover the one way we modified, for simplicity.) So instead of running it as a cron task, we had to make it run as a daemon, listen to the Kafka topic continuously, and ‘compile’ viewer data with creatives. This was rather straightforward after having to fit Ingest to listen to Kafka.
We were able to cut down the start-to-finish time from 15 minutes to about 15 seconds. In fact, the time it took for a viewer to propagate through to our first two services, Tracking and Ingest, took about 1-2 seconds. The majority of those 15 seconds* were taken up by the relatively difficult task of having to find which campaigns are relevant for the user. It was a huge success!
*It spent 15 seconds on our comparatively wimpy AWS EC2 QA instance where our QA Compile, HBase and Kyoto Tycoon services also lie. Production is likely to be much faster.
In hindsight, any of the many messaging queue options like RabbitMQ, ZeroMQ, and ActiveMQ could have also worked well enough for our hackathon presentation. We used Kafka because we were familiar with brokered message queue systems and that it was pretty quick. In a production environment, however, we would have to be more judicious in choosing the right message queue.
For one, the Kafka-Python [https://github.com/mumrah/kafka-python] library isn’t mature. Don’t get us wrong, it’s a good library as it is, and it worked well for our hackathon (it’s also the only one of its kind). However, some features are noticeably missing (specifying the maximum message size is impossible with the MultiProcessConsumer) so we’re looking at spending some time to improve the library and send a pull-request when we have something production-ready.
But for our hackathon? It was just what we needed.